Riemann Integration
The Riemann Integral is the standard "grid-based" approach to integration taught in introductory calculus.
It relies on partitioning the domain (\(x\)-axis) into increasingly fine intervals to approximate the area under a curve.
While intuitive, this method is fundamentally tied to the geometry of intervals, which limits its ability to handle functions defined on more
complex sets or those with highly irregular structures. Understanding these limitations is the primary motivation for
moving toward Measure Theory.
To begin our formalization, we must first define how we "slice" the input space. This is achieved through the concept of a
partition, which divides a continuous interval into a finite set of discrete sub-segments.
Definition: Partition and Subintervals
Let \(f: [a, b] \to \mathbb{R} \) be a bounded function on a closed interval \([a, b]\).
A partition \(\mathcal{P}\) of \([a, b]\) is a finite sequence of points:
\[
\mathcal{P} = \{x_0, x_1, x_2, \cdots, x_n\}
\]
where \(a = x_0 < x_1 < x_2 < \cdots < x_n = b\). The resulting subintervals
are \([x_{i-1}, x_i]\) for \(i = 1, \cdots, n\).
Once we have partitioned the domain, we need a way to approximate the function's behavior within each
sub-segment. By taking the extreme values (the supremum and infimum) of the function on each interval,
we can construct two bounding rectangles that "sandwich" the true area.
Definition: Lower & Upper Sums
For each subinterval, let \(m_i\) be the infimum (greatest lower bound) and \(M_i\) be the
supremum (least upper bound) of \(f(x)\):
\[
\begin{align*}
m_i &= \inf_{x\in [x_{i-1}, x_i]} f(x), \\\\
M_i &= \sup_{x\in [x_{i-1}, x_i]} f(x).
\end{align*}
\]
Moreover, we define the Lower Sum \(L(f, \mathcal{P})\) and Upper Sum
\(U(f, \mathcal{P})\) as:
\[
\begin{align*}
L(f, \mathcal{P}) &= \sum_{i=1}^n m_i (x_i - x_{i-1}), \\\\
U(f, \mathcal{P}) &= \sum_{i=1}^n M_i (x_i - x_{i-1}).
\end{align*}
\]
Integrability is achieved when these upper and lower bounds converge to the same value as we refine our partition.
This convergence implies that the "gap" between our overestimation and underestimation vanishes, leaving us with a
unique value for the area.
Definition: Riemann Integrability
The lower Riemann integral and the upper Riemann integral of \(f\) over \([a, b]\) are
as follows respectively:
\[
\begin{align*}
&\underline{\int_a^b} f(x)dx = \sup_{\mathcal{P}} L(f, \mathcal{P}), \\\\
&\overline{\int_a^b} f(x)dx = \inf_{\mathcal{P}} U(f, \mathcal{P}).
\end{align*}
\]
Then, a function \(f\) is said to be Riemann integrable on \([a, b]\) if
\[
\underline{\int_a^b} f(x)dx = \overline{\int_a^b} f(x)dx = \alpha.
\]
When this holds, the common value \(\alpha\) is called the Riemann integral of \(f\) over \([a, b]\):
\[
\int_a^b f(x)dx = \alpha.
\]
In practice, we often describe this refinement process using the norm of the partition, denoted as
\(\| \mathcal{P} \|\), which represents the width of the largest sub-segment. As this width approaches zero, the Riemann
sums converge to the integral.
Theorem: Convergence via Partition Norm
Let \( \| \mathcal{P} \| = \max_{1 \leq i \leq n} (x_i - x_{i-1}) \). A bounded function \(f\) is Riemann integrable
on \([a, b]\) if and only if:
\[
\lim_{ \| \mathcal{P} \| \to 0} L(f, \mathcal{P}) = \lim_{ \| \mathcal{P} \| \to 0} U(f, \mathcal{P}) = \alpha.
\]
Understanding when a function is integrable is key. While some properties are strict requirements (necessary conditions),
others provide a guarantee that the integral exists (sufficient conditions).
Requirements and Sufficient Conditions
- Necessary Condition (Boundedness):
If \(f\) is Riemann integrable, it must be bounded. An unbounded function cannot be Riemann integrable in the classical sense.
- Sufficient Condition (Continuity):
If \(f\) is continuous on \([a, b]\), it is guaranteed to be Riemann integrable.
- Sufficient Condition (Monotonicity):
If \(f\) is monotonic on \([a, b]\), it is guaranteed to be Riemann integrable, even if it has many points of discontinuity.
However, the Riemann integral has its limits — specifically when dealing with functions that are "too" discontinuous or
intervals that are unbounded. To address these, we must extend our toolkit to include improper integrals.
Improper Riemann Integration
In many cases, especially in statistics, a function may fail to be Riemann integrable due to unbounded intervals or
singularities. However, such functions can often still be integrated using the improper Riemann integral,
which extends the classical Riemann integral by incorporating limits to handle these issues. Fortunately, the improper
Riemann integral covers a wide range of integrals commonly encountered in statistics and machine learning.
Case 1: One side of the interval is unbounded
Let \(f: [a, \infty) \to \mathbb{R} \) be a function. \(f\) is improperly integrable as follows:
\[
\int_a^{\infty} f(x)dx = \lim_{b \to \infty} \int_a^b f(x)dx.
\]
if the limit exists and is finite. Similarly, for \(f: (-\infty, b] \to \mathbb{R} \),
\[
\int_{-\infty}^b f(x)dx = \lim_{a \to -\infty} \int_a^b f(x)dx.
\]
Example:
\[
\begin{align*}
\int_0^{\infty} e^{-x} dx &= \lim_{b \to \infty} \int_0^b e^{-x} dx \\
&= \lim_{b \to \infty} \left[-e^{-x}\right]_0^b \\
&= \lim_{b \to \infty} \left(-e^{-b} - (-e^{0})\right) \\
&= \lim_{b \to \infty} \left(-e^{-b} + 1\right) \\
&= 0 + 1 = 1.
\end{align*}
\]
Case 2: Both sides of the interval are unbounded
For a function \(f: \mathbb{R} \to \mathbb{R}\), we define the integral over the entire real line by splitting it at any real number \(c\) (commonly \(c = 0\)):
\[
\int_{-\infty}^{\infty} f(x)dx = \int_{-\infty}^c f(x)dx + \int_c^{\infty} f(x)dx
\]
More formally, this is defined using two independent limits:
\[
\int_{-\infty}^{\infty} f(x)dx = \lim_{a \to -\infty} \int_a^c f(x)dx + \lim_{b \to \infty} \int_c^b f(x)dx
\]
The integral exists (converges) if and only if both limits exist and are finite independently.
Example: Divergence and Cauchy Principal Value
A classic example is the
Gaussian integral:
\[
\int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}.
\]
Important Caution: The improper integral is
not defined as \(\lim_{R \to \infty} \int_{-R}^{R} f(x) dx\). That symmetric limit is known as the
Cauchy Principal Value, which can exist even when the improper integral does not. Consider \(f(x) = \frac{x}{1+x^2}\):
\[
\begin{align*}
\int_{-\infty}^{\infty} \frac{x}{1+x^2}dx &= \lim_{a \to -\infty} \int_a^0 \frac{x}{1+x^2}dx + \lim_{b \to \infty} \int_0^b \frac{x}{1+x^2}dx \\\\
&= \lim_{a \to -\infty} \left[ \frac{1}{2}\ln(1+x^2) \right]_a^0 + \lim_{b \to \infty} \left[ \frac{1}{2}\ln(1+x^2) \right]_0^b \\\\
&= \lim_{a \to -\infty} \left( -\frac{1}{2}\ln(1+a^2) \right) + \lim_{b \to \infty} \left( \frac{1}{2}\ln(1+b^2) \right)
\end{align*}
\]
As \(a \to -\infty\) and \(b \to \infty\), the two terms diverge in opposite directions, yielding the indeterminate form \(-\infty + \infty\). Therefore, the integral does not converge. This explains why the
Cauchy distribution has no defined mean.
Note that the Cauchy principal value \(\lim_{R \to \infty} \int_{-R}^{R} \frac{x}{1+x^2}dx = 0\) by symmetry, which can be misleading if the independent nature of the limits is ignored.
Case 3: Function has a singularity (becomes unbounded) on a finite interval
The improper integral also extends to functions that are not defined or become unbounded at certain points within a finite interval.
It is important to note that this does not require the function to be continuous; it only requires the function to be
Riemann integrable on every closed subinterval that excludes the singularity.
If \(f\) has a singularity at a point \(c \in (a, b)\), the improper integral is defined by approaching \(c\) from both sides:
\[
\int_a^b f(x)dx = \lim_{\epsilon \to 0^+} \int_a^{c-\epsilon} f(x)dx + \lim_{\delta \to 0^+} \int_{c+\delta}^b f(x)dx
\]
The integral converges only if both independent limits exist and are finite.
If the singularity occurs at an endpoint, we define:
\[
\int_a^b f(x)dx = \lim_{\epsilon \to 0^+} \int_{a+\epsilon}^b f(x)dx
\]
if \(f\) has a singularity at \(a\), or
\[
\int_a^b f(x)dx = \lim_{\epsilon \to 0^+} \int_a^{b-\epsilon} f(x)dx
\]
if \(f\) has a singularity at \(b\).
Example: Singularity at an Endpoint
Consider the function \(f(x) = \frac{1}{\sqrt{x}}\) on the interval \([0, 1]\). The function is unbounded at \(x = 0\):
\[
\begin{align*}
\int_0^1 \frac{1}{\sqrt{x}}dx &= \lim_{\epsilon \to 0^+} \int_{\epsilon}^1 x^{-1/2} dx \\
&= \lim_{\epsilon \to 0^+} \left[2\sqrt{x}\right]_{\epsilon}^1 \\
&= \lim_{\epsilon \to 0^+} \left(2\sqrt{1} - 2\sqrt{\epsilon}\right) \\
&= 2 - 0 = 2.
\end{align*}
\]
Even though the function "shoots up" to infinity at the origin, the area under the curve remains finite.
Limitation of the (Improper) Riemann integration
Consider the Dirichlet function on \([0, 1]\):
\[
f(x)=
\begin{cases}
1 &\text{if \(x \in \mathbb{Q}\)} \\
0 &\text{if \(x \in \mathbb{R} \setminus \mathbb{Q}\)}
\end{cases}
\]
Since both the rational \(\mathbb{Q}\) and the irrational \(\mathbb{R} \setminus \mathbb{Q}\) numbers are dense
in the interval \([0, 1]\), every subinterval \([x_{i-1}, x_i]\) of a partition \(\mathcal{P}\) - no matter how small - contains both.
This leads to the fact that the upper and lower Riemann sums never converge to the same value:
- For every subinterval, \(M_i = \sup f(x) = 1\).
- For every subinterval, \(m_i = \inf f(x) = 0\).
Consequently, \(U(f, \mathcal{P}) = 1\) and \(L(f, \mathcal{P}) = 0\) for all partitions, making the function
not Riemann integrable.
Moreover, the Dirichlet function is discontinuous everywhere in \([0, 1]\). While improper
Riemann integrals can handle isolated discontinuities (such as singularities), they cannot handle an infinite,
dense set of discontinuities like this.
The failure here stems from the Riemann integral's inability to distinguish between the "size" of \(\mathbb{Q}\)
and \(\mathbb{R} \setminus \mathbb{Q}\). While both are dense, Measure Theory reveals that
\(\mathbb{Q}\) is a set of measure zero.
The Lebesgue integral will allow us to "ignore" these points of measure zero, yielding an integral of 0
for the Dirichlet function - a result that matches our probabilistic intuition that the "weight" of rational points
on the real line is negligible.
Side Note:
Definition: Dense Set
A set \(A \subset \mathbb{R}\) is said to be dense on \(\mathbb{R}\) if for any
\(x, y \in \mathbb{R}\), there exsits \(a \in A\) such that \(x < a < y\).
Equivalently, every open interval in \(\mathbb{R}\) contains at least one point from \(A\).