Abstract Integration
Lebesgue integration is sometimes referred to as abstract integration.
The reason is that the Lebesgue integral is developed in the very general setting of a
measure space \((\Omega, \mathcal{F}, \mu)\).
We want to define the integral:
\[
\int_{\Omega} g(\omega) \, d\mu(\omega)
\]
of a measurable function \(g: \Omega \to \overline{\mathbb{R}}\) defined on a measure space \((\Omega, \mathcal{F}, \mu)\).
Note: \(\overline{\mathbb{R}}\) refers to the extended set of real values, which includes \(\infty\) and \(-\infty\).
Let's check some special cases:
If the measure space is a probability space:\((\Omega, \mathcal{F}, \mathbb{P})\), and
\(X: \Omega \to \overline{\mathbb{R}}\) is measurable, which means \(X\) is an extended-valued random variable,
then the integral is the expectation of \(X\):
\[
\int_{\Omega} X \, d\mathbb{P} = \mathbb{E }(X).
\]
If we define the measure space as \((\mathbb{R}, \mathcal{B}, \lambda)\) where \(\mathcal{B}\) is the Borel
\(\sigma\)-algebra and \(\lambda\) is the Lebesgue measure, then the integral is a generalization of the usual integral
encountered in calculus:
\[
\int_{\mathbb{R}} g \, d\lambda = \int g(x) dx.
\]
So, even if it isn't always stated explicitly, many of the integrals we have encountered - whether in the context of computing
expectations in probability theory or evaluating integrals on the real line - can be defined using Lebesgue integration.
This approach is the core of modern analysis and probability, providing a robust framework for handling functions that may
be too irregular or complex for the Riemann integration.
One of the key advantages of this abstract approach is that the Lebesgue integral is insensitive to what happens on sets
of measure zero. This allows us to integrate highly discontinuous functions, such as the Dirichlet function,
which the Riemann integral fails to handle.
Before we proceed, we pin down the notion of "measurable function" that has been informally invoked above.
Only measurable functions admit a Lebesgue integral — this is the structural analogue of boundedness being necessary
for the Riemann integral.
Definition: Measurable Function
Let \((\Omega, \mathcal{F})\) be a measurable space. A function \(g: \Omega \to \overline{\mathbb{R}}\)
is said to be \(\mathcal{F}\)-measurable
(or simply measurable, when \(\mathcal{F}\) is understood) if for every \(a \in \mathbb{R}\),
\[
\{\omega \in \Omega : g(\omega) \leq a\} \in \mathcal{F}.
\]
Equivalently, \(g^{-1}(B) \in \mathcal{F}\) for every Borel set \(B \subset \overline{\mathbb{R}}\).
In the special case of a probability space \((\Omega, \mathcal{F}, \mathbb{P})\), a measurable function
is called a random variable.
Characteristic function
Before we define the Lebesgue integral in full generality, we need a fundamental tool that allows us to
"isolate" or "focus on" specific regions of our measure space. This is where the characteristic function
comes in.
The characteristic function is deceptively simple — it's essentially a mathematical "switch" that turns a function
"on" over a specific set and "off" everywhere else. Despite its simplicity, this concept is crucial for building
up the Lebesgue integral from basic building blocks to arbitrarily complex measurable functions.
Definition: Characteristic Function (Indicator Function)
The characteristic function (or indicator function) of a set \(B\), denoted
\(\chi_B\) or \(1_B\), is defined as:
\[
\chi_B(\omega) = \begin{cases}
1 & \text{if \(\omega \in B\)}\\
0 & \text{if \(\omega \notin B\)}.
\end{cases}
\]
The integral over a measurable subset \(B\) of \(g\) is then defined by
\[
\int_B g \, d\mu = \int (\chi_B \, g)d\mu
\]
where
\[
(\chi_B \, g) (\omega) = \begin{cases}
g(\omega) & \text{if \(\omega \in B\)}\\
0 & \text{if \(\omega \notin B\)}.
\end{cases}
\]
We will use the term "almost everywhere"(a.e.) to mean "for all \(\omega\) outside
a zero-measure subset of \(\Omega\). So, we "ignore" a set of \(\omega\)'s that has measure zero.
For the special case of probability measure, we also use "almost surely"(a.s.).
For example,
\[
g_n \uparrow g, \, a.e.
\]
means that the increasing monotonic convergence of \(g_n(\omega)\) to \(g(\omega)\) holds \(\forall \omega \in \Omega\)
outside a zero-measure set.
The Integral of Simple Functions
Definition: Simple Function
A function \(g: \Omega \to \mathbb{R}\) is said to be simple if it
is measurable, finite, and takes only finitely many different values.
If \(g\) is a simple function of the form:
\[
g(\omega) = \sum_{i=1}^k a_i \chi_{A_i}(\omega), \quad \forall \omega \in \Omega
\]
where \(k\) is a positive integer, \(a_i \in \mathbb{R}\), and \(A_i\) are
pairwise disjoint measurable sets, then its integral is defined by
\[
\int g \, d\mu = \sum_{i=1}^k a_i \mu(A_i).
\]
The sum on the right is independent of the choice of such disjoint-set representation (shown below),
so the integral is well-defined as a property of \(g\) alone.
Convention: We adopt the standard measure-theoretic convention that \(0 \cdot \infty = 0\).
This ensures that \(a_i \mu(A_i) = 0\) whenever \(a_i = 0\), even if \(\mu(A_i) = \infty\).
Well-definedness.
Suppose \(g = \sum_i a_i \chi_{A_i} = \sum_j b_j \chi_{B_j}\) are two disjoint-set representations of the same simple function.
On each non-empty piece \(A_i \cap B_j\) of the common refinement, the two expressions force the common value \(a_i = b_j\).
Using finite additivity of \(\mu\) on the disjoint families \(\{A_i \cap B_j\}_j\) and \(\{A_i \cap B_j\}_i\),
and interchanging the two finite sums,
\[
\sum_i a_i \, \mu(A_i) = \sum_i \sum_j a_i \, \mu(A_i \cap B_j) = \sum_j \sum_i b_j \, \mu(A_i \cap B_j) = \sum_j b_j \, \mu(B_j).
\]
Hence the value of \(\int g \, d\mu\) does not depend on the chosen representation.
Example: Dirichlet function
Finally, we can compute the integral of the Dirichlet function on the interval \([0, 1]\):
\[
f(x)=
\begin{cases}
1 &\text{if \(x \in \mathbb{Q}\)} \\
0 &\text{if \(x \in \mathbb{R} \setminus \mathbb{Q}\)}
\end{cases}
\]
The Dirichlet function is simple on \([0, 1]\): it takes only the two values \(1\) and \(0\)
on the disjoint measurable sets \([0,1] \cap \mathbb{Q}\) and \([0,1] \setminus \mathbb{Q}\),
so it fits the form \(\sum_i a_i \chi_{A_i}\) of the definition above.
We first record the measures of the two pieces:
- \(\mu([0,1] \cap \mathbb{Q}) = 0\), since \([0,1] \cap \mathbb{Q}\) is a countable set
and countable sets have Lebesgue measure zero (established on the previous page).
- \(\mu([0,1] \setminus \mathbb{Q}) = 1\), by finite additivity on the disjoint decomposition
\([0,1] = ([0,1] \cap \mathbb{Q}) \cup ([0,1] \setminus \mathbb{Q})\), using \(\mu([0,1]) = 1\).
The simple-function integral then gives
\[
\begin{align*}
\int_0^1 f \, d\lambda &= 1 \cdot \mu([0, 1] \cap \mathbb{Q}) + 0 \cdot \mu([0, 1] \setminus \mathbb{Q}) \\\\
&= 1 \cdot 0 + 0 \cdot 1 \\\\
&= 0.
\end{align*}
\]
So, the Dirichlet function is \(0\) almost everywhere. Even though there are infinitely many rational numbers
where the function equals \(1\), their "weight" (measure) in the continuum of real numbers is zero.
Note: The same computation, together with the fact that any countable subset of \(\mathbb{R}\) has Lebesgue measure zero,
shows the Lebesgue integral of the Dirichlet function is \(0\) on every interval \([a, b]\).
Insight: "Generally Speaking"
In the real world, strict logical implications like \(P \implies Q\) are incredibly rare.
When we say "Generally, A implies B," we are often mentally suppressing thousands of lines
of "exception handling" code.
In measure theory, we can formalize this "generally" using the concept of
"almost everywhere" (a.e.). Instead of demanding that a property holds for
every single point (which is often too brittle for complex systems), we allow for a
set of exceptions, provided that their measure is zero.
As seen in the Dirichlet function example above, even though there are
infinitely many points where the function is 1, they have zero "weight" in the eyes of the
Lebesgue integral.
By using a.e., we gain a powerful way to "compress" information without losing
logical rigor. We aren't just being vague; we are mathematically proving that the exceptions don't
affect the overall "structure" or "integral" of the system.
When measure theory is applied to probability, "almost everywhere" is referred to as "almost surely" (a.s.).
This distinction is vital for understanding how we handle "impossible" events.
Consider flipping a fair coin infinitely many times. From a set-theoretic perspective, a sequence of
"all heads" (H, H, H, ...) is a "valid" element of the sample space \(\Omega\). It is logically "possible" in the sense
that the set is not empty.
However, from a measure-theoretic perspective, the probability measure assigned to this specific singleton set is
exactly zero.
This is why the Law of Large Numbers states that the mean "converges" to \(1/2\) almost surely. It acknowledges that
while non-convergent sequences exist as mathematical objects, their total "weight" is zero, allowing us to treat the convergence
as a certainty in any functional system.
The Integral of Nonnegative Measurable Functions
We approximate the integral of a nonnegative function \(g\) using simple functions.
For a nonnegative, possibly extended-valued, measurable function \(g: \Omega \to [0, \infty]\), let \(S(g)\)
denote the set of all nonnegative simple functions \(q\) — measurable by definition of "simple" — that satisfy
\[
0 \leq q(\omega) \leq g(\omega) \quad \text{at every } \omega \in \Omega.
\]
Note that \(q\) itself is finite-valued, whereas \(g\) may attain the value \(+\infty\);
at points where \(g(\omega) = +\infty\), the condition \(q(\omega) \leq g(\omega)\) simply imposes no upper bound on \(q(\omega)\).
We then define
\[
\int g \, d\mu = \sup_{q \in S(g)} \int q \, d\mu,
\]
interpreted as a value in \([0, \infty]\) (the supremum is allowed to be \(+\infty\)).
When \(g\) is itself a nonnegative simple function, we have \(g \in S(g)\),
and for every other \(q \in S(g)\) the pointwise inequality \(q \leq g\) passes through the simple-function integral
(by taking a common refinement of their representations) to give \(\int q \, d\mu \leq \int g \, d\mu\).
Hence the supremum is attained by \(q = g\), and this definition agrees with the one given earlier for simple functions.
The two definitions are therefore consistent.
Theorem: Zero Integral Implies Vanishing Almost Everywhere
Let \(g : \Omega \to [0, \infty]\) be nonnegative and measurable. Then
\[
\int_\Omega g \, d\mu = 0
\quad \Longleftrightarrow \quad
g = 0 \text{ almost everywhere.}
\]
Proof:
(\(\Leftarrow\)) Suppose \(g = 0\) a.e., i.e., \(\mu(N) = 0\)
for \(N = \{\omega : g(\omega) > 0\}\). For any \(q \in S(g)\), pointwise
\(0 \leq q \leq g\), so \(q(\omega) = 0\) whenever \(g(\omega) = 0\); in
particular, \(q = 0\) on \(\Omega \setminus N\). Write
\(q = \sum_i a_i \chi_{A_i}\) in its disjoint-set representation. For each
index \(i\) with \(a_i > 0\), we claim \(A_i \subseteq N\): if \(\omega \in A_i\),
disjointness of the representation forces \(q(\omega) = a_i > 0\), hence
\(\omega \in N\). By
monotonicity of \(\mu\),
\(\mu(A_i) \leq \mu(N) = 0\). Therefore
\(\int q \, d\mu = \sum_i a_i \mu(A_i) = 0\) for every \(q \in S(g)\), so
\(\int g \, d\mu = \sup_{q \in S(g)} \int q \, d\mu = 0\).
(\(\Rightarrow\)) Suppose \(\int g \, d\mu = 0\). For each
\(n \in \mathbb{N}\), define \(A_n = \{\omega : g(\omega) \geq 1/n\}\). Since
\(g\) is
measurable
and \([1/n, \infty] \subset \overline{\mathbb{R}}\) is Borel, the preimage
\(A_n = g^{-1}([1/n, \infty])\) lies in \(\mathcal{F}\). The function
\(\tfrac{1}{n}\chi_{A_n}\) is a nonnegative simple function satisfying
\(\tfrac{1}{n}\chi_{A_n}(\omega) \leq g(\omega)\) at every \(\omega\), so
\(\tfrac{1}{n}\chi_{A_n} \in S(g)\). By the integral of
simple functions and
the definition of \(\int g \, d\mu\) as a supremum over \(S(g)\),
\[
\tfrac{1}{n} \mu(A_n) \;=\; \int \tfrac{1}{n}\chi_{A_n} \, d\mu
\;\leq\; \int g \, d\mu \;=\; 0.
\]
Since \(\tfrac{1}{n} \mu(A_n) \in [0, \infty]\) and is bounded above by \(0\),
we conclude \(\mu(A_n) = 0\).
Finally, \(g(\omega) > 0\) holds iff \(g(\omega) \geq 1/n\) for some
\(n \in \mathbb{N}\) (by the Archimedean property for \(g(\omega) \in (0, \infty)\),
and trivially for \(g(\omega) = \infty\)), so
\(\{\omega : g(\omega) > 0\} = \bigcup_{n=1}^\infty A_n\). This is a countable
union of measurable sets, hence measurable, and by
\(\sigma\)-subadditivity
of \(\mu\),
\[
\mu(\{g > 0\}) \;\leq\; \sum_{n=1}^\infty \mu(A_n) \;=\; 0.
\]
Since \(g \geq 0\), we have \(\{g \neq 0\} = \{g > 0\}\), so \(\mu(\{g \neq 0\}) = 0\),
i.e., \(g = 0\) almost everywhere.
The Integral of General Measurable Functions
Consider a measurable function \(g: \Omega \to \overline{\mathbb{R}}\). Let
\[
A_+ = \{\omega \mid g(\omega) > 0\}, \qquad g_+ = g \cdot \chi_{A_+}
\]
and
\[
A_- = \{\omega \mid g(\omega) < 0\}, \qquad g_- = -g \cdot \chi_{A_-}
\]
Note that \(A_+\) and \(A_-\) are measurable sets, and \(g_+\), \(g_-\) are both nonnegative
(possibly extended-valued) measurable functions.
Then we have \(g = g_+ - g_-\) and define
\[
\int g \, d\mu = \int g_+ \, d\mu - \int g_- \, d\mu
\]
if we have both \(\int g_+ \, d\mu < \infty\) and \(\int g_- \, d\mu < \infty\).
Note: The definition implies there exists a function that is NOT Lebesgue integrable over some interval
— we see a concrete example below. First, however, we record how the Lebesgue integral relates to the ordinary Riemann integral
on a bounded interval.
Proposition: Riemann Integrable Implies Lebesgue Integrable
Let \(f: [a, b] \to \mathbb{R}\) be Riemann integrable on \([a, b]\). Then \(f\) is Lebesgue integrable on \([a, b]\)
(with respect to Lebesgue measure \(\lambda\)), and the two integrals agree:
\[
\int_{[a,b]} f \, d\lambda = \int_a^b f(x) \, dx.
\]
The left side is the Lebesgue integral of the measurable function \(f\) against Lebesgue measure;
the right side is the classical Riemann integral.
Sketch of proof.
Since \(f\) is Riemann integrable, it is bounded on \([a, b]\). Let \(\mathcal{P}_n\) be a sequence of partitions
with \(\|\mathcal{P}_n\| \to 0\), and set \(\mathcal{Q}_n = \mathcal{P}_1 \cup \cdots \cup \mathcal{P}_n\)
so that \(\mathcal{Q}_{n+1}\) refines \(\mathcal{Q}_n\) and \(\|\mathcal{Q}_n\| \to 0\) as well. Define the step functions
\[
u_n = \sum_i M_i^{(n)} \chi_{I_i^{(n)}}, \qquad \ell_n = \sum_i m_i^{(n)} \chi_{I_i^{(n)}}
\]
on \([a, b]\) using the sup \(M_i^{(n)}\) and inf \(m_i^{(n)}\) of \(f\) on the subintervals \(I_i^{(n)}\) of \(\mathcal{Q}_n\).
These are simple functions, and by the definition of the simple-function integral,
\[
\int_{[a,b]} \ell_n \, d\lambda = L(f, \mathcal{Q}_n), \qquad \int_{[a,b]} u_n \, d\lambda = U(f, \mathcal{Q}_n).
\]
Refinement forces \(\ell_n \uparrow \ell^*\) and \(u_n \downarrow u^*\) pointwise on \([a, b]\), with \(\ell^* \leq f \leq u^*\).
Both \(\ell^*\) and \(u^*\) are Borel-measurable as pointwise limits of simple functions.
By the monotone and dominated convergence theorems (applied respectively to \(\ell_n\) increasing and \(u_n\) decreasing,
with \(u_1 - \ell_1\) serving as a bounded integrable dominator),
\[
\int_{[a,b]} (u^* - \ell^*) \, d\lambda = \lim_{n \to \infty} \bigl( U(f, \mathcal{Q}_n) - L(f, \mathcal{Q}_n) \bigr) = 0,
\]
the last equality because \(f\) is Riemann integrable. Since \(u^* - \ell^* \geq 0\) with vanishing integral,
\(u^* = \ell^*\) almost everywhere, and therefore \(f = \ell^*\) a.e. This makes \(f\) Lebesgue-measurable: it is an a.e. limit
of Borel-measurable functions, and the completion of the Borel \(\sigma\)-algebra (discussed on the previous page)
absorbs the a.e. exception.
Finally, \(\int_{[a,b]} \ell_n \, d\lambda = L(f, \mathcal{Q}_n) \to \int_a^b f(x)\, dx\) by the definition
of the Riemann integral, while \(\int_{[a,b]} \ell_n \, d\lambda \to \int_{[a,b]} \ell^* \, d\lambda = \int_{[a,b]} f \, d\lambda\)
by monotone convergence. Equating the two limits yields the stated equality.
Note that some improperly Riemann integrable functions are not Lebesgue integrable
— the next example shows this failure explicitly.
Example: Sinc function over \([0, \infty)\)
Consider the Dirichlet integral:
\[
\int_0^{\infty} \frac{\sin x}{x} dx
\]
This is known to converge to \(\frac{\pi}{2}\):
\[
\begin{align*}
\int_0^{\infty} \frac{\sin x}{x} dx &= \lim_{b \to \infty} \int_0^b \frac{\sin x}{x} dx \\\\
&= \frac{\pi}{2}
\end{align*}
\]
So, \(f\) is improperly Riemann integrable on \([0, \infty)\).
On the other hand, in the Lebesgue sense:
\[
\int_0^{\infty} \frac{\sin x}{x} dx = \int_0^{\infty} \left(\frac{\sin x}{x}\right)_+ dx - \int_0^{\infty} \left(\frac{\sin x}{x}\right)_- dx
\]
\(\sin x\) is always positive on the interval \([2\pi n, 2\pi n + \pi]\), and then
\[
\begin{align*}
\int_0^{\infty} \left(\frac{\sin x}{x}\right)_+ dx &= \sum_{n=0}^{\infty} \int_{2\pi n}^{2\pi n + \pi} \frac{\sin x}{x} dx \\\\
&\geq \sum_{n=0}^{\infty} \int_{2\pi n}^{2\pi n + \pi} \frac{\sin x}{2 \pi n + \pi} dx \tag{**} \\\\
&= \sum_{n=0}^{\infty} \frac{2}{\pi(2n +1)} \\\\
&= \infty
\end{align*}
\]
and similarly, using the intervals \([2\pi n + \pi,\, 2\pi n + 2\pi]\) on which \(\sin x \leq 0\),
\[
\int_0^{\infty} \left(\frac{\sin x}{x}\right)_- dx = \infty.
\]
Thus, by the definition, this is NOT integrable in the Lebesgue sense.
** Since \(x \leq 2\pi n + \pi\) on this interval, we have \(\frac{1}{x} \geq \frac{1}{2\pi n + \pi}\).
This example reveals a profound insight: Lebesgue integrability is more restrictive than improper Riemann integrability.
The sinc function's integral converges in the Riemann sense due to careful cancellation between positive and negative parts.
However, Lebesgue integration demands that both \(g_+\) and \(g_-\) be individually finite - there's no room for
"conditional convergence" based on cancellation.
This restriction is actually a feature, not a limitation. It ensures that Lebesgue integration behaves well under
fundamental operations like taking limits and changing the order of integration. The trade-off is clear: we gain substantial
theoretical power (handling highly discontinuous functions like the Dirichlet function) at the cost of excluding some
conditionally convergent improper integrals.
The Foundation Layer for ML and Geometry
With the definition of Lebesgue integration complete, we have established a rigorous framework that:
- Handles functions that are "impossible" for Riemann integration (e.g., Dirichlet function).
- Formalizes the notion of "almost everywhere" to ignore sets of measure zero.
- Provides the foundation for functional analysis and the \(L^p\) spaces where our models reside.
- Enables powerful convergence theorems — Monotone Convergence, Fatou's Lemma, and Dominated Convergence
— which we state and prove formally in the upcoming \(L^p\) spaces chapter.
Modern machine learning rests on this framework indirectly. Expected losses
\(\mathbb{E}_{(x,y) \sim P}[\ell(f_\theta(x), y)] = \int \ell(f_\theta(x), y) \, dP(x, y)\)
are Lebesgue integrals against an unknown data distribution \(P\); the empirical risk
\(\frac{1}{n} \sum_i \ell(f_\theta(x_i), y_i)\) used in practice is a finite-sample
estimate justified by the law of large numbers. Similarly, the Fisher information matrix
and natural gradient of information geometry are defined as integrals against a parametric
density, but estimated in code from sampled data. The grid-free character of Lebesgue
integration — content is assigned to arbitrary measurable sets, not only to intervals —
is what permits the extension to manifolds, where local charts pull back to \(\mathbb{R}^n\).
This is the foundation on which integration on curved spaces and
matrix Lie groups
(such as \(SO(3)\) and \(SE(3)\), used in robotics and equivariant networks) is built.
The recurring pattern is that the rigorous Lebesgue object is what makes the practical
finite-sample computation an approximation of something well-defined. The
broader question of how directly future ML and robotics systems should invoke these
constructions — and whether the present rigor gap contributes to known instabilities —
is one this curriculum revisits in
Conditional Expectation.