Abstract Integration
Lebesgue integration is sometimes referred to as abstract integration. The reason is that the
Lebesgue integral is developed in the very general setting of a measure space \((\Omega, \mathcal{F}, \mu)\).
We want to define the integral:
\[
\int_{\Omega} g(\omega) \, d\mu(\omega)
\]
of a measurable function \(g: \Omega \to \overline{\mathbb{R}}\) defined on a measure space \((\Omega, \mathcal{F}, \mu)\).
Note: \(\overline{\mathbb{R}}\) refers to the extended set of real values, which includes \(\infty\) and \(-\infty\).
Let's check some special cases:
If the measure space is a probability space:\((\Omega, \mathcal{F}, \mathbb{P})\), and \(X: \Omega \to \bar{\mathbb{R}}\) is
measurable, which means \(X\) is an extended-valued random variable, then the integral is the expectation of \(X\):
\[
\int_{\Omega} X \, d\mathbb{P} = \mathbb{E }(X).
\]
If we define the measure space as \((\mathbb{R}, \mathcal{B}, \lambda)\) where \(\mathcal{B}\) is the Borel
\(\sigma\)-algebra and \(\lambda\) is the Lebesgue measure, then the integral is a generalization of the usual integral
encountered in calculus:
\[
\int_{\mathbb{R}} g \, d\lambda = \int g(x) dx.
\]
So, even if it isn't always stated explicitly, many of the integrals we have encountered - whether in the context of computing
expectations in probability theory or evaluating integrals on the real line - can be defined using Lebesgue integration.
This approach is the core of modern analysis and probability, providing a robust framework for handling functions that may
be too irregular or complex for the Riemann integration.
One of the key advantages of this abstract approach is that the Lebesgue integral is insensitive to what happens on sets
of measure zero. This allows us to integrate highly discontinuous functions, such as the Dirichlet function,
which the Riemann integral fails to handle.
Characteristic function
Before we define the Lebesgue integral in full generality, we need a fundamental tool that allows us to
"isolate" or "focus on" specific regions of our measure space. This is where the characteristic function
comes in.
The characteristic function is deceptively simple — it's essentially a mathematical "switch" that turns a function
"on" over a specific set and "off" everywhere else. Despite its simplicity, this concept is crucial for building
up the Lebesgue integral from basic building blocks to arbitrarily complex measurable functions.
Definition: Characteristic Function (Indicator Function)
The characteristic function (or indicator function) of a set \(B\), denoted
\(\chi_B\) or \(1_B\), is defined as:
\[
\chi_B(\omega) = \begin{cases}
1 & \text{if \(\omega \in B\)}\\
0 & \text{if \(\omega \notin B\)}.
\end{cases}
\]
The integral over a measurable subset \(B\) of \(g\) is then defined by
\[
\int_B g \, d\mu = \int (\chi_B \, g)d\mu
\]
where
\[
(\chi_B \, g) (\omega) = \begin{cases}
g(\omega) & \text{if \(\omega \in B\)}\\
0 & \text{if \(\omega \notin B\)}.
\end{cases}
\]
We will use the term "almost everywhere"(a.e.) to mean "for all \(\omega\) outside
a zero-measure subset of \(\Omega\). So, we "ignore" a set of \(\omega\)'s that has measure zero.
For the special case of probability measure, we also use "almost surely"(a.s.).
For example,
\[
g_n \uparrow g, \, a.e.
\]
means that the increasing monotonic convergence of \(g_n(\omega)\) to \(g(\omega)\) holds \(\forall \omega \in \Omega\)
outside a zero-measure set.
The Integral of Finite Nonnegative Measurable Functions
Definition: Simple Function
A function \(g: \Omega \to \mathbb{R}\) is said to be simple if it
is measurable, finite, and takes only finitely many different values.
If \(g\) is a simple function of the form:
\[
g(\omega) = \sum_{i=1}^k a_i \chi_{A_i}(\omega), \quad \forall \omega \in \Omega
\]
where \(k\) is a positive integer, \(a_i \in \mathbb{R}\), and \(A_i\) are
pairwise disjoint measurable sets, then its integral is defined by
\[
\int g \, d\mu = \sum_{i=1}^k a_i \mu(A_i).
\]
Convention: We adopt the standard measure-theoretic convention that \(0 \cdot \infty = 0\).
This ensures that \(a_i \mu(A_i) = 0\) whenever \(a_i = 0\), even if \(\mu(A_i) = \infty\).
Example: Dirichlet function
Finally, we can compute the integral of the Dirichlet function on the interval
on the interval \([0, 1]\):
\[
f(x)=
\begin{cases}
1 &\text{if \(x \in \mathbb{Q}\)} \\
0 &\text{if \(x \in \mathbb{R} \setminus \mathbb{Q}\)}
\end{cases}
\]
The Dirichlet function is a simple function since it only takes on the two values 0 and 1. By the definiton:
\[
\begin{align*}
\int_0^1 f(x)dx &= \int_0^1 \chi_{\mathbb{Q}}(x) dx\\\\
&= 1 \cdot \mu([0, 1] \cap \mathbb{Q}) + 0 \cdot \mu([0, 1] \setminus \mathbb{Q}) \\\\
&= 1 \cdot 0 + 0 \cdot (1 - 0) \\\\
&= 0.
\end{align*}
\]
So, the Dirichlet function is 0 almost everywhere. Even though there are infinitely many rational numbers
where the function is 1, their "weight" (measure) in the continuum of real numbers is zero.
Note: Technically, for any interval \([a, b]\), the Lebesgue integral of the Dirichlet function is 0
because the Lebesgue measure of any countable set is zero.
Insight: "Generally Speaking"
In the real world, strict logical implications like \(P \implies Q\) are incredibly rare.
When we say "Generally, A implies B," we are often mentally suppressing thousands of lines
of "exception handling" code.
In measure theory, we can formalize this "generally" using the concept of
"almost everywhere" (a.e.). Instead of demanding that a property holds for
every single point (which is often too brittle for complex systems), we allow for a
set of exceptions, provided that their measure is zero.
As seen in the Dirichlet function example above, even though there are
infinitely many points where the function is 1, they have zero "weight" in the eyes of the
Lebesgue integral.
By using a.e., we gain a powerful way to "compress" information without losing
logical rigor. We aren't just being vague; we are mathematically proving that the exceptions don't
affect the overall "structure" or "integral" of the system.
When measure theory is applied to probability, "almost everywhere" is referred to as "almost surely" (a.s.).
This distinction is vital for understanding how we handle "impossible" events.
Consider flipping a fair coin infinitely many times. From a set-theoretic perspective, a sequence of
"all heads" (H, H, H, ...) is a "valid" element of the sample space \(\Omega\). It is logically "possible" in the sense
that the set is not empty.
However, from a measure-theoretic perspective, the probability measure assigned to this specific singleton set is
exactly zero.
This is why the Law of Large Numbers states that the mean "converges" to \(1/2\) almost surely. It acknowledges that
while non-convergent sequences exist as mathematical objects, their total "weight" is zero, allowing us to treat the convergence
as a certainty in any functional system.
The Integral of Nonnegative Measurable Functions
We approximate the integral of a nonnegative function \(g\) using simple functions.
For a nonnegative (extended-valued) measurable function \(g: \Omega \to [0, \infty]\), we let \(S(g)\) be the
set of all nonnegative simple (hence automatically measurable) functions \(q\) that satisfy \(0 \leq q \leq g\), and
define
\[
\int g \, d\mu = \sup_{q \in S(g)} \int q \, d\mu.
\]
The Integral of General Measurable Functions
Consider a measurable function \(g: \Omega \to \overline{\mathbb{R}}\). Let
\[
A_+ = \{\omega \mid g(\omega) > 0\}, \qquad g_+ = g \cdot \chi_{A_+}
\]
and
\[
A_- = \{\omega \mid g(\omega) < 0\}, \qquad g_- = -g \cdot \chi_{A_-}
\]
Note that \(A_+\) and \(A_-\) are measurable sets, and \(g_+\), \(g_-\) are both nonnegative
(possibly extended-valued) measurable functions.
Then we have \(g = g_+ - g_-\) and define
\[
\int g \, d\mu = \int g_+ \, d\mu - \int g_- \, d\mu
\]
if we have both \(\int g_+ \, d\mu < \infty\) and \(\int g_- \, d\mu < \infty\).
Note: The definition implies there exists a function that is NOT Lebesgue integrable over some interval. In general,
if a function is Riemann integrable, the function is also Lebesgue integrable and
\[
\int_{[a, b]} f(x) dx = \int_a^b f(x) dx
\]
but some improperly Riemann integrable functions are not Lebesgue integrable.
Example: Sinc function over \([0, \infty)\)
Consider the Dirichlet integral:
\[
\int_0^{\infty} \frac{\sin x}{x} dx
\]
This is known to converge to \(\frac{\pi}{2}\):
\[
\begin{align*}
\int_0^{\infty} \frac{\sin x}{x} dx &= \lim_{b \to \infty} \int_0^b \frac{\sin x}{x} dx \\\\
&= \frac{\pi}{2}
\end{align*}
\]
So, \(f\) is improperly Riemann integrable on \([0, \infty)\).
On the other hand, in the Lebesgue sense:
\[
\int_0^{\infty} \frac{\sin x}{x} dx = \int_0^{\infty} \left(\frac{\sin x}{x}\right)_+ dx - \int_0^{\infty} \left(\frac{\sin x}{x}\right)_- dx
\]
\(\sin x\) is always positive on the interval \([2\pi n, 2\pi n + \pi]\), and then
\[
\begin{align*}
\int_0^{\infty} \left(\frac{\sin x}{x}\right)_+ dx &= \sum_{n=0}^{\infty} \int_{2\pi n}^{2\pi n + \pi} \frac{\sin x}{x} dx \\\\
&\geq \sum_{n=0}^{\infty} \int_{2\pi n}^{2\pi n + \pi} \frac{\sin x}{2 \pi n + \pi} dx \tag{**} \\\\
&= \sum_{n=0}^{\infty} \frac{2}{\pi(2n +1)} \\\\
&= \infty
\end{align*}
\]
and similarly, \(\int_0^{\infty} \left(\frac{\sin x}{x}\right)_- dx = \infty\).
Thus, by the definition, this is NOT integrable in the Lebesgue sense.
** Since \(x \leq 2\pi n + \pi\) on this interval, we have \(\frac{1}{x} \geq \frac{1}{2\pi n + \pi}\).
This example reveals a profound insight: Lebesgue integrability is more restrictive than improper Riemann integrability.
The sinc function's integral converges in the Riemann sense due to careful cancellation between positive and negative parts.
However, Lebesgue integration demands that both \(g_+\) and \(g_-\) be individually finite - there's no room for
"conditional convergence" based on cancellation.
This restriction is actually a feature, not a limitation. It ensures that Lebesgue integration behaves well under
fundamental operations like taking limits and changing the order of integration. The trade-off is clear: we gain extraordinary
theoretical power (handling highly discontinuous functions like the Dirichlet function) at the cost of excluding some
conditionally convergent improper integrals.
The Blueprint of Modern AI and Geometry
With the definition of Lebesgue integration complete, we have established a rigorous framework that:
- Handles functions that are "impossible" for Riemann integration (e.g., Dirichlet function).
- Formalizes the notion of "almost everywhere" to ignore sets of measure zero.
- Provides the foundation for functional analysis and the \(L^p\) spaces where our models reside.
- Enables powerful convergence theorems (Monotone, Dominated, Fatou) essential for justifying stochastic algorithms.
This is the essential language of modern Machine Learning. It allows us to calculate expectations over continuous,
high-dimensional spaces and ensures the stability of algorithms that power today's AI.
Furthermore, this perspective is the key to moving beyond flat Euclidean spaces. By decoupling integration from rigid grids,
we can extend these principles to manifolds and continuous symmetries like \(SE(3)\).
Whether you are analyzing a robot's orientation or the information geometry of a probability manifold,
the Lebesgue framework ensures that your "volume" and "structure" remain consistent under transformation.