In Intro to Functional Analysis,
we introduced the \(L^p\) spaces as the most important examples of Banach spaces in
analysis and machine learning. We stated that \(\|f\|_p = \bigl(\int |f|^p \, d\mu\bigr)^{1/p}\)
defines a norm, that the resulting space is complete, and that the dual of \(L^p\) is \(L^q\)
(where \(1/p + 1/q = 1\)). In the
Dual Spaces chapter,
we relied on Hölder's inequality — without proof — to justify the pairing
\(\varphi_g(f) = \int fg \, d\mu\) in the duality table.
All three of these claims were left unproven. This chapter settles the debt in full:
- \(\|\cdot\|_p\) is genuinely a norm — not merely a seminorm — once we
pass to equivalence classes of functions equal
almost everywhere.
- The triangle inequality for this norm (Minkowski's inequality) follows from
Hölder's inequality, which in turn follows from the elementary Young's inequality.
- The resulting normed space is complete: the Riesz-Fischer theorem
guarantees that every Cauchy sequence in \(L^p\) converges in \(L^p\).
The chain of implications is: Young → Hölder → Minkowski → Norm → Riesz-Fischer → Banach space.
Each link depends logically on the preceding one. We follow this chain from beginning to end.
From Functions to Equivalence Classes
Let \((\Omega, \mathcal{F}, \mu)\) be a
measure space.
In Intro to Functional Analysis,
we defined the \(L^p\) space as the collection of measurable functions \(f\) with
\(\int |f|^p \, d\mu < \infty\), equipped with the quantity
\[
\|f\|_p \;=\; \left( \int_\Omega |f|^p \, d\mu \right)^{1/p}.
\]
For this to be a valid norm, three axioms must hold:
(i) \(\|f\|_p \geq 0\) with equality if and only if \(f = 0\),
(ii) \(\|\alpha f\|_p = |\alpha| \cdot \|f\|_p\), and
(iii) \(\|f + g\|_p \leq \|f\|_p + \|g\|_p\).
Axiom (ii) is straightforward from the linearity of the integral. But axiom (i) immediately
presents a problem.
The Seminorm Problem
Suppose \(\|f\|_p = 0\). Then \(\int |f|^p \, d\mu = 0\), and since \(|f|^p \geq 0\),
this forces \(|f(x)|^p = 0\) for
almost every \(x \in \Omega\) — that is,
\(f(x) = 0\) except on a set of
measure zero.
But this does not mean \(f\) is the zero function. It means only that \(f = 0\)
almost everywhere (a.e.).
For a concrete example, recall the Dirichlet function from
Lebesgue Integration:
\[
f(x) = \chi_{\mathbb{Q}}(x) =
\begin{cases}
1 & \text{if } x \in \mathbb{Q}, \\
0 & \text{if } x \notin \mathbb{Q}.
\end{cases}
\]
Since \(\mu(\mathbb{Q}) = 0\), we have \(\|f\|_p = 0\) for every \(1 \leq p \leq \infty\),
yet \(f\) is not the zero function — it equals \(1\) at every rational point. The quantity
\(\|\cdot\|_p\) therefore fails to distinguish \(f\) from the zero function.
In the language of normed space theory, \(\|\cdot\|_p\) is a seminorm,
not a norm: it satisfies \(\|f\|_p = 0\) without \(f = 0\).
The Equivalence Relation
The resolution is a standard algebraic maneuver: we quotient out the
ambiguity. We declare two measurable functions to be "the same" if they differ only
on a negligible set.
Definition: Equality Almost Everywhere
Let \(f, g : \Omega \to \mathbb{F}\) (\(\mathbb{F} = \mathbb{R}\) or \(\mathbb{C}\))
be measurable functions. We say \(f\) and \(g\) are
equal almost everywhere, written \(f = g\) a.e., if
\[
\mu\bigl(\{x \in \Omega : f(x) \neq g(x)\}\bigr) = 0.
\]
The relation \(f \sim g \iff f = g\) a.e. is an equivalence relation on
the set of measurable functions.
Verification:
We verify the three axioms of an equivalence relation.
Reflexivity: \(\{x : f(x) \neq f(x)\} = \emptyset\), which has measure zero.
Symmetry: \(\{x : f(x) \neq g(x)\} = \{x : g(x) \neq f(x)\}\),
so \(\mu\) of the two sets is the same.
Transitivity: Suppose \(f = g\) a.e. and \(g = h\) a.e.
Let \(N_1 = \{x : f(x) \neq g(x)\}\) and \(N_2 = \{x : g(x) \neq h(x)\}\).
If \(f(x) \neq h(x)\), then either \(f(x) \neq g(x)\) or \(g(x) \neq h(x)\), so
\(\{x : f(x) \neq h(x)\} \subseteq N_1 \cup N_2\). Since
\(\sigma\)-algebras are closed under
finite unions, \(N_1 \cup N_2\) is measurable, and by the
subadditivity of measures,
\(\mu(N_1 \cup N_2) \leq \mu(N_1) + \mu(N_2) = 0\).
The Formal Definition of \(L^p\)
Definition: The \(L^p\) Space (Rigorous)
Let \((\Omega, \mathcal{F}, \mu)\) be a measure space and \(1 \leq p < \infty\). Define
\[
\mathscr{L}^p(\Omega, \mathcal{F}, \mu) \;=\;
\bigl\{\, f : \Omega \to \mathbb{F} \;\big|\; f \text{ is measurable and }
\textstyle\int_\Omega |f|^p \, d\mu < \infty \,\bigr\}.
\]
The \(L^p\) space is the quotient
\[
L^p(\Omega, \mathcal{F}, \mu) \;=\; \mathscr{L}^p(\Omega, \mathcal{F}, \mu) \,\big/\!\sim
\]
where \(f \sim g \iff f = g\) a.e. Each element of \(L^p\) is an
equivalence class \([f]\) of functions that agree almost everywhere.
Following universal convention, we write \(f \in L^p\) rather than \([f] \in L^p\),
understanding that "\(f\)" refers to the equivalence class and not to any particular
representative. This notational abuse is harmless because all quantities we care
about — the norm \(\|f\|_p\), integrals \(\int fg \, d\mu\), and convergence statements —
are invariant under modification on sets of measure zero.
When we write \(f = 0\) in \(L^p\), we mean \(f(x) = 0\) for \(\mu\)-almost every \(x\).
We similarly define \(L^\infty(\Omega, \mathcal{F}, \mu)\) as the space of equivalence classes
of essentially bounded measurable functions, equipped with the
essential supremum norm
\(\|f\|_\infty = \inf\{C \geq 0 : |f(x)| \leq C \text{ a.e.}\}\).
Connection to Sequence Spaces
The sequence spaces \(\ell^p\) introduced in
Intro to Functional Analysis
are a special case of \(L^p\). If we take \(\Omega = \mathbb{N}\),
\(\mathcal{F} = 2^{\mathbb{N}}\) (all subsets), and \(\mu\) = counting measure
(\(\mu(\{n\}) = 1\) for each \(n\)), then \(L^p(\mathbb{N}, \mu)\) is exactly \(\ell^p\).
In this setting, the equivalence class issue is trivial — since every singleton
\(\{n\}\) has positive measure, two sequences are equal a.e. if and only if they
are identical. Every theorem we prove for \(L^p\) in this chapter therefore
specializes to \(\ell^p\) automatically.
With the quotient construction in hand, the seminorm \(\|\cdot\|_p\) on \(\mathscr{L}^p\)
descends to a genuine norm on \(L^p\): if \(\|[f]\|_p = 0\), then \(f = 0\) a.e.,
which means \([f]\) is the zero element of the quotient space.
The remaining norm axiom — the triangle inequality \(\|f + g\|_p \leq \|f\|_p + \|g\|_p\) —
is the content of Minkowski's inequality, which requires
Hölder's inequality as an intermediate step.
We turn to these now.
Young's & Hölder's Inequality
The proof chain begins with an elementary inequality about real numbers,
which we then "integrate" to obtain the central inequality of \(L^p\) theory.
Hölder Conjugates
Throughout this section, we fix an exponent \(1 < p < \infty\) and define its
Hölder conjugate \(q\) by the relation
\[
\frac{1}{p} + \frac{1}{q} = 1,
\qquad \text{equivalently} \quad q = \frac{p}{p - 1}.
\]
The pair \((p, q)\) is symmetric: the conjugate of \(q\) is \(p\).
The extreme cases are \(p = 1, q = \infty\) and \(p = \infty, q = 1\), which we
treat separately where needed. Note also that \(p = q = 2\) is the unique self-conjugate case —
the setting of Hilbert spaces
and the Cauchy-Schwarz inequality.
The algebraic identity \((p - 1)q = p\) (equivalently, \(p/q = p - 1\)) will appear
repeatedly in the proofs below. It is worth verifying once: from \(1/q = 1 - 1/p = (p-1)/p\),
we get \(q = p/(p-1)\), so \((p-1)q = p\).
Young's Inequality
Theorem: Young's Inequality
Let \(1 < p < \infty\) and let \(q\) be its Hölder conjugate. For all
\(a, b \geq 0\),
\[
ab \;\leq\; \frac{a^p}{p} + \frac{b^q}{q}.
\]
Equality holds if and only if \(a^p = b^q\).
Proof:
If \(a = 0\) or \(b = 0\), both sides are zero and the inequality is trivial.
Assume \(a, b > 0\).
The key observation is that the exponential function \(t \mapsto e^t\) is convex.
Equivalently, the logarithm \(t \mapsto \log t\) is concave on \((0, \infty)\).
By the definition of concavity, for any \(\lambda \in (0, 1)\) and \(u, v > 0\),
\[
\log\bigl(\lambda u + (1 - \lambda) v\bigr) \;\geq\; \lambda \log u + (1 - \lambda) \log v
\;=\; \log\bigl(u^\lambda v^{1-\lambda}\bigr).
\]
Since \(\log\) is increasing, this gives the
weighted AM-GM inequality:
\[
u^\lambda \, v^{1 - \lambda} \;\leq\; \lambda u + (1 - \lambda) v.
\]
Now set \(\lambda = 1/p\), \(1 - \lambda = 1/q\), \(u = a^p\), and \(v = b^q\):
\[
(a^p)^{1/p} (b^q)^{1/q} \;\leq\; \frac{a^p}{p} + \frac{b^q}{q},
\]
which simplifies to \(ab \leq \frac{a^p}{p} + \frac{b^q}{q}\).
Since \(\log\) is strictly concave, equality holds if and only if \(u = v\), i.e., \(a^p = b^q\).
Young's inequality is a pointwise statement about real numbers.
Its power emerges when we "integrate both sides" — which is exactly what happens
in the proof of Hölder's inequality.
Hölder's Inequality
Theorem: Hölder's Inequality
Let \(1 \leq p \leq \infty\) and let \(q\) be its Hölder conjugate.
If \(f \in L^p\) and \(g \in L^q\), then \(fg \in L^1\) and
\[
\|fg\|_1 \;=\; \int_\Omega |f g| \, d\mu
\;\leq\; \|f\|_p \, \|g\|_q.
\]
Proof:
Case \(p = 1, q = \infty\):
By the definition of the essential supremum, \(|g(x)| \leq \|g\|_\infty\)
for a.e. \(x\). Therefore \(|f(x) g(x)| \leq |f(x)| \cdot \|g\|_\infty\) a.e.
Integrating both sides gives \(\int |fg| \, d\mu \leq \|g\|_\infty \int |f| \, d\mu
= \|f\|_1 \|g\|_\infty\).
Case \(1 < p < \infty\):
If \(\|f\|_p = 0\) or \(\|g\|_q = 0\), then \(f = 0\) a.e. or \(g = 0\) a.e.,
so \(fg = 0\) a.e. and both sides are zero. Assume \(\|f\|_p > 0\) and \(\|g\|_q > 0\).
Normalization. Define
\[
\tilde{f} = \frac{|f|}{\|f\|_p}, \qquad
\tilde{g} = \frac{|g|}{\|g\|_q}.
\]
Then \(\|\tilde{f}\|_p = 1\) and \(\|\tilde{g}\|_q = 1\).
Applying Young's inequality pointwise:
\[
\tilde{f}(x) \, \tilde{g}(x)
\;\leq\; \frac{\tilde{f}(x)^p}{p} + \frac{\tilde{g}(x)^q}{q}
\quad \text{for all } x.
\]
Integrating over \(\Omega\):
\[
\int_\Omega \tilde{f} \, \tilde{g} \, d\mu
\;\leq\; \frac{1}{p} \int_\Omega \tilde{f}^p \, d\mu
+ \frac{1}{q} \int_\Omega \tilde{g}^q \, d\mu
\;=\; \frac{1}{p} + \frac{1}{q} \;=\; 1.
\]
Substituting back \(\tilde{f} = |f|/\|f\|_p\) and \(\tilde{g} = |g|/\|g\|_q\):
\[
\frac{1}{\|f\|_p \, \|g\|_q} \int_\Omega |f g| \, d\mu \;\leq\; 1,
\]
which gives \(\int |fg| \, d\mu \leq \|f\|_p \, \|g\|_q\) as claimed.
The normalization trick is the heart of the proof: by scaling \(f\) and \(g\) to have
unit norm, we reduce a statement about integrals to a pointwise application of Young's
inequality, and the "integration" step becomes trivial.
Equality Conditions and Special Cases
Equality in Hölder's inequality holds if and only if
\(|f(x)|^p / \|f\|_p^p = |g(x)|^q / \|g\|_q^q\) for a.e. \(x\),
i.e., \(|f|^p\) and \(|g|^q\) are proportional as functions. In other words,
there exist constants \(\alpha, \beta \geq 0\) (not both zero) such that
\(\alpha |f(x)|^p = \beta |g(x)|^q\) a.e.
When \(p = q = 2\), Hölder's inequality reduces to:
\[
\int_\Omega |fg| \, d\mu \;\leq\; \|f\|_2 \, \|g\|_2.
\]
This is precisely the Cauchy-Schwarz inequality for \(L^2\),
the infinite-dimensional generalization of the bound \(|\mathbf{u} \cdot \mathbf{v}|
\leq \|\mathbf{u}\| \, \|\mathbf{v}\|\) from
Hilbert spaces.
Hölder's inequality thus provides a single, unified framework that encompasses
Cauchy-Schwarz as its most symmetric special case.
The One Direction of \(L^p\) Duality
Hölder's inequality immediately settles one half of the duality claim from
Dual Spaces.
For any fixed \(g \in L^q\), the map
\[
\varphi_g : L^p \to \mathbb{F}, \qquad \varphi_g(f) = \int_\Omega f \, g \, d\mu
\]
is a well-defined, linear, and bounded functional on \(L^p\), with
\(\|\varphi_g\|_{(L^p)^*} \leq \|g\|_q\).
(One can show that equality holds, so the embedding \(L^q \hookrightarrow (L^p)^*\)
is isometric.) This establishes the "easy direction" of the isomorphism
\((L^p)^* \cong L^q\) for \(1 \leq p < \infty\):
every element of \(L^q\) gives rise to a continuous functional on \(L^p\).
The converse — that every continuous functional on \(L^p\) arises from
some \(g \in L^q\) — is substantially harder and requires the
Radon-Nikodym theorem, a tool from measure theory that we will
develop after the measure-theoretic probability foundations are in place.
Minkowski's Inequality & the \(L^p\) Norm
We now use Hölder's inequality to prove the triangle inequality for \(\|\cdot\|_p\),
completing the verification that \(L^p\) is a normed space.
Theorem: Minkowski's Inequality
Let \(1 \leq p \leq \infty\). If \(f, g \in L^p\), then \(f + g \in L^p\) and
\[
\|f + g\|_p \;\leq\; \|f\|_p + \|g\|_p.
\]
Proof:
Case \(p = 1\):
By the pointwise triangle inequality, \(|f(x) + g(x)| \leq |f(x)| + |g(x)|\).
Integrating directly gives
\(\|f + g\|_1 \leq \|f\|_1 + \|g\|_1\).
Case \(p = \infty\):
We have \(|f(x) + g(x)| \leq |f(x)| + |g(x)| \leq \|f\|_\infty + \|g\|_\infty\)
for a.e. \(x\). Taking the essential supremum on the left gives the result.
Case \(1 < p < \infty\):
First, observe that \(f + g \in L^p\): since
\(|f(x) + g(x)| \leq |f(x)| + |g(x)| \leq 2\max(|f(x)|, |g(x)|)\),
we have
\[
|f + g|^p \;\leq\; 2^p \max(|f|^p,\, |g|^p) \;\leq\; 2^p(|f|^p + |g|^p).
\]
Integrating gives
\(\int |f+g|^p \, d\mu \leq 2^p(\|f\|_p^p + \|g\|_p^p) < \infty\),
so \(\|f+g\|_p\) is finite.
Now assume \(\|f + g\|_p > 0\) (otherwise the inequality is trivial).
We begin by splitting \(|f + g|^p\):
\[
|f + g|^p \;=\; |f + g|^{p-1} \cdot |f + g|
\;\leq\; |f + g|^{p-1} |f| \;+\; |f + g|^{p-1} |g|.
\]
We now apply Hölder's inequality to each term on the right.
The key observation is that \(|f + g|^{p-1} \in L^q\), because
\[
\int_\Omega \bigl(|f + g|^{p-1}\bigr)^q \, d\mu
\;=\; \int_\Omega |f + g|^{(p-1)q} \, d\mu
\;=\; \int_\Omega |f + g|^{p} \, d\mu
\;=\; \|f + g\|_p^p,
\]
where we used the identity \((p - 1)q = p\). Therefore
\(\bigl\| |f+g|^{p-1} \bigr\|_q = \|f+g\|_p^{p/q}\).
Applying Hölder's inequality to each term:
\[
\int |f + g|^{p-1} |f| \, d\mu
\;\leq\; \bigl\||f+g|^{p-1}\bigr\|_q \cdot \|f\|_p
\;=\; \|f+g\|_p^{p/q} \cdot \|f\|_p,
\]
and similarly for the term with \(|g|\). Adding:
\[
\|f + g\|_p^p
\;=\; \int |f+g|^p \, d\mu
\;\leq\; \|f+g\|_p^{p/q} \bigl(\|f\|_p + \|g\|_p\bigr).
\]
Dividing both sides by \(\|f+g\|_p^{p/q} > 0\):
\[
\|f + g\|_p^{p - p/q} \;\leq\; \|f\|_p + \|g\|_p.
\]
Since \(p - p/q = p(1 - 1/q) = p \cdot (1/p) = 1\), the left side is simply
\(\|f + g\|_p\), completing the proof.
Equality condition (\(1 < p < \infty\)):
Equality in Minkowski's inequality holds if and only if \(f\) and \(g\) are
positively linearly dependent a.e. — that is, there exist
constants \(\alpha, \beta \geq 0\) (not both zero) such that
\(\alpha f(x) = \beta g(x)\) for a.e. \(x\). This follows from the
equality condition of Hölder's inequality applied in the proof above.
In the language of geometry, Minkowski's inequality
is strict whenever \(f\) and \(g\) point in genuinely different "directions"
in \(L^p\) — a manifestation of the strict convexity of the
\(L^p\) norm for \(1 < p < \infty\).
\(L^p\) Is a Normed Space
Let us now summarize the complete verification of the norm axioms
for \(\|\cdot\|_p\) on \(L^p(\Omega, \mathcal{F}, \mu)\):
- Positive definiteness: \(\|f\|_p \geq 0\), and
\(\|f\|_p = 0 \iff f = 0\) in \(L^p\) (i.e., \(f = 0\) a.e.).
This is where the equivalence class construction of
§1 is essential — without it,
\(\|\cdot\|_p\) would only be a seminorm.
- Absolute homogeneity:
\(\|\alpha f\|_p = |\alpha| \cdot \|f\|_p\) for all \(\alpha \in \mathbb{F}\).
This follows immediately from \(\int |\alpha f|^p = |\alpha|^p \int |f|^p\).
- Triangle inequality:
\(\|f + g\|_p \leq \|f\|_p + \|g\|_p\).
This is Minkowski's inequality, proven above.
Therefore \(\bigl(L^p(\Omega, \mathcal{F}, \mu),\, \|\cdot\|_p\bigr)\) is a
normed vector space for every \(1 \leq p \leq \infty\).
The remaining question — the deepest one — is whether this normed space is
complete: does every
Cauchy sequence in \(L^p\) converge to a limit that is itself in \(L^p\)?
This is the content of the Riesz-Fischer theorem, to which
we turn next.
The Riesz-Fischer Theorem
We have established that \(L^p\) is a normed space. Completeness — the property
that every Cauchy sequence converges within the space — is what elevates a normed
space to a Banach space.
In Completeness, we studied
this property for general metric spaces. Now we prove it concretely for \(L^p\).
The proof relies on three fundamental convergence theorems from
Lebesgue integration.
We state them precisely here for reference, as they are the essential
tools of the argument.
Toolkit from Lebesgue Integration
The following three theorems govern the interchange of limits and integrals.
They were introduced conceptually in
Lebesgue Integration;
we now state them in the precise form needed for the completeness proof.
All functions below are measurable on a measure space
\((\Omega, \mathcal{F}, \mu)\).
Theorem: Monotone Convergence Theorem (MCT)
Let \((g_n)\) be a sequence of measurable functions satisfying
\(0 \leq g_1(x) \leq g_2(x) \leq \cdots\) for a.e. \(x \in \Omega\).
Define \(g(x) = \lim_{n \to \infty} g_n(x)\) (which exists in \([0, \infty]\)
by monotonicity). Then
\[
\int_\Omega g \, d\mu \;=\; \lim_{n \to \infty} \int_\Omega g_n \, d\mu.
\]
In words: for nonnegative increasing sequences, the integral of the limit
equals the limit of the integrals.
Theorem: Fatou's Lemma
Let \((g_n)\) be a sequence of measurable functions satisfying
\(g_n(x) \geq 0\) for a.e. \(x\). Then
\[
\int_\Omega \liminf_{n \to \infty} g_n \, d\mu
\;\leq\; \liminf_{n \to \infty} \int_\Omega g_n \, d\mu.
\]
In words: the integral of the \(\liminf\) is bounded above by the
\(\liminf\) of the integrals. The inequality can be strict — passing limits
through integrals can "lose mass."
Theorem: Dominated Convergence Theorem (DCT)
Let \((f_n)\) be a sequence of measurable functions such that
\(f_n(x) \to f(x)\) for a.e. \(x\). Suppose there exists a
dominating function \(h \in L^1(\mu)\) with
\(|f_n(x)| \leq h(x)\) for a.e. \(x\) and all \(n\). Then
\(f \in L^1(\mu)\) and
\[
\lim_{n \to \infty} \int_\Omega f_n \, d\mu \;=\; \int_\Omega f \, d\mu.
\]
In words: under pointwise convergence with a uniform integrable bound,
limits and integrals commute.
The MCT requires monotonicity but imposes no integrability bound — it even allows
the limit to be infinite. Fatou's lemma relaxes monotonicity to mere nonnegativity,
at the cost of an inequality rather than equality. The DCT trades the nonnegativity
assumption for a dominating function, recovering full equality. Together, these
three tools form the backbone of measure-theoretic analysis.
The Theorem
Theorem: Riesz-Fischer
For \(1 \leq p \leq \infty\), the space
\(L^p(\Omega, \mathcal{F}, \mu)\) is complete.
That is, \(L^p\) is a Banach space.
The full proof for \(1 \leq p < \infty\) is a beautiful application of the
convergence theorems above. Rather than present every detail, we expose the
architecture of the proof — the strategy and key steps — so that
the logical structure is transparent. This architecture reappears in virtually
every completeness proof in modern analysis (Sobolev spaces, Besov spaces,
Hardy spaces), making it one of the most important proof patterns to internalize.
Proof Architecture (\(1 \leq p < \infty\))
The challenge: We are given a Cauchy sequence \((f_n)\) in \(L^p\) and must
produce a limit function \(f \in L^p\) with \(\|f_n - f\|_p \to 0\). The difficulty is that
\(L^p\) convergence is an integral condition — it says nothing directly about pointwise
behavior. We need to bridge from integral estimates to pointwise convergence and back.
Key Lemma (Absolute Summability Criterion).
A normed space is complete if and only if every absolutely summable series
converges. That is, \(\sum_{k=1}^{\infty} \|h_k\| < \infty\) implies that the
partial sums \(\sum_{k=1}^{N} h_k\) converge in norm. This equivalent formulation
is often easier to work with than the Cauchy sequence definition directly, because
summability conditions mesh naturally with the MCT.
Proof Sketch:
Step 1 — Extract a fast subsequence.
Since \((f_n)\) is Cauchy, for each \(k \geq 1\) there exists \(n_k\) such that
\(\|f_m - f_n\|_p < 2^{-k}\) for all \(m, n \geq n_k\).
By choosing \(n_1 < n_2 < n_3 < \cdots\), we obtain a subsequence \((f_{n_k})\) with
\[
\|f_{n_{k+1}} - f_{n_k}\|_p \;<\; 2^{-k} \quad \text{for all } k \geq 1.
\]
In particular, the "differences" \(h_k = f_{n_{k+1}} - f_{n_k}\) satisfy
\(\sum_{k=1}^{\infty} \|h_k\|_p < \sum_{k=1}^{\infty} 2^{-k} = 1 < \infty\).
Step 2 — Construct a dominating function via MCT.
Define the partial sums of absolute values:
\[
G_N(x) \;=\; |f_{n_1}(x)| + \sum_{k=1}^{N} |f_{n_{k+1}}(x) - f_{n_k}(x)|.
\]
The sequence \((G_N)\) is nonnegative and increasing, so \((G_N^p)\) is
also nonnegative and increasing. Applying the
Monotone Convergence Theorem to \(G_N^p\):
\[
\int_\Omega G^p \, d\mu \;=\; \lim_{N \to \infty} \int_\Omega G_N^p \, d\mu,
\]
where \(G(x) = \lim_{N \to \infty} G_N(x)\).
Taking \(p\)-th roots, \(\|G\|_p = \lim_{N \to \infty} \|G_N\|_p\).
By Minkowski's inequality (applied finitely many times):
\[
\|G_N\|_p \;\leq\; \|f_{n_1}\|_p + \sum_{k=1}^{N} \|h_k\|_p
\;\leq\; \|f_{n_1}\|_p + 1.
\]
Therefore \(\|G\|_p \leq \|f_{n_1}\|_p + 1 < \infty\), which means
\(G \in L^p\). In particular, \(G(x) < \infty\) for a.e. \(x\).
Step 3 — Obtain a pointwise limit.
At every point where \(G(x) < \infty\) (which is a.e.), the telescoping series
\[
f_{n_1}(x) + \sum_{k=1}^{\infty} \bigl(f_{n_{k+1}}(x) - f_{n_k}(x)\bigr)
\]
converges absolutely (its partial sums are bounded by \(G(x)\)).
The partial sums of this series are exactly \(f_{n_K}(x)\), so we may define
\[
f(x) \;=\; \lim_{K \to \infty} f_{n_K}(x) \quad \text{for a.e. } x.
\]
Since \(|f(x)| \leq G(x)\) a.e. and \(G \in L^p\), we have \(f \in L^p\).
Step 4 — Prove \(L^p\) convergence of the subsequence.
We have \(f_{n_K}(x) \to f(x)\) a.e., so \(|f_{n_K} - f|^p \to 0\) a.e.
Furthermore,
\[
|f_{n_K}(x) - f(x)|^p \;\leq\; \bigl(|f_{n_K}(x)| + |f(x)|\bigr)^p
\;\leq\; (2G(x))^p,
\]
and \((2G)^p \in L^1\) because \(G \in L^p\).
By the Dominated Convergence Theorem:
\[
\|f_{n_K} - f\|_p^p \;=\; \int |f_{n_K} - f|^p \, d\mu \;\to\; 0
\quad \text{as } K \to \infty.
\]
Step 5 — Lift from the subsequence to the full sequence.
We now know \(f_{n_K} \to f\) in \(L^p\). To show \(f_n \to f\) in \(L^p\),
we use the fact that \((f_n)\) is Cauchy. For any \(\epsilon > 0\), choose
\(N_0\) such that \(\|f_m - f_n\|_p < \epsilon/2\) for all \(m, n \geq N_0\),
and choose \(K\) such that \(n_K \geq N_0\) and
\(\|f_{n_K} - f\|_p < \epsilon/2\). Then for all \(n \geq N_0\):
\[
\|f_n - f\|_p \;\leq\; \|f_n - f_{n_K}\|_p + \|f_{n_K} - f\|_p
\;<\; \frac{\epsilon}{2} + \frac{\epsilon}{2} \;=\; \epsilon.
\]
This is a standard argument: if a Cauchy sequence has a convergent
subsequence, then the full sequence converges to the same limit.
The Case \(p = \infty\)
For \(L^\infty\), the argument is simpler and does not require the MCT.
By the definition of the essential supremum, for each pair of indices
\(m, n\), the inequality
\(|f_m(x) - f_n(x)| \leq \|f_m - f_n\|_\infty\) holds for a.e. \(x\),
outside an exceptional null set \(E_{m,n}\). Taking the countable union
\(E = \bigcup_{m,n \in \mathbb{N}} E_{m,n}\) (still a null set, since it is
a countable union of null sets), we have
\[
|f_m(x) - f_n(x)| \;\leq\; \|f_m - f_n\|_\infty
\quad \text{for all } x \notin E \text{ and all } m, n.
\]
If \((f_n)\) is Cauchy in \(L^\infty\), the right side tends to zero as
\(m, n \to \infty\), so \((f_n(x))\) is a Cauchy sequence in \(\mathbb{F}\)
for every \(x \notin E\). Since \(\mathbb{F}\) is complete,
\(f_n(x) \to f(x)\) pointwise on \(\Omega \setminus E\).
Moreover, the convergence is uniform outside \(E\): for any \(\epsilon > 0\),
choose \(N\) such that \(\|f_m - f_n\|_\infty < \epsilon\) for \(m, n \geq N\),
then let \(n \to \infty\) to get \(|f_N(x) - f(x)| \leq \epsilon\) for all
\(x \notin E\). This gives \(f \in L^\infty\) and
\(\|f_n - f\|_\infty \to 0\).
Why the Proof Architecture Matters
The five-step pattern above — extract a fast subsequence, build a dominating
function, obtain pointwise convergence, apply DCT, lift to the full
sequence — is not specific to \(L^p\). It is the standard template for
proving completeness of function spaces throughout analysis. Sobolev spaces
\(W^{k,p}\), which arise in the study of partial differential equations and
physics-informed neural networks, are proven complete by the same strategy:
reduce the problem to \(L^p\) completeness of the function and its derivatives.
Understanding the architecture once equips you to recognize and deploy it
wherever function space completeness is needed.
An Important Corollary
The Riesz-Fischer proof yields more than just completeness. Step 4 produced a
subsequence \((f_{n_K})\) that converges to \(f\) both in \(L^p\) and pointwise a.e.
This is worth recording as an independent result:
Corollary: Subsequence with Pointwise Convergence
If \(f_n \to f\) in \(L^p\) (\(1 \leq p \leq \infty\)), then there exists
a subsequence \((f_{n_k})\) such that \(f_{n_k}(x) \to f(x)\) for a.e. \(x\).
This corollary connects \(L^p\) convergence (an integral condition) back to
pointwise behavior (a condition on individual points). As we will see in the
next section, the converse does not hold: pointwise a.e. convergence
alone does not imply \(L^p\) convergence, and \(L^p\) convergence does not
imply full pointwise a.e. convergence (only a subsequence is guaranteed).
Convergence in \(L^p\) — A Hierarchy of Modes
With \(L^p\) established as a Banach space, we can study convergence within it.
But \(L^p\) convergence is only one of several natural notions of convergence
for sequences of measurable functions. Understanding how these notions relate
to one another is essential for working effectively with function spaces —
and for bridging to probability theory, where the same hierarchy reappears
under different names.
Four Notions of Convergence
Let \((f_n)\) be a sequence of measurable functions on \((\Omega, \mathcal{F}, \mu)\)
and let \(f\) be a measurable function. We consider four modes of convergence:
Definition: Modes of Convergence
(i) \(L^p\) convergence:
\(f_n \to f\) in \(L^p\) if \(\|f_n - f\|_p \to 0\).
(ii) Pointwise a.e. convergence:
\(f_n \to f\) a.e. if \(f_n(x) \to f(x)\) for all \(x\) outside a
set of measure zero.
(iii) Convergence in measure:
\(f_n \to f\) in measure if, for every \(\epsilon > 0\),
\[
\mu\bigl(\{x \in \Omega : |f_n(x) - f(x)| > \epsilon\}\bigr) \;\to\; 0
\quad \text{as } n \to \infty.
\]
(iv) Uniform convergence a.e.:
There exists a null set \(E\) such that \(f_n \to f\)
uniformly on \(\Omega \setminus E\), i.e.,
\(\sup_{x \notin E} |f_n(x) - f(x)| \to 0\).
These are ordered roughly from "strongest" to "weakest" in terms of what they
guarantee. Uniform a.e. convergence is the strongest (it is equivalent to
\(L^\infty\) convergence); convergence in measure is the weakest of the four.
The precise logical implications are as follows.
The Implication Map
Theorem: Relations Between Modes of Convergence
The following implications hold:
- \(L^p\) convergence \(\Rightarrow\) convergence in measure.
- \(L^p\) convergence \(\Rightarrow\) some subsequence converges a.e.
(the Riesz-Fischer corollary above).
- Pointwise a.e. convergence + domination by \(h \in L^p\)
\(\Rightarrow\) \(L^p\) convergence (the DCT).
- Convergence in measure \(\Rightarrow\) some subsequence converges a.e.
No other implications hold in general. In particular:
- \(L^p\) convergence does not imply pointwise a.e. convergence.
- Pointwise a.e. convergence does not imply \(L^p\) convergence
(without domination).
- Pointwise a.e. convergence does not imply convergence in measure
(on infinite measure spaces).
On finite measure spaces (such as probability spaces), the
picture tightens: pointwise a.e. convergence does imply convergence
in measure, and Egorov's theorem guarantees that for every
\(\epsilon > 0\), there exists a measurable set \(E\) with
\(\mu(E) < \epsilon\) such that \(f_n \to f\) uniformly on
\(\Omega \setminus E\). This "almost uniform convergence" will
reappear naturally when we study convergence of random variables in the
setting of measure-theoretic probability.
Proof of (1): \(L^p\) convergence \(\Rightarrow\) convergence in measure.
This follows from the Chebyshev-Markov inequality.
For any \(\epsilon > 0\):
\[
\mu\bigl(\{|f_n - f| > \epsilon\}\bigr)
\;=\; \mu\bigl(\{|f_n - f|^p > \epsilon^p\}\bigr)
\;\leq\; \frac{1}{\epsilon^p} \int_\Omega |f_n - f|^p \, d\mu
\;=\; \frac{\|f_n - f\|_p^p}{\epsilon^p}.
\]
If \(\|f_n - f\|_p \to 0\), the right side tends to zero, so
\(f_n \to f\) in measure.
The Traveling Bump: Why \(L^p\) Does Not Imply A.E.
Counterexample:
Consider the interval \([0, 1]\) with Lebesgue measure.
We construct a sequence of indicator functions that converges to zero in
\(L^p\) but does not converge at any point.
Enumerate the intervals
\([0, 1],\; [0, 1/2],\; [1/2, 1],\; [0, 1/3],\; [1/3, 2/3],\; [2/3, 1],\;
[0, 1/4],\; \ldots\)
and let \(f_n = \chi_{I_n}\) where \(I_n\) is the \(n\)-th interval in this
enumeration. Then \(\|f_n\|_p = |I_n|^{1/p} \to 0\) as \(n \to \infty\),
so \(f_n \to 0\) in \(L^p\).
However, for every \(x \in [0, 1]\), the sequence \((f_n(x))\) takes the value
\(1\) infinitely often and the value \(0\) infinitely often (since the
intervals sweep across \([0,1]\) repeatedly with decreasing widths).
Therefore \(f_n(x)\) does not converge for any \(x\) — not even on a set
of full measure.
This counterexample shows that \(L^p\) convergence is fundamentally an
average condition: it says the integrated \(p\)-th power of the
difference is small, but it does not control where the function is large on any
given day. The Riesz-Fischer corollary guarantees only that a
subsequence converges pointwise a.e. — the full sequence may oscillate
wildly at each point.
The \(L^p\) Dominated Convergence Theorem
The standard DCT gives conditions under which pointwise convergence implies
\(L^1\) convergence. We now state the natural \(L^p\) generalization, which
follows immediately by applying the standard DCT to \(|f_n - f|^p\).
Theorem: \(L^p\)-Dominated Convergence
Let \(1 \leq p < \infty\). Suppose \(f_n \to f\) a.e., and there exists
\(h \in L^p\) such that \(|f_n(x)| \leq h(x)\) for a.e. \(x\) and all \(n\).
Then \(f \in L^p\) and
\[
\|f_n - f\|_p \;\to\; 0 \quad \text{as } n \to \infty.
\]
Proof:
Since \(|f_n| \leq h\) a.e. and \(f_n \to f\) a.e., we also have
\(|f| \leq h\) a.e., so \(f \in L^p\). Now consider
\(|f_n - f|^p \leq (|f_n| + |f|)^p \leq (2h)^p\) a.e.
Since \(h \in L^p\), we have \((2h)^p \in L^1\). Also
\(|f_n - f|^p \to 0\) a.e. Applying the standard DCT to the sequence
\(|f_n - f|^p\) with dominating function \((2h)^p\), we conclude
\(\int |f_n - f|^p \, d\mu \to 0\).
Looking Ahead: Probability and Convergence
The hierarchy of convergence modes we have just developed has a direct parallel
in probability theory. When the measure space is a probability space
\((\Omega, \mathcal{F}, \mathbb{P})\) and the functions are random variables:
- A.e. convergence becomes
almost sure (a.s.) convergence.
- Convergence in measure becomes
convergence in probability.
- \(L^p\) convergence becomes
\(L^p\) convergence of random variables, i.e.,
\(\mathbb{E}[|X_n - X|^p] \to 0\).
A fourth mode — convergence in distribution — has no direct
analogue in the function-space setting and is inherently probabilistic.
The full picture, including the relationships between these probabilistic
modes of convergence and their role in the law of large numbers and central
limit theorem, is the subject of a future chapter on
measure-theoretic probability.
Why Complete Function Spaces Are Essential
We have now proven that \(L^p\) is a Banach space: a normed vector space in which
every Cauchy sequence converges. In
Completeness, we motivated
this property for general metric spaces as the absence of "holes." But for function
spaces, completeness carries a far more concrete significance: it guarantees that
the result of a limiting operation is still a legitimate object in the space —
a function with finite energy, a probability distribution with finite moments,
or a physically meaningful quantum state.
We close this chapter by examining three domains where completeness of \(L^p\) is not
a mathematical luxury but an absolute necessity.
Probability Theory: Finite Moments and Estimation
In probability, a random variable \(X\) on a probability space
\((\Omega, \mathcal{F}, \mathbb{P})\) is simply a measurable function
\(X : \Omega \to \mathbb{R}\). Saying \(X \in L^p(\Omega, \mathbb{P})\)
means precisely that the \(p\)-th moment is finite:
\[
\mathbb{E}\bigl[|X|^p\bigr] \;=\; \int_\Omega |X|^p \, d\mathbb{P} \;<\; \infty.
\]
The case \(p = 2\) is especially important: \(X \in L^2\) means that both the
mean and the variance are finite, and \(L^2(\Omega, \mathbb{P})\) is a
Hilbert space
with inner product \(\langle X, Y \rangle = \mathbb{E}[XY]\).
Completeness of \(L^2\) guarantees that the
orthogonal projection onto any closed subspace exists.
This is the mathematical foundation of least-squares estimation:
the conditional expectation \(\mathbb{E}[X \mid \mathcal{G}]\) is the
\(L^2\)-projection of \(X\) onto the subspace of \(\mathcal{G}\)-measurable
random variables. Without completeness, the projection might not land
inside the space — the "best estimate" might not exist as a random variable
with finite variance.
Hölder's inequality also takes on a probabilistic reading: for random
variables \(X \in L^p\) and \(Y \in L^q\),
\[
\mathbb{E}[|XY|] \;\leq\;
\bigl(\mathbb{E}[|X|^p]\bigr)^{1/p} \, \bigl(\mathbb{E}[|Y|^q]\bigr)^{1/q}.
\]
This bounds the expectation of a product in terms of individual moment
conditions — a tool used constantly in proving concentration inequalities,
convergence theorems, and the convergence rates of estimators.
Signal Processing: Finite Energy and Fourier Reconstruction
In signal processing, a signal \(f : \mathbb{R} \to \mathbb{C}\) has
finite energy if
\[
\|f\|_2^2 \;=\; \int_{-\infty}^{\infty} |f(t)|^2 \, dt \;<\; \infty.
\]
The space of finite-energy signals is exactly \(L^2(\mathbb{R})\).
Plancherel's theorem
states that the Fourier transform preserves this energy:
\[
\|f\|_{L^2}^2 \;=\; \frac{1}{2\pi}\|\hat{f}\|_{L^2}^2.
\]
In other words, the (normalized) Fourier transform is a
unitary operator on \(L^2(\mathbb{R})\) — it is an isometry
that maps \(L^2\) onto itself.
But unitarity is only meaningful if the space is complete.
If \(L^2\) had "holes," the Fourier transform of a finite-energy signal might
land outside the space — there would be frequency representations that
correspond to no legitimate time-domain signal, or vice versa. Completeness
ensures that the Fourier transform is a bijection on \(L^2\), that every
finite-energy spectrum reconstructs a finite-energy signal, and that
Parseval's identity
holds with exact equality. The entire mathematical framework of spectral
analysis rests on the Riesz-Fischer theorem.
Quantum Mechanics: Wave Functions and Unitary Evolution
In quantum mechanics, the state of a particle is described by a
wave function \(\psi \in L^2(\mathbb{R}^3)\) satisfying the
normalization condition \(\|\psi\|_2 = 1\). The physical interpretation is
Born's rule: \(|\psi(x)|^2\) is the probability density for finding the
particle at position \(x\). The \(L^2\) norm being \(1\) ensures that
probabilities sum to \(1\).
Time evolution is governed by the Schrödinger equation, whose solution is
a one-parameter family of unitary operators
\(U(t) = e^{-iHt/\hbar}\) acting on \(L^2(\mathbb{R}^3)\).
Unitarity means \(\|U(t)\psi\|_2 = \|\psi\|_2 = 1\) for all \(t\) —
probability is conserved under time evolution.
If \(L^2\) were not complete, the time evolution of a quantum state
could converge to a limit that is not a valid wave function —
it would have infinite energy or fail to be square-integrable, making
the probability interpretation collapse. Completeness guarantees that
unitary evolution stays within the space of physical states, and that
the spectral decomposition of observables (the
spectral theorem)
produces well-defined measurement outcomes. In this sense, the Riesz-Fischer
theorem is not merely a mathematical convenience — it is a precondition
for the logical consistency of quantum theory.
The Common Thread
Across all three domains, the pattern is the same. Each field relies on
limiting operations — expectations of infinite sums, inverse
Fourier transforms, time evolution of differential equations — and completeness
is the guarantee that these limits remain within the space of objects that
have physical or mathematical meaning. An estimator with finite variance.
A signal with finite energy. A quantum state with total probability one.
In Completeness, we described
a complete metric space as one "without holes." Here we see what that metaphor
means concretely for function spaces: a "hole" in \(L^p\) would be a sequence of
perfectly legitimate functions — each with finite \(p\)-th integral — whose limit
escapes to something infinite, undefined, or physically meaningless. The
Riesz-Fischer theorem seals every such hole.
Looking Ahead
This chapter has established \(L^p\) as a Banach space and settled the proof
debts deferred from
Intro to Functional Analysis
and Dual Spaces.
The road ahead branches in two complementary directions:
Historically, the theorem that bears the names of Frigyes Riesz and Ernst Fischer
was originally proven (independently, in 1907) for the case \(p = 2\) alone:
they showed that \(L^2\) is complete, and that the map sending a function
to its Fourier coefficients is an isometric isomorphism between \(L^2\) and
\(\ell^2\). The generalization to all \(1 \leq p \leq \infty\) came later, but
the original \(L^2\) result remains the most consequential — it is what makes
Fourier analysis in Hilbert spaces possible.
-
Fourier analysis in Hilbert spaces will take the special
case \(p = 2\) and develop its Hilbert space structure in full — the inner
product, Plancherel's theorem as a unitary equivalence, and the
Heisenberg uncertainty principle as a theorem about noncommuting operators
on \(L^2\).
-
Measure-theoretic probability will reinterpret the
convergence theorems (MCT, DCT, Fatou) and the \(L^p\) hierarchy in
the language of random variables and expectations, closing the gap between
the measure-theoretic foundations of
Measure Theory /
Lebesgue Integration and the
probabilistic reasoning used throughout Section III.
Both paths build directly on the completeness of \(L^p\) proven here — the first
by specializing to the richest structure (\(L^2\) as a Hilbert space), the second
by specializing to the richest interpretation (\(L^p\) of random variables on a
probability space).