\(L^p\) Spaces & Riesz-Fischer

From Functions to Equivalence Classes Young's & Hölder's Inequality Minkowski's Inequality & the \(L^p\) Norm The Riesz-Fischer Theorem Convergence in \(L^p\) Why Complete Function Spaces Are Essential

In Intro to Functional Analysis, we introduced the \(L^p\) spaces as the most important examples of Banach spaces in analysis and machine learning. We stated that \(\|f\|_p = \bigl(\int |f|^p \, d\mu\bigr)^{1/p}\) defines a norm, that the resulting space is complete, and that the dual of \(L^p\) is \(L^q\) (where \(1/p + 1/q = 1\)). In the Dual Spaces chapter, we relied on Hölder's inequality — without proof — to justify the pairing \(\varphi_g(f) = \int fg \, d\mu\) in the duality table.

All three of these claims were left unproven. This chapter settles the debt in full:

  1. \(\|\cdot\|_p\) is genuinely a norm — not merely a seminorm — once we pass to equivalence classes of functions equal almost everywhere.
  2. The triangle inequality for this norm (Minkowski's inequality) follows from Hölder's inequality, which in turn follows from the elementary Young's inequality.
  3. The resulting normed space is complete: the Riesz-Fischer theorem guarantees that every Cauchy sequence in \(L^p\) converges in \(L^p\).

The chain of implications is: Young → Hölder → Minkowski → Norm → Riesz-Fischer → Banach space. Each link depends logically on the preceding one. We follow this chain from beginning to end.

From Functions to Equivalence Classes

Let \((\Omega, \mathcal{F}, \mu)\) be a measure space. In Intro to Functional Analysis, we defined the \(L^p\) space as the collection of measurable functions \(f\) with \(\int |f|^p \, d\mu < \infty\), equipped with the quantity \[ \|f\|_p \;=\; \left( \int_\Omega |f|^p \, d\mu \right)^{1/p}. \] For this to be a valid norm, three axioms must hold: (i) \(\|f\|_p \geq 0\) with equality if and only if \(f = 0\), (ii) \(\|\alpha f\|_p = |\alpha| \cdot \|f\|_p\), and (iii) \(\|f + g\|_p \leq \|f\|_p + \|g\|_p\). Axiom (ii) is straightforward from the linearity of the integral. But axiom (i) immediately presents a problem.

The Seminorm Problem

Suppose \(\|f\|_p = 0\). Then \(\int |f|^p \, d\mu = 0\), and since \(|f|^p \geq 0\), this forces \(|f(x)|^p = 0\) for almost every \(x \in \Omega\) — that is, \(f(x) = 0\) except on a set of measure zero. But this does not mean \(f\) is the zero function. It means only that \(f = 0\) almost everywhere (a.e.).

For a concrete example, recall the Dirichlet function from Lebesgue Integration: \[ f(x) = \chi_{\mathbb{Q}}(x) = \begin{cases} 1 & \text{if } x \in \mathbb{Q}, \\ 0 & \text{if } x \notin \mathbb{Q}. \end{cases} \] Since \(\mu(\mathbb{Q}) = 0\), we have \(\|f\|_p = 0\) for every \(1 \leq p \leq \infty\), yet \(f\) is not the zero function — it equals \(1\) at every rational point. The quantity \(\|\cdot\|_p\) therefore fails to distinguish \(f\) from the zero function. In the language of normed space theory, \(\|\cdot\|_p\) is a seminorm, not a norm: it satisfies \(\|f\|_p = 0\) without \(f = 0\).

The Equivalence Relation

The resolution is a standard algebraic maneuver: we quotient out the ambiguity. We declare two measurable functions to be "the same" if they differ only on a negligible set.

Definition: Equality Almost Everywhere

Let \(f, g : \Omega \to \mathbb{F}\) (\(\mathbb{F} = \mathbb{R}\) or \(\mathbb{C}\)) be measurable functions. We say \(f\) and \(g\) are equal almost everywhere, written \(f = g\) a.e., if \[ \mu\bigl(\{x \in \Omega : f(x) \neq g(x)\}\bigr) = 0. \] The relation \(f \sim g \iff f = g\) a.e. is an equivalence relation on the set of measurable functions.

Verification:

We verify the three axioms of an equivalence relation.

Reflexivity: \(\{x : f(x) \neq f(x)\} = \emptyset\), which has measure zero.

Symmetry: \(\{x : f(x) \neq g(x)\} = \{x : g(x) \neq f(x)\}\), so \(\mu\) of the two sets is the same.

Transitivity: Suppose \(f = g\) a.e. and \(g = h\) a.e. Let \(N_1 = \{x : f(x) \neq g(x)\}\) and \(N_2 = \{x : g(x) \neq h(x)\}\). If \(f(x) \neq h(x)\), then either \(f(x) \neq g(x)\) or \(g(x) \neq h(x)\), so \(\{x : f(x) \neq h(x)\} \subseteq N_1 \cup N_2\). Since \(\sigma\)-algebras are closed under finite unions, \(N_1 \cup N_2\) is measurable, and by the subadditivity of measures, \(\mu(N_1 \cup N_2) \leq \mu(N_1) + \mu(N_2) = 0\).

The Formal Definition of \(L^p\)

Definition: The \(L^p\) Space (Rigorous)

Let \((\Omega, \mathcal{F}, \mu)\) be a measure space and \(1 \leq p < \infty\). Define \[ \mathscr{L}^p(\Omega, \mathcal{F}, \mu) \;=\; \bigl\{\, f : \Omega \to \mathbb{F} \;\big|\; f \text{ is measurable and } \textstyle\int_\Omega |f|^p \, d\mu < \infty \,\bigr\}. \] The \(L^p\) space is the quotient \[ L^p(\Omega, \mathcal{F}, \mu) \;=\; \mathscr{L}^p(\Omega, \mathcal{F}, \mu) \,\big/\!\sim \] where \(f \sim g \iff f = g\) a.e. Each element of \(L^p\) is an equivalence class \([f]\) of functions that agree almost everywhere.

Following universal convention, we write \(f \in L^p\) rather than \([f] \in L^p\), understanding that "\(f\)" refers to the equivalence class and not to any particular representative. This notational abuse is harmless because all quantities we care about — the norm \(\|f\|_p\), integrals \(\int fg \, d\mu\), and convergence statements — are invariant under modification on sets of measure zero. When we write \(f = 0\) in \(L^p\), we mean \(f(x) = 0\) for \(\mu\)-almost every \(x\).

We similarly define \(L^\infty(\Omega, \mathcal{F}, \mu)\) as the space of equivalence classes of essentially bounded measurable functions, equipped with the essential supremum norm \(\|f\|_\infty = \inf\{C \geq 0 : |f(x)| \leq C \text{ a.e.}\}\).

Connection to Sequence Spaces

The sequence spaces \(\ell^p\) introduced in Intro to Functional Analysis are a special case of \(L^p\). If we take \(\Omega = \mathbb{N}\), \(\mathcal{F} = 2^{\mathbb{N}}\) (all subsets), and \(\mu\) = counting measure (\(\mu(\{n\}) = 1\) for each \(n\)), then \(L^p(\mathbb{N}, \mu)\) is exactly \(\ell^p\). In this setting, the equivalence class issue is trivial — since every singleton \(\{n\}\) has positive measure, two sequences are equal a.e. if and only if they are identical. Every theorem we prove for \(L^p\) in this chapter therefore specializes to \(\ell^p\) automatically.

With the quotient construction in hand, the seminorm \(\|\cdot\|_p\) on \(\mathscr{L}^p\) descends to a genuine norm on \(L^p\): if \(\|[f]\|_p = 0\), then \(f = 0\) a.e., which means \([f]\) is the zero element of the quotient space. The remaining norm axiom — the triangle inequality \(\|f + g\|_p \leq \|f\|_p + \|g\|_p\) — is the content of Minkowski's inequality, which requires Hölder's inequality as an intermediate step. We turn to these now.

Young's & Hölder's Inequality

The proof chain begins with an elementary inequality about real numbers, which we then "integrate" to obtain the central inequality of \(L^p\) theory.

Hölder Conjugates

Throughout this section, we fix an exponent \(1 < p < \infty\) and define its Hölder conjugate \(q\) by the relation \[ \frac{1}{p} + \frac{1}{q} = 1, \qquad \text{equivalently} \quad q = \frac{p}{p - 1}. \] The pair \((p, q)\) is symmetric: the conjugate of \(q\) is \(p\). The extreme cases are \(p = 1, q = \infty\) and \(p = \infty, q = 1\), which we treat separately where needed. Note also that \(p = q = 2\) is the unique self-conjugate case — the setting of Hilbert spaces and the Cauchy-Schwarz inequality.

The algebraic identity \((p - 1)q = p\) (equivalently, \(p/q = p - 1\)) will appear repeatedly in the proofs below. It is worth verifying once: from \(1/q = 1 - 1/p = (p-1)/p\), we get \(q = p/(p-1)\), so \((p-1)q = p\).

Young's Inequality

Theorem: Young's Inequality

Let \(1 < p < \infty\) and let \(q\) be its Hölder conjugate. For all \(a, b \geq 0\), \[ ab \;\leq\; \frac{a^p}{p} + \frac{b^q}{q}. \] Equality holds if and only if \(a^p = b^q\).

Proof:

If \(a = 0\) or \(b = 0\), both sides are zero and the inequality is trivial. Assume \(a, b > 0\).

The key observation is that the exponential function \(t \mapsto e^t\) is convex. Equivalently, the logarithm \(t \mapsto \log t\) is concave on \((0, \infty)\). By the definition of concavity, for any \(\lambda \in (0, 1)\) and \(u, v > 0\), \[ \log\bigl(\lambda u + (1 - \lambda) v\bigr) \;\geq\; \lambda \log u + (1 - \lambda) \log v \;=\; \log\bigl(u^\lambda v^{1-\lambda}\bigr). \] Since \(\log\) is increasing, this gives the weighted AM-GM inequality: \[ u^\lambda \, v^{1 - \lambda} \;\leq\; \lambda u + (1 - \lambda) v. \]

Now set \(\lambda = 1/p\), \(1 - \lambda = 1/q\), \(u = a^p\), and \(v = b^q\): \[ (a^p)^{1/p} (b^q)^{1/q} \;\leq\; \frac{a^p}{p} + \frac{b^q}{q}, \] which simplifies to \(ab \leq \frac{a^p}{p} + \frac{b^q}{q}\).

Since \(\log\) is strictly concave, equality holds if and only if \(u = v\), i.e., \(a^p = b^q\).

Young's inequality is a pointwise statement about real numbers. Its power emerges when we "integrate both sides" — which is exactly what happens in the proof of Hölder's inequality.

Hölder's Inequality

Theorem: Hölder's Inequality

Let \(1 \leq p \leq \infty\) and let \(q\) be its Hölder conjugate. If \(f \in L^p\) and \(g \in L^q\), then \(fg \in L^1\) and \[ \|fg\|_1 \;=\; \int_\Omega |f g| \, d\mu \;\leq\; \|f\|_p \, \|g\|_q. \]

Proof:

Case \(p = 1, q = \infty\): By the definition of the essential supremum, \(|g(x)| \leq \|g\|_\infty\) for a.e. \(x\). Therefore \(|f(x) g(x)| \leq |f(x)| \cdot \|g\|_\infty\) a.e. Integrating both sides gives \(\int |fg| \, d\mu \leq \|g\|_\infty \int |f| \, d\mu = \|f\|_1 \|g\|_\infty\).

Case \(1 < p < \infty\): If \(\|f\|_p = 0\) or \(\|g\|_q = 0\), then \(f = 0\) a.e. or \(g = 0\) a.e., so \(fg = 0\) a.e. and both sides are zero. Assume \(\|f\|_p > 0\) and \(\|g\|_q > 0\).

Normalization. Define \[ \tilde{f} = \frac{|f|}{\|f\|_p}, \qquad \tilde{g} = \frac{|g|}{\|g\|_q}. \] Then \(\|\tilde{f}\|_p = 1\) and \(\|\tilde{g}\|_q = 1\). Applying Young's inequality pointwise: \[ \tilde{f}(x) \, \tilde{g}(x) \;\leq\; \frac{\tilde{f}(x)^p}{p} + \frac{\tilde{g}(x)^q}{q} \quad \text{for all } x. \]

Integrating over \(\Omega\): \[ \int_\Omega \tilde{f} \, \tilde{g} \, d\mu \;\leq\; \frac{1}{p} \int_\Omega \tilde{f}^p \, d\mu + \frac{1}{q} \int_\Omega \tilde{g}^q \, d\mu \;=\; \frac{1}{p} + \frac{1}{q} \;=\; 1. \]

Substituting back \(\tilde{f} = |f|/\|f\|_p\) and \(\tilde{g} = |g|/\|g\|_q\): \[ \frac{1}{\|f\|_p \, \|g\|_q} \int_\Omega |f g| \, d\mu \;\leq\; 1, \] which gives \(\int |fg| \, d\mu \leq \|f\|_p \, \|g\|_q\) as claimed.

The normalization trick is the heart of the proof: by scaling \(f\) and \(g\) to have unit norm, we reduce a statement about integrals to a pointwise application of Young's inequality, and the "integration" step becomes trivial.

Equality Conditions and Special Cases

Equality in Hölder's inequality holds if and only if \(|f(x)|^p / \|f\|_p^p = |g(x)|^q / \|g\|_q^q\) for a.e. \(x\), i.e., \(|f|^p\) and \(|g|^q\) are proportional as functions. In other words, there exist constants \(\alpha, \beta \geq 0\) (not both zero) such that \(\alpha |f(x)|^p = \beta |g(x)|^q\) a.e.

When \(p = q = 2\), Hölder's inequality reduces to: \[ \int_\Omega |fg| \, d\mu \;\leq\; \|f\|_2 \, \|g\|_2. \] This is precisely the Cauchy-Schwarz inequality for \(L^2\), the infinite-dimensional generalization of the bound \(|\mathbf{u} \cdot \mathbf{v}| \leq \|\mathbf{u}\| \, \|\mathbf{v}\|\) from Hilbert spaces. Hölder's inequality thus provides a single, unified framework that encompasses Cauchy-Schwarz as its most symmetric special case.

The One Direction of \(L^p\) Duality

Hölder's inequality immediately settles one half of the duality claim from Dual Spaces. For any fixed \(g \in L^q\), the map \[ \varphi_g : L^p \to \mathbb{F}, \qquad \varphi_g(f) = \int_\Omega f \, g \, d\mu \] is a well-defined, linear, and bounded functional on \(L^p\), with \(\|\varphi_g\|_{(L^p)^*} \leq \|g\|_q\). (One can show that equality holds, so the embedding \(L^q \hookrightarrow (L^p)^*\) is isometric.) This establishes the "easy direction" of the isomorphism \((L^p)^* \cong L^q\) for \(1 \leq p < \infty\): every element of \(L^q\) gives rise to a continuous functional on \(L^p\).

The converse — that every continuous functional on \(L^p\) arises from some \(g \in L^q\) — is substantially harder and requires the Radon-Nikodym theorem, a tool from measure theory that we will develop after the measure-theoretic probability foundations are in place.

Minkowski's Inequality & the \(L^p\) Norm

We now use Hölder's inequality to prove the triangle inequality for \(\|\cdot\|_p\), completing the verification that \(L^p\) is a normed space.

Theorem: Minkowski's Inequality

Let \(1 \leq p \leq \infty\). If \(f, g \in L^p\), then \(f + g \in L^p\) and \[ \|f + g\|_p \;\leq\; \|f\|_p + \|g\|_p. \]

Proof:

Case \(p = 1\): By the pointwise triangle inequality, \(|f(x) + g(x)| \leq |f(x)| + |g(x)|\). Integrating directly gives \(\|f + g\|_1 \leq \|f\|_1 + \|g\|_1\).

Case \(p = \infty\): We have \(|f(x) + g(x)| \leq |f(x)| + |g(x)| \leq \|f\|_\infty + \|g\|_\infty\) for a.e. \(x\). Taking the essential supremum on the left gives the result.

Case \(1 < p < \infty\): First, observe that \(f + g \in L^p\): since \(|f(x) + g(x)| \leq |f(x)| + |g(x)| \leq 2\max(|f(x)|, |g(x)|)\), we have \[ |f + g|^p \;\leq\; 2^p \max(|f|^p,\, |g|^p) \;\leq\; 2^p(|f|^p + |g|^p). \] Integrating gives \(\int |f+g|^p \, d\mu \leq 2^p(\|f\|_p^p + \|g\|_p^p) < \infty\), so \(\|f+g\|_p\) is finite.

Now assume \(\|f + g\|_p > 0\) (otherwise the inequality is trivial). We begin by splitting \(|f + g|^p\): \[ |f + g|^p \;=\; |f + g|^{p-1} \cdot |f + g| \;\leq\; |f + g|^{p-1} |f| \;+\; |f + g|^{p-1} |g|. \]

We now apply Hölder's inequality to each term on the right. The key observation is that \(|f + g|^{p-1} \in L^q\), because \[ \int_\Omega \bigl(|f + g|^{p-1}\bigr)^q \, d\mu \;=\; \int_\Omega |f + g|^{(p-1)q} \, d\mu \;=\; \int_\Omega |f + g|^{p} \, d\mu \;=\; \|f + g\|_p^p, \] where we used the identity \((p - 1)q = p\). Therefore \(\bigl\| |f+g|^{p-1} \bigr\|_q = \|f+g\|_p^{p/q}\).

Applying Hölder's inequality to each term: \[ \int |f + g|^{p-1} |f| \, d\mu \;\leq\; \bigl\||f+g|^{p-1}\bigr\|_q \cdot \|f\|_p \;=\; \|f+g\|_p^{p/q} \cdot \|f\|_p, \] and similarly for the term with \(|g|\). Adding: \[ \|f + g\|_p^p \;=\; \int |f+g|^p \, d\mu \;\leq\; \|f+g\|_p^{p/q} \bigl(\|f\|_p + \|g\|_p\bigr). \]

Dividing both sides by \(\|f+g\|_p^{p/q} > 0\): \[ \|f + g\|_p^{p - p/q} \;\leq\; \|f\|_p + \|g\|_p. \] Since \(p - p/q = p(1 - 1/q) = p \cdot (1/p) = 1\), the left side is simply \(\|f + g\|_p\), completing the proof.

Equality condition (\(1 < p < \infty\)): Equality in Minkowski's inequality holds if and only if \(f\) and \(g\) are positively linearly dependent a.e. — that is, there exist constants \(\alpha, \beta \geq 0\) (not both zero) such that \(\alpha f(x) = \beta g(x)\) for a.e. \(x\). This follows from the equality condition of Hölder's inequality applied in the proof above. In the language of geometry, Minkowski's inequality is strict whenever \(f\) and \(g\) point in genuinely different "directions" in \(L^p\) — a manifestation of the strict convexity of the \(L^p\) norm for \(1 < p < \infty\).

\(L^p\) Is a Normed Space

Let us now summarize the complete verification of the norm axioms for \(\|\cdot\|_p\) on \(L^p(\Omega, \mathcal{F}, \mu)\):

  1. Positive definiteness: \(\|f\|_p \geq 0\), and \(\|f\|_p = 0 \iff f = 0\) in \(L^p\) (i.e., \(f = 0\) a.e.). This is where the equivalence class construction of §1 is essential — without it, \(\|\cdot\|_p\) would only be a seminorm.
  2. Absolute homogeneity: \(\|\alpha f\|_p = |\alpha| \cdot \|f\|_p\) for all \(\alpha \in \mathbb{F}\). This follows immediately from \(\int |\alpha f|^p = |\alpha|^p \int |f|^p\).
  3. Triangle inequality: \(\|f + g\|_p \leq \|f\|_p + \|g\|_p\). This is Minkowski's inequality, proven above.

Therefore \(\bigl(L^p(\Omega, \mathcal{F}, \mu),\, \|\cdot\|_p\bigr)\) is a normed vector space for every \(1 \leq p \leq \infty\). The remaining question — the deepest one — is whether this normed space is complete: does every Cauchy sequence in \(L^p\) converge to a limit that is itself in \(L^p\)? This is the content of the Riesz-Fischer theorem, to which we turn next.

The Riesz-Fischer Theorem

We have established that \(L^p\) is a normed space. Completeness — the property that every Cauchy sequence converges within the space — is what elevates a normed space to a Banach space. In Completeness, we studied this property for general metric spaces. Now we prove it concretely for \(L^p\).

The proof relies on three fundamental convergence theorems from Lebesgue integration. We state them precisely here for reference, as they are the essential tools of the argument.

Toolkit from Lebesgue Integration

The following three theorems govern the interchange of limits and integrals. They were introduced conceptually in Lebesgue Integration; we now state them in the precise form needed for the completeness proof. All functions below are measurable on a measure space \((\Omega, \mathcal{F}, \mu)\).

Theorem: Monotone Convergence Theorem (MCT)

Let \((g_n)\) be a sequence of measurable functions satisfying \(0 \leq g_1(x) \leq g_2(x) \leq \cdots\) for a.e. \(x \in \Omega\). Define \(g(x) = \lim_{n \to \infty} g_n(x)\) (which exists in \([0, \infty]\) by monotonicity). Then \[ \int_\Omega g \, d\mu \;=\; \lim_{n \to \infty} \int_\Omega g_n \, d\mu. \] In words: for nonnegative increasing sequences, the integral of the limit equals the limit of the integrals.

Theorem: Fatou's Lemma

Let \((g_n)\) be a sequence of measurable functions satisfying \(g_n(x) \geq 0\) for a.e. \(x\). Then \[ \int_\Omega \liminf_{n \to \infty} g_n \, d\mu \;\leq\; \liminf_{n \to \infty} \int_\Omega g_n \, d\mu. \] In words: the integral of the \(\liminf\) is bounded above by the \(\liminf\) of the integrals. The inequality can be strict — passing limits through integrals can "lose mass."

Theorem: Dominated Convergence Theorem (DCT)

Let \((f_n)\) be a sequence of measurable functions such that \(f_n(x) \to f(x)\) for a.e. \(x\). Suppose there exists a dominating function \(h \in L^1(\mu)\) with \(|f_n(x)| \leq h(x)\) for a.e. \(x\) and all \(n\). Then \(f \in L^1(\mu)\) and \[ \lim_{n \to \infty} \int_\Omega f_n \, d\mu \;=\; \int_\Omega f \, d\mu. \] In words: under pointwise convergence with a uniform integrable bound, limits and integrals commute.

The MCT requires monotonicity but imposes no integrability bound — it even allows the limit to be infinite. Fatou's lemma relaxes monotonicity to mere nonnegativity, at the cost of an inequality rather than equality. The DCT trades the nonnegativity assumption for a dominating function, recovering full equality. Together, these three tools form the backbone of measure-theoretic analysis.

The Theorem

Theorem: Riesz-Fischer

For \(1 \leq p \leq \infty\), the space \(L^p(\Omega, \mathcal{F}, \mu)\) is complete. That is, \(L^p\) is a Banach space.

The full proof for \(1 \leq p < \infty\) is a beautiful application of the convergence theorems above. Rather than present every detail, we expose the architecture of the proof — the strategy and key steps — so that the logical structure is transparent. This architecture reappears in virtually every completeness proof in modern analysis (Sobolev spaces, Besov spaces, Hardy spaces), making it one of the most important proof patterns to internalize.

Proof Architecture (\(1 \leq p < \infty\))

The challenge: We are given a Cauchy sequence \((f_n)\) in \(L^p\) and must produce a limit function \(f \in L^p\) with \(\|f_n - f\|_p \to 0\). The difficulty is that \(L^p\) convergence is an integral condition — it says nothing directly about pointwise behavior. We need to bridge from integral estimates to pointwise convergence and back.

Key Lemma (Absolute Summability Criterion). A normed space is complete if and only if every absolutely summable series converges. That is, \(\sum_{k=1}^{\infty} \|h_k\| < \infty\) implies that the partial sums \(\sum_{k=1}^{N} h_k\) converge in norm. This equivalent formulation is often easier to work with than the Cauchy sequence definition directly, because summability conditions mesh naturally with the MCT.

Proof Sketch:

Step 1 — Extract a fast subsequence. Since \((f_n)\) is Cauchy, for each \(k \geq 1\) there exists \(n_k\) such that \(\|f_m - f_n\|_p < 2^{-k}\) for all \(m, n \geq n_k\). By choosing \(n_1 < n_2 < n_3 < \cdots\), we obtain a subsequence \((f_{n_k})\) with \[ \|f_{n_{k+1}} - f_{n_k}\|_p \;<\; 2^{-k} \quad \text{for all } k \geq 1. \] In particular, the "differences" \(h_k = f_{n_{k+1}} - f_{n_k}\) satisfy \(\sum_{k=1}^{\infty} \|h_k\|_p < \sum_{k=1}^{\infty} 2^{-k} = 1 < \infty\).

Step 2 — Construct a dominating function via MCT. Define the partial sums of absolute values: \[ G_N(x) \;=\; |f_{n_1}(x)| + \sum_{k=1}^{N} |f_{n_{k+1}}(x) - f_{n_k}(x)|. \] The sequence \((G_N)\) is nonnegative and increasing, so \((G_N^p)\) is also nonnegative and increasing. Applying the Monotone Convergence Theorem to \(G_N^p\): \[ \int_\Omega G^p \, d\mu \;=\; \lim_{N \to \infty} \int_\Omega G_N^p \, d\mu, \] where \(G(x) = \lim_{N \to \infty} G_N(x)\). Taking \(p\)-th roots, \(\|G\|_p = \lim_{N \to \infty} \|G_N\|_p\). By Minkowski's inequality (applied finitely many times): \[ \|G_N\|_p \;\leq\; \|f_{n_1}\|_p + \sum_{k=1}^{N} \|h_k\|_p \;\leq\; \|f_{n_1}\|_p + 1. \] Therefore \(\|G\|_p \leq \|f_{n_1}\|_p + 1 < \infty\), which means \(G \in L^p\). In particular, \(G(x) < \infty\) for a.e. \(x\).

Step 3 — Obtain a pointwise limit. At every point where \(G(x) < \infty\) (which is a.e.), the telescoping series \[ f_{n_1}(x) + \sum_{k=1}^{\infty} \bigl(f_{n_{k+1}}(x) - f_{n_k}(x)\bigr) \] converges absolutely (its partial sums are bounded by \(G(x)\)). The partial sums of this series are exactly \(f_{n_K}(x)\), so we may define \[ f(x) \;=\; \lim_{K \to \infty} f_{n_K}(x) \quad \text{for a.e. } x. \] Since \(|f(x)| \leq G(x)\) a.e. and \(G \in L^p\), we have \(f \in L^p\).

Step 4 — Prove \(L^p\) convergence of the subsequence. We have \(f_{n_K}(x) \to f(x)\) a.e., so \(|f_{n_K} - f|^p \to 0\) a.e. Furthermore, \[ |f_{n_K}(x) - f(x)|^p \;\leq\; \bigl(|f_{n_K}(x)| + |f(x)|\bigr)^p \;\leq\; (2G(x))^p, \] and \((2G)^p \in L^1\) because \(G \in L^p\). By the Dominated Convergence Theorem: \[ \|f_{n_K} - f\|_p^p \;=\; \int |f_{n_K} - f|^p \, d\mu \;\to\; 0 \quad \text{as } K \to \infty. \]

Step 5 — Lift from the subsequence to the full sequence. We now know \(f_{n_K} \to f\) in \(L^p\). To show \(f_n \to f\) in \(L^p\), we use the fact that \((f_n)\) is Cauchy. For any \(\epsilon > 0\), choose \(N_0\) such that \(\|f_m - f_n\|_p < \epsilon/2\) for all \(m, n \geq N_0\), and choose \(K\) such that \(n_K \geq N_0\) and \(\|f_{n_K} - f\|_p < \epsilon/2\). Then for all \(n \geq N_0\): \[ \|f_n - f\|_p \;\leq\; \|f_n - f_{n_K}\|_p + \|f_{n_K} - f\|_p \;<\; \frac{\epsilon}{2} + \frac{\epsilon}{2} \;=\; \epsilon. \] This is a standard argument: if a Cauchy sequence has a convergent subsequence, then the full sequence converges to the same limit.

The Case \(p = \infty\)

For \(L^\infty\), the argument is simpler and does not require the MCT. By the definition of the essential supremum, for each pair of indices \(m, n\), the inequality \(|f_m(x) - f_n(x)| \leq \|f_m - f_n\|_\infty\) holds for a.e. \(x\), outside an exceptional null set \(E_{m,n}\). Taking the countable union \(E = \bigcup_{m,n \in \mathbb{N}} E_{m,n}\) (still a null set, since it is a countable union of null sets), we have \[ |f_m(x) - f_n(x)| \;\leq\; \|f_m - f_n\|_\infty \quad \text{for all } x \notin E \text{ and all } m, n. \] If \((f_n)\) is Cauchy in \(L^\infty\), the right side tends to zero as \(m, n \to \infty\), so \((f_n(x))\) is a Cauchy sequence in \(\mathbb{F}\) for every \(x \notin E\). Since \(\mathbb{F}\) is complete, \(f_n(x) \to f(x)\) pointwise on \(\Omega \setminus E\). Moreover, the convergence is uniform outside \(E\): for any \(\epsilon > 0\), choose \(N\) such that \(\|f_m - f_n\|_\infty < \epsilon\) for \(m, n \geq N\), then let \(n \to \infty\) to get \(|f_N(x) - f(x)| \leq \epsilon\) for all \(x \notin E\). This gives \(f \in L^\infty\) and \(\|f_n - f\|_\infty \to 0\).

Why the Proof Architecture Matters

The five-step pattern above — extract a fast subsequence, build a dominating function, obtain pointwise convergence, apply DCT, lift to the full sequence — is not specific to \(L^p\). It is the standard template for proving completeness of function spaces throughout analysis. Sobolev spaces \(W^{k,p}\), which arise in the study of partial differential equations and physics-informed neural networks, are proven complete by the same strategy: reduce the problem to \(L^p\) completeness of the function and its derivatives. Understanding the architecture once equips you to recognize and deploy it wherever function space completeness is needed.

An Important Corollary

The Riesz-Fischer proof yields more than just completeness. Step 4 produced a subsequence \((f_{n_K})\) that converges to \(f\) both in \(L^p\) and pointwise a.e. This is worth recording as an independent result:

Corollary: Subsequence with Pointwise Convergence

If \(f_n \to f\) in \(L^p\) (\(1 \leq p \leq \infty\)), then there exists a subsequence \((f_{n_k})\) such that \(f_{n_k}(x) \to f(x)\) for a.e. \(x\).

This corollary connects \(L^p\) convergence (an integral condition) back to pointwise behavior (a condition on individual points). As we will see in the next section, the converse does not hold: pointwise a.e. convergence alone does not imply \(L^p\) convergence, and \(L^p\) convergence does not imply full pointwise a.e. convergence (only a subsequence is guaranteed).

Convergence in \(L^p\) — A Hierarchy of Modes

With \(L^p\) established as a Banach space, we can study convergence within it. But \(L^p\) convergence is only one of several natural notions of convergence for sequences of measurable functions. Understanding how these notions relate to one another is essential for working effectively with function spaces — and for bridging to probability theory, where the same hierarchy reappears under different names.

Four Notions of Convergence

Let \((f_n)\) be a sequence of measurable functions on \((\Omega, \mathcal{F}, \mu)\) and let \(f\) be a measurable function. We consider four modes of convergence:

Definition: Modes of Convergence

(i) \(L^p\) convergence: \(f_n \to f\) in \(L^p\) if \(\|f_n - f\|_p \to 0\).

(ii) Pointwise a.e. convergence: \(f_n \to f\) a.e. if \(f_n(x) \to f(x)\) for all \(x\) outside a set of measure zero.

(iii) Convergence in measure: \(f_n \to f\) in measure if, for every \(\epsilon > 0\), \[ \mu\bigl(\{x \in \Omega : |f_n(x) - f(x)| > \epsilon\}\bigr) \;\to\; 0 \quad \text{as } n \to \infty. \]

(iv) Uniform convergence a.e.: There exists a null set \(E\) such that \(f_n \to f\) uniformly on \(\Omega \setminus E\), i.e., \(\sup_{x \notin E} |f_n(x) - f(x)| \to 0\).

These are ordered roughly from "strongest" to "weakest" in terms of what they guarantee. Uniform a.e. convergence is the strongest (it is equivalent to \(L^\infty\) convergence); convergence in measure is the weakest of the four. The precise logical implications are as follows.

The Implication Map

Theorem: Relations Between Modes of Convergence

The following implications hold:

  1. \(L^p\) convergence \(\Rightarrow\) convergence in measure.
  2. \(L^p\) convergence \(\Rightarrow\) some subsequence converges a.e. (the Riesz-Fischer corollary above).
  3. Pointwise a.e. convergence + domination by \(h \in L^p\) \(\Rightarrow\) \(L^p\) convergence (the DCT).
  4. Convergence in measure \(\Rightarrow\) some subsequence converges a.e.

No other implications hold in general. In particular:

  • \(L^p\) convergence does not imply pointwise a.e. convergence.
  • Pointwise a.e. convergence does not imply \(L^p\) convergence (without domination).
  • Pointwise a.e. convergence does not imply convergence in measure (on infinite measure spaces).

On finite measure spaces (such as probability spaces), the picture tightens: pointwise a.e. convergence does imply convergence in measure, and Egorov's theorem guarantees that for every \(\epsilon > 0\), there exists a measurable set \(E\) with \(\mu(E) < \epsilon\) such that \(f_n \to f\) uniformly on \(\Omega \setminus E\). This "almost uniform convergence" will reappear naturally when we study convergence of random variables in the setting of measure-theoretic probability.

Proof of (1): \(L^p\) convergence \(\Rightarrow\) convergence in measure.

This follows from the Chebyshev-Markov inequality. For any \(\epsilon > 0\): \[ \mu\bigl(\{|f_n - f| > \epsilon\}\bigr) \;=\; \mu\bigl(\{|f_n - f|^p > \epsilon^p\}\bigr) \;\leq\; \frac{1}{\epsilon^p} \int_\Omega |f_n - f|^p \, d\mu \;=\; \frac{\|f_n - f\|_p^p}{\epsilon^p}. \] If \(\|f_n - f\|_p \to 0\), the right side tends to zero, so \(f_n \to f\) in measure.

The Traveling Bump: Why \(L^p\) Does Not Imply A.E.

Counterexample:

Consider the interval \([0, 1]\) with Lebesgue measure. We construct a sequence of indicator functions that converges to zero in \(L^p\) but does not converge at any point.

Enumerate the intervals \([0, 1],\; [0, 1/2],\; [1/2, 1],\; [0, 1/3],\; [1/3, 2/3],\; [2/3, 1],\; [0, 1/4],\; \ldots\) and let \(f_n = \chi_{I_n}\) where \(I_n\) is the \(n\)-th interval in this enumeration. Then \(\|f_n\|_p = |I_n|^{1/p} \to 0\) as \(n \to \infty\), so \(f_n \to 0\) in \(L^p\).

However, for every \(x \in [0, 1]\), the sequence \((f_n(x))\) takes the value \(1\) infinitely often and the value \(0\) infinitely often (since the intervals sweep across \([0,1]\) repeatedly with decreasing widths). Therefore \(f_n(x)\) does not converge for any \(x\) — not even on a set of full measure.

This counterexample shows that \(L^p\) convergence is fundamentally an average condition: it says the integrated \(p\)-th power of the difference is small, but it does not control where the function is large on any given day. The Riesz-Fischer corollary guarantees only that a subsequence converges pointwise a.e. — the full sequence may oscillate wildly at each point.

The \(L^p\) Dominated Convergence Theorem

The standard DCT gives conditions under which pointwise convergence implies \(L^1\) convergence. We now state the natural \(L^p\) generalization, which follows immediately by applying the standard DCT to \(|f_n - f|^p\).

Theorem: \(L^p\)-Dominated Convergence

Let \(1 \leq p < \infty\). Suppose \(f_n \to f\) a.e., and there exists \(h \in L^p\) such that \(|f_n(x)| \leq h(x)\) for a.e. \(x\) and all \(n\). Then \(f \in L^p\) and \[ \|f_n - f\|_p \;\to\; 0 \quad \text{as } n \to \infty. \]

Proof:

Since \(|f_n| \leq h\) a.e. and \(f_n \to f\) a.e., we also have \(|f| \leq h\) a.e., so \(f \in L^p\). Now consider \(|f_n - f|^p \leq (|f_n| + |f|)^p \leq (2h)^p\) a.e. Since \(h \in L^p\), we have \((2h)^p \in L^1\). Also \(|f_n - f|^p \to 0\) a.e. Applying the standard DCT to the sequence \(|f_n - f|^p\) with dominating function \((2h)^p\), we conclude \(\int |f_n - f|^p \, d\mu \to 0\).

Looking Ahead: Probability and Convergence

The hierarchy of convergence modes we have just developed has a direct parallel in probability theory. When the measure space is a probability space \((\Omega, \mathcal{F}, \mathbb{P})\) and the functions are random variables:

A fourth mode — convergence in distribution — has no direct analogue in the function-space setting and is inherently probabilistic. The full picture, including the relationships between these probabilistic modes of convergence and their role in the law of large numbers and central limit theorem, is the subject of a future chapter on measure-theoretic probability.

Why Complete Function Spaces Are Essential

We have now proven that \(L^p\) is a Banach space: a normed vector space in which every Cauchy sequence converges. In Completeness, we motivated this property for general metric spaces as the absence of "holes." But for function spaces, completeness carries a far more concrete significance: it guarantees that the result of a limiting operation is still a legitimate object in the space — a function with finite energy, a probability distribution with finite moments, or a physically meaningful quantum state.

We close this chapter by examining three domains where completeness of \(L^p\) is not a mathematical luxury but an absolute necessity.

Probability Theory: Finite Moments and Estimation

In probability, a random variable \(X\) on a probability space \((\Omega, \mathcal{F}, \mathbb{P})\) is simply a measurable function \(X : \Omega \to \mathbb{R}\). Saying \(X \in L^p(\Omega, \mathbb{P})\) means precisely that the \(p\)-th moment is finite: \[ \mathbb{E}\bigl[|X|^p\bigr] \;=\; \int_\Omega |X|^p \, d\mathbb{P} \;<\; \infty. \] The case \(p = 2\) is especially important: \(X \in L^2\) means that both the mean and the variance are finite, and \(L^2(\Omega, \mathbb{P})\) is a Hilbert space with inner product \(\langle X, Y \rangle = \mathbb{E}[XY]\).

Completeness of \(L^2\) guarantees that the orthogonal projection onto any closed subspace exists. This is the mathematical foundation of least-squares estimation: the conditional expectation \(\mathbb{E}[X \mid \mathcal{G}]\) is the \(L^2\)-projection of \(X\) onto the subspace of \(\mathcal{G}\)-measurable random variables. Without completeness, the projection might not land inside the space — the "best estimate" might not exist as a random variable with finite variance.

Hölder's inequality also takes on a probabilistic reading: for random variables \(X \in L^p\) and \(Y \in L^q\), \[ \mathbb{E}[|XY|] \;\leq\; \bigl(\mathbb{E}[|X|^p]\bigr)^{1/p} \, \bigl(\mathbb{E}[|Y|^q]\bigr)^{1/q}. \] This bounds the expectation of a product in terms of individual moment conditions — a tool used constantly in proving concentration inequalities, convergence theorems, and the convergence rates of estimators.

Signal Processing: Finite Energy and Fourier Reconstruction

In signal processing, a signal \(f : \mathbb{R} \to \mathbb{C}\) has finite energy if \[ \|f\|_2^2 \;=\; \int_{-\infty}^{\infty} |f(t)|^2 \, dt \;<\; \infty. \] The space of finite-energy signals is exactly \(L^2(\mathbb{R})\). Plancherel's theorem states that the Fourier transform preserves this energy: \[ \|f\|_{L^2}^2 \;=\; \frac{1}{2\pi}\|\hat{f}\|_{L^2}^2. \] In other words, the (normalized) Fourier transform is a unitary operator on \(L^2(\mathbb{R})\) — it is an isometry that maps \(L^2\) onto itself.

But unitarity is only meaningful if the space is complete. If \(L^2\) had "holes," the Fourier transform of a finite-energy signal might land outside the space — there would be frequency representations that correspond to no legitimate time-domain signal, or vice versa. Completeness ensures that the Fourier transform is a bijection on \(L^2\), that every finite-energy spectrum reconstructs a finite-energy signal, and that Parseval's identity holds with exact equality. The entire mathematical framework of spectral analysis rests on the Riesz-Fischer theorem.

Quantum Mechanics: Wave Functions and Unitary Evolution

In quantum mechanics, the state of a particle is described by a wave function \(\psi \in L^2(\mathbb{R}^3)\) satisfying the normalization condition \(\|\psi\|_2 = 1\). The physical interpretation is Born's rule: \(|\psi(x)|^2\) is the probability density for finding the particle at position \(x\). The \(L^2\) norm being \(1\) ensures that probabilities sum to \(1\).

Time evolution is governed by the Schrödinger equation, whose solution is a one-parameter family of unitary operators \(U(t) = e^{-iHt/\hbar}\) acting on \(L^2(\mathbb{R}^3)\). Unitarity means \(\|U(t)\psi\|_2 = \|\psi\|_2 = 1\) for all \(t\) — probability is conserved under time evolution.

If \(L^2\) were not complete, the time evolution of a quantum state could converge to a limit that is not a valid wave function — it would have infinite energy or fail to be square-integrable, making the probability interpretation collapse. Completeness guarantees that unitary evolution stays within the space of physical states, and that the spectral decomposition of observables (the spectral theorem) produces well-defined measurement outcomes. In this sense, the Riesz-Fischer theorem is not merely a mathematical convenience — it is a precondition for the logical consistency of quantum theory.

The Common Thread

Across all three domains, the pattern is the same. Each field relies on limiting operations — expectations of infinite sums, inverse Fourier transforms, time evolution of differential equations — and completeness is the guarantee that these limits remain within the space of objects that have physical or mathematical meaning. An estimator with finite variance. A signal with finite energy. A quantum state with total probability one.

In Completeness, we described a complete metric space as one "without holes." Here we see what that metaphor means concretely for function spaces: a "hole" in \(L^p\) would be a sequence of perfectly legitimate functions — each with finite \(p\)-th integral — whose limit escapes to something infinite, undefined, or physically meaningless. The Riesz-Fischer theorem seals every such hole.

Looking Ahead

This chapter has established \(L^p\) as a Banach space and settled the proof debts deferred from Intro to Functional Analysis and Dual Spaces. The road ahead branches in two complementary directions:

Historically, the theorem that bears the names of Frigyes Riesz and Ernst Fischer was originally proven (independently, in 1907) for the case \(p = 2\) alone: they showed that \(L^2\) is complete, and that the map sending a function to its Fourier coefficients is an isometric isomorphism between \(L^2\) and \(\ell^2\). The generalization to all \(1 \leq p \leq \infty\) came later, but the original \(L^2\) result remains the most consequential — it is what makes Fourier analysis in Hilbert spaces possible.

Both paths build directly on the completeness of \(L^p\) proven here — the first by specializing to the richest structure (\(L^2\) as a Hilbert space), the second by specializing to the richest interpretation (\(L^p\) of random variables on a probability space).