Introduction
We briefly introduced completeness in earlier chapters as a property ensuring metric spaces have
"no holes." Now, equipped with the theory of convergence and Cauchy sequences, we can develop
completeness fully and prove one of the most important theorems in analysis:
Banach's Fixed-Point Theorem.
This theorem states that in a complete metric space, every contraction mapping has
a unique fixed point, and iterative application of the mapping converges to that fixed point. This is not an
abstract curiosity. It is the theoretical backbone of countless algorithms: Newton's method, value iteration in reinforcement learning, and many optimization schemes.
Understanding why these algorithms converge requires understanding completeness at a
deeper level than our initial preview allowed.
Complete Metric Spaces
Remember, we introduced the universal criterion for completeness as the first definition of
a complete metric space. (See Part 16.)
Universal Criterion for Completeness
A metric space \(X\) is said to be complete if and only if
\(X\) is closed in every metric superspace of \(X\).
After that, we discussed the completeness in terms of the Cauchy sequence.
(See Part 17.)
Theorem: Cauchy Criterion for Completeness
A metric space \(X\) is said to be complete if and only if
every Cauchy sequence in \(X\) converges in \(X\).
Another equivalent way to define completeness is given by the virtual point.
Definition: Virtual Points
Suppose \((X, d)\) is a metric space and \(u: X \to \mathbb{R}\). Then \(u\) is called a
virtual point
of \(X\) if and only if \(u\) satisfies the following conditions:
- \(\forall a, b \in X, \, |u(a) - u(b)| \leq d(a, b) \leq u(a) + u(b)\).
- \(\inf u(X) = 0\).
- \(0 \notin u(X)\).
Theorem: Virtual Point Criterion for Completeness
A metric space \(X\) is said to be complete if and only if \(X\) has no virtual points of \(X\).
A virtual point is a function that "measures distance to a missing point." If \(X\) has a hole
(like \(\mathbb{Q}\) missing \(\sqrt{2}\)), we can define \(u: \mathbb{Q} \to \mathbb{R}\) by
\(u(q) = |q - \sqrt{2}|\). This function satisfies all three conditions: it respects the triangle
inequality, its infimum is 0 (rationals get arbitrarily close to \(\sqrt{2}\)), yet no rational
achieves distance 0.
The virtual point criterion is equivalent to the Cauchy criterion: every non-convergent
Cauchy sequence \((x_n)\) in an incomplete space induces a virtual point via
\(u(a) = \lim_{n \to \infty} d(a, x_n)\).
Insight: Why Completeness Matters in Machine Learning
Parameter spaces for neural networks live in \(\mathbb{R}^n\), which is complete. Thus,
gradient-based optimizations have a valid "target" to converge to (provided contraction conditions hold).
However, function spaces require careful treatment:
-
\(C([0,1])\) with the uniform metric \(d_\infty\) is complete - uniform limits of
continuous functions are continuous.
-
\(C([0,1])\) with the integral metric \(d_1(f,g) = \int_0^1 |f(x) - g(x)| \, dx\) is
not complete. Its completion is the Lebesgue space \(L^1([0,1])\).
-
In Reinforcement Learning, the space of value functions is often modeled using
\(L^\infty\) (supremum norm) to ensure completeness. This guarantees that the Bellman operator,
which is a contraction, actually has a fixed point (the optimal value function) within the space.
Completion of a Metric Space
What if our space isn't complete? Can we "fill the holes"? In fact, every metric space has a minimal
complete superspace.
Definition: Completion of a Metric Space
Suppose \((X, d)\) is a metric space. A metric space \((Y, e)\) is called a completion
of \((X, d)\) if and only if \((Y, e)\) is complete and \((X, d)\) is isometric to a dense subspace of
\((Y, e)\).
From a mathematical perspective, the existence of a completion is only half the story. If we could
"complete" a space in multiple ways that result in fundamentally different structures, the concept
would lose its power. The Uniqueness Theorem ensures that the completion of a metric
space is unique "up to isometry" - meaning any two completions are essentially identical in their
distance-preserving structure.
In Computer Science, this is about the canonicity of representation.
Consider the challenge of representing real numbers using different underlying logic or data structures.
The Uniqueness Theorem provides the formal proof that if we design a system to handle "incomplete" data (like rational numbers)
and extend it to handle all limit points, the resulting system's behavioral properties are invariant regardless
of the underlying encoding (e.g., Cauchy sequences vs. Dedekind cuts). This ensures that algorithms relying on the "structure"
of the space remain robust across different implementations.
Theorem: Uniqueness of Completion
Suppose that \((X, d)\) is a metric space and \((X', m)\) and \((X'', s)\) are completions of \(X\),
where \(\psi: X \to X'\) and \(\phi: X \to X''\) are isometries onto dense subspaces of \(X'\) and \(X''\), respectively.
Then there exists an isometry from \(X'\) to \(X''\) that maps \(\psi(X)\) onto \(\phi(X)\).
To prove this theorem, we require the following results:
- Continuous Extension Theorem:
Suppose \((X, d)\) and \((Y, e)\) are metric spaces and \(Y\) is complete. Suppose \(S\) is a
dense subset of \(X\) and \(f: S \to Y\) is a uniformly continuous function. Then there exists a unique
continuous function \(\tilde{f}:X \to Y\) such that \(\tilde{f} \mid_S = f\). Moreover, this extension
\(\tilde{f}\) is also uniformly continuous.
- Isometry Extension Theorem:
Suppose \((X, d)\) and \((Y, e)\) are metric spaces and \(Y\) is complete. Suppose \(S\) is a
dense subset of \(X\) and \(f: S \to Y\) is an isometric map. Then the unique continuous extension
of \(f\) to \(X\) is also an isometry.
Note that the extension of functions is defined as follows:
Definition: Extension and Restriction
Suppose \(f\) and \(g\) are functions. We say that \(f\) is a restriction of \(g\) and that
\(g\) is an extension of \(f\) if and only if \(\text{dom}(f) \subseteq \text{dom}(g) \) and
\(f(x) = g(x)\) for all \(x \in \text{dom}(f)\). In this case, if \(A = \text{dom}(f)\), we say that \(f\) is
the restriction of \(g\) to \(A\) and write \(f = g \mid_A\).
Below is the outline of the proof for the Uniqueness of Completion.
Proof Sketch:
The map \(\phi \circ \psi^{-1}\) is an isometry from the dense subspace \(\psi(X)\) of \(X'\) onto
the dense subspace \(\phi(X)\) of \(X''\).
Since \(X''\) is complete, \(\phi \circ \psi^{-1}\) has a unique continuous extension \(f\) to \(X'\) by
the Continuous Extension Theorem, and this extension is isometric by the Isometry Extension Theorem.
Because \(f\) is an isometry, its range is complete and therefore closed in \(X''\).
Since this range includes \(\phi(X)\), it must also include the closure \(\overline{\phi(X)}\), which is \(X''\).
Thus, \(f\) is an isometry from \(X'\) onto \(X''\).
Example: \(\mathbb{Q}\) Completes to \(\mathbb{R}\)
The completion of the rational numbers \((\mathbb{Q}, |\cdot|)\) is the real numbers \((\mathbb{R}, |\cdot|)\).
While we typically construct \(\mathbb{R}\) as equivalence classes of Cauchy sequences, there
are other historically significant methods, such as Dedekind cuts.
- Cauchy Sequences:
Constructing \(\mathbb{R}\) via the limits of sequences that "should" converge
(e.g., \((1, 1.4, 1.41, \dots) \to \sqrt{2}\)).
- Dedekind Cuts:
Constructing \(\mathbb{R}\) by partitioning \(\mathbb{Q}\) into two sets
(those less than \(\sqrt{2}\) and those greater).
The Uniqueness Theorem provides the vital bridge: it proves that
these two entirely different mathematical "implementations" result in the same structural reality.
Without this theorem, we couldn't confidently speak of "The Real Numbers"; we would be stuck
with "the real numbers according to Cauchy" or "the real numbers according to Dedekind." Uniqueness
allows us to treat \(\mathbb{R}\) as a single, canonical abstract data type.
Cantor's Intersection Theorem
Cantor's Intersection Theorem provides an alternative characterization of completeness through nested closed sets.
This perspective is particularly useful in existence proofs, where we construct a solution by progressively
narrowing the search region.
Definition: Nest
A non-empty collection \(\mathcal{N}\) of sets is called a nest if and only if for
each \(A, B \in \mathcal{N}\), either \(A \subseteq B\) or \(B \subseteq A\).
Cantor's Intersection Theorem
Suppose \((X, d)\) is a metric space and \(\mathcal{F}\) is a nest of non-empty subsets of
\(X\) for which \(\inf \{\text{diam } (A) \mid A \in \mathcal{F}\} = 0\).
Suppose \(\bigcap \left\{\text{Cl}_{X}(A) \mid A \in \mathcal{F} \right\} = \emptyset\). Then, given \(z \notin X\),
\(d\) can be extended to be a metric on \(X' = X \cup \{z\}\) in such a way that
\[
\forall A \in \mathcal{F}, \, \text{Cl}_{X'} (A) = \text{Cl}_{X}(A) \cup \{z\}.
\]
Thus,
\[
\bigcap \left\{\text{Cl}_{X'} (A) \mid A \in \mathcal{F} \right\} = \{z\}.
\]
This leads to the following fundamental criteria for completeness, framing it as the guarantee that
"shrinking containers" always trap a point.
Theorem: Nest & Nested Sequence Criteria for Completeness
A metric space \(X\) is said to be complete if and only if
Every nest \(\mathcal{F}\) of non-empty closed subsets of \(X\) for which
\[
\inf \{\text{diam } (A) \mid A \in \mathcal{F}\} = 0
\]
has a singleton intersection.
The following statement is equivalent:
A metric space \((X)\) is said to be complete if and only if
every sequence \(\{F_n\}\) of non-empty closed subsets of \(X\) for which \(F_{n+1} \subseteq F_n\)
for each \(n \in \mathbb{N}\) and \(\text{diam }(F_n) \to 0\) has singleton (hence non-empty) intersection.
This nest criterion also yields an important characterization of complete subsets.
Theorem: Complete Subset
Suppose \(X\) is complete metric space and \(S \subseteq X\). Then \(S\) is complete if and only if
\(S\) is closed in \(X\).
CS Insight: The Bridge Between Ideal Convergence and Algorithmic Reality
While computer algorithms are inherently finite, Cantor's Intersection Theorem provides the
theoretical upper bound and the logical justification for their convergence.
-
Bisection Method (Termination Logic):
Numerical solvers rely on Cantor's theorem to guarantee that a "true" root exists within the
shrinking interval. When we stop at a tolerance \(\epsilon\), the theorem ensures that our
result is a valid approximation of a unique point in \(\mathbb{R}\), rather than an empty search
in a space with "holes."
-
Spatial Indexing (Numerical Stability):
Data structures like Kd-trees partition space into nested regions. The theorem justifies the stability
of point-location queries: as the diameter of the partition shrinks, the identity of the trapped point
becomes mathematically certain.
-
Floating-Point as a Limit-Logic:
Floating-point numbers are discrete and finite, but they are designed to shadow the complete
structure of \(\mathbb{R}\). Cantor's theorem is the blueprint for this design, allowing us to treat
increasing precision as a process of converging toward a canonical mathematical object.
The Banach Fixed-Point Theorem
We now formally explain why many optimization methods converge.
Definition: Fixed Point
Suppose \(X\) is a non-empty set and \(f: X \to X\). A point \(x \in X\) is called a fixed point
for \(f\) if and only if \(f(x) = x\).
Banach's Fixed-Point Theorem
Suppose \((X, d)\) is a complete metric space and \(f: X \to X\) is a
(strong) contraction on
\(X\) with Lipschitz constant \(k \in (0, 1)\). Then \(f\) has a unique fixed point in \(X\) and, for
each \(w \in X\), the sequence \(\{f^n(w)\}\) converges to this point.
Note that for \(n \in \mathbb{N}\), \(f^n\) is the composition of \(n\) copies of \(f\).
We shall use the following theorem in the proof.
Theorem:
Suppose \((X, d)\) is a metric space and \(f\) is a (strong) contraction on \(X\). Then,
- for each \(a, b \in X\), the real sequence \(\{d(f^n(a), f^n(b))\}\) converges to \(0\).
- for each \(x \in X\), the sequence \(\{f^n(x)\}\) is a Cauchy sequence in \(X\).
Proof:
Suppose that \(w \in X\). By the above theorem, \(\{f^n(w)\}\) is Cauchy in \(X\), and since \(X\) is
complete, this sequence converges in \(X\).
Let \(z = \lim f^n(w)\). Since contraction \(f\) is uniformly continuous on its domain, by the convergence
criterion of \(f\) at \(z\),
\[
f(z) = \lim f^{n+1}(w)
\]
and since \(\{f^{n+1}(w)\}\) is a subsequence of \(\{f^n(w)\}\),
\[
\lim_{n \to \infty} f^{n+1}(w) = \lim_{n \to \infty} f^n(w) = z,
\]
so that \(f(z) = z\) as required.
Contraction \(f\) has no other fixed point for, if \(a \in X\) were such a point, then we should have
\[
d(a, z) = d(f(a), f(z)) \leq k d(a, z)
\]
that forces \(d(a, z) = 0\) and thus \(a = z\) because \(k < 1\).
Corollary: Convergence Rate
Under the hypotheses of Banach's theorem, let \(z\) be the unique
fixed point. For any starting point \(w \in X\):
\[
d(f^n(w), z) \leq \frac{k^n}{1-k} \cdot d(w, f(w)).
\]
In particular, convergence is geometric (linear in
the logarithmic scale) with rate \(k\).
Proof:
For \(m > n\), triangle inequality and geometric series give:
\[
d(f^n(w), f^m(w)) \leq \sum_{i=n}^{m-1} d(f^i(w), f^{i+1}(w))
\leq \sum_{i=n}^{m-1} k^i \cdot d(w, f(w))
\leq \frac{k^n}{1-k} \cdot d(w, f(w)).
\]
Taking \(m \to \infty\) yields the result.
Insight: Fixed-Point Iterations in Optimization and Learning
Banach's theorem transforms existence and convergence questions into
verification of two conditions: (1) the space is complete, and
(2) the iteration mapping is a contraction.
Newton's Method
For solving \(g(x) = 0\), Newton's iteration is
\[f(x) = x - \frac{g(x)}{g'(x)}.
\]
Under sufficient smoothness and starting close enough to a simple root \(x^*\), one can show
\(f\) is a contraction on a neighborhood of \(x^*\) (in fact,
convergence is quadratic, not merely linear).
Gradient Descent (Convex Case)
For minimizing \(h: \mathbb{R}^n \to \mathbb{R}\) with
\(L\)-Lipschitz gradient and strong convexity parameter \(\mu > 0\):
\[
f(x) = x - \alpha \nabla h(x)
\]
is a contraction with \(k = \max(|1 - \alpha L|, |1 - \alpha \mu|)\).
Choosing \(\alpha = \frac{2}{L + \mu}\) minimizes \(k\) to
\(\frac{L - \mu}{L + \mu} < 1\).
Note: In Deep Learning, loss functions are rarely convex, so global contraction is not guaranteed,
but local behavior around minima often approximates this dynamics.
Value Iteration (Reinforcement Learning)
The Bellman backup:
\[
V_{k+1}(s) = \max_a \left[ R(s, a) + \gamma \sum_{s'} p_T(s' \mid s, a) V_k (s') \right].
\]
is a \(\gamma\)-contraction with the supremum norm \(\|\cdot\|_\infty\), where \(\gamma \in (0,1)\) is the
discount factor. The space of bounded value functions is complete under \(\|\cdot\|_\infty\), so value
iteration converges to the unique optimal value function \(V_*\) as \(k \to \infty\).
Neumann Series and Iterative Methods:
In Policy Evaluation, we solve the linear system \(\mathbf{v} = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\), which formally yields
\(\mathbf{v} = (\mathbf{I} - \gamma \mathbf{T})^{-1} \mathbf{r}\). Calculating this inverse directly is computationally expensive
(\(O(|\mathcal{S}|^3)\)).
However, because the operator \(T_{\gamma}(\mathbf{v}) = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\) is a
contraction, we can apply the Neumann Series expansion:
\[
(\mathbf{I} - \gamma \mathbf{T})^{-1} = \sum_{k=0}^{\infty} (\gamma \mathbf{T})^k = \mathbf{I} + \gamma \mathbf{T} + \gamma^2 \mathbf{T}^2 + \dots
\]
This power series converges because the spectral radius \(\rho(\gamma \mathbf{T}) < 1\). This provides the mathematical
guarantee that the iterative updates in Reinforcement Learning (CS approach) will perfectly converge to the
analytical solution of the Bellman equation (Math approach).