Completeness

Introduction

We briefly introduced completeness in earlier chapters as a property ensuring metric spaces have "no holes." Now, equipped with the theory of convergence and Cauchy sequences, we can develop completeness fully and prove one of the most important theorems in analysis: Banach's Fixed-Point Theorem.

This theorem states that in a complete metric space, every contraction mapping has a unique fixed point, and iterative application of the mapping converges to that fixed point. This is not an abstract curiosity. It is the theoretical backbone of countless algorithms: Newton's method, value iteration in reinforcement learning, and many optimization schemes.

Understanding why these algorithms converge requires understanding completeness at a deeper level than our initial preview allowed.

Complete Metric Spaces

Recall that in the chapter on metric spaces, we introduced the universal criterion for completeness: a metric space \(X\) is complete if and only if \(X\) is closed in every metric superspace of \(X\). In the chapter on convergence, we introduced Cauchy sequences and stated without proof an equivalent intrinsic characterization of completeness. We now establish that equivalence formally.

Theorem: Cauchy Criterion for Completeness A metric space \(X\) is complete if and only if every Cauchy sequence in \(X\) converges in \(X\).

Proof:

(\(\Rightarrow\)) Suppose \(X\) is complete in the universal sense, and let \(\{x_n\}\) be a Cauchy sequence in \(X\). We argue by contradiction: assume \(\{x_n\}\) does not converge to any point of \(X\). We manufacture a superspace in which \(X\) fails to be closed, contradicting completeness.

Adjoin a single new point \(\omega \notin X\) and set \(Y = X \cup \{\omega\}\). Define \(e : Y \times Y \to [0, \infty)\) by \[ e(a, b) = d(a, b) \ \ (a, b \in X), \qquad e(a, \omega) = e(\omega, a) = \lim_{m \to \infty} d(a, x_m) \ \ (a \in X), \qquad e(\omega, \omega) = 0. \] The limit defining \(e(a, \omega)\) exists: the reverse triangle inequality \(|d(a, x_m) - d(a, x_k)| \le d(x_m, x_k)\) shows that \(\{d(a, x_m)\}_m\) is a Cauchy sequence of reals, which converges by completeness of \(\mathbb{R}\).

We verify that \(e\) is a metric on \(Y\). Non-negativity and symmetry are immediate. For the identity of indiscernibles, the only non-trivial case is \(e(a, \omega) = 0\) with \(a \in X\): this would give \(d(a, x_m) \to 0\), i.e. \(x_m \to a\) in \(X\), contradicting our assumption that \(\{x_n\}\) has no limit in \(X\). Hence \(e(p, q) = 0\) if and only if \(p = q\). For the triangle inequality, the cases involving \(\omega\) follow by passing to the limit in the corresponding inequality for \(d\): for \(a, b \in X\), taking \(m \to \infty\) in \(d(a, x_m) \le d(a, b) + d(b, x_m)\) yields \(e(a, \omega) \le e(a, b) + e(b, \omega)\), and in \(d(a, b) \le d(a, x_m) + d(x_m, b)\) yields \(e(a, b) \le e(a, \omega) + e(\omega, b)\). Since \(e\) restricts to \(d\) on \(X \times X\), the space \((Y, e)\) is a metric superspace of \((X, d)\).

In \((Y, e)\) we have \(x_n \to \omega\): given \(\varepsilon \gt 0\), choose \(N\) with \(d(x_n, x_m) \lt \varepsilon\) for all \(n, m \ge N\); fixing \(n \ge N\) and letting \(m \to \infty\) gives \(e(x_n, \omega) = \lim_m d(x_n, x_m) \le \varepsilon\). Consequently \(\operatorname{dist}(\omega, X) = \inf_{a \in X} e(\omega, a) = 0\). At the same time \(X^c = Y \setminus X = \{\omega\}\), so \(\operatorname{dist}(\omega, X^c) = e(\omega, \omega) = 0\). By definition \(\omega\) is thus a boundary point of \(X\) in \(Y\); since \(\omega \notin X\), the boundary \(\partial X\) is not contained in \(X\), so \(X\) is not closed in \(Y\). This contradicts the completeness of \(X\). Hence \(\{x_n\}\) converges in \(X\).

(\(\Leftarrow\)) Suppose every Cauchy sequence in \(X\) converges in \(X\), and let \((Y, e)\) be any metric superspace of \(X\). To show \(X\) is closed in \(Y\), take any \(z \in \overline{X}^{\,Y}\). If \(z \in X\) there is nothing to prove, so assume \(z \in Y \setminus X\); then \(z\) is a boundary point, and every ball about \(z\) meets \(X\). Choose \(x_n \in X\) with \(e(x_n, z) \lt 1/n\), so that \(x_n \to z\) in \(Y\). A convergent sequence is Cauchy, and since \(e\) agrees with \(d\) on \(X\), the sequence \(\{x_n\}\) is Cauchy in \((X, d)\) as well. By hypothesis it converges to some \(x \in X\); then \(x_n \to x\) in \(Y\) too, and by uniqueness of limits in \(Y\) we conclude \(z = x \in X\). Hence \(\overline{X}^{\,Y} \subseteq X\), so \(\partial X \subseteq X\) and \(X\) is closed in \(Y\). As \(Y\) was arbitrary, \(X\) is complete.

Insight: Why Completeness Matters in Machine Learning

Parameter spaces for neural networks live in \(\mathbb{R}^n\), which is complete. Thus, gradient-based optimizations have a valid "target" to converge to (provided contraction conditions hold).

However, function spaces require careful treatment:

\(C([0,1])\) with the uniform metric \(d_\infty\) is complete - uniform limits of continuous functions are continuous.
\(C([0,1])\) with the integral metric \(d_1(f,g) = \int_0^1 |f(x) - g(x)| \, dx\) is not complete. Its completion is the Lebesgue space \(L^1([0,1])\).
In Reinforcement Learning over a state space \(\mathcal{S}\), value functions live in \(\ell^\infty(\mathcal{S})\), the space of bounded functions under the supremum norm \(\|V\|_\infty = \sup_{s \in \mathcal{S}} |V(s)|\), which is complete. Pointwise values matter here, so the supremum norm — not an almost-everywhere essential supremum — is the norm in play. Completeness guarantees that the Bellman operator, a contraction in this norm, has a fixed point (the optimal value function) within the space.

Completion of a Metric Space

What if our space isn't complete? Can we "fill the holes"? In fact, every metric space has a minimal complete superspace.

Definition: Completion of a Metric Space Suppose \((X, d)\) is a metric space. A metric space \((Y, e)\) is called a completion of \((X, d)\) if and only if \((Y, e)\) is complete and \((X, d)\) is isometric to a dense subspace of \((Y, e)\).

From a mathematical perspective, the existence of a completion is only half the story. If we could "complete" a space in multiple ways that result in fundamentally different structures, the concept would lose its power. The Uniqueness Theorem ensures that the completion of a metric space is unique "up to isometry" - meaning any two completions are essentially identical in their distance-preserving structure.

In Computer Science, this is about the canonicity of representation. Consider the challenge of representing real numbers using different underlying logic or data structures.

The Uniqueness Theorem provides the formal proof that if we design a system to handle "incomplete" data (like rational numbers) and extend it to handle all limit points, the resulting system's behavioral properties are invariant regardless of the underlying encoding (e.g., Cauchy sequences vs. Dedekind cuts). This ensures that algorithms relying on the "structure" of the space remain robust across different implementations.

Theorem: Uniqueness of Completion

Suppose that \((X, d)\) is a metric space and \((X', m)\) and \((X'', s)\) are completions of \(X\), where \(\psi: X \to X'\) and \(\phi: X \to X''\) are isometries onto dense subspaces of \(X'\) and \(X''\), respectively. Then there exists an isometry from \(X'\) to \(X''\) that maps \(\psi(X)\) onto \(\phi(X)\).

The proof relies on a general principle about extending functions from a dense subset. If \(f: S \to Y\) is uniformly continuous on a dense subset \(S \subseteq X\) and \(Y\) is complete, then \(f\) admits a unique continuous extension to all of \(X\). The idea is natural: for any \(x \in X\), density provides a sequence \(s_n \in S\) with \(s_n \to x\); uniform continuity transfers the Cauchy property of \(\{s_n\}\) to \(\{f(s_n)\}\) in \(Y\); and completeness of \(Y\) produces a limit, which serves as the extended value \(\tilde f(x)\). Well-definedness (independence of the approximating sequence) again comes from uniform continuity. Furthermore, if \(f\) happens to be an isometry, the extension is an isometry too — distance preservation passes through the limit by continuity.

We formalize the notion of extension before proceeding.

Definition: Extension and Restriction Suppose \(f\) and \(g\) are functions. We say that \(f\) is a restriction of \(g\) and that \(g\) is an extension of \(f\) if and only if \(\operatorname{dom}(f) \subseteq \operatorname{dom}(g) \) and \(f(x) = g(x)\) for all \(x \in \operatorname{dom}(f)\). In this case, if \(A = \operatorname{dom}(f)\), we say that \(f\) is the restriction of \(g\) to \(A\) and write \(f = g \mid_A\).

Applying this extension principle, the outline of the proof for the Uniqueness of Completion is as follows.

Proof Sketch:

The map \(\phi \circ \psi^{-1}\) is an isometry from the dense subspace \(\psi(X)\) of \(X'\) onto the dense subspace \(\phi(X)\) of \(X''\). Since \(X''\) is complete, \(\phi \circ \psi^{-1}\) has a unique continuous extension \(f\) to \(X'\) (by the extension principle above), and this extension is isometric because the isometry equation passes through limits. Because \(f\) is an isometry and \(X'\) is complete, the range \(f(X')\) inherits completeness (Cauchy sequences transfer across isometries), and by the complete subset theorem applied in \(X''\), \(f(X')\) is closed in \(X''\). Since \(f(X') \supseteq \phi(X)\), it contains the closure \(\overline{\phi(X)} = X''\). Thus, \(f\) is an isometry from \(X'\) onto \(X''\).

Example: \(\mathbb{Q}\) Completes to \(\mathbb{R}\)

The completion of the rational numbers \((\mathbb{Q}, |\cdot|)\) is the real numbers \((\mathbb{R}, |\cdot|)\). While we typically construct \(\mathbb{R}\) as equivalence classes of Cauchy sequences, there are other historically significant methods, such as Dedekind cuts.

Cauchy Sequences:
Constructing \(\mathbb{R}\) via the limits of sequences that "should" converge (e.g., \((1, 1.4, 1.41, \dots) \to \sqrt{2}\)).
Dedekind Cuts:
Constructing \(\mathbb{R}\) by partitioning \(\mathbb{Q}\) into two sets (those less than \(\sqrt{2}\) and those greater).

The Uniqueness Theorem provides the vital bridge: it proves that these two entirely different mathematical "implementations" result in the same structural reality. Without this theorem, we couldn't confidently speak of "The Real Numbers"; we would be stuck with "the real numbers according to Cauchy" or "the real numbers according to Dedekind." Uniqueness allows us to treat \(\mathbb{R}\) as a single, canonical abstract data type.

Cantor's Intersection Theorem

Cantor's Intersection Theorem provides an alternative characterization of completeness through nested closed sets. This perspective is particularly useful in existence proofs, where we construct a solution by progressively narrowing the search region.

Definition: Nest A non-empty collection \(\mathcal{N}\) of sets is called a nest if and only if for each \(A, B \in \mathcal{N}\), either \(A \subseteq B\) or \(B \subseteq A\).

The nest formulation repackages the Cauchy criterion in the language of shrinking closed sets. A sequence with diminishing tails is exactly a Cauchy sequence; taking closures of those tails produces a nested family of closed sets whose diameters tend to zero, and a point common to all of them is precisely a limit of the sequence. This yields the following criteria for completeness, which recast it as the guarantee that shrinking closed containers always trap a point.

Theorem: Cantor's Intersection Theorem (Nest & Nested Sequence Criteria for Completeness) A metric space \(X\) is complete if and only if every nest \(\mathcal{F}\) of non-empty closed subsets of \(X\) for which \[ \inf \{\operatorname{diam}(A) \mid A \in \mathcal{F}\} = 0 \] has a singleton intersection.

The following statement is equivalent:

A metric space \(X\) is complete if and only if every sequence \(\{F_n\}\) of non-empty closed subsets of \(X\) for which \(F_{n+1} \subseteq F_n\) for each \(n \in \mathbb{N}\) and \(\operatorname{diam}(F_n) \to 0\) has singleton (hence non-empty) intersection.

Proof Sketch. (\(\Rightarrow\)) Pick \(x_n \in F_n\); since \(\operatorname{diam}(F_n) \to 0\), \(\{x_n\}\) is Cauchy, and by completeness converges to some \(z \in X\). Each \(F_n\) is closed and eventually contains the tail of \(\{x_n\}\), so \(z \in F_n\) for every \(n\); hence \(z \in \bigcap F_n\), and the singleton property follows from \(\operatorname{diam}(F_n) \to 0\). (\(\Leftarrow\)) We argue by contrapositive. If \(X\) is not complete, the Cauchy Criterion yields a non-convergent Cauchy sequence \(\{x_n\}\). Let \(F_n\) be the closure of the tail \(\{x_k : k \geq n\}\) in \(X\): each \(F_n\) is non-empty and closed, the sequence is nested since later tails are subsets, and \(\operatorname{diam}(F_n) \to 0\) because the tail diameters vanish by the Cauchy property while closure preserves diameter. If some \(z \in \bigcap F_n\), then \(x_n, z \in F_n\) gives \(d(x_n, z) \leq \operatorname{diam}(F_n) \to 0\), forcing \(x_n \to z\) — contradicting non-convergence. Hence \(\bigcap F_n = \emptyset\), so the nest \(\{F_n\}\) violates the hypothesis, and the contrapositive is established.

This nest criterion also yields an important characterization of complete subsets.

Theorem: Complete Subset Suppose \(X\) is a complete metric space and \(S \subseteq X\). Then \(S\) is complete if and only if \(S\) is closed in \(X\).

Proof:

(\(\Rightarrow\)) \(S\) complete \(\Rightarrow\) \(S\) closed. Suppose, for contradiction, that \(S\) is not closed in \(X\): there exists \(x \in \partial S\) with \(x \notin S\). By the definition of a boundary point, \(\operatorname{dist}(x, S) = 0\), so for each \(n \in \mathbb{N}\) we can pick \(s_n \in S\) with \(d(s_n, x) < 1/n\). Then \(s_n \to x\) in \(X\) (ball form of convergence), hence \(\{s_n\}\) is Cauchy in \(X\). To transfer this to \(S\): given \(r > 0\), some ball \(\mathcal{B}^X[c; r/2)\) of \(X\) includes a tail \(\{s_n : n \geq N\}\); re-centering at \(s_N \in S\), the triangle inequality gives \(d(s_n, s_N) \leq d(s_n, c) + d(c, s_N) \lt r/2 + r/2 = r\) for all \(n \geq N\), so the tail lies in \(\mathcal{B}^S[s_N; r)\). Thus \(\{s_n\}\) is also Cauchy in \(S\). By completeness of \(S\), \(s_n \to s\) for some \(s \in S\). Since every ball of \(S\) is included in the corresponding ball of \(X\), convergence in \(S\) implies convergence in \(X\) to the same point, so \(s_n \to s\) in \(X\) as well. Uniqueness of limits in \(X\) then forces \(s = x\), so \(x \in S\), contradicting \(x \notin S\). Hence \(S\) is closed.

(\(\Leftarrow\)) \(S\) closed \(\Rightarrow\) \(S\) complete. Let \(\{s_n\}\) be Cauchy in \(S\). Since any ball of \(S\) is included in the corresponding ball of \(X\), \(\{s_n\}\) is also Cauchy in \(X\); by completeness of \(X\) and the Cauchy Criterion, \(s_n \to x\) for some \(x \in X\). We claim \(x \in S\). Any open ball \(\mathcal{B}^X[x; r)\) includes a tail of \(\{s_n\} \subseteq S\), so \(\mathcal{B}^X[x; r) \cap S \neq \emptyset\) for every \(r > 0\); hence \(\operatorname{dist}(x, S) = 0\). If \(x \notin S\), then \(x \in S^c\) gives \(\operatorname{dist}(x, S^c) = 0\) as well, so \(x \in \partial S\); but \(S\) is closed, forcing \(x \in S\) — a contradiction. Thus \(x \in S\), and by the subspace convergence theorem, \(\{s_n\}\) converges to \(x\) in \(S\). Hence \(S\) is complete.

CS Insight: The Bridge Between Ideal Convergence and Algorithmic Reality

While computer algorithms are inherently finite, Cantor's Intersection Theorem provides the theoretical upper bound and the logical justification for their convergence.

Bisection Method (Termination Logic):
Numerical solvers rely on Cantor's theorem to guarantee that a "true" root exists within the shrinking interval. When we stop at a tolerance \(\epsilon\), the theorem ensures that our result is a valid approximation of a unique point in \(\mathbb{R}\), rather than an empty search in a space with "holes."
Spatial Indexing (Numerical Stability):
Data structures like Kd-trees partition space into nested regions. The theorem justifies the stability of point-location queries: as the diameter of the partition shrinks, the identity of the trapped point becomes mathematically certain.
Floating-Point as a Limit-Logic:
Floating-point numbers are discrete and finite, but they are designed to shadow the complete structure of \(\mathbb{R}\). Cantor's theorem is the blueprint for this design, allowing us to treat increasing precision as a process of converging toward a canonical mathematical object.

The Banach Fixed-Point Theorem

We now formally explain why many optimization methods converge.

Definition: Fixed Point Suppose \(X\) is a non-empty set and \(f: X \to X\). A point \(x \in X\) is called a fixed point for \(f\) if and only if \(f(x) = x\).

Theorem: Banach's Fixed-Point Theorem Suppose \((X, d)\) is a complete metric space and \(f: X \to X\) is a (strong) contraction on \(X\) with contraction factor \(k \in [0, 1)\). Then \(f\) has a unique fixed point in \(X\) and, for each \(w \in X\), the sequence \(\{f^n(w)\}\) converges to this point.

Note that for \(n \in \mathbb{N}\), \(f^n\) is the composition of \(n\) copies of \(f\). We shall use the following theorem in the proof.

Theorem: Iterates of a Contraction Form a Cauchy Sequence Suppose \((X, d)\) is a metric space and \(f\) is a (strong) contraction on \(X\) with contraction factor \(k \in [0, 1)\). Then,

for each \(a, b \in X\), the real sequence \(\{d(f^n(a), f^n(b))\}\) converges to \(0\);
for each \(x \in X\), the sequence \(\{f^n(x)\}\) is a Cauchy sequence in \(X\).

Proof:

(1) By iterating the contraction inequality, \[ d(f^n(a), f^n(b)) \leq k \cdot d(f^{n-1}(a), f^{n-1}(b)) \leq \cdots \leq k^n \cdot d(a, b). \] Since \(k \in [0, 1)\), we have \(k^n \to 0\), so \(d(f^n(a), f^n(b)) \to 0\).

(2) Fix \(x \in X\) and set \(c := d(x, f(x))\). By (1) applied with \(a = x, b = f(x)\), we obtain \(d(f^n(x), f^{n+1}(x)) \leq k^n c\). For \(m > n\), the triangle inequality and the geometric series yield \[ d(f^n(x), f^m(x)) \leq \sum_{i=n}^{m-1} d(f^i(x), f^{i+1}(x)) \leq \sum_{i=n}^{m-1} k^i c \leq \frac{k^n}{1 - k} \cdot c. \] Since \(k^n \to 0\), the right-hand side can be made arbitrarily small by choosing \(n\) large, which establishes the Cauchy property.

Proof: Suppose that \(w \in X\). By the preceding theorem, \(\{f^n(w)\}\) is Cauchy in \(X\), and since \(X\) is complete, this sequence converges in \(X\).
Let \(z = \lim f^n(w)\). Since \(f\) is a contraction, it is uniformly continuous (Lipschitz with constant \(k\)), hence continuous, so \(f(x_n) \to f(z)\) whenever \(x_n \to z\). In particular, \[ f(z) = \lim_{n \to \infty} f(f^n(w)) = \lim_{n \to \infty} f^{n+1}(w). \] Since \(\{f^{n+1}(w)\}\) is a subsequence of \(\{f^n(w)\}\), it has the same limit: \[ \lim_{n \to \infty} f^{n+1}(w) = \lim_{n \to \infty} f^n(w) = z, \] so that \(f(z) = z\) as required.
Contraction \(f\) has no other fixed point: if \(a \in X\) were also fixed, then \[ d(a, z) = d(f(a), f(z)) \leq k \, d(a, z), \] which forces \(d(a, z) = 0\) and thus \(a = z\) because \(k \lt 1\).

Corollary: Convergence Rate

Under the hypotheses of Banach's theorem, let \(z\) be the unique fixed point. For any starting point \(w \in X\): \[ d(f^n(w), z) \leq \frac{k^n}{1-k} \cdot d(w, f(w)). \] In particular, convergence is geometric (linear in the logarithmic scale) with rate \(k\).

Proof:

For \(m > n\), triangle inequality and geometric series give: \[ d(f^n(w), f^m(w)) \leq \sum_{i=n}^{m-1} d(f^i(w), f^{i+1}(w)) \leq \sum_{i=n}^{m-1} k^i \cdot d(w, f(w)) \leq \frac{k^n}{1-k} \cdot d(w, f(w)). \] Taking \(m \to \infty\) yields the result.

Insight: Fixed-Point Iterations in Optimization and Learning

Banach's theorem transforms existence and convergence questions into verification of two conditions: (1) the space is complete, and (2) the iteration mapping is a contraction.

Newton's Method
For solving \(g(x) = 0\), Newton's iteration is \[f(x) = x - \frac{g(x)}{g'(x)}. \] Under sufficient smoothness and starting close enough to a simple root \(x^*\), one can show \(f\) is a contraction on a neighborhood of \(x^*\) (in fact, convergence is quadratic, not merely linear).

Gradient Descent (Convex Case)
For minimizing \(h: \mathbb{R}^n \to \mathbb{R}\) with \(L\)-Lipschitz gradient and strong convexity parameter \(\mu > 0\): \[ f(x) = x - \alpha \nabla h(x) \] is a contraction with \(k = \max(|1 - \alpha L|, |1 - \alpha \mu|)\). Choosing \(\alpha = \frac{2}{L + \mu}\) minimizes \(k\) to \(\frac{L - \mu}{L + \mu} \lt 1\).
Note: In Deep Learning, loss functions are rarely convex, so global contraction is not guaranteed, but local behavior around minima often approximates this dynamics.

Value Iteration (Reinforcement Learning)
The Bellman backup: \[ V_{k+1}(s) = \max_a \left[ R(s, a) + \gamma \sum_{s'} p_T(s' \mid s, a) V_k (s') \right]. \] is a \(\gamma\)-contraction with the supremum norm \(\|\cdot\|_\infty\), where \(\gamma \in (0,1)\) is the discount factor. The space of bounded value functions is complete under \(\|\cdot\|_\infty\), so value iteration converges to the unique optimal value function \(V_*\) as \(k \to \infty\).

Neumann Series and Iterative Methods:
In Policy Evaluation, we solve the linear system \(\mathbf{v} = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\), which formally yields \(\mathbf{v} = (\mathbf{I} - \gamma \mathbf{T})^{-1} \mathbf{r}\). Calculating this inverse directly is computationally expensive (\(O(|\mathcal{S}|^3)\)).

However, because the operator \(T_{\gamma}(\mathbf{v}) = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\) is a contraction, we can apply the Neumann Series expansion: \[ (\mathbf{I} - \gamma \mathbf{T})^{-1} = \sum_{k=0}^{\infty} (\gamma \mathbf{T})^k = \mathbf{I} + \gamma \mathbf{T} + \gamma^2 \mathbf{T}^2 + \dots \] This power series converges because the spectral radius \(\rho(\gamma \mathbf{T}) \lt 1\). This provides the mathematical guarantee that the iterative updates in Reinforcement Learning (CS approach) will perfectly converge to the analytical solution of the Bellman equation (Math approach).

Loading...

Introduction

Complete Metric Spaces

Insight: Why Completeness Matters in Machine Learning

Completion of a Metric Space

Cantor's Intersection Theorem

CS Insight: The Bridge Between Ideal Convergence and Algorithmic Reality

The Banach Fixed-Point Theorem

Insight: Fixed-Point Iterations in Optimization and Learning