Completeness

Introduction Complete Metric Spaces Completion of a Metric Space Cantor's Intersection Theorem Banach's Fixed-Point Theorem

Introduction

We briefly introduced completeness in earlier chapters as a property ensuring metric spaces have "no holes." Now, equipped with the theory of convergence and Cauchy sequences, we can develop completeness fully and prove one of the most important theorems in analysis: Banach's Fixed-Point Theorem.

This theorem states that in a complete metric space, every contraction mapping has a unique fixed point, and iterative application of the mapping converges to that fixed point. This is not an abstract curiosity. It is the theoretical backbone of countless algorithms: Newton's method, value iteration in reinforcement learning, and many optimization schemes.

Understanding why these algorithms converge requires understanding completeness at a deeper level than our initial preview allowed.

Complete Metric Spaces

Recall that in the chapter on metric spaces, we introduced the universal criterion for completeness: a metric space \(X\) is complete if and only if \(X\) is closed in every metric superspace of \(X\). In the chapter on convergence, we introduced Cauchy sequences and stated without proof an equivalent intrinsic characterization of completeness. We now establish that equivalence formally.

Theorem: Cauchy Criterion for Completeness A metric space \(X\) is complete if and only if every Cauchy sequence in \(X\) converges in \(X\).
Proof Sketch:

(⇒) Suppose \(X\) is complete in the universal sense and let \(\{x_n\}\) be a Cauchy sequence in \(X\). Embed \(X\) in any metric superspace \(Y\). In \(Y\), the sequence remains Cauchy, and general arguments show that a Cauchy sequence has at most one accumulation point. Let this candidate limit be \(z \in Y\); since every neighborhood of \(z\) meets \(\{x_n\} \subseteq X\), we have \(z \in \overline{X}^Y\). By universal completeness, \(X\) is closed in \(Y\), so \(z \in X\), and \(x_n \to z\) in \(X\).

(⇐) Suppose every Cauchy sequence in \(X\) converges in \(X\). Let \(Y\) be any metric superspace and \(z \in \overline{X}^Y\). Pick \(x_n \in X\) with \(d_Y(x_n, z) < 1/n\). Then \(\{x_n\}\) is Cauchy in \(Y\) and, because distances agree on \(X\), also Cauchy in \(X\). Its limit \(x \in X\) must equal \(z\) by uniqueness of limits in \(Y\). Hence \(z \in X\), so \(X\) is closed in \(Y\).

Insight: Why Completeness Matters in Machine Learning

Parameter spaces for neural networks live in \(\mathbb{R}^n\), which is complete. Thus, gradient-based optimizations have a valid "target" to converge to (provided contraction conditions hold).

However, function spaces require careful treatment:

  • \(C([0,1])\) with the uniform metric \(d_\infty\) is complete - uniform limits of continuous functions are continuous.
  • \(C([0,1])\) with the integral metric \(d_1(f,g) = \int_0^1 |f(x) - g(x)| \, dx\) is not complete. Its completion is the Lebesgue space \(L^1([0,1])\).
  • In Reinforcement Learning, the space of value functions is often modeled using \(L^\infty\) (supremum norm) to ensure completeness. This guarantees that the Bellman operator, which is a contraction, actually has a fixed point (the optimal value function) within the space.

Completion of a Metric Space

What if our space isn't complete? Can we "fill the holes"? In fact, every metric space has a minimal complete superspace.

Definition: Completion of a Metric Space Suppose \((X, d)\) is a metric space. A metric space \((Y, e)\) is called a completion of \((X, d)\) if and only if \((Y, e)\) is complete and \((X, d)\) is isometric to a dense subspace of \((Y, e)\).

From a mathematical perspective, the existence of a completion is only half the story. If we could "complete" a space in multiple ways that result in fundamentally different structures, the concept would lose its power. The Uniqueness Theorem ensures that the completion of a metric space is unique "up to isometry" - meaning any two completions are essentially identical in their distance-preserving structure.

In Computer Science, this is about the canonicity of representation. Consider the challenge of representing real numbers using different underlying logic or data structures.

The Uniqueness Theorem provides the formal proof that if we design a system to handle "incomplete" data (like rational numbers) and extend it to handle all limit points, the resulting system's behavioral properties are invariant regardless of the underlying encoding (e.g., Cauchy sequences vs. Dedekind cuts). This ensures that algorithms relying on the "structure" of the space remain robust across different implementations.

Theorem: Uniqueness of Completion

Suppose that \((X, d)\) is a metric space and \((X', m)\) and \((X'', s)\) are completions of \(X\), where \(\psi: X \to X'\) and \(\phi: X \to X''\) are isometries onto dense subspaces of \(X'\) and \(X''\), respectively. Then there exists an isometry from \(X'\) to \(X''\) that maps \(\psi(X)\) onto \(\phi(X)\).

The proof relies on a general principle about extending functions from a dense subset. If \(f: S \to Y\) is uniformly continuous on a dense subset \(S \subseteq X\) and \(Y\) is complete, then \(f\) admits a unique continuous extension to all of \(X\). The idea is natural: for any \(x \in X\), density provides a sequence \(s_n \in S\) with \(s_n \to x\); uniform continuity transfers the Cauchy property of \(\{s_n\}\) to \(\{f(s_n)\}\) in \(Y\); and completeness of \(Y\) produces a limit, which serves as the extended value \(\tilde f(x)\). Well-definedness (independence of the approximating sequence) again comes from uniform continuity. Furthermore, if \(f\) happens to be an isometry, the extension is an isometry too — distance preservation passes through the limit by continuity.

We formalize the notion of extension before proceeding.

Definition: Extension and Restriction Suppose \(f\) and \(g\) are functions. We say that \(f\) is a restriction of \(g\) and that \(g\) is an extension of \(f\) if and only if \(\text{dom}(f) \subseteq \text{dom}(g) \) and \(f(x) = g(x)\) for all \(x \in \text{dom}(f)\). In this case, if \(A = \text{dom}(f)\), we say that \(f\) is the restriction of \(g\) to \(A\) and write \(f = g \mid_A\).

Applying this extension principle, the outline of the proof for the Uniqueness of Completion is as follows.

Proof Sketch:

The map \(\phi \circ \psi^{-1}\) is an isometry from the dense subspace \(\psi(X)\) of \(X'\) onto the dense subspace \(\phi(X)\) of \(X''\). Since \(X''\) is complete, \(\phi \circ \psi^{-1}\) has a unique continuous extension \(f\) to \(X'\) (by the extension principle above), and this extension is isometric because the isometry equation passes through limits. Because \(f\) is an isometry and \(X'\) is complete, the range \(f(X')\) inherits completeness (Cauchy sequences transfer across isometries), and by the complete subset theorem applied in \(X''\), \(f(X')\) is closed in \(X''\). Since \(f(X') \supseteq \phi(X)\), it contains the closure \(\overline{\phi(X)} = X''\). Thus, \(f\) is an isometry from \(X'\) onto \(X''\).

Example: \(\mathbb{Q}\) Completes to \(\mathbb{R}\)

The completion of the rational numbers \((\mathbb{Q}, |\cdot|)\) is the real numbers \((\mathbb{R}, |\cdot|)\). While we typically construct \(\mathbb{R}\) as equivalence classes of Cauchy sequences, there are other historically significant methods, such as Dedekind cuts.

  • Cauchy Sequences:
    Constructing \(\mathbb{R}\) via the limits of sequences that "should" converge (e.g., \((1, 1.4, 1.41, \dots) \to \sqrt{2}\)).
  • Dedekind Cuts:
    Constructing \(\mathbb{R}\) by partitioning \(\mathbb{Q}\) into two sets (those less than \(\sqrt{2}\) and those greater).

The Uniqueness Theorem provides the vital bridge: it proves that these two entirely different mathematical "implementations" result in the same structural reality. Without this theorem, we couldn't confidently speak of "The Real Numbers"; we would be stuck with "the real numbers according to Cauchy" or "the real numbers according to Dedekind." Uniqueness allows us to treat \(\mathbb{R}\) as a single, canonical abstract data type.

Cantor's Intersection Theorem

Cantor's Intersection Theorem provides an alternative characterization of completeness through nested closed sets. This perspective is particularly useful in existence proofs, where we construct a solution by progressively narrowing the search region.

Definition: Nest A non-empty collection \(\mathcal{N}\) of sets is called a nest if and only if for each \(A, B \in \mathcal{N}\), either \(A \subseteq B\) or \(B \subseteq A\).

The key technical input is an adjoining principle: if a nest of closed sets in \(X\) has shrinking diameters but empty intersection, then the metric \(d\) can be extended to a superspace \(X' = X \cup \{z\}\) so that \(z\) serves as the "missing limit point" common to all closures. Intuitively, an incomplete space has sequences that ought to converge, and this construction manifests their would-be limit as a new point outside \(X\). This is the tool that allows us to move from non-existence of an intersection inside \(X\) to the existence of a Cauchy sequence with no limit in \(X\) — i.e., a witness of incompleteness.

This leads to the following fundamental criteria for completeness, framing it as the guarantee that "shrinking containers" always trap a point.

Theorem: Cantor's Intersection Theorem (Nest & Nested Sequence Criteria for Completeness) A metric space \(X\) is complete if and only if every nest \(\mathcal{F}\) of non-empty closed subsets of \(X\) for which \[ \inf \{\text{diam } (A) \mid A \in \mathcal{F}\} = 0 \] has a singleton intersection.

The following statement is equivalent:

A metric space \(X\) is complete if and only if every sequence \(\{F_n\}\) of non-empty closed subsets of \(X\) for which \(F_{n+1} \subseteq F_n\) for each \(n \in \mathbb{N}\) and \(\text{diam }(F_n) \to 0\) has singleton (hence non-empty) intersection.

Proof Sketch. (\(\Rightarrow\)) Pick \(x_n \in F_n\); since \(\text{diam}(F_n) \to 0\), \(\{x_n\}\) is Cauchy, and by completeness converges to some \(z \in X\). Each \(F_n\) is closed and eventually contains the tail of \(\{x_n\}\), so \(z \in F_n\) for every \(n\); hence \(z \in \bigcap F_n\), and the singleton property follows from \(\text{diam}(F_n) \to 0\). (\(\Leftarrow\)) We argue by contrapositive. If \(X\) is not complete, the Cauchy Criterion yields a non-convergent Cauchy sequence \(\{x_n\}\). Let \(F_n\) be the closure of the tail \(\{x_k : k \geq n\}\) in \(X\): each \(F_n\) is non-empty and closed, the sequence is nested since later tails are subsets, and \(\text{diam}(F_n) \to 0\) follows from the Cauchy property. If some \(z \in \bigcap F_n\), then \(x_n, z \in F_n\) gives \(d(x_n, z) \leq \text{diam}(F_n) \to 0\), forcing \(x_n \to z\) — contradicting non-convergence. Hence \(\bigcap F_n = \emptyset\), and the nest \(\{F_n\}\) violates the hypothesis. For full details — including the rigorous adjoining construction that makes the contrapositive strictly constructive.

This nest criterion also yields an important characterization of complete subsets.

Theorem: Complete Subset Suppose \(X\) is a complete metric space and \(S \subseteq X\). Then \(S\) is complete if and only if \(S\) is closed in \(X\).
Proof:

(\(\Rightarrow\)) \(S\) complete \(\Rightarrow\) \(S\) closed. Suppose, for contradiction, that \(S\) is not closed in \(X\): there exists \(x \in \partial S\) with \(x \notin S\). By the definition of a boundary point, \(\text{dist}(x, S) = 0\), so for each \(n \in \mathbb{N}\) we can pick \(s_n \in S\) with \(d(s_n, x) < 1/n\). Then \(s_n \to x\) in \(X\) (ball form of convergence), hence \(\{s_n\}\) is Cauchy in \(X\). To transfer this to \(S\): given \(r > 0\), some ball \(\mathcal{B}^X[c; r/2)\) of \(X\) includes a tail \(\{s_n : n \geq N\}\); re-centering at \(s_N \in S\), the triangle inequality gives \(d(s_n, s_N) \leq d(s_n, c) + d(c, s_N) < r/2 + r/2 = r\) for all \(n \geq N\), so the tail lies in \(\mathcal{B}^S[s_N; r)\). Thus \(\{s_n\}\) is also Cauchy in \(S\). By completeness of \(S\), \(s_n \to s\) for some \(s \in S\). Since every ball of \(S\) is included in the corresponding ball of \(X\), convergence in \(S\) implies convergence in \(X\) to the same point, so \(s_n \to s\) in \(X\) as well. Uniqueness of limits in \(X\) then forces \(s = x\), so \(x \in S\), contradicting \(x \notin S\). Hence \(S\) is closed.

(\(\Leftarrow\)) \(S\) closed \(\Rightarrow\) \(S\) complete. Let \(\{s_n\}\) be Cauchy in \(S\). Since any ball of \(S\) is included in the corresponding ball of \(X\), \(\{s_n\}\) is also Cauchy in \(X\); by completeness of \(X\) and the Cauchy Criterion, \(s_n \to x\) for some \(x \in X\). We claim \(x \in S\). Any open ball \(\mathcal{B}^X[x; r)\) includes a tail of \(\{s_n\} \subseteq S\), so \(\mathcal{B}^X[x; r) \cap S \neq \emptyset\) for every \(r > 0\); hence \(\text{dist}(x, S) = 0\). If \(x \notin S\), then \(x \in S^c\) gives \(\text{dist}(x, S^c) = 0\) as well, so \(x \in \partial S\); but \(S\) is closed, forcing \(x \in S\) — a contradiction. Thus \(x \in S\), and by the subspace convergence theorem, \(\{s_n\}\) converges to \(x\) in \(S\). Hence \(S\) is complete.

CS Insight: The Bridge Between Ideal Convergence and Algorithmic Reality

While computer algorithms are inherently finite, Cantor's Intersection Theorem provides the theoretical upper bound and the logical justification for their convergence.

  • Bisection Method (Termination Logic):
    Numerical solvers rely on Cantor's theorem to guarantee that a "true" root exists within the shrinking interval. When we stop at a tolerance \(\epsilon\), the theorem ensures that our result is a valid approximation of a unique point in \(\mathbb{R}\), rather than an empty search in a space with "holes."
  • Spatial Indexing (Numerical Stability):
    Data structures like Kd-trees partition space into nested regions. The theorem justifies the stability of point-location queries: as the diameter of the partition shrinks, the identity of the trapped point becomes mathematically certain.
  • Floating-Point as a Limit-Logic:
    Floating-point numbers are discrete and finite, but they are designed to shadow the complete structure of \(\mathbb{R}\). Cantor's theorem is the blueprint for this design, allowing us to treat increasing precision as a process of converging toward a canonical mathematical object.

The Banach Fixed-Point Theorem

We now formally explain why many optimization methods converge.

Definition: Fixed Point Suppose \(X\) is a non-empty set and \(f: X \to X\). A point \(x \in X\) is called a fixed point for \(f\) if and only if \(f(x) = x\).
Theorem: Banach's Fixed-Point Theorem Suppose \((X, d)\) is a complete metric space and \(f: X \to X\) is a (strong) contraction on \(X\) with contraction factor \(k \in [0, 1)\). Then \(f\) has a unique fixed point in \(X\) and, for each \(w \in X\), the sequence \(\{f^n(w)\}\) converges to this point.

Note that for \(n \in \mathbb{N}\), \(f^n\) is the composition of \(n\) copies of \(f\). We shall use the following theorem in the proof.

Theorem: Iterates of a Contraction Form a Cauchy Sequence Suppose \((X, d)\) is a metric space and \(f\) is a (strong) contraction on \(X\) with contraction factor \(k \in [0, 1)\). Then,
  1. for each \(a, b \in X\), the real sequence \(\{d(f^n(a), f^n(b))\}\) converges to \(0\);
  2. for each \(x \in X\), the sequence \(\{f^n(x)\}\) is a Cauchy sequence in \(X\).
Proof:

(1) By iterating the contraction inequality, \[ d(f^n(a), f^n(b)) \leq k \cdot d(f^{n-1}(a), f^{n-1}(b)) \leq \cdots \leq k^n \cdot d(a, b). \] Since \(k \in [0, 1)\), we have \(k^n \to 0\), so \(d(f^n(a), f^n(b)) \to 0\).

(2) Fix \(x \in X\) and set \(c := d(x, f(x))\). By (1) applied with \(a = x, b = f(x)\), we obtain \(d(f^n(x), f^{n+1}(x)) \leq k^n c\). For \(m > n\), the triangle inequality and the geometric series yield \[ d(f^n(x), f^m(x)) \leq \sum_{i=n}^{m-1} d(f^i(x), f^{i+1}(x)) \leq \sum_{i=n}^{m-1} k^i c \leq \frac{k^n}{1 - k} \cdot c. \] Since \(k^n \to 0\), the right-hand side can be made arbitrarily small by choosing \(n\) large, which establishes the Cauchy property.

Proof: Suppose that \(w \in X\). By the preceding theorem, \(\{f^n(w)\}\) is Cauchy in \(X\), and since \(X\) is complete, this sequence converges in \(X\).
Let \(z = \lim f^n(w)\). Since \(f\) is a contraction, it is uniformly continuous (Lipschitz with constant \(k\)), hence continuous, so \(f(x_n) \to f(z)\) whenever \(x_n \to z\). In particular, \[ f(z) = \lim_{n \to \infty} f(f^n(w)) = \lim_{n \to \infty} f^{n+1}(w). \] Since \(\{f^{n+1}(w)\}\) is a subsequence of \(\{f^n(w)\}\), it has the same limit: \[ \lim_{n \to \infty} f^{n+1}(w) = \lim_{n \to \infty} f^n(w) = z, \] so that \(f(z) = z\) as required.
Contraction \(f\) has no other fixed point: if \(a \in X\) were also fixed, then \[ d(a, z) = d(f(a), f(z)) \leq k \, d(a, z), \] which forces \(d(a, z) = 0\) and thus \(a = z\) because \(k < 1\).
Corollary: Convergence Rate

Under the hypotheses of Banach's theorem, let \(z\) be the unique fixed point. For any starting point \(w \in X\): \[ d(f^n(w), z) \leq \frac{k^n}{1-k} \cdot d(w, f(w)). \] In particular, convergence is geometric (linear in the logarithmic scale) with rate \(k\).

Proof:

For \(m > n\), triangle inequality and geometric series give: \[ d(f^n(w), f^m(w)) \leq \sum_{i=n}^{m-1} d(f^i(w), f^{i+1}(w)) \leq \sum_{i=n}^{m-1} k^i \cdot d(w, f(w)) \leq \frac{k^n}{1-k} \cdot d(w, f(w)). \] Taking \(m \to \infty\) yields the result.

Insight: Fixed-Point Iterations in Optimization and Learning

Banach's theorem transforms existence and convergence questions into verification of two conditions: (1) the space is complete, and (2) the iteration mapping is a contraction.

Newton's Method
For solving \(g(x) = 0\), Newton's iteration is \[f(x) = x - \frac{g(x)}{g'(x)}. \] Under sufficient smoothness and starting close enough to a simple root \(x^*\), one can show \(f\) is a contraction on a neighborhood of \(x^*\) (in fact, convergence is quadratic, not merely linear).

Gradient Descent (Convex Case)
For minimizing \(h: \mathbb{R}^n \to \mathbb{R}\) with \(L\)-Lipschitz gradient and strong convexity parameter \(\mu > 0\): \[ f(x) = x - \alpha \nabla h(x) \] is a contraction with \(k = \max(|1 - \alpha L|, |1 - \alpha \mu|)\). Choosing \(\alpha = \frac{2}{L + \mu}\) minimizes \(k\) to \(\frac{L - \mu}{L + \mu} < 1\).
Note: In Deep Learning, loss functions are rarely convex, so global contraction is not guaranteed, but local behavior around minima often approximates this dynamics.

Value Iteration (Reinforcement Learning)
The Bellman backup: \[ V_{k+1}(s) = \max_a \left[ R(s, a) + \gamma \sum_{s'} p_T(s' \mid s, a) V_k (s') \right]. \] is a \(\gamma\)-contraction with the supremum norm \(\|\cdot\|_\infty\), where \(\gamma \in (0,1)\) is the discount factor. The space of bounded value functions is complete under \(\|\cdot\|_\infty\), so value iteration converges to the unique optimal value function \(V_*\) as \(k \to \infty\).

Neumann Series and Iterative Methods:
In Policy Evaluation, we solve the linear system \(\mathbf{v} = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\), which formally yields \(\mathbf{v} = (\mathbf{I} - \gamma \mathbf{T})^{-1} \mathbf{r}\). Calculating this inverse directly is computationally expensive (\(O(|\mathcal{S}|^3)\)).

However, because the operator \(T_{\gamma}(\mathbf{v}) = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\) is a contraction, we can apply the Neumann Series expansion: \[ (\mathbf{I} - \gamma \mathbf{T})^{-1} = \sum_{k=0}^{\infty} (\gamma \mathbf{T})^k = \mathbf{I} + \gamma \mathbf{T} + \gamma^2 \mathbf{T}^2 + \dots \] This power series converges because the spectral radius \(\rho(\gamma \mathbf{T}) < 1\). This provides the mathematical guarantee that the iterative updates in Reinforcement Learning (CS approach) will perfectly converge to the analytical solution of the Bellman equation (Math approach).