Completeness

Introduction Complete Metric Spaces Completion of a Metric Space Cantor's Intersection Theorem Banach's Fixed-Point Theorem

Introduction

We briefly introduced completeness in earlier chapters as a property ensuring metric spaces have "no holes." Now, equipped with the theory of convergence and Cauchy sequences, we can develop completeness fully and prove one of the most important theorems in analysis: Banach's Fixed-Point Theorem.

This theorem states that in a complete metric space, every contraction mapping has a unique fixed point, and iterative application of the mapping converges to that fixed point. This is not an abstract curiosity. It is the theoretical backbone of countless algorithms: Newton's method, value iteration in reinforcement learning, and many optimization schemes.

Understanding why these algorithms converge requires understanding completeness at a deeper level than our initial preview allowed.

Complete Metric Spaces

Remember, we introduced the universal criterion for completeness as the first definition of a complete metric space. (See Part 16.)

Universal Criterion for Completeness

A metric space \(X\) is said to be complete if and only if \(X\) is closed in every metric superspace of \(X\).

After that, we discussed the completeness in terms of the Cauchy sequence. (See Part 17.)

Theorem: Cauchy Criterion for Completeness A metric space \(X\) is said to be complete if and only if every Cauchy sequence in \(X\) converges in \(X\).

Another equivalent way to define completeness is given by the virtual point.

Definition: Virtual Points Suppose \((X, d)\) is a metric space and \(u: X \to \mathbb{R}\). Then \(u\) is called a virtual point of \(X\) if and only if \(u\) satisfies the following conditions:
  • \(\forall a, b \in X, \, |u(a) - u(b)| \leq d(a, b) \leq u(a) + u(b)\).
  • \(\inf u(X) = 0\).
  • \(0 \notin u(X)\).
Theorem: Virtual Point Criterion for Completeness A metric space \(X\) is said to be complete if and only if \(X\) has no virtual points of \(X\).

A virtual point is a function that "measures distance to a missing point." If \(X\) has a hole (like \(\mathbb{Q}\) missing \(\sqrt{2}\)), we can define \(u: \mathbb{Q} \to \mathbb{R}\) by \(u(q) = |q - \sqrt{2}|\). This function satisfies all three conditions: it respects the triangle inequality, its infimum is 0 (rationals get arbitrarily close to \(\sqrt{2}\)), yet no rational achieves distance 0.

The virtual point criterion is equivalent to the Cauchy criterion: every non-convergent Cauchy sequence \((x_n)\) in an incomplete space induces a virtual point via \(u(a) = \lim_{n \to \infty} d(a, x_n)\).

Insight: Why Completeness Matters in Machine Learning

Parameter spaces for neural networks live in \(\mathbb{R}^n\), which is complete. Thus, gradient-based optimizations have a valid "target" to converge to (provided contraction conditions hold).

However, function spaces require careful treatment:

  • \(C([0,1])\) with the uniform metric \(d_\infty\) is complete - uniform limits of continuous functions are continuous.
  • \(C([0,1])\) with the integral metric \(d_1(f,g) = \int_0^1 |f(x) - g(x)| \, dx\) is not complete. Its completion is the Lebesgue space \(L^1([0,1])\).
  • In Reinforcement Learning, the space of value functions is often modeled using \(L^\infty\) (supremum norm) to ensure completeness. This guarantees that the Bellman operator, which is a contraction, actually has a fixed point (the optimal value function) within the space.

Completion of a Metric Space

What if our space isn't complete? Can we "fill the holes"? In fact, every metric space has a minimal complete superspace.

Definition: Completion of a Metric Space Suppose \((X, d)\) is a metric space. A metric space \((Y, e)\) is called a completion of \((X, d)\) if and only if \((Y, e)\) is complete and \((X, d)\) is isometric to a dense subspace of \((Y, e)\).

From a mathematical perspective, the existence of a completion is only half the story. If we could "complete" a space in multiple ways that result in fundamentally different structures, the concept would lose its power. The Uniqueness Theorem ensures that the completion of a metric space is unique "up to isometry" - meaning any two completions are essentially identical in their distance-preserving structure.

In Computer Science, this is about the canonicity of representation. Consider the challenge of representing real numbers using different underlying logic or data structures.

The Uniqueness Theorem provides the formal proof that if we design a system to handle "incomplete" data (like rational numbers) and extend it to handle all limit points, the resulting system's behavioral properties are invariant regardless of the underlying encoding (e.g., Cauchy sequences vs. Dedekind cuts). This ensures that algorithms relying on the "structure" of the space remain robust across different implementations.

Theorem: Uniqueness of Completion

Suppose that \((X, d)\) is a metric space and \((X', m)\) and \((X'', s)\) are completions of \(X\), where \(\psi: X \to X'\) and \(\phi: X \to X''\) are isometries onto dense subspaces of \(X'\) and \(X''\), respectively. Then there exists an isometry from \(X'\) to \(X''\) that maps \(\psi(X)\) onto \(\phi(X)\).

To prove this theorem, we require the following results:

Note that the extension of functions is defined as follows:

Definition: Extension and Restriction Suppose \(f\) and \(g\) are functions. We say that \(f\) is a restriction of \(g\) and that \(g\) is an extension of \(f\) if and only if \(\text{dom}(f) \subseteq \text{dom}(g) \) and \(f(x) = g(x)\) for all \(x \in \text{dom}(f)\). In this case, if \(A = \text{dom}(f)\), we say that \(f\) is the restriction of \(g\) to \(A\) and write \(f = g \mid_A\).

Below is the outline of the proof for the Uniqueness of Completion.

Proof Sketch:

The map \(\phi \circ \psi^{-1}\) is an isometry from the dense subspace \(\psi(X)\) of \(X'\) onto the dense subspace \(\phi(X)\) of \(X''\). Since \(X''\) is complete, \(\phi \circ \psi^{-1}\) has a unique continuous extension \(f\) to \(X'\) by the Continuous Extension Theorem, and this extension is isometric by the Isometry Extension Theorem. Because \(f\) is an isometry, its range is complete and therefore closed in \(X''\). Since this range includes \(\phi(X)\), it must also include the closure \(\overline{\phi(X)}\), which is \(X''\). Thus, \(f\) is an isometry from \(X'\) onto \(X''\).

Example: \(\mathbb{Q}\) Completes to \(\mathbb{R}\)

The completion of the rational numbers \((\mathbb{Q}, |\cdot|)\) is the real numbers \((\mathbb{R}, |\cdot|)\). While we typically construct \(\mathbb{R}\) as equivalence classes of Cauchy sequences, there are other historically significant methods, such as Dedekind cuts.

  • Cauchy Sequences:
    Constructing \(\mathbb{R}\) via the limits of sequences that "should" converge (e.g., \((1, 1.4, 1.41, \dots) \to \sqrt{2}\)).
  • Dedekind Cuts:
    Constructing \(\mathbb{R}\) by partitioning \(\mathbb{Q}\) into two sets (those less than \(\sqrt{2}\) and those greater).

The Uniqueness Theorem provides the vital bridge: it proves that these two entirely different mathematical "implementations" result in the same structural reality. Without this theorem, we couldn't confidently speak of "The Real Numbers"; we would be stuck with "the real numbers according to Cauchy" or "the real numbers according to Dedekind." Uniqueness allows us to treat \(\mathbb{R}\) as a single, canonical abstract data type.

Cantor's Intersection Theorem

Cantor's Intersection Theorem provides an alternative characterization of completeness through nested closed sets. This perspective is particularly useful in existence proofs, where we construct a solution by progressively narrowing the search region.

Definition: Nest A non-empty collection \(\mathcal{N}\) of sets is called a nest if and only if for each \(A, B \in \mathcal{N}\), either \(A \subseteq B\) or \(B \subseteq A\).
Cantor's Intersection Theorem Suppose \((X, d)\) is a metric space and \(\mathcal{F}\) is a nest of non-empty subsets of \(X\) for which \(\inf \{\text{diam } (A) \mid A \in \mathcal{F}\} = 0\).
Suppose \(\bigcap \left\{\text{Cl}_{X}(A) \mid A \in \mathcal{F} \right\} = \emptyset\). Then, given \(z \notin X\), \(d\) can be extended to be a metric on \(X' = X \cup \{z\}\) in such a way that \[ \forall A \in \mathcal{F}, \, \text{Cl}_{X'} (A) = \text{Cl}_{X}(A) \cup \{z\}. \] Thus, \[ \bigcap \left\{\text{Cl}_{X'} (A) \mid A \in \mathcal{F} \right\} = \{z\}. \]

This leads to the following fundamental criteria for completeness, framing it as the guarantee that "shrinking containers" always trap a point.

Theorem: Nest & Nested Sequence Criteria for Completeness A metric space \(X\) is said to be complete if and only if Every nest \(\mathcal{F}\) of non-empty closed subsets of \(X\) for which \[ \inf \{\text{diam } (A) \mid A \in \mathcal{F}\} = 0 \] has a singleton intersection.

The following statement is equivalent:

A metric space \((X)\) is said to be complete if and only if every sequence \(\{F_n\}\) of non-empty closed subsets of \(X\) for which \(F_{n+1} \subseteq F_n\) for each \(n \in \mathbb{N}\) and \(\text{diam }(F_n) \to 0\) has singleton (hence non-empty) intersection.

This nest criterion also yields an important characterization of complete subsets.

Theorem: Complete Subset Suppose \(X\) is complete metric space and \(S \subseteq X\). Then \(S\) is complete if and only if \(S\) is closed in \(X\).

CS Insight: The Bridge Between Ideal Convergence and Algorithmic Reality

While computer algorithms are inherently finite, Cantor's Intersection Theorem provides the theoretical upper bound and the logical justification for their convergence.

  • Bisection Method (Termination Logic):
    Numerical solvers rely on Cantor's theorem to guarantee that a "true" root exists within the shrinking interval. When we stop at a tolerance \(\epsilon\), the theorem ensures that our result is a valid approximation of a unique point in \(\mathbb{R}\), rather than an empty search in a space with "holes."
  • Spatial Indexing (Numerical Stability):
    Data structures like Kd-trees partition space into nested regions. The theorem justifies the stability of point-location queries: as the diameter of the partition shrinks, the identity of the trapped point becomes mathematically certain.
  • Floating-Point as a Limit-Logic:
    Floating-point numbers are discrete and finite, but they are designed to shadow the complete structure of \(\mathbb{R}\). Cantor's theorem is the blueprint for this design, allowing us to treat increasing precision as a process of converging toward a canonical mathematical object.

The Banach Fixed-Point Theorem

We now formally explain why many optimization methods converge.

Definition: Fixed Point Suppose \(X\) is a non-empty set and \(f: X \to X\). A point \(x \in X\) is called a fixed point for \(f\) if and only if \(f(x) = x\).
Banach's Fixed-Point Theorem Suppose \((X, d)\) is a complete metric space and \(f: X \to X\) is a (strong) contraction on \(X\) with Lipschitz constant \(k \in (0, 1)\). Then \(f\) has a unique fixed point in \(X\) and, for each \(w \in X\), the sequence \(\{f^n(w)\}\) converges to this point.

Note that for \(n \in \mathbb{N}\), \(f^n\) is the composition of \(n\) copies of \(f\). We shall use the following theorem in the proof.

Theorem: Suppose \((X, d)\) is a metric space and \(f\) is a (strong) contraction on \(X\). Then,
  1. for each \(a, b \in X\), the real sequence \(\{d(f^n(a), f^n(b))\}\) converges to \(0\).
  2. for each \(x \in X\), the sequence \(\{f^n(x)\}\) is a Cauchy sequence in \(X\).
Proof: Suppose that \(w \in X\). By the above theorem, \(\{f^n(w)\}\) is Cauchy in \(X\), and since \(X\) is complete, this sequence converges in \(X\).
Let \(z = \lim f^n(w)\). Since contraction \(f\) is uniformly continuous on its domain, by the convergence criterion of \(f\) at \(z\), \[ f(z) = \lim f^{n+1}(w) \] and since \(\{f^{n+1}(w)\}\) is a subsequence of \(\{f^n(w)\}\), \[ \lim_{n \to \infty} f^{n+1}(w) = \lim_{n \to \infty} f^n(w) = z, \] so that \(f(z) = z\) as required.
Contraction \(f\) has no other fixed point for, if \(a \in X\) were such a point, then we should have \[ d(a, z) = d(f(a), f(z)) \leq k d(a, z) \] that forces \(d(a, z) = 0\) and thus \(a = z\) because \(k < 1\).
Corollary: Convergence Rate

Under the hypotheses of Banach's theorem, let \(z\) be the unique fixed point. For any starting point \(w \in X\): \[ d(f^n(w), z) \leq \frac{k^n}{1-k} \cdot d(w, f(w)). \] In particular, convergence is geometric (linear in the logarithmic scale) with rate \(k\).

Proof:

For \(m > n\), triangle inequality and geometric series give: \[ d(f^n(w), f^m(w)) \leq \sum_{i=n}^{m-1} d(f^i(w), f^{i+1}(w)) \leq \sum_{i=n}^{m-1} k^i \cdot d(w, f(w)) \leq \frac{k^n}{1-k} \cdot d(w, f(w)). \] Taking \(m \to \infty\) yields the result.

Insight: Fixed-Point Iterations in Optimization and Learning

Banach's theorem transforms existence and convergence questions into verification of two conditions: (1) the space is complete, and (2) the iteration mapping is a contraction.

Newton's Method
For solving \(g(x) = 0\), Newton's iteration is \[f(x) = x - \frac{g(x)}{g'(x)}. \] Under sufficient smoothness and starting close enough to a simple root \(x^*\), one can show \(f\) is a contraction on a neighborhood of \(x^*\) (in fact, convergence is quadratic, not merely linear).

Gradient Descent (Convex Case)
For minimizing \(h: \mathbb{R}^n \to \mathbb{R}\) with \(L\)-Lipschitz gradient and strong convexity parameter \(\mu > 0\): \[ f(x) = x - \alpha \nabla h(x) \] is a contraction with \(k = \max(|1 - \alpha L|, |1 - \alpha \mu|)\). Choosing \(\alpha = \frac{2}{L + \mu}\) minimizes \(k\) to \(\frac{L - \mu}{L + \mu} < 1\).
Note: In Deep Learning, loss functions are rarely convex, so global contraction is not guaranteed, but local behavior around minima often approximates this dynamics.

Value Iteration (Reinforcement Learning)
The Bellman backup: \[ V_{k+1}(s) = \max_a \left[ R(s, a) + \gamma \sum_{s'} p_T(s' \mid s, a) V_k (s') \right]. \] is a \(\gamma\)-contraction with the supremum norm \(\|\cdot\|_\infty\), where \(\gamma \in (0,1)\) is the discount factor. The space of bounded value functions is complete under \(\|\cdot\|_\infty\), so value iteration converges to the unique optimal value function \(V_*\) as \(k \to \infty\).

Neumann Series and Iterative Methods:
In Policy Evaluation, we solve the linear system \(\mathbf{v} = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\), which formally yields \(\mathbf{v} = (\mathbf{I} - \gamma \mathbf{T})^{-1} \mathbf{r}\). Calculating this inverse directly is computationally expensive (\(O(|\mathcal{S}|^3)\)).

However, because the operator \(T_{\gamma}(\mathbf{v}) = \mathbf{r} + \gamma \mathbf{T}\mathbf{v}\) is a contraction, we can apply the Neumann Series expansion: \[ (\mathbf{I} - \gamma \mathbf{T})^{-1} = \sum_{k=0}^{\infty} (\gamma \mathbf{T})^k = \mathbf{I} + \gamma \mathbf{T} + \gamma^2 \mathbf{T}^2 + \dots \] This power series converges because the spectral radius \(\rho(\gamma \mathbf{T}) < 1\). This provides the mathematical guarantee that the iterative updates in Reinforcement Learning (CS approach) will perfectly converge to the analytical solution of the Bellman equation (Math approach).