Foundations of Analysis: Metric Spaces

Introduction

Up until this chapter, we relied on calculus and linear algebra to solve optimization problems. For many practical machine learning applications, these tools are sufficient. However, as we move toward advanced topics like manifold learning or information geometry, intuition alone is no longer enough.

Consider a fundamental question: when we run gradient descent, we assume each step brings us closer to a solution, and that iterating long enough will reach (or approach) a minimum. But what guarantees this? The sequence of iterates \(\mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \ldots\) may get arbitrarily close together without ever settling on an actual point unless the underlying space has the right structure.

Similarly, when we work with function spaces in kernel methods or neural networks, we treat entire functions as single "points." But what does "distance" mean between two functions? What does it mean for a sequence of functions to converge? The familiar Euclidean intuition breaks down, and we need a more general framework.

This chapter introduces that framework: metric spaces and the notion of completeness. These concepts formalize exactly when iterative algorithms converge to valid solutions. Rather than learning new computational techniques, we are gaining the language to state precisely what we have been assuming all along and to understand when those assumptions hold.

Metric Space

Remember, In Section I we defined the measure space and metric as follows.

Definition: Metric Space \((M, d)\)

A metric space is an ordered pair \((M, d)\)consisting of a nonempty set \(M\) and a metric \(d\) on \(M\). The metric \(d\) is a function \(d: M \times M \to \mathbb{R}\) that defines the distance between any two elements of \(M\). For all \(a, b, c \in M\), the following axioms must hold:

Positivity: \(d(a,b) \geq 0\) with equality if and only if \(a = b\).
Symmetry: \(d(a,b) = d(b,a)\).
The triangle inequality \(d(a,b) \leq d(a,c) + d(c, b)\).

Like the set theory, for two metric spaces \((X, d)\) and \((Y, e)\), we can define that \(X\) is a metric subspace of \(Y\) and \(Y\) is a metric superspace of \(X\) if and only if \(X\) is a subset of \(Y\) and \(d\) is a restriction of \(e\).

Insight: Why the Axioms Matter in Practice

You have already used non-Euclidean distances throughout this curriculum. In k-nearest neighbors, the choice between Euclidean distance, Manhattan distance, or cosine similarity fundamentally changes which points are considered "close." In kernel methods, the kernel function implicitly defines a distance in a high-dimensional feature space. These are all metrics on different spaces.

The axioms are not arbitrary mathematical restrictions. They encode the minimal properties required for algorithms to behave sensibly. For instance, if the triangle inequality fails, then a nearest-neighbor search might return a point \(y\) as "closest" to \(x\), even when a third point \(z\) satisfies \(d(x, z) < d(x, y)\) via an indirect path. Efficient search structures like KD-trees and ball trees rely on the triangle inequality for their pruning guarantees.

Distance

With a metric defined, we can extend the notion of distance from pairs of points to more general situations. In optimization, we rarely care about the distance between two specific points; instead, we ask questions like: "How far is this point from the feasible region?" or "How close is our current iterate to the set of optimal solutions?" These questions require measuring distance from a point to a set.

Definition: Distances Suppose \((X, d)\) is a metric space and \(A\) and \(B\) are subsets of \(X\).
The distance from \(x \in X\) to \(A\) to be \[ \text{dist }(x, A) = \inf \{d(x, a) \mid a \in A\} \quad (\text{Points to Sets}) \] and also the distance from \(A\) to \(B\) to be \[ \text{dist }(A, B) = \inf \{d(a, b) \mid a \in A, b \in B\}. \quad (\text{Sets to Sets}) \] The both distances depend on what the metric \(d\) is.

Definition: Isolated Points Suppose \(X\) is a metric space, \(S\) is a subset of \(X\), and \(z \in S\). Then \(z\) is said to be an isolated point of \(S\) if and only if \[ \text{dist }(z, S \ \{z\}) \neq 0. \] The collection of isolated points of \(S\) is denoted by \(\text{iso } (S)\).

The complement of isolated points leads to a more important concept for our purposes: accumulation points. While isolated points stand alone with "room around them," accumulation points are surrounded by infinitely many other points from the set no matter how small a neighborhood we consider.

Definition: Accumulation Points Suppose \(X\) is a metric space, \(S\) is a subset of \(X\), and \(z \in X\). Then \(z\) is said to be an accumulation point or a limit point of \(S\) in \(X\) if and only if \[ \text{dist }(z, S \ \{z\}) = 0. \] The collection of accumulation points of \(S\) in \(X\) is denoted by \(\text{acc } (S)\).

Definition: Nearest Points Suppose \((X, d)\) is a metric space, \(S\) is a subset of \(X\) and \(z \in X\). A member \(s\) of \(S\) is called a nearest point of \(S\) to \(z\) in \(X\) if and only if \[ d(z, s) = \text{dist }(z, S). \]

The concept of distance to a set is fundamental to margin-based classifiers (like SVMs), where the goal is to maximize the distance between a decision boundary and the nearest data points. Furthermore, the nearest point definition is the mathematical foundation of projection. In constrained optimization, projecting a gradient update back onto the feasible set \(S\) is exactly the act of finding a nearest point \(s \in S\) to the updated vector \(z\).

Boundary

Having established how to measure distance to sets, we can now classify points by their relationship to a set and its complement. This classification is fundamental to constrained optimization. When we solve problems like "minimize \(f(x)\) subject to \(g(x) \leq 0\)," the feasible region is a set \(S\), and the nature of optimal solutions depends heavily on whether they lie in the interior or on the boundary of \(S\).

Definition: Boundary Points Suppose \(X\) is a metric space, \(S\) is a subset of \(X\) and \(a \in X\). Then \(a\) is called a boundary point of \(S\) in \(X\) if and only if \[ \text{dist }(a, S) = \text{dist }(a, S^c) = 0. \] The collection of boundary points of \(S\) in \(X\) is called the boundary of \(S\) in \(X\) and is denoted by \(\partial S\).

Definition: Closure, Interior, and Exterior Suppose \(X\) is a metric space, and \(S\) is a subset of \(X\).

The closure of \(S\) in \(X\) to be the union: \[ \overline{S} = \text{cl}(S) = S \cup \partial S. \]
The interior of \(S\) in \(X\) to be the difference: \[ S^{\circ} = \text{int}(S) = S \ \partial S. \]
The exterior of \(S\) in \(X\) to be the complement of the closure of \(S\) in \(X\): \[ \text{ext}(S) = \left(\overline{S}\right)^c. \] Note that the exterior of \(S\) in \(X\) is the interior of its complement in \(X\): \[ \left(\overline{S}\right)^c = (S^c)^{\circ}. \]

Open & Closed

The concepts of interior and boundary lead naturally to a classification of sets themselves. A set is open if it consists entirely of interior points. Intuitively, every point has "breathing room" and you can move slightly in any direction without leaving the set. A set is closed if it contains all its boundary points—the set includes its own "edge."

Definition: Open and Closed Subsets Suppose \(X\) is a metric space and \(S\) is a subset of \(X\). Then \(S\) is said to be

an open subset of \(X\), or open in \(X\) iff \[S \cap \partial S = \emptyset\]
a closed subset of \(X\), or closed in \(X\) iff \[\partial S \subseteq S\]

Note that most subsets of metric spaces are neither open nor closed - a set can contain some but not all of its boundary points. However, the sets that are open form a structure worth naming.

Definition: Topology Suppose \((X, d)\) is a metric space. The collection of open subsets of \(X\) is called the topology determined by the metric \(d\).

While we will not develop general topology here, this connection is valuable: many properties of metric spaces (continuity, convergence, compactness) depend only on which sets are open, not on the specific numerical values of the metric.

Why do these classifications matter for optimization? If our search space is strictly open (e.g., \(0 < x < 1\)), an algorithm might iterate forever, approaching a boundary value without ever reaching it - the limit point is excluded from the feasible set. A closed set contains its boundary, ensuring that edge solutions are valid. More fundamentally, the Extreme Value Theorem guarantees that a continuous function attains its maximum and minimum only when the domain is compact - which, in \(\mathbb{R}^n\), means both closed and bounded. We will formalize compactness in later chapters.

Now we arrive at one of the most important concepts for algorithmic convergence: completeness.

Definition: Complete Metric Space

A metric space \(X\) is said to be complete if and only if \(X\) is closed in every metric superspace of \(X\).

Intuitively, completeness means the space has "no holes." The real numbers \(\mathbb{R}\) with the standard metric are complete - this is why calculus works. The rational numbers \(\mathbb{Q}\) are not complete: consider the sequence defined by \(x_1 = 1\) and \(x_{n+1} = \frac{1}{2}\left(x_n + \frac{2}{x_n}\right)\). Each term is rational, but the sequence converges to \(\sqrt{2}\), which is irrational. The limit exists in \(\mathbb{R}\) but not in \(\mathbb{Q}\). There is a "hole" where \(\sqrt{2}\) should be.

For optimization, completeness ensures that limit points of iterative algorithms actually belong to the space we're working in. In future chapters, we will develop the theory of convergence and see an equivalent characterization: a metric space is complete if and only if every Cauchy sequence converges to a limit within the space.

Balls

Open and closed sets can be complicated objects, but we can build them from simpler pieces. The open ball centered at a point \(x\) with radius \(r\) is the set of all points within distance \(r\) of \(x\). These balls are the "building blocks" of the topology: every open set can be expressed as a union of open balls.

More importantly for algorithms, balls give us a coordinate-free way to talk about neighborhoods. When we say "there exists a step size \(\alpha\) such that gradient descent stays in the feasible region," we are implicitly saying the current point lies in an open ball contained within that region.

Definition: Open Balls & Closed Balls Suppose \((X, d)\) is a metric space and \(x \in X\). For each \(r \in \mathbb{R}^+\), we define:

The open ball in \(X\) centered at the point \(x\) and with radius \(r\) to be the set \[ \mathcal{B}[x ; r) = \{y \in X \mid d(x, y) < r\}. \]
The closed ball in \(X\) centered at the point \(x\) and with radius \(r\) to be the set \[ \mathcal{B}[x ; r] = \{y \in X \mid d(x, y) \leq r\}. \]

Intuitively, we want to say that the open ball in \(X\) is the open subset of \(X\), and similary, the closed ball in \(X\) is the closed subset of \(X\). Indeed, these are true. We shall show that only the open ball case here.

Proof: The Open Ball is Open Suppose \(X, d\) is a metric space, \(x \in X\), and \(r \in \mathbb{R}^+\). Consider the open ball \[ \mathcal{B}[x ; r) = \{y \in X \mid d(x, y) < r\} \subseteq X. \] If \(z \in \mathcal{B}\), then \[ \exists s \in (0, r) \text{ such that } d(x, z) < r - s. \] Now, consider any point \(w\) outside the ball, i.e., \(w \in X \setminus \mathcal{B}\). By definition, \(d(w, x) \geq r\). Using the triangle inequality, we have: \[ d(z, w) \geq d(w, x) - d(x, z) > s. \] Since \(w\) is an arbitrary point in the complement \(X \setminus \mathcal{B}\), this inequality holds for all points outside the ball. This implies that the "shortest distance" (infimum) from \(z\) to the outside the bal is at least \(s\): \[ \text{dist }(z, X \setminus \mathcal{B}) = \inf \{d(z, w) \mid w \notin \mathcal{B}\} \geq s. \] Recall the definition of a boundary point requires the distance to the complement to be zero. Since the distance here is strictly positive (\(\geq s\)), \(z\) is not a boundary point. Therefore, being in \(\mathcal{B}\) but not on its boundary, \(z\) must be in the interior: \[ z \in \mathcal{B}^{\circ}. \] Since \(z\) was chosen arbitrarily, every point in \(\mathcal{B}\) is an interior point: \[ \mathcal{B} = \mathcal{B}^{\circ}. \] Therefore, \(\mathcal{B}\) is open.

The open ball also provides the foundation for discussing convexity in a coordinate-free manner. In a normed linear space, all open balls of the same radius have identical shape - each is simply a translation of the others. Moreover, these balls are themselves convex sets. We can now state the definition of convexity that applies to general normed spaces, not just \(\mathbb{R}^n\):

Definition: Convex Suppose \(V\) is a normed linear space and \(C \subseteq V\). Then \(C\) is said to be convex if and only if for each \(a, b \in C\), the line segment \(\{ (1 - t)a + tb \mid t \in [0, 1] \}\) jointing \(a\) and \(b\) is included in \(C\).