Intro to Functional Analysis: Banach & Hilbert Spaces

What We Have Seen So Far... Normed Spaces & Banach Spaces Inner Product Spaces & Hilbert Spaces Finite vs Infinite Dimensions Geometry of Unit Balls & ML Applications

What We Have Seen So Far...

We have encountered various "spaces" throughout our study. In linear algebra, we started with vector spaces. To discuss "length," we introduced normed vector spaces, and to formalize "orthogonality," we considered inner product spaces. Strictly speaking, our familiar space \(\mathbb{R}^n\) is categorized as a Hilbert space, which is an inner product space that guarantees completeness.

Completeness is critical in calculus and analysis because, without it, we cannot guarantee the convergence of limits "within" the space. Moreover, we frequently encounter the \(L^2\) space in machine learning contexts. What exactly is it? The \(L^2\) space is also a Hilbert space, but unlike \(\mathbb{R}^n\), whose elements are finite-dimensional vectors, it is a function space whose elements are functions.

Furthermore, we have explored \(L^1\) and \(L^\infty\). These are categorized as Banach spaces - complete normed vector spaces that do not necessarily arise from an inner product. Thus, a Banach space is a more general concept than a Hilbert space, yet it is essential in both analysis and machine learning. For instance, in deep learning architectures, a neural network can be viewed as a point (a function) within a vast function space, and we build highly complex models via topological connections. While it is commonly stated in engineering that training works as long as the functions are differentiable (which provides the local gradient), mathematically, differentiability only provides the direction. For a sequence of optimized models to actually converge to a valid limit without "falling through a hole," the underlying space must first be complete - an absolute prerequisite before we can even rely on properties like convexity or compactness to guarantee the existence of an optimum.

Fundamentally, these function spaces are defined by their properties regarding integration. Specific rules of integrability dictate which functions qualify to be part of spaces like \(L^1, L^2,\) and \(L^\infty\). This is precisely why we studied Lebesgue integration: it is the mathematical framework that ensures these function spaces possess completeness, using measures as their foundation. It is crucial not to confuse a vector space with a measure space. A measure space is not a vector space; rather, it provides the underlying domain upon which functions are defined. In other words, a measure space itself does not possess algebraic operations like vector addition or scalar multiplication.

Viewed through this lens, core concepts in statistics gain clear mathematical structure. We can now rigorously understand that random variables are actually measurable functions that map outcomes from a sample space to real numbers, and a probability space is simply a specific type of measure space (with a total measure of 1) used to describe data distributions. Consequently, the expected value \(\mathbb{E}[X]\) is precisely a Lebesgue integral over this probability space, and the variance \(\mathbb{V}[X]\) is the squared \(L^2\) norm of the random variable centered around its mean.

Going back to our algebraic foundations, we know that \(\mathbb{R}\) is a field in abstract algebra. Simultaneously, from the perspective of functional analysis, \(\mathbb{R}\) itself acts as a one-dimensional Hilbert space. Thus, \(\mathbb{R}\) serves both as the foundational "rule" for arithmetic operations and as the underlying "stage" (codomain) for our functions. A field equipped with a topology that makes its algebraic operations continuous is known as a topological field. In particular, \(\mathbb{R}\) and \(\mathbb{C}\) are indispensable because they are complete topological fields - a highly special property in mathematics, even though we have implicitly relied on them since elementary school (for comparison, the field of rational numbers \(\mathbb{Q}\) is a topological field, but it is not complete).

As a final note on notation: in applied mathematics and machine learning contexts, you will frequently see \(L_1, L_2,\) and \(L_\infty\) written with subscripts instead of superscripts (\(L^1, L^2, L^\infty\)). This variation stems from differing conventional focuses across disciplines, but both notations refer to exactly the same mathematical concepts and spaces.

Normed Spaces & Banach Spaces

To perform calculus or optimization on a vector space, we need a notion of "distance" to define limits and convergence. A norm provides this by measuring the "length" of a vector. Crucially, any norm naturally induces a metric (distance function) defined by \(d(x, y) = \|x - y\|\). Once we have a metric, we can define Cauchy sequences and ask whether the space is complete.

Definition: Normed Space (Normed Vector Space)

A normed space is a pair \((\mathcal{X}, \| \cdot\|)\), where \(\mathcal{X}\) is a vector space and \(\| \cdot\|\) is a norm on \(\mathcal{X}\).

Definition: Banach Space

A Banach space is a normed space that is complete with respect to the metric defined by the norm.

Note: Every Cauchy sequence in the space converges to a limit that is also within the space.

The most important examples of Banach spaces in analysis and machine learning are the \(L^p\) spaces. Thanks to the rigorous foundation of Lebesgue integration, these spaces of functions are guaranteed to be complete.

Definition: \(L^p\) Spaces

For \(1 \leq p < \infty\), the space \(L^p\) consists of all measurable functions \(f\) for which the Lebesgue integral of \(|f|^p\) is finite. The norm is defined as: \[ \|f\|_p = \left( \int |f(x)|^p d\mu \right)^{1/p}. \]

Definition: \(L^\infty\) Space

The space \(L^\infty\) consists of all measurable functions \(f\) that are essentially bounded. The norm is defined as the essential supremum: \[ \|f\|_\infty = \text{ess sup}_{x} |f(x)| = \inf \{ C \geq 0 : |f(x)| \leq C \text{ a.e.} \}. \]

Note: "Almost everywhere" (a.e.) means the condition holds except on a set of measure zero. The essential supremum ignores the function's behavior on negligible sets, making it robust to isolated singularities. Rigorously, to ensure \(\|f\|_\infty = 0 \implies f = 0\) (a requirement for a valid norm), the elements of \(L^\infty\) (and all \(L^p\) spaces) are actually equivalence classes of functions that are equal almost everywhere.

Inner Product Spaces & Hilbert Spaces

While Banach spaces allow us to measure lengths and distances, they lack the geometric concepts of "angles" and "orthogonality." To recover the rich geometry of Euclidean space in abstract settings, we need an inner product. Every inner product naturally induces a norm via \(\|x\| = \sqrt{\langle x, x \rangle}\), meaning every inner product space is automatically a normed space.

Definition: Inner Product Space

An inner product space is a vector space over a field \(\mathbb{F}\) equipped with an inner product \(\langle \cdot, \cdot \rangle\).

Just as a normed space requires completeness to become a Banach space, an inner product space requires completeness to become the ultimate setting for functional analysis: a Hilbert space.

Definition: Hilbert Space

A Hilbert space \(\mathcal{H}\) is an inner product space that is complete with respect to the metric \(d(x, y) = \|x - y\|\) induced by its inner product.

Hilbert spaces are the "nicest" of all infinite-dimensional spaces because they behave almost exactly like \(\mathbb{R}^n\). The foundational inequality that connects the inner product to the norm is the Cauchy-Schwarz inequality, which is vital for proving bounds in optimization.

Theorem: Cauchy-Schwarz Inequality

For all \(x, y\) in an inner product space, \( |\langle x, y \rangle| \leq \|x\| \|y\| \).

But how do we know if a given Banach space is actually a Hilbert space? It turns out that a norm is induced by an inner product if and only if it satisfies the Parallelogram Law. This profound connection between algebra and geometry was proven by Pascual Jordan and John von Neumann. Among the \(L^p\) spaces, only \(L^2\) satisfies this law, making \(L^2\) the unique and profoundly important Hilbert space of functions.

Theorem: Jordan-von Neumann Characterization

A Banach space is a Hilbert space (i.e., its norm is induced by an inner product) if and only if its norm satisfies the Parallelogram Law: \[ \|x + y\|^2 + \|x - y\|^2 = 2\|x\|^2 + 2\|y\|^2 \].

Finite vs Infinite Dimensions

Because \(\mathbb{R}^n\) is a Hilbert space, it is easy to assume that all complete spaces behave like \(\mathbb{R}^n\). However, the transition from finite-dimensional vectors to infinite-dimensional function spaces introduces profound topological differences.

Theorem: Equivalence of Norms in Finite Dimensions

In a finite-dimensional vector space, all norms are topologically equivalent. This means if a sequence converges under the \(L^1\) norm, it is guaranteed to converge under the \(L^2\) or \(L^\infty\) norm.

In infinite-dimensional spaces, this is false. A sequence of functions might converge in \(L^1\) but diverge in \(L^\infty\).

The most drastic difference, however, lies in compactness. As we established in the Heine-Borel Theorem, in \(\mathbb{R}^n\), every closed and bounded set is compact. This guarantees that any continuous optimization problem on a bounded region has a minimum. But as we hinted earlier, this comfort vanishes in infinite dimensions.

Theorem: Compactness of the Unit Ball

The closed unit ball \(\{x : \|x\| \leq 1\}\) in a normed space is compact if and only if the space is finite-dimensional.

Corollary: In any infinite-dimensional space, the closed unit ball (which is strictly closed and bounded) is never compact.

Why does this matter? Because a closed unit ball is the prototypical "closed and bounded" set. The theorem tells us that in an infinite-dimensional function space, simply bounding your parameters does not guarantee that your optimization algorithm will converge to a solution inside that boundary. A sequence can wander endlessly inside a bounded ball without ever converging to a point.

Geometry of Unit Balls & ML Applications

The choice of norm in a Banach space dictates the geometry of its open and closed balls. In machine learning, we frequently use norms as regularization terms to constrain our model parameters (\(\|w\|_p \leq C\)). The geometric shape of these unit balls perfectly explains why different regularizers produce different types of AI models.

Geometry of \(L^p\) Unit Balls in \(\mathbb{R}^2\)
  • \(L^1\) norm (Lasso):
    The unit ball is a diamond (a square rotated by 45 degrees). It has sharp corners precisely on the axes.
  • \(L^2\) norm (Ridge):
    The unit ball is a perfect circle. It is strictly convex and rotationally invariant.
  • \(L^\infty\) norm:
    The unit ball is a square aligned with the axes.

When we optimize a loss function subject to an \(L^1\) penalty (Lasso regression), the expanding contour of the loss function is highly likely to hit the "sharp corners" of the \(L^1\) diamond first. Because these corners lie exactly on the axes, many parameter weights become exactly zero. This geometric property of the \(L^1\) Banach space is the mathematical mechanism behind Sparsity and feature selection in ML.

Conversely, the \(L^2\) Hilbert space penalty (Ridge regression) has a perfectly round, strictly convex unit ball. The loss contour will touch it at a tangent, smoothly shrinking all parameters but rarely setting them to exactly zero. The strict convexity of \(L^2\) guarantees that the optimization problem has a unique global solution, making it mathematically highly stable.

Finally, when we leverage the inner product structure of Hilbert spaces, we can project data into infinite-dimensional spaces to find linear boundaries for non-linear data. This is the foundation of the Kernel Trick and Reproducing Kernel Hilbert Spaces (RKHS), which power Support Vector Machines (SVMs) and Gaussian processes.