The Hahn-Banach Theorem

The Extension Problem The Hahn-Banach Theorem: Real Case The Complex Case Norm-Preserving Extension & Consequences

The Extension Problem

Throughout the study of dual spaces we relied on a single, deceptively powerful guarantee: that a normed space carries enough continuous linear functionals to detect every vector. Concretely, we used the existence of a norming functional — a unit functional \(\varphi\) with \(\varphi(x) = \|x\|\) — to prove that the canonical embedding \(J : \mathcal{X} \to \mathcal{X}^{**}\) is isometric and that \(\mathcal{X}^*\) separates points. That guarantee was taken on faith. We now repay the debt.

The mechanism behind it is an extension result. One starts with a functional defined only on a subspace — for instance, the one-dimensional span of a single vector, where a functional is trivial to write down — and asks whether it extends to the whole space without increasing its size. That the answer is always yes is the content of the Hahn-Banach Theorem, the first of the cornerstones of functional analysis. Its reach extends well beyond norms: the natural setting is extension under domination by a sublinear functional, a class of size-measuring maps more permissive than norms and seminorms, defined below. We state the theorem in this general form because the geometric, separation-type consequences we will need later — separating a point from a convex set, characterizing weak closures — require exactly this generality, not merely the norm-preserving special case.

Seminorms and Sublinear Functionals

Two classes of size-measuring functions organize the entire discussion. The first weakens a norm by dropping the requirement that only the zero vector have zero size.

Definition: Seminorm

Let \(\mathcal{X}\) be a vector space over \(\mathbb{F}\) (\(\mathbb{R}\) or \(\mathbb{C}\)). A seminorm on \(\mathcal{X}\) is a function \(p : \mathcal{X} \to [0, \infty)\) such that, for all \(x, y \in \mathcal{X}\) and all \(\alpha \in \mathbb{F}\), \[ p(x + y) \;\leq\; p(x) + p(y) \qquad \text{(subadditivity)}, \] \[ p(\alpha x) \;=\; |\alpha|\, p(x) \qquad \text{(absolute homogeneity)}. \] Taking \(\alpha = 0\) in absolute homogeneity gives \(p(0) = 0\). A seminorm that additionally satisfies \(p(x) = 0 \Rightarrow x = 0\) is precisely a norm.

The second class abstracts the two structural features a norm uses when it bounds a functional — the triangle inequality and scaling under factors — but discards far more than a seminorm does. It may take negative values, and it is homogeneous only under non-negative scalars.

Definition: Sublinear Functional

Let \(\mathcal{X}\) be a vector space over \(\mathbb{R}\). A sublinear functional on \(\mathcal{X}\) is a map \(q : \mathcal{X} \to \mathbb{R}\) satisfying, for all \(x, y \in \mathcal{X}\) and all \(\alpha \geq 0\), \[ q(x + y) \;\leq\; q(x) + q(y) \qquad \text{(subadditivity)}, \] \[ q(\alpha x) \;=\; \alpha\, q(x) \qquad \text{(non-negative homogeneity)}. \]

On a real vector space, every seminorm is a sublinear functional: absolute homogeneity \(p(\alpha x) = |\alpha| p(x)\) restricts, for \(\alpha \geq 0\), to \(p(\alpha x) = \alpha\, p(x)\), which is exactly non-negative homogeneity. The converse fails. A seminorm is symmetric (\(p(-x) = p(x)\)) and non-negative, whereas a sublinear functional need be neither. A concrete example on \(\mathbb{R}\) is \(q(x) = x\) itself, which is linear — hence sublinear — yet takes negative values and satisfies \(q(-1) = -1 \neq 1 = q(1)\). This genuine asymmetry is precisely what will let a sublinear functional encode one-sided geometric constraints, such as the separation of convex sets, in later developments. On a complex space the notion of sublinear functional does not directly apply — non-negative homogeneity is meaningless for complex scalars — which is exactly why the complex form of the theorem, treated below, is phrased through seminorms rather than sublinear functionals.

The extension problem can now be posed sharply. Suppose \(f\) is a linear functional defined on a subspace \(\mathcal{M} \subseteq \mathcal{X}\), and suppose it is dominated by a sublinear functional \(q\) on its domain, meaning \(f(x) \leq q(x)\) for all \(x \in \mathcal{M}\). Does \(f\) extend to a linear functional \(F\) on all of \(\mathcal{X}\), still dominated by the same \(q\)? Finding some linear extension is easy — extend a basis of \(\mathcal{M}\) arbitrarily to a basis of \(\mathcal{X}\). The difficulty, and the entire substance of the theorem, is to find an extension that remains dominated by \(q\). We turn to this next.

The Hahn-Banach Theorem: Real Case

We state the theorem in its primary form: extension of a real linear functional under domination by a sublinear functional. The norm-preserving statement that powers the dual-space theory is a special case, recovered at the end of this development.

Theorem: Hahn-Banach (Real, Sublinear-Dominated Form)

Let \(\mathcal{X}\) be a vector space over \(\mathbb{R}\), let \(q\) be a sublinear functional on \(\mathcal{X}\), and let \(\mathcal{M} \subseteq \mathcal{X}\) be a linear subspace. If \(f : \mathcal{M} \to \mathbb{R}\) is linear and satisfies \[ f(x) \;\leq\; q(x) \qquad \text{for all } x \in \mathcal{M}, \] then there exists a linear functional \(F : \mathcal{X} \to \mathbb{R}\) extending \(f\) — that is, \(F|_{\mathcal{M}} = f\) — with \[ F(x) \;\leq\; q(x) \qquad \text{for all } x \in \mathcal{X}. \]

The proof has two ingredients. The first is a single, concrete step: given the functional on a subspace, extend it to a subspace one dimension larger, preserving domination. The second is a transfinite mechanism that iterates this step until the domain exhausts \(\mathcal{X}\). The mechanism is Zorn's Lemma, which we now state, since the rest of functional analysis — and this proof in particular — uses it openly.

Zorn's Lemma

Recall the language of order. A partial order \(\preceq\) on a set \(\mathcal{P}\) is a reflexive, antisymmetric, transitive relation. A subset \(\mathcal{C} \subseteq \mathcal{P}\) is a chain if any two of its elements are comparable. An upper bound for \(\mathcal{C}\) is an element \(u \in \mathcal{P}\) with \(c \preceq u\) for every \(c \in \mathcal{C}\); a maximal element of \(\mathcal{P}\) is an \(m \in \mathcal{P}\) admitting no strictly larger element.

Theorem (Axiom): Zorn's Lemma

Let \((\mathcal{P}, \preceq)\) be a non-empty partially ordered set in which every chain has an upper bound in \(\mathcal{P}\). Then \(\mathcal{P}\) has a maximal element.

Zorn's Lemma is logically equivalent to the Axiom of Choice over the Zermelo–Fraenkel axioms: each can be derived from the other, and neither is provable from the remaining axioms of set theory. We therefore adopt it as an axiom. It is not a theorem awaiting proof here or elsewhere; stating it is a declaration of which set-theoretic foundation we build on, made explicit so that every later existence result resting on it — bases of arbitrary vector spaces, maximal ideals, and the extension theorem below — carries a visible dependency rather than a hidden one.

Extension by One Dimension

The heart of the matter is the following lemma. Once a dominated functional can be pushed across a single new dimension, Zorn's Lemma supplies the rest.

Lemma: One-Dimensional Extension

Under the hypotheses of the theorem, suppose additionally that \(\dim(\mathcal{X} / \mathcal{M}) = 1\); that is, \(\mathcal{X} = \mathcal{M} + \mathbb{R}\,x_0\) for some \(x_0 \notin \mathcal{M}\). Then \(f\) admits an extension \(F : \mathcal{X} \to \mathbb{R}\), linear and dominated by \(q\).

Proof

Every element of \(\mathcal{X}\) is uniquely \(t x_0 + y\) with \(t \in \mathbb{R}\), \(y \in \mathcal{M}\) (uniqueness because \(x_0 \notin \mathcal{M}\)). A linear extension of \(f\) is therefore determined by a single real number \(\alpha_0 := F(x_0)\), via \(F(t x_0 + y) = t\alpha_0 + f(y)\). The task is to choose \(\alpha_0\) so that \(F \leq q\) everywhere.

Consider \(t > 0\). Writing \(t x_0 + y = t\bigl(x_0 + y/t\bigr)\) and using positive homogeneity of \(q\), the requirement \(F(t x_0 + y) \leq q(t x_0 + y)\) becomes \(t\alpha_0 + f(y) \leq t\, q(x_0 + y/t)\); dividing by \(t > 0\) and renaming \(y_1 := y/t \in \mathcal{M}\), this is \[ \alpha_0 \;\leq\; -f(y_1) + q(x_0 + y_1) \qquad \text{for all } y_1 \in \mathcal{M}. \] For \(t < 0\), write \(t = -s\) with \(s > 0\); the requirement becomes \(-s\alpha_0 + f(y) \leq s\, q(-x_0 + y/s)\), and dividing by \(s > 0\) and renaming \(y_2 := y/s\) gives \[ \alpha_0 \;\geq\; f(y_2) - q(-x_0 + y_2) \qquad \text{for all } y_2 \in \mathcal{M}. \] The case \(t = 0\) is the original hypothesis \(f \leq q\) on \(\mathcal{M}\). Thus a valid \(\alpha_0\) exists if and only if the supremum below does not exceed the infimum: \[ \begin{align*} \sup_{y_2 \in \mathcal{M}} \bigl[\, f(y_2) - q(-x_0 + y_2) \,\bigr] \\\\ \leq\; \inf_{y_1 \in \mathcal{M}} \bigl[\, {-f(y_1)} + q(x_0 + y_1) \,\bigr]. \end{align*} \]

This inequality holds. For arbitrary \(y_1, y_2 \in \mathcal{M}\), linearity of \(f\), the domination \(f \leq q\) on \(\mathcal{M}\), and subadditivity of \(q\) give \[ \begin{align*} f(y_2) + f(y_1) &= f(y_1 + y_2) \\\\ &\leq q(y_1 + y_2) \\\\ &= q\bigl((x_0 + y_1) + (-x_0 + y_2)\bigr) \\\\ &\leq q(x_0 + y_1) + q(-x_0 + y_2). \end{align*} \] Rearranging, \(f(y_2) - q(-x_0 + y_2) \leq -f(y_1) + q(x_0 + y_1)\) for every pair \(y_1, y_2\). Taking the supremum over \(y_2\) on the left and the infimum over \(y_1\) on the right yields the displayed inequality. Any \(\alpha_0\) in the (non-empty) interval between the two sides defines, via \(F(t x_0 + y) = t\alpha_0 + f(y)\), a linear extension of \(f\) with \(F \leq q\) on all of \(\mathcal{X}\).

From One Dimension to All of \(\mathcal{X}\)

The one-dimensional lemma extends \(f\) across any single new direction. To reach all of \(\mathcal{X}\) — possibly requiring uncountably many such steps — we order the partial extensions and apply Zorn's Lemma.

Proof of the Theorem

Let \(\mathcal{S}\) be the set of all pairs \((\mathcal{N}, g)\) where \(\mathcal{N}\) is a linear subspace with \(\mathcal{M} \subseteq \mathcal{N} \subseteq \mathcal{X}\) and \(g : \mathcal{N} \to \mathbb{R}\) is linear, extends \(f\), and satisfies \(g \leq q\) on \(\mathcal{N}\). The pair \((\mathcal{M}, f)\) lies in \(\mathcal{S}\), so \(\mathcal{S} \neq \varnothing\). Order \(\mathcal{S}\) by declaring \((\mathcal{N}_1, g_1) \preceq (\mathcal{N}_2, g_2)\) to mean \[ \mathcal{N}_1 \subseteq \mathcal{N}_2 \quad \text{and} \quad g_2|_{\mathcal{N}_1} = g_1. \] This is reflexive, antisymmetric, and transitive, so \((\mathcal{S}, \preceq)\) is a partially ordered set.

Every chain has an upper bound.
Let \(\mathcal{C} = \{(\mathcal{N}_i, g_i) : i \in I\}\) be a chain in \(\mathcal{S}\). Put \(\mathcal{N} := \bigcup_{i \in I} \mathcal{N}_i\). This union is a linear subspace: given \(u, v \in \mathcal{N}\), say \(u \in \mathcal{N}_i\) and \(v \in \mathcal{N}_j\), comparability of the chain places both in the larger of \(\mathcal{N}_i, \mathcal{N}_j\), which is a subspace and so contains every linear combination \(\lambda u + v\). Define \(g : \mathcal{N} \to \mathbb{R}\) by \(g(u) := g_i(u)\) whenever \(u \in \mathcal{N}_i\). This is well defined: if \(u \in \mathcal{N}_i \cap \mathcal{N}_j\), then the chain makes one pair dominate the other, say \((\mathcal{N}_i, g_i) \preceq (\mathcal{N}_j, g_j)\), whence \(g_j|_{\mathcal{N}_i} = g_i\) and \(g_i(u) = g_j(u)\). Linearity of \(g\) follows by evaluating any two arguments inside a common \(\mathcal{N}_i\) (again using comparability), and \(g \leq q\) holds because it holds for each \(g_i\). Thus \((\mathcal{N}, g) \in \mathcal{S}\), and by construction it is an upper bound for \(\mathcal{C}\).

Maximal element is the full space.
By Zorn's Lemma, \(\mathcal{S}\) has a maximal element \((\mathcal{Y}, F)\). We claim \(\mathcal{Y} = \mathcal{X}\). If not, choose \(x_0 \in \mathcal{X} \setminus \mathcal{Y}\). Applying the one-dimensional extension lemma to \(F\) on \(\mathcal{Y}\) and the direction \(x_0\) produces a dominated linear extension \(F'\) on \(\mathcal{Y} + \mathbb{R}\,x_0\), a strictly larger subspace. Then \((\mathcal{Y}, F) \prec (\mathcal{Y} + \mathbb{R}\,x_0, F')\), contradicting maximality. Hence \(\mathcal{Y} = \mathcal{X}\), and \(F\) is the required extension: linear on all of \(\mathcal{X}\), restricting to \(f\) on \(\mathcal{M}\), and dominated by \(q\).

The Complex Case

The real theorem dominates a functional by a sublinear functional, a notion tied to the order structure of \(\mathbb{R}\). On a complex space neither the order nor non-negative homogeneity makes sense, so the natural object of domination becomes a seminorm, whose absolute homogeneity \(p(\alpha x) = |\alpha| p(x)\) is meaningful for complex \(\alpha\). The passage from real to complex rests on a single observation: a complex-linear functional is completely determined by its real part.

Real and Complex Parts of a Functional

A complex vector space \(\mathcal{X}\) is also a real vector space, simply by restricting scalar multiplication to real scalars. If \(g : \mathcal{X} \to \mathbb{C}\) is complex-linear, then \(\operatorname{Re} g : \mathcal{X} \to \mathbb{R}\) is real-linear. The following lemma shows this loses no information.

Lemma: Recovery from the Real Part

Let \(\mathcal{X}\) be a vector space over \(\mathbb{C}\).

(a) If \(f : \mathcal{X} \to \mathbb{R}\) is real-linear, then \[ \tilde{f}(x) \;:=\; f(x) - i\, f(i x) \] is complex-linear, and \(f = \operatorname{Re} \tilde{f}\).

(b) If \(g : \mathcal{X} \to \mathbb{C}\) is complex-linear and \(f = \operatorname{Re} g\), then \(\tilde{f} = g\).

(c) If \(p\) is a seminorm on \(\mathcal{X}\), then \(|f(x)| \leq p(x)\) for all \(x\) if and only if \(|\tilde{f}(x)| \leq p(x)\) for all \(x\).

(d) If \(\mathcal{X}\) is a normed space and \(f, \tilde{f}\) are as in (a), then \(\|f\| = \|\tilde{f}\|\).

Proof

(a)
Additivity and real homogeneity of \(\tilde{f}\) follow from those of \(f\). For complex homogeneity it suffices to check multiplication by \(i\), since every complex scalar is generated by \(\mathbb{R}\) and \(i\). Compute \[ \begin{align*} \tilde{f}(i x) &= f(i x) - i\, f(i^2 x) \\\\ &= f(i x) - i\, f(-x) \\\\ &= f(i x) + i\, f(x) \\\\ &= i\bigl(f(x) - i\, f(i x)\bigr) \\\\ &= i\, \tilde{f}(x), \end{align*} \] using real-linearity of \(f\) (so \(f(-x) = -f(x)\)). Hence \(\tilde{f}\) is complex-linear. Its real part is \(\operatorname{Re}\tilde{f}(x) = f(x)\), since \(f\) is real-valued.

(b)
With \(f = \operatorname{Re} g\), write \(g(x) = \operatorname{Re} g(x) + i \operatorname{Im} g(x)\). For any complex number \(g(x)\), its imaginary part equals \(-\operatorname{Re}\bigl(i\, g(x)\bigr)\). Complex-linearity gives \(i\, g(x) = g(i x)\), so \(\operatorname{Im} g(x) = -\operatorname{Re} g(i x) = -f(i x)\). Therefore \(g(x) = f(x) - i f(i x) = \tilde{f}(x)\).

(c)
Since \(f = \operatorname{Re}\tilde{f}\), we have \(|f(x)| = |\operatorname{Re}\tilde{f}(x)| \leq |\tilde{f}(x)|\), so the bound on \(\tilde{f}\) immediately implies the bound on \(f\). Conversely, suppose \(|f(x)| \leq p(x)\) for all \(x\). Fix \(x\) and write \(\tilde{f}(x) = r e^{i\theta}\) with \(r = |\tilde{f}(x)| \geq 0\). Then \[ |\tilde{f}(x)| \;=\; r \;=\; e^{-i\theta}\,\tilde{f}(x) \;=\; \tilde{f}\bigl(e^{-i\theta} x\bigr), \] using complex-linearity. The left side is real, so the right side equals its own real part: \(|\tilde{f}(x)| = \operatorname{Re}\tilde{f}(e^{-i\theta} x) = f(e^{-i\theta} x)\). Applying the hypothesis and absolute homogeneity of \(p\), \[ |\tilde{f}(x)| \;=\; f\bigl(e^{-i\theta} x\bigr) \;\leq\; p\bigl(e^{-i\theta} x\bigr) \;=\; |e^{-i\theta}|\, p(x) \;=\; p(x). \]

(d)
A norm is in particular a seminorm, so applying (c) with \(p(x) = \|\tilde{f}\|\,\|x\|\) and with \(p(x) = \|f\|\,\|x\|\) shows each of \(\|f\|, \|\tilde{f}\|\) bounds the other; hence they are equal.

The Seminorm Version

With recovery from the real part in hand, the complex extension theorem follows by applying the real theorem to the real part, then complexifying.

Theorem: Hahn-Banach (Seminorm-Dominated Form)

Let \(\mathcal{X}\) be a vector space over \(\mathbb{F}\) (\(\mathbb{R}\) or \(\mathbb{C}\)), let \(p\) be a seminorm on \(\mathcal{X}\), and let \(\mathcal{M} \subseteq \mathcal{X}\) be a linear subspace. If \(f : \mathcal{M} \to \mathbb{F}\) is linear with \(|f(x)| \leq p(x)\) for all \(x \in \mathcal{M}\), then there exists a linear functional \(F : \mathcal{X} \to \mathbb{F}\) extending \(f\) with \[ |F(x)| \;\leq\; p(x) \qquad \text{for all } x \in \mathcal{X}. \]

Proof

Case \(\mathbb{F} = \mathbb{R}\).
A seminorm is a sublinear functional, and \(f(x) \leq |f(x)| \leq p(x)\) on \(\mathcal{M}\). The real theorem yields a linear extension \(F\) with \(F(x) \leq p(x)\) for all \(x\). Applying this to \(-x\) and using \(p(-x) = p(x)\) gives \(-F(x) = F(-x) \leq p(-x) = p(x)\), so \(|F(x)| \leq p(x)\).

Case \(\mathbb{F} = \mathbb{C}\).
Let \(f_1 := \operatorname{Re} f\), a real-linear functional on \(\mathcal{M}\) with \(|f_1(x)| \leq |f(x)| \leq p(x)\). By the real case there is a real-linear \(F_1 : \mathcal{X} \to \mathbb{R}\) extending \(f_1\) with \(|F_1(x)| \leq p(x)\) for all \(x\). Define \(F(x) := F_1(x) - i F_1(i x)\). By part (a) of the recovery lemma \(F\) is complex-linear, and by part (c) the bound \(|F_1| \leq p\) gives \(|F(x)| \leq p(x)\) for all \(x\). Finally \(F\) extends \(f\): on \(\mathcal{M}\), part (b) applied to \(g = f\) gives \(f = \widetilde{\operatorname{Re} f} = \widetilde{f_1}\), and since \(F_1|_{\mathcal{M}} = f_1\) we get \(F|_{\mathcal{M}} = \widetilde{f_1} = f\).

Norm-Preserving Extension & Consequences

We now specialize to normed spaces and recover the statement that the dual-space theory took on faith: a bounded functional on a subspace extends to the whole space without enlarging its norm. This is the seminorm theorem applied to the most natural seminorm a bounded functional carries with it.

The Norm-Preserving Extension

Let \(\mathcal{X}\) be a normed space, \(\mathcal{M} \subseteq \mathcal{X}\) a subspace, and \(f \in \mathcal{M}^*\) a bounded linear functional, with operator norm \(\|f\|_{\mathcal{M}^*}\). Set \[ p(x) \;:=\; \|f\|_{\mathcal{M}^*}\, \|x\|_{\mathcal{X}}. \] This \(p\) is a seminorm on \(\mathcal{X}\) — in fact a scalar multiple of the norm — and on \(\mathcal{M}\) the boundedness of \(f\) reads exactly \(|f(x)| \leq \|f\|_{\mathcal{M}^*}\,\|x\| = p(x)\). The seminorm theorem furnishes a linear extension \(F : \mathcal{X} \to \mathbb{F}\) of \(f\) with \(|F(x)| \leq p(x) = \|f\|_{\mathcal{M}^*}\,\|x\|\) for all \(x\). The last inequality says \(\|F\|_{\mathcal{X}^*} \leq \|f\|_{\mathcal{M}^*}\); and since \(F\) extends \(f\), the supremum defining \(\|F\|_{\mathcal{X}^*}\) ranges over a superset of that defining \(\|f\|_{\mathcal{M}^*}\), giving the reverse inequality. Hence \(\|F\|_{\mathcal{X}^*} = \|f\|_{\mathcal{M}^*}\): the extension is norm-preserving.

This is precisely the Hahn-Banach extension theorem invoked in the study of dual spaces. The debt incurred there — where the statement was used to construct norming functionals and to prove the canonical embedding isometric — is now discharged: it is the norm-seminorm specialization of the general theorem proved above.

Why the Dual Space Is Rich

The norm-preserving extension has an immediate consequence that the dual-space theory used repeatedly: a normed space carries enough functionals to detect every vector by its norm. Given a nonzero \(x \in \mathcal{X}\), apply the extension to the one-dimensional subspace \(\mathcal{M} = \mathbb{F}\,x\) with the functional \(f(\lambda x) := \lambda \|x\|\), which has \(\|f\|_{\mathcal{M}^*} = 1\). The resulting \(F \in \mathcal{X}^*\) satisfies \(\|F\|_{\mathcal{X}^*} = 1\) and \(F(x) = \|x\|\). This is the norming functional whose existence underlies both the isometry of the canonical embedding \(J : \mathcal{X} \to \mathcal{X}^{**}\) and the fact that \(\mathcal{X}^*\) separates points. Two vectors with the same image under every functional must be equal — the dual space sees everything.