The Extension Problem
Throughout the study of dual spaces
we relied on a single, deceptively powerful guarantee: that a normed space carries
enough continuous linear functionals to detect every vector. Concretely, we used the existence of a
norming functional — a unit functional \(\varphi\) with \(\varphi(x) = \|x\|\) — to prove
that the canonical embedding \(J : \mathcal{X} \to \mathcal{X}^{**}\) is isometric and that
\(\mathcal{X}^*\) separates points. That guarantee was taken on faith. We now repay the debt.
The mechanism behind it is an extension result. One starts with a functional defined only on a
subspace — for instance, the one-dimensional span of a single vector, where a functional is trivial to write
down — and asks whether it extends to the whole space without increasing its size. That the answer is
always yes is the content of the Hahn-Banach Theorem, the first of the cornerstones of functional
analysis. Its reach extends well beyond norms: the natural setting is extension under domination by a
sublinear functional, a class of size-measuring maps more permissive than norms and seminorms,
defined below. We state the theorem in this general form because the
geometric, separation-type consequences we will need later — separating a point from a convex set, characterizing
weak closures — require exactly this generality, not merely the norm-preserving special case.
Seminorms and Sublinear Functionals
Two classes of size-measuring functions organize the entire discussion. The first weakens a norm by dropping the
requirement that only the zero vector have zero size.
Definition: Seminorm
Let \(\mathcal{X}\) be a vector space over \(\mathbb{F}\) (\(\mathbb{R}\) or \(\mathbb{C}\)). A
seminorm on \(\mathcal{X}\) is a function \(p : \mathcal{X} \to [0, \infty)\) such that, for all
\(x, y \in \mathcal{X}\) and all \(\alpha \in \mathbb{F}\),
\[
p(x + y) \;\leq\; p(x) + p(y) \qquad \text{(subadditivity)},
\]
\[
p(\alpha x) \;=\; |\alpha|\, p(x) \qquad \text{(absolute homogeneity)}.
\]
Taking \(\alpha = 0\) in absolute homogeneity gives \(p(0) = 0\). A seminorm that additionally satisfies
\(p(x) = 0 \Rightarrow x = 0\) is precisely a norm.
The second class abstracts the two structural features a norm uses when it bounds a functional — the triangle
inequality and scaling under factors — but discards far more than a seminorm does. It may take
negative values, and it is homogeneous only under non-negative scalars.
Definition: Sublinear Functional
Let \(\mathcal{X}\) be a vector space over \(\mathbb{R}\). A sublinear functional on
\(\mathcal{X}\) is a map \(q : \mathcal{X} \to \mathbb{R}\) satisfying, for all \(x, y \in \mathcal{X}\) and all
\(\alpha \geq 0\),
\[
q(x + y) \;\leq\; q(x) + q(y) \qquad \text{(subadditivity)},
\]
\[
q(\alpha x) \;=\; \alpha\, q(x) \qquad \text{(non-negative homogeneity)}.
\]
On a real vector space, every seminorm is a sublinear functional: absolute homogeneity
\(p(\alpha x) = |\alpha| p(x)\) restricts, for \(\alpha \geq 0\), to \(p(\alpha x) = \alpha\, p(x)\), which is exactly
non-negative homogeneity. The converse fails. A seminorm is symmetric (\(p(-x) = p(x)\)) and non-negative,
whereas a sublinear functional need be neither. A concrete example on \(\mathbb{R}\) is \(q(x) = x\) itself,
which is linear — hence sublinear — yet takes negative values and satisfies
\(q(-1) = -1 \neq 1 = q(1)\). This genuine asymmetry is precisely what will let a sublinear functional encode
one-sided geometric constraints, such as the separation of convex sets, in later developments. On a
complex space the notion of sublinear functional does not directly apply — non-negative
homogeneity is meaningless for complex scalars — which is exactly why the complex form of the theorem,
treated below, is phrased through seminorms rather than sublinear functionals.
The extension problem can now be posed sharply. Suppose \(f\) is a linear functional defined on a subspace
\(\mathcal{M} \subseteq \mathcal{X}\), and suppose it is dominated by a sublinear functional \(q\) on
its domain, meaning \(f(x) \leq q(x)\) for all \(x \in \mathcal{M}\). Does \(f\) extend to a linear functional \(F\) on
all of \(\mathcal{X}\), still dominated by the same \(q\)? Finding some linear extension is easy — extend a
basis of \(\mathcal{M}\) arbitrarily to a basis of \(\mathcal{X}\). The difficulty, and the entire substance of the
theorem, is to find an extension that remains dominated by \(q\). We turn to this next.
The Hahn-Banach Theorem: Real Case
We state the theorem in its primary form: extension of a real linear functional under domination by a sublinear
functional. The norm-preserving statement that powers the dual-space theory is a special case, recovered at the
end of this development.
Theorem: Hahn-Banach (Real, Sublinear-Dominated Form)
Let \(\mathcal{X}\) be a vector space over \(\mathbb{R}\), let \(q\) be a
sublinear functional on \(\mathcal{X}\),
and let \(\mathcal{M} \subseteq \mathcal{X}\) be a linear subspace. If \(f : \mathcal{M} \to \mathbb{R}\) is
linear and satisfies
\[
f(x) \;\leq\; q(x) \qquad \text{for all } x \in \mathcal{M},
\]
then there exists a linear functional \(F : \mathcal{X} \to \mathbb{R}\) extending \(f\) — that is,
\(F|_{\mathcal{M}} = f\) — with
\[
F(x) \;\leq\; q(x) \qquad \text{for all } x \in \mathcal{X}.
\]
The proof has two ingredients. The first is a single, concrete step: given the functional on a subspace, extend
it to a subspace one dimension larger, preserving domination. The second is a transfinite mechanism that iterates
this step until the domain exhausts \(\mathcal{X}\). The mechanism is Zorn's Lemma, which we now
state, since the rest of functional analysis — and this proof in particular — uses it openly.
Zorn's Lemma
Recall the language of order. A partial order \(\preceq\) on a set \(\mathcal{P}\) is a reflexive,
antisymmetric, transitive relation. A subset \(\mathcal{C} \subseteq \mathcal{P}\) is a chain if any
two of its elements are comparable. An upper bound for \(\mathcal{C}\) is an element \(u \in \mathcal{P}\)
with \(c \preceq u\) for every \(c \in \mathcal{C}\); a maximal element of \(\mathcal{P}\) is an
\(m \in \mathcal{P}\) admitting no strictly larger element.
Theorem (Axiom): Zorn's Lemma
Let \((\mathcal{P}, \preceq)\) be a non-empty partially ordered set in which every chain has an upper bound in
\(\mathcal{P}\). Then \(\mathcal{P}\) has a maximal element.
Zorn's Lemma is logically equivalent to the Axiom of Choice over the
Zermelo–Fraenkel axioms: each can be derived from the other, and neither is provable from the remaining
axioms of set theory. We therefore adopt it as an axiom. It is not a theorem awaiting proof
here or elsewhere; stating it is a declaration of which set-theoretic foundation we build on, made explicit so
that every later existence result resting on it — bases of arbitrary vector spaces, maximal ideals, and
the extension theorem below — carries a visible dependency rather than a hidden one.
Extension by One Dimension
The heart of the matter is the following lemma. Once a dominated functional can be pushed across a single new
dimension, Zorn's Lemma supplies the rest.
Lemma: One-Dimensional Extension
Under the hypotheses of the theorem, suppose additionally that \(\dim(\mathcal{X} / \mathcal{M}) = 1\); that is,
\(\mathcal{X} = \mathcal{M} + \mathbb{R}\,x_0\) for some \(x_0 \notin \mathcal{M}\). Then \(f\) admits an extension
\(F : \mathcal{X} \to \mathbb{R}\), linear and dominated by \(q\).
Proof
Every element of \(\mathcal{X}\) is uniquely \(t x_0 + y\) with \(t \in \mathbb{R}\), \(y \in \mathcal{M}\)
(uniqueness because \(x_0 \notin \mathcal{M}\)). A linear extension of \(f\) is therefore determined by a single
real number \(\alpha_0 := F(x_0)\), via \(F(t x_0 + y) = t\alpha_0 + f(y)\). The task is to choose \(\alpha_0\) so
that \(F \leq q\) everywhere.
Consider \(t > 0\). Writing \(t x_0 + y = t\bigl(x_0 + y/t\bigr)\) and using positive homogeneity of \(q\), the
requirement \(F(t x_0 + y) \leq q(t x_0 + y)\) becomes \(t\alpha_0 + f(y) \leq t\, q(x_0 + y/t)\); dividing by
\(t > 0\) and renaming \(y_1 := y/t \in \mathcal{M}\), this is
\[
\alpha_0 \;\leq\; -f(y_1) + q(x_0 + y_1) \qquad \text{for all } y_1 \in \mathcal{M}.
\]
For \(t < 0\), write \(t = -s\) with \(s > 0\); the requirement becomes \(-s\alpha_0 + f(y) \leq s\, q(-x_0 + y/s)\),
and dividing by \(s > 0\) and renaming \(y_2 := y/s\) gives
\[
\alpha_0 \;\geq\; f(y_2) - q(-x_0 + y_2) \qquad \text{for all } y_2 \in \mathcal{M}.
\]
The case \(t = 0\) is the original hypothesis \(f \leq q\) on \(\mathcal{M}\). Thus a valid \(\alpha_0\) exists if
and only if the supremum below does not exceed the infimum:
\[
\begin{align*}
\sup_{y_2 \in \mathcal{M}} \bigl[\, f(y_2) - q(-x_0 + y_2) \,\bigr] \\\\
\leq\; \inf_{y_1 \in \mathcal{M}} \bigl[\, {-f(y_1)} + q(x_0 + y_1) \,\bigr].
\end{align*}
\]
This inequality holds. For arbitrary \(y_1, y_2 \in \mathcal{M}\), linearity of \(f\), the domination
\(f \leq q\) on \(\mathcal{M}\), and subadditivity of \(q\) give
\[
\begin{align*}
f(y_2) + f(y_1) &= f(y_1 + y_2) \\\\
&\leq q(y_1 + y_2) \\\\
&= q\bigl((x_0 + y_1) + (-x_0 + y_2)\bigr) \\\\
&\leq q(x_0 + y_1) + q(-x_0 + y_2).
\end{align*}
\]
Rearranging, \(f(y_2) - q(-x_0 + y_2) \leq -f(y_1) + q(x_0 + y_1)\) for every pair \(y_1, y_2\). Taking the
supremum over \(y_2\) on the left and the infimum over \(y_1\) on the right yields the displayed inequality.
Any \(\alpha_0\) in the (non-empty) interval between the two sides defines, via \(F(t x_0 + y) = t\alpha_0 + f(y)\),
a linear extension of \(f\) with \(F \leq q\) on all of \(\mathcal{X}\).
From One Dimension to All of \(\mathcal{X}\)
The one-dimensional lemma extends \(f\) across any single new direction. To reach all of \(\mathcal{X}\) — possibly
requiring uncountably many such steps — we order the partial extensions and apply Zorn's Lemma.
Proof of the Theorem
Let \(\mathcal{S}\) be the set of all pairs \((\mathcal{N}, g)\) where \(\mathcal{N}\) is a linear subspace with
\(\mathcal{M} \subseteq \mathcal{N} \subseteq \mathcal{X}\) and \(g : \mathcal{N} \to \mathbb{R}\) is linear,
extends \(f\), and satisfies \(g \leq q\) on \(\mathcal{N}\). The pair \((\mathcal{M}, f)\) lies in \(\mathcal{S}\),
so \(\mathcal{S} \neq \varnothing\). Order \(\mathcal{S}\) by declaring
\((\mathcal{N}_1, g_1) \preceq (\mathcal{N}_2, g_2)\) to mean
\[
\mathcal{N}_1 \subseteq \mathcal{N}_2 \quad \text{and} \quad g_2|_{\mathcal{N}_1} = g_1.
\]
This is reflexive, antisymmetric, and transitive, so \((\mathcal{S}, \preceq)\) is a partially ordered set.
Every chain has an upper bound.
Let \(\mathcal{C} = \{(\mathcal{N}_i, g_i) : i \in I\}\) be a
chain in \(\mathcal{S}\). Put \(\mathcal{N} := \bigcup_{i \in I} \mathcal{N}_i\). This union is a linear subspace:
given \(u, v \in \mathcal{N}\), say \(u \in \mathcal{N}_i\) and \(v \in \mathcal{N}_j\), comparability of the chain
places both in the larger of \(\mathcal{N}_i, \mathcal{N}_j\), which is a subspace and so contains every linear
combination \(\lambda u + v\). Define \(g : \mathcal{N} \to \mathbb{R}\) by \(g(u) := g_i(u)\) whenever
\(u \in \mathcal{N}_i\). This is well defined: if \(u \in \mathcal{N}_i \cap \mathcal{N}_j\), then
the chain makes one pair dominate the other, say \((\mathcal{N}_i, g_i) \preceq (\mathcal{N}_j, g_j)\), whence
\(g_j|_{\mathcal{N}_i} = g_i\) and \(g_i(u) = g_j(u)\). Linearity of \(g\) follows by evaluating any two arguments
inside a common \(\mathcal{N}_i\) (again using comparability), and \(g \leq q\) holds because it holds for each
\(g_i\). Thus \((\mathcal{N}, g) \in \mathcal{S}\), and by construction it is an upper bound for \(\mathcal{C}\).
Maximal element is the full space.
By
Zorn's Lemma, \(\mathcal{S}\) has a maximal element
\((\mathcal{Y}, F)\). We claim \(\mathcal{Y} = \mathcal{X}\). If not, choose \(x_0 \in \mathcal{X} \setminus \mathcal{Y}\).
Applying the
one-dimensional extension lemma to \(F\) on
\(\mathcal{Y}\) and the direction \(x_0\) produces a dominated linear extension \(F'\) on
\(\mathcal{Y} + \mathbb{R}\,x_0\), a strictly larger subspace. Then \((\mathcal{Y}, F) \prec (\mathcal{Y} + \mathbb{R}\,x_0, F')\),
contradicting maximality. Hence \(\mathcal{Y} = \mathcal{X}\), and \(F\) is the required extension:
linear on all of \(\mathcal{X}\), restricting to \(f\) on \(\mathcal{M}\), and dominated by \(q\).
The Complex Case
The real theorem dominates a functional by a sublinear functional, a notion tied to the order structure of
\(\mathbb{R}\). On a complex space neither the order nor non-negative homogeneity makes sense, so the natural object
of domination becomes a seminorm, whose absolute
homogeneity \(p(\alpha x) = |\alpha| p(x)\) is meaningful for complex \(\alpha\). The passage from real to complex
rests on a single observation: a complex-linear functional is completely determined by its real part.
Real and Complex Parts of a Functional
A complex vector space \(\mathcal{X}\) is also a real vector space, simply by restricting scalar multiplication to
real scalars. If \(g : \mathcal{X} \to \mathbb{C}\) is complex-linear, then \(\operatorname{Re} g : \mathcal{X} \to \mathbb{R}\)
is real-linear. The following lemma shows this loses no information.
Lemma: Recovery from the Real Part
Let \(\mathcal{X}\) be a vector space over \(\mathbb{C}\).
(a) If \(f : \mathcal{X} \to \mathbb{R}\) is real-linear, then
\[
\tilde{f}(x) \;:=\; f(x) - i\, f(i x)
\]
is complex-linear, and \(f = \operatorname{Re} \tilde{f}\).
(b) If \(g : \mathcal{X} \to \mathbb{C}\) is complex-linear and \(f = \operatorname{Re} g\), then \(\tilde{f} = g\).
(c) If \(p\) is a seminorm on \(\mathcal{X}\), then \(|f(x)| \leq p(x)\) for all \(x\) if and only if
\(|\tilde{f}(x)| \leq p(x)\) for all \(x\).
(d) If \(\mathcal{X}\) is a normed space and \(f, \tilde{f}\) are as in (a), then
\(\|f\| = \|\tilde{f}\|\).
Proof
(a)
Additivity and real homogeneity of \(\tilde{f}\) follow from those of \(f\). For complex
homogeneity it suffices to check multiplication by \(i\), since every complex scalar is generated by
\(\mathbb{R}\) and \(i\). Compute
\[
\begin{align*}
\tilde{f}(i x) &= f(i x) - i\, f(i^2 x) \\\\
&= f(i x) - i\, f(-x) \\\\
&= f(i x) + i\, f(x) \\\\
&= i\bigl(f(x) - i\, f(i x)\bigr) \\\\
&= i\, \tilde{f}(x),
\end{align*}
\]
using real-linearity of \(f\) (so \(f(-x) = -f(x)\)). Hence \(\tilde{f}\) is complex-linear. Its real part is
\(\operatorname{Re}\tilde{f}(x) = f(x)\), since \(f\) is real-valued.
(b)
With \(f = \operatorname{Re} g\), write \(g(x) = \operatorname{Re} g(x) + i \operatorname{Im} g(x)\).
For any complex number \(g(x)\), its imaginary part equals \(-\operatorname{Re}\bigl(i\, g(x)\bigr)\). Complex-linearity
gives \(i\, g(x) = g(i x)\), so \(\operatorname{Im} g(x) = -\operatorname{Re} g(i x) = -f(i x)\). Therefore
\(g(x) = f(x) - i f(i x) = \tilde{f}(x)\).
(c)
Since \(f = \operatorname{Re}\tilde{f}\), we have \(|f(x)| = |\operatorname{Re}\tilde{f}(x)| \leq |\tilde{f}(x)|\),
so the bound on \(\tilde{f}\) immediately implies the bound on \(f\). Conversely, suppose \(|f(x)| \leq p(x)\) for
all \(x\). Fix \(x\) and write \(\tilde{f}(x) = r e^{i\theta}\) with \(r = |\tilde{f}(x)| \geq 0\). Then
\[
|\tilde{f}(x)| \;=\; r \;=\; e^{-i\theta}\,\tilde{f}(x) \;=\; \tilde{f}\bigl(e^{-i\theta} x\bigr),
\]
using complex-linearity. The left side is real, so the right side equals its own real part:
\(|\tilde{f}(x)| = \operatorname{Re}\tilde{f}(e^{-i\theta} x) = f(e^{-i\theta} x)\). Applying the hypothesis and
absolute homogeneity of \(p\),
\[
|\tilde{f}(x)| \;=\; f\bigl(e^{-i\theta} x\bigr) \;\leq\; p\bigl(e^{-i\theta} x\bigr) \;=\; |e^{-i\theta}|\, p(x) \;=\; p(x).
\]
(d)
A norm is in particular a seminorm, so applying (c) with \(p(x) = \|\tilde{f}\|\,\|x\|\) and
with \(p(x) = \|f\|\,\|x\|\) shows each of \(\|f\|, \|\tilde{f}\|\) bounds the other; hence they are equal.
The Seminorm Version
With recovery from the real part in hand, the complex extension theorem follows by applying the real theorem to
the real part, then complexifying.
Theorem: Hahn-Banach (Seminorm-Dominated Form)
Let \(\mathcal{X}\) be a vector space over \(\mathbb{F}\) (\(\mathbb{R}\) or \(\mathbb{C}\)), let \(p\) be a
seminorm on \(\mathcal{X}\), and let \(\mathcal{M} \subseteq \mathcal{X}\) be a linear subspace. If
\(f : \mathcal{M} \to \mathbb{F}\) is linear with \(|f(x)| \leq p(x)\) for all \(x \in \mathcal{M}\), then there
exists a linear functional \(F : \mathcal{X} \to \mathbb{F}\) extending \(f\) with
\[
|F(x)| \;\leq\; p(x) \qquad \text{for all } x \in \mathcal{X}.
\]
Proof
Case \(\mathbb{F} = \mathbb{R}\).
A seminorm is a sublinear functional, and \(f(x) \leq |f(x)| \leq p(x)\)
on \(\mathcal{M}\). The
real theorem yields a linear extension \(F\)
with \(F(x) \leq p(x)\) for all \(x\). Applying this to \(-x\) and using \(p(-x) = p(x)\) gives
\(-F(x) = F(-x) \leq p(-x) = p(x)\), so \(|F(x)| \leq p(x)\).
Case \(\mathbb{F} = \mathbb{C}\).
Let \(f_1 := \operatorname{Re} f\), a real-linear functional on
\(\mathcal{M}\) with \(|f_1(x)| \leq |f(x)| \leq p(x)\). By the real case there is a real-linear
\(F_1 : \mathcal{X} \to \mathbb{R}\) extending \(f_1\) with \(|F_1(x)| \leq p(x)\) for all \(x\). Define
\(F(x) := F_1(x) - i F_1(i x)\). By part (a) of the recovery lemma \(F\) is complex-linear, and by part (c)
the bound \(|F_1| \leq p\) gives \(|F(x)| \leq p(x)\) for all \(x\). Finally \(F\) extends \(f\): on
\(\mathcal{M}\), part (b) applied to \(g = f\) gives \(f = \widetilde{\operatorname{Re} f} = \widetilde{f_1}\),
and since \(F_1|_{\mathcal{M}} = f_1\) we get \(F|_{\mathcal{M}} = \widetilde{f_1} = f\).
Norm-Preserving Extension & Consequences
We now specialize to normed spaces and recover the statement that the
dual-space theory took on
faith: a bounded functional on a subspace extends to the whole space without enlarging its norm. This is
the seminorm theorem applied to the most natural seminorm a bounded functional carries with it.
The Norm-Preserving Extension
Let \(\mathcal{X}\) be a normed space, \(\mathcal{M} \subseteq \mathcal{X}\) a subspace, and \(f \in \mathcal{M}^*\) a
bounded linear functional, with operator norm \(\|f\|_{\mathcal{M}^*}\). Set
\[
p(x) \;:=\; \|f\|_{\mathcal{M}^*}\, \|x\|_{\mathcal{X}}.
\]
This \(p\) is a seminorm on \(\mathcal{X}\) — in fact a scalar multiple of the norm — and on \(\mathcal{M}\) the
boundedness of \(f\) reads exactly \(|f(x)| \leq \|f\|_{\mathcal{M}^*}\,\|x\| = p(x)\). The
seminorm theorem furnishes a linear extension
\(F : \mathcal{X} \to \mathbb{F}\) of \(f\) with \(|F(x)| \leq p(x) = \|f\|_{\mathcal{M}^*}\,\|x\|\) for all \(x\). The
last inequality says \(\|F\|_{\mathcal{X}^*} \leq \|f\|_{\mathcal{M}^*}\); and since \(F\) extends \(f\), the supremum
defining \(\|F\|_{\mathcal{X}^*}\) ranges over a superset of that defining \(\|f\|_{\mathcal{M}^*}\), giving the reverse
inequality. Hence \(\|F\|_{\mathcal{X}^*} = \|f\|_{\mathcal{M}^*}\): the extension is norm-preserving.
This is precisely the
Hahn-Banach extension theorem
invoked in the study of dual spaces. The debt incurred there — where the statement was used to construct norming
functionals and to prove the canonical embedding isometric — is now discharged: it is the norm-seminorm
specialization of the general theorem proved above.
Why the Dual Space Is Rich
The norm-preserving extension has an immediate consequence that the dual-space theory used repeatedly: a normed
space carries enough functionals to detect every vector by its norm. Given a nonzero \(x \in \mathcal{X}\), apply the
extension to the one-dimensional subspace \(\mathcal{M} = \mathbb{F}\,x\) with the functional
\(f(\lambda x) := \lambda \|x\|\), which has \(\|f\|_{\mathcal{M}^*} = 1\). The resulting \(F \in \mathcal{X}^*\) satisfies
\(\|F\|_{\mathcal{X}^*} = 1\) and \(F(x) = \|x\|\). This is the
norming functional whose
existence underlies both the isometry of the canonical embedding
\(J : \mathcal{X} \to \mathcal{X}^{**}\) and the fact that \(\mathcal{X}^*\) separates points. Two vectors with the same
image under every functional must be equal — the dual space sees everything.