The Theorem and the Setting
The classical theorem of Weierstrass asserts that every continuous real function on a closed interval is a
uniform limit of polynomials. Read inside the space \(C([a, b])\) of continuous functions with the supremum
norm, this says that the polynomials form a dense subset. Stone's generalization replaces the
interval by an arbitrary compact Hausdorff space \(X\) and the polynomials by any subalgebra of \(C(X)\) large
enough to tell points apart. The result is one of the central density theorems of analysis: it converts an
algebraic hypothesis — closure under addition, multiplication, and scalar multiplication, together with
a separation property — into the analytic conclusion that the subalgebra exhausts all of \(C(X)\).
Throughout, \(X\) is a compact Hausdorff space and \(C(X)\) is the algebra of continuous functions
\(X \to \mathbb{C}\), normed by \(\|f\| = \sup_{x \in X} |f(x)|\). A subalgebra of \(C(X)\)
is a linear subspace \(\mathcal{A}\) that is closed under pointwise multiplication: if \(f, g \in \mathcal{A}\)
then \(fg \in \mathcal{A}\). We say \(\mathcal{A}\) is closed when it is closed in the norm
topology, i.e. a uniformly closed subspace. The proof we give routes the entire
argument through the geometry of the dual ball; it rests on the
Krein-Milman Theorem and on
the identification of the dual of \(C(X)\) with a space of measures.
Separation and Conjugation
Two structural properties single out the subalgebras that can be dense. The first is a separation condition:
the functions must be able to distinguish any two points of \(X\). Without it, a subalgebra cannot even
approximate a function that takes different values at two points it fails to separate.
Definition: Separation of Points
A collection \(\mathcal{A}\) of functions on \(X\) separates the points of \(X\) if for
every pair of distinct points \(x, y \in X\) there is a function \(f \in \mathcal{A}\) with
\(f(x) \neq f(y)\).
The second property is needed only in the complex setting. A subalgebra of complex-valued functions might be
closed under all algebraic operations and still fail to be dense, because the operations of complex analysis
are not symmetric under conjugation. The subalgebra of \(C(\overline{\mathbb{D}})\) generated by the
coordinate function \(z\) on the closed unit disc consists of uniform limits of polynomials in \(z\); these
are holomorphic on the interior, so the non-holomorphic function \(\bar{z}\) lies outside it even though the
algebra separates points and contains the constants. The remedy is to require closure under conjugation.
Definition: Self-Adjoint Subalgebra
For a function \(f : X \to \mathbb{C}\), let \(\bar{f}\) denote its pointwise complex conjugate,
\(\bar{f}(x) = \overline{f(x)}\). A subalgebra \(\mathcal{A} \subseteq C(X)\) is self-adjoint
(or closed under conjugation) if \(\bar{f} \in \mathcal{A}\) whenever \(f \in \mathcal{A}\).
The Theorem
We can now state the theorem in the form proved below. The hypotheses are exactly the three properties just
isolated: the subalgebra contains the constants, separates points, and is self-adjoint.
Theorem: Stone-Weierstrass
Let \(X\) be a compact Hausdorff space and let \(\mathcal{A}\) be a closed subalgebra of \(C(X)\) such that
(a) the constant function \(1\) belongs to \(\mathcal{A}\);
(b) \(\mathcal{A}\) separates the points of \(X\);
(c) \(\mathcal{A}\) is self-adjoint.
Then \(\mathcal{A} = C(X)\).
When \(C(X)\) is taken to be the algebra of real-valued continuous functions, condition (c) is automatic
— every real function equals its own conjugate — and the theorem reads: a closed subalgebra of
real \(C(X)\) that contains the constants and separates points is all of \(C(X)\). This is the form that
recovers the Weierstrass approximation theorem, taking \(X = [a, b]\) and \(\mathcal{A}\) the uniform closure
of the polynomials, which separates points because the single function \(x \mapsto x\) already does.
Reduction to the Annihilator
The proof does not approximate functions directly. It works in the dual space, through the principle that a
closed subspace of a Banach space is the whole space precisely when no nonzero continuous functional annihilates
it. The functionals vanishing on \(\mathcal{A}\) form its annihilator
\[
\mathcal{A}^\perp \;=\; \{\, \varphi \in C(X)^* : \varphi(f) = 0 \text{ for all } f \in \mathcal{A} \,\},
\]
so by the
annihilator density
criterion the conclusion \(\mathcal{A} = C(X)\) is equivalent to \(\mathcal{A}^\perp = (0)\). To prove
the theorem it therefore suffices to show that no nonzero functional annihilates a subalgebra satisfying (a), (b),
(c).
This is where the measures enter. By the
Riesz Representation
Theorem, every functional on \(C(X)\) is integration against a complex regular Borel measure, so an
element of \(\mathcal{A}^\perp\) is a measure \(\mu\) with \(\int f \, d\mu = 0\) for all \(f \in \mathcal{A}\). The
strategy is to assume some nonzero such \(\mu\) exists, pass to an extreme point of the unit ball of these
measures, and show that its support must collapse to a single point — whereupon condition (a) produces a
contradiction. Realizing this requires knowing what the extreme points of the measure ball look like, which is the
subject of the next section.
A Tool: Multiplying a Measure by a Function
The argument repeatedly modifies a measure by reweighting it with a bounded continuous function. We record the
construction once. Given a complex measure \(\mu\) on \(X\) and a bounded Borel function \(h\), the
product measure \(h\mu\) is defined by integrating \(h\) over each set,
\[
(h\mu)(\Delta) \;=\; \int_\Delta h \, d\mu,
\]
for every Borel set \(\Delta\). It is again a complex measure, and its
total
variation is \(|h\mu| = |h|\,|\mu|\) as set functions — reweighting a measure by \(h\) scales
its variation pointwise by \(|h|\) — so its norm is \(\|h\mu\| = |h\mu|(X) = \int |h| \, d|\mu|\); in
particular \(\|h\mu\| \leq \|h\|_\infty \, \|\mu\|\). Integration against \(h\mu\) is integration against \(\mu\)
with the integrand reweighted: \(\int g \, d(h\mu) = \int gh \, d\mu\) for every bounded Borel \(g\). The use we make of this is the
case \(h \in \mathcal{A}\): when \(\mu\) annihilates \(\mathcal{A}\), the closure of \(\mathcal{A}\) under
multiplication makes every product \(gh\) with \(g \in \mathcal{A}\) again a member of \(\mathcal{A}\), so the
reweighted measure \(h\mu\) annihilates \(\mathcal{A}\) as well. This is the mechanism by which the algebra
hypothesis enters an argument phrased entirely in terms of measures.
Extreme Points of the Ball of M(X)
The argument of the next section will extract an extreme point of the unit ball of the measure space
\(M(X)\) and
exploit its rigidity. Before running that argument we identify these extreme points completely: they are the
smallest measures there are, the
point masses scaled to
unit size. A measure spread over more than one point can always be split into a proper average, so it cannot be
extreme; a unit point mass admits no such splitting. This is the precise statement that the closing remarks on the
dual ball of \(C(X)\) anticipated, and the technique used to prove it — reweighting a measure by a separating
continuous function and reading off a contradiction from extremality — is exactly the technique that will
drive the main proof.
Theorem: Extreme Points of the Unit Ball of M(X)
Let \(X\) be a compact Hausdorff space. The extreme points of the closed unit ball of \(M(X)\) are exactly the
unimodular point masses,
\[
\operatorname{ext}\bigl(\operatorname{ball} M(X)\bigr) \;=\; \{\, \alpha \delta_x : \alpha \in \mathbb{C},\;
|\alpha| = 1,\; x \in X \,\}.
\]
Proof Sketch
Point masses are extreme.
Fix \(x \in X\) and \(|\alpha| = 1\); then \(\|\alpha\delta_x\| = |\alpha| = 1\), so \(\alpha\delta_x\) lies in
the unit ball. Suppose \(\alpha\delta_x = \tfrac{1}{2}(\nu_1 + \nu_2)\) with \(\nu_1, \nu_2\) in the ball.
Evaluating total variations, \(1 = |\alpha| = \|\alpha\delta_x\| \leq \tfrac{1}{2}(\|\nu_1\| + \|\nu_2\|) \leq
1\), forcing \(\|\nu_1\| = \|\nu_2\| = 1\) and equality throughout. Equality in the triangle inequality for the
total-variation norm, combined with both measures being forced to concentrate at \(x\) (any mass either
\(\nu_i\) placed off \(x\) would have to be cancelled by the other, contradicting
\(\|\nu_1\| + \|\nu_2\| = \|\nu_1 + \nu_2\|\)), gives \(\nu_1 = \nu_2 = \alpha\delta_x\). Hence
\(\alpha\delta_x\) is extreme.
Extreme points have one-point support.
Conversely, let \(\mu\) be an extreme point of the ball; then \(\|\mu\| = 1\), since a measure of norm less than
\(1\) is an interior point of the ball and a strict convex combination of two scalar multiples of itself. Let
\(K = \operatorname{supp}\mu\) be the
support of
\(\mu\). We claim \(K\) is a single point. Suppose instead that \(K\) contains two distinct points \(x_0 \neq
x\). Choose disjoint open sets \(U \ni x_0\) and \(V \ni x\). Because \(X\) is compact Hausdorff, hence normal,
a continuous function \(f : X \to [0, 1]\) exists with \(f \equiv 1\) on a neighborhood of \(x_0\) contained in
\(U\) and \(f \equiv 0\) on a neighborhood of \(x\) contained in \(V\) — the standard interpolation of a
continuous function between two disjoint closed sets on a normal space.
Form the reweighted measure \(f\mu\), whose total variation, by the product-measure formula established
above, is
\[
\alpha \;=\; \|f\mu\| \;=\; \int |f| \, d|\mu| \;=\; \int f \, d|\mu|,
\]
the last equality because \(0 \leq f \leq 1\) is real. Since \(x_0 \in K\), every neighborhood of \(x_0\) has
positive \(|\mu|\)-measure, so \(\int f \, d|\mu| \geq |\mu|(\{f \equiv 1 \text{ near } x_0\}) > 0\); thus
\(\alpha > 0\). Symmetrically, since \(x \in K\) and \(f\) vanishes near \(x\),
\[
1 - \alpha \;=\; \int (1 - f) \, d|\mu| \;\geq\; |\mu|(\{f \equiv 0 \text{ near } x\}) > 0,
\]
so \(\alpha < 1\). Therefore \(0 < \alpha < 1\), and we may write
\[
\mu \;=\; \alpha \cdot \frac{f\mu}{\alpha} \;+\; (1 - \alpha) \cdot \frac{(1 - f)\mu}{1 - \alpha},
\]
a convex combination of two measures each of total-variation norm \(1\): the first factor has norm
\(\|f\mu\|/\alpha = 1\), and the second has norm \(\|(1 - f)\mu\|/(1 - \alpha) = 1\) since
\(\|(1 - f)\mu\| = \int (1 - f) \, d|\mu| = 1 - \alpha\).
Because \(\mu\) is an extreme point, the two factors must coincide with \(\mu\); in particular
\(\mu = f\mu / \alpha\), i.e. \(f\mu = \alpha\mu\). Reading this as an identity of measures, \(f = \alpha\)
almost everywhere with respect to \(|\mu|\), hence \(f \equiv \alpha\) on the support \(K\). But \(f \equiv 1\)
on a neighborhood of \(x_0 \in K\) carrying positive \(|\mu|\)-mass, forcing \(\alpha = 1\) —
contradicting \(\alpha < 1\). The contradiction shows \(K = \{x_0\}\) is a single point. A measure of norm \(1\)
supported at one point is \(\mu = \alpha\delta_{x_0}\) with \(|\alpha| = \|\mu\| = 1\). This is the claimed form.
The probability measures — the positive measures \(\mu\) with \(\mu(X) = 1\) — form a convex subset of
the ball, and the same reweighting argument identifies their extreme points as the unscaled point masses
\(\{\delta_x : x \in X\}\): the positivity and unit-mass constraints remove the freedom to choose a phase
\(\alpha\), leaving exactly the evaluations. Either way, the extreme structure collapses to the points of \(X\). In
the language of functionals, reading these measures back through the
Riesz
identification, the extreme points of the dual ball of \(C(X)\) are the scaled evaluations
\(f \mapsto \alpha f(x)\). It is this collapse to single points that the proof of the next section converts into the
statement that an annihilating measure, once pushed to an extreme point, must be supported at one point — and
a measure supported at one point cannot annihilate an algebra that contains the constants.
The Proof via the Dual Ball
We now prove the theorem. By the reduction of the first section it suffices to show \(\mathcal{A}^\perp = (0)\).
Suppose, for contradiction, that \(\mathcal{A}^\perp \neq (0)\). The plan is to manufacture an extreme point of the
unit ball of \(\mathcal{A}^\perp\), show by the reweighting technique of the previous section that it must be a
point mass, and then collide that point mass with the hypothesis \(1 \in \mathcal{A}\).
An Extreme Annihilating Measure
The unit ball of \(\mathcal{A}^\perp\) is a closed subset of the unit ball of \(C(X)^*\). It is convex, and it is
closed in the weak-\(*\) topology: each condition \(\varphi(f) = 0\) defining \(\mathcal{A}^\perp\) is a weak-\(*\)
continuous constraint, so \(\mathcal{A}^\perp\) is an intersection of weak-\(*\) closed hyperplanes, and
intersecting with the ball keeps it weak-\(*\) closed. By the
Banach-Alaoglu Theorem the
ball of \(C(X)^*\) is weak-\(*\) compact, and a closed subset of a compact set is compact; hence the unit ball of
\(\mathcal{A}^\perp\) is weak-\(*\) compact and convex. Since \(\mathcal{A}^\perp \neq (0)\), this ball is
nonempty and not reduced to the origin, so it has nonzero extreme points by the
Krein-Milman Theorem. Choose an
extreme point \(\mu\) of the unit ball of \(\mathcal{A}^\perp\); normalizing, \(\|\mu\| = 1\), and by the
Riesz
Representation we treat \(\mu\) as a complex regular Borel measure on \(X\) with \(\int f \, d\mu = 0\)
for every \(f \in \mathcal{A}\).
Building a Separating Function in the Algebra
Let \(K = \operatorname{supp}\mu\). Since \(\|\mu\| = 1 \neq 0\), the support is nonempty. We show \(K\) is a single
point. Suppose instead that \(K\) contains two distinct points \(x_0 \neq x\). Here the hypotheses on
\(\mathcal{A}\) enter for the first time: we use them to produce a function in \(\mathcal{A}\) that is positive at
\(x_0\), vanishes at \(x\), and is squeezed into \([0, 1)\).
By separation, hypothesis (b), there is \(f_1 \in \mathcal{A}\) with \(f_1(x_0) \neq f_1(x)\). Let
\(\beta = f_1(x)\); by hypothesis (a) the constant \(\beta\) lies in \(\mathcal{A}\), so
\(f_2 = f_1 - \beta \in \mathcal{A}\) satisfies \(f_2(x) = 0\) and \(f_2(x_0) \neq 0\). By self-adjointness,
hypothesis (c), \(\bar{f_2} \in \mathcal{A}\), and since \(\mathcal{A}\) is an algebra the product
\(f_3 = f_2 \bar{f_2} = |f_2|^2 \in \mathcal{A}\). This \(f_3\) is real and nonnegative, with \(f_3(x) = 0\) and
\(f_3(x_0) = |f_2(x_0)|^2 > 0\). Finally set
\[
f \;=\; \frac{f_3}{\,\|f_3\|_\infty + 1\,} \;\in\; \mathcal{A},
\]
which inherits \(f(x) = 0\) and \(f(x_0) > 0\), and satisfies \(0 \leq f < 1\) everywhere because dividing the
nonnegative \(f_3\) by a constant strictly larger than its maximum keeps it below \(1\). The role of the three
hypotheses is now visible: (b) gives a function distinguishing the two points, (a) lets us recenter it to vanish at
one of them, and (c) lets us pass to the modulus squared so that the result is real, nonnegative, and still in the
algebra.
Splitting the Measure
Because \(\mathcal{A}\) is an algebra and \(f \in \mathcal{A}\), the products \(gf\) and \(g(1 - f)\) lie in
\(\mathcal{A}\) for every \(g \in \mathcal{A}\). Testing \(\mu \in \mathcal{A}^\perp\) against these,
\[
\int g \, d(f\mu) \;=\; \int gf \, d\mu \;=\; 0, \qquad \int g \, d\bigl((1 - f)\mu\bigr) \;=\; \int g(1 - f) \,
d\mu \;=\; 0,
\]
for all \(g \in \mathcal{A}\). Thus both reweighted measures \(f\mu\) and \((1 - f)\mu\) again annihilate
\(\mathcal{A}\): they lie in \(\mathcal{A}^\perp\). This is the step that the algebra structure exists to provide
— reweighting by a member of \(\mathcal{A}\) preserves membership in the annihilator.
Set \(\alpha = \|f\mu\| = \int f \, d|\mu|\), the total variation of \(f\mu\) computed as in the previous section.
Since \(f(x_0) > 0\) and \(f\) is continuous, there is an open neighborhood \(U\) of \(x_0\) and an
\(\varepsilon > 0\) with \(f > \varepsilon\) on \(U\); and because \(x_0 \in K = \operatorname{supp}\mu\), the set
\(U\) has positive \(|\mu|\)-measure. Hence
\[
\alpha \;=\; \int f \, d|\mu| \;\geq\; \varepsilon\, |\mu|(U) \;>\; 0.
\]
Symmetrically, \(f(x) = 0\) with \(x \in K\) forces \(1 - \alpha = \int (1 - f) \, d|\mu| > 0\) by the same
reasoning applied to \(1 - f\) near \(x\); so \(\alpha < 1\). Therefore \(0 < \alpha < 1\), and using
\(\|f\mu\| = \alpha\) and \(\|(1 - f)\mu\| = \int(1 - f)\,d|\mu| = 1 - \alpha\) we may write \(\mu\) as a proper
convex combination of unit vectors of \(\mathcal{A}^\perp\):
\[
\mu \;=\; \alpha \cdot \frac{f\mu}{\alpha} \;+\; (1 - \alpha) \cdot \frac{(1 - f)\mu}{1 - \alpha}.
\]
Collapsing the Support and the Final Contradiction
The measure \(\mu\) is an extreme point of the unit ball of \(\mathcal{A}^\perp\), and the display exhibits it as a
convex combination of two members of that ball. Extremality forces the two factors to equal \(\mu\); in particular
\(\mu = f\mu / \alpha\), that is \(f\mu = \alpha\mu\). As an identity of measures this says \(f = \alpha\) almost
everywhere with respect to \(|\mu|\), hence \(f \equiv \alpha\) on the support \(K\). But \(x_0\) and \(x\) both lie
in \(K\), so \(f(x_0) = \alpha = f(x)\); yet \(f(x_0) > 0 = f(x)\) by construction. This contradiction shows that
\(K\) cannot contain two distinct points: \(K = \{x_0\}\) is a single point.
A measure of total variation \(1\) supported at the single point \(x_0\) is a unimodular point mass,
\(\mu = \gamma\,\delta_{x_0}\) with \(|\gamma| = 1\) — exactly the extreme form found in the previous section.
Now the last hypothesis closes the argument. The constant \(1\) belongs to \(\mathcal{A}\) by (a), and
\(\mu \in \mathcal{A}^\perp\), so
\[
0 \;=\; \int 1 \, d\mu \;=\; \mu(X) \;=\; \gamma\,\delta_{x_0}(X) \;=\; \gamma,
\]
forcing \(\gamma = 0\). This contradicts \(|\gamma| = 1\). The assumption \(\mathcal{A}^\perp \neq (0)\) is therefore
untenable, so \(\mathcal{A}^\perp = (0)\) and, by the density criterion, \(\mathcal{A} = C(X)\). \(\blacksquare\)
Conjugate-Closed and Locally Compact Forms
Two refinements extend the reach of the theorem. The first asks what happens when the constants are dropped from
the hypotheses; the second moves from compact \(X\) to the locally compact setting that the Riesz representation
already accommodated.
When the Constants Are Absent
Suppose \(\mathcal{A}\) is a closed self-adjoint subalgebra of \(C(X)\) that separates points but is not assumed to
contain \(1\). One borderline possibility now appears: \(\mathcal{A}\) might consist precisely of the functions
vanishing at a single fixed point. This is the only way the conclusion can fail.
Corollary: The Conjugate-Closed Case Without Constants
Let \(X\) be compact Hausdorff and \(\mathcal{A}\) a closed subalgebra of \(C(X)\) that separates points and is
self-adjoint. Then either \(\mathcal{A} = C(X)\), or there is a point \(x_0 \in X\) such that
\[
\mathcal{A} \;=\; \{\, f \in C(X) : f(x_0) = 0 \,\}.
\]
The mechanism is to restore the constants and then measure how much was lost. Adjoining the constant functions
produces \(\mathcal{A} + \mathbb{F}\), still a closed subalgebra, and now satisfying all three hypotheses of the
theorem; so \(\mathcal{A} + \mathbb{F} = C(X)\). Hence \(\mathcal{A}\) is either all of \(C(X)\) or has codimension
one. In the codimension-one case the space of measures annihilating \(\mathcal{A}\) is correspondingly
one-dimensional, spanned by a single \(\mu\) of norm \(1\). Self-adjointness makes \(f\mu\) annihilate
\(\mathcal{A}\) for every \(f \in \mathcal{A}\), so \(f\mu\) is a scalar multiple of \(\mu\); reading this as before,
every \(f \in \mathcal{A}\) is constant on the support of \(\mu\). Because \(\mathcal{A}\) separates the points of
\(X\), that support cannot contain two points, so it is a single \(x_0\). The annihilator is then spanned by
\(\delta_{x_0}\), and \(\mathcal{A}\) is exactly the functions killed by evaluation at \(x_0\). The argument reuses
the support-collapsing technique of the main proof; what changes is only that the contradiction with the constants
is replaced by the bookkeeping of a one-dimensional annihilator.
The Locally Compact Setting
When \(X\) is merely locally compact, the natural algebra is \(C_0(X)\), the continuous functions vanishing at
infinity, whose dual is again the
space of
measures. The constant \(1\) is no longer available — it does not vanish
at infinity unless \(X\) is compact — so the role of hypothesis (a) is taken by the weaker requirement that
the functions not all vanish at any single point.
Corollary: The Locally Compact Case
Let \(X\) be locally compact Hausdorff and \(\mathcal{A}\) a closed subalgebra of \(C_0(X)\) such that
(a) for each \(x \in X\) there is \(f \in \mathcal{A}\) with \(f(x) \neq 0\);
(b) \(\mathcal{A}\) separates the points of \(X\);
(c) \(\mathcal{A}\) is self-adjoint.
Then \(\mathcal{A} = C_0(X)\).
The reduction is to compactify. Adjoining a single point at infinity turns \(X\) into a compact space \(X_\infty\),
and \(C_0(X)\) becomes the functions on \(X_\infty\) that vanish at the added point. The algebra \(\mathcal{A}\),
viewed inside \(C(X_\infty)\), is closed, self-adjoint, and separates the points of \(X_\infty\) once one checks
that conditions (a) and (b) together prevent any point of \(X\) from being indistinguishable from the point at
infinity. The conjugate-closed case without constants then applies on \(X_\infty\): either \(\mathcal{A}\) is all of
the relevant algebra, or it is the set of functions vanishing at one point — and that point must be the point
at infinity, which returns exactly \(\mathcal{A} = C_0(X)\).
Function Algebras
The three hypotheses are not idle: a subalgebra that separates points and contains the constants but fails to be
self-adjoint can be a proper closed subalgebra of \(C(X)\). The functions holomorphic on a disc and continuous up
to its boundary form such an algebra, generated by the coordinate function alone; it separates points and contains
the constants, yet the conjugate of the coordinate function escapes it. Closed subalgebras of \(C(X)\) that
separate points and contain the constants — with self-adjointness dropped — are the
uniform algebras, and they are objects of independent study rather than degenerate cases. The
Stone-Weierstrass theorem marks the boundary precisely: with conjugation, the only point-separating constant-containing
closed subalgebra is everything; without it, an entire theory of proper function algebras opens up. The same dual-ball
method that proved the theorem — extract an extreme annihilating measure, force its support to a point —
is the prototype for analyzing those algebras through the measures that annihilate them.