The Stone-Weierstrass Theorem

The Theorem and the Setting Extreme Points of the Ball of M(X) The Proof via the Dual Ball Conjugate-Closed and Locally Compact Forms

The Theorem and the Setting

The classical theorem of Weierstrass asserts that every continuous real function on a closed interval is a uniform limit of polynomials. Read inside the space \(C([a, b])\) of continuous functions with the supremum norm, this says that the polynomials form a dense subset. Stone's generalization replaces the interval by an arbitrary compact Hausdorff space \(X\) and the polynomials by any subalgebra of \(C(X)\) large enough to tell points apart. The result is one of the central density theorems of analysis: it converts an algebraic hypothesis — closure under addition, multiplication, and scalar multiplication, together with a separation property — into the analytic conclusion that the subalgebra exhausts all of \(C(X)\).

Throughout, \(X\) is a compact Hausdorff space and \(C(X)\) is the algebra of continuous functions \(X \to \mathbb{C}\), normed by \(\|f\| = \sup_{x \in X} |f(x)|\). A subalgebra of \(C(X)\) is a linear subspace \(\mathcal{A}\) that is closed under pointwise multiplication: if \(f, g \in \mathcal{A}\) then \(fg \in \mathcal{A}\). We say \(\mathcal{A}\) is closed when it is closed in the norm topology, i.e. a uniformly closed subspace. The proof we give routes the entire argument through the geometry of the dual ball; it rests on the Krein-Milman Theorem and on the identification of the dual of \(C(X)\) with a space of measures.

Separation and Conjugation

Two structural properties single out the subalgebras that can be dense. The first is a separation condition: the functions must be able to distinguish any two points of \(X\). Without it, a subalgebra cannot even approximate a function that takes different values at two points it fails to separate.

Definition: Separation of Points

A collection \(\mathcal{A}\) of functions on \(X\) separates the points of \(X\) if for every pair of distinct points \(x, y \in X\) there is a function \(f \in \mathcal{A}\) with \(f(x) \neq f(y)\).

The second property is needed only in the complex setting. A subalgebra of complex-valued functions might be closed under all algebraic operations and still fail to be dense, because the operations of complex analysis are not symmetric under conjugation. The subalgebra of \(C(\overline{\mathbb{D}})\) generated by the coordinate function \(z\) on the closed unit disc consists of uniform limits of polynomials in \(z\); these are holomorphic on the interior, so the non-holomorphic function \(\bar{z}\) lies outside it even though the algebra separates points and contains the constants. The remedy is to require closure under conjugation.

Definition: Self-Adjoint Subalgebra

For a function \(f : X \to \mathbb{C}\), let \(\bar{f}\) denote its pointwise complex conjugate, \(\bar{f}(x) = \overline{f(x)}\). A subalgebra \(\mathcal{A} \subseteq C(X)\) is self-adjoint (or closed under conjugation) if \(\bar{f} \in \mathcal{A}\) whenever \(f \in \mathcal{A}\).

The Theorem

We can now state the theorem in the form proved below. The hypotheses are exactly the three properties just isolated: the subalgebra contains the constants, separates points, and is self-adjoint.

Theorem: Stone-Weierstrass

Let \(X\) be a compact Hausdorff space and let \(\mathcal{A}\) be a closed subalgebra of \(C(X)\) such that

(a) the constant function \(1\) belongs to \(\mathcal{A}\);
(b) \(\mathcal{A}\) separates the points of \(X\);
(c) \(\mathcal{A}\) is self-adjoint.

Then \(\mathcal{A} = C(X)\).

When \(C(X)\) is taken to be the algebra of real-valued continuous functions, condition (c) is automatic — every real function equals its own conjugate — and the theorem reads: a closed subalgebra of real \(C(X)\) that contains the constants and separates points is all of \(C(X)\). This is the form that recovers the Weierstrass approximation theorem, taking \(X = [a, b]\) and \(\mathcal{A}\) the uniform closure of the polynomials, which separates points because the single function \(x \mapsto x\) already does.

Reduction to the Annihilator

The proof does not approximate functions directly. It works in the dual space, through the principle that a closed subspace of a Banach space is the whole space precisely when no nonzero continuous functional annihilates it. The functionals vanishing on \(\mathcal{A}\) form its annihilator \[ \mathcal{A}^\perp \;=\; \{\, \varphi \in C(X)^* : \varphi(f) = 0 \text{ for all } f \in \mathcal{A} \,\}, \] so by the annihilator density criterion the conclusion \(\mathcal{A} = C(X)\) is equivalent to \(\mathcal{A}^\perp = (0)\). To prove the theorem it therefore suffices to show that no nonzero functional annihilates a subalgebra satisfying (a), (b), (c).

This is where the measures enter. By the Riesz Representation Theorem, every functional on \(C(X)\) is integration against a complex regular Borel measure, so an element of \(\mathcal{A}^\perp\) is a measure \(\mu\) with \(\int f \, d\mu = 0\) for all \(f \in \mathcal{A}\). The strategy is to assume some nonzero such \(\mu\) exists, pass to an extreme point of the unit ball of these measures, and show that its support must collapse to a single point — whereupon condition (a) produces a contradiction. Realizing this requires knowing what the extreme points of the measure ball look like, which is the subject of the next section.

A Tool: Multiplying a Measure by a Function

The argument repeatedly modifies a measure by reweighting it with a bounded continuous function. We record the construction once. Given a complex measure \(\mu\) on \(X\) and a bounded Borel function \(h\), the product measure \(h\mu\) is defined by integrating \(h\) over each set, \[ (h\mu)(\Delta) \;=\; \int_\Delta h \, d\mu, \] for every Borel set \(\Delta\). It is again a complex measure, and its total variation is \(|h\mu| = |h|\,|\mu|\) as set functions — reweighting a measure by \(h\) scales its variation pointwise by \(|h|\) — so its norm is \(\|h\mu\| = |h\mu|(X) = \int |h| \, d|\mu|\); in particular \(\|h\mu\| \leq \|h\|_\infty \, \|\mu\|\). Integration against \(h\mu\) is integration against \(\mu\) with the integrand reweighted: \(\int g \, d(h\mu) = \int gh \, d\mu\) for every bounded Borel \(g\). The use we make of this is the case \(h \in \mathcal{A}\): when \(\mu\) annihilates \(\mathcal{A}\), the closure of \(\mathcal{A}\) under multiplication makes every product \(gh\) with \(g \in \mathcal{A}\) again a member of \(\mathcal{A}\), so the reweighted measure \(h\mu\) annihilates \(\mathcal{A}\) as well. This is the mechanism by which the algebra hypothesis enters an argument phrased entirely in terms of measures.

Extreme Points of the Ball of M(X)

The argument of the next section will extract an extreme point of the unit ball of the measure space \(M(X)\) and exploit its rigidity. Before running that argument we identify these extreme points completely: they are the smallest measures there are, the point masses scaled to unit size. A measure spread over more than one point can always be split into a proper average, so it cannot be extreme; a unit point mass admits no such splitting. This is the precise statement that the closing remarks on the dual ball of \(C(X)\) anticipated, and the technique used to prove it — reweighting a measure by a separating continuous function and reading off a contradiction from extremality — is exactly the technique that will drive the main proof.

Theorem: Extreme Points of the Unit Ball of M(X)

Let \(X\) be a compact Hausdorff space. The extreme points of the closed unit ball of \(M(X)\) are exactly the unimodular point masses, \[ \operatorname{ext}\bigl(\operatorname{ball} M(X)\bigr) \;=\; \{\, \alpha \delta_x : \alpha \in \mathbb{C},\; |\alpha| = 1,\; x \in X \,\}. \]

Proof Sketch

Point masses are extreme.
Fix \(x \in X\) and \(|\alpha| = 1\); then \(\|\alpha\delta_x\| = |\alpha| = 1\), so \(\alpha\delta_x\) lies in the unit ball. Suppose \(\alpha\delta_x = \tfrac{1}{2}(\nu_1 + \nu_2)\) with \(\nu_1, \nu_2\) in the ball. Evaluating total variations, \(1 = |\alpha| = \|\alpha\delta_x\| \leq \tfrac{1}{2}(\|\nu_1\| + \|\nu_2\|) \leq 1\), forcing \(\|\nu_1\| = \|\nu_2\| = 1\) and equality throughout. Equality in the triangle inequality for the total-variation norm, combined with both measures being forced to concentrate at \(x\) (any mass either \(\nu_i\) placed off \(x\) would have to be cancelled by the other, contradicting \(\|\nu_1\| + \|\nu_2\| = \|\nu_1 + \nu_2\|\)), gives \(\nu_1 = \nu_2 = \alpha\delta_x\). Hence \(\alpha\delta_x\) is extreme.

Extreme points have one-point support.
Conversely, let \(\mu\) be an extreme point of the ball; then \(\|\mu\| = 1\), since a measure of norm less than \(1\) is an interior point of the ball and a strict convex combination of two scalar multiples of itself. Let \(K = \operatorname{supp}\mu\) be the support of \(\mu\). We claim \(K\) is a single point. Suppose instead that \(K\) contains two distinct points \(x_0 \neq x\). Choose disjoint open sets \(U \ni x_0\) and \(V \ni x\). Because \(X\) is compact Hausdorff, hence normal, a continuous function \(f : X \to [0, 1]\) exists with \(f \equiv 1\) on a neighborhood of \(x_0\) contained in \(U\) and \(f \equiv 0\) on a neighborhood of \(x\) contained in \(V\) — the standard interpolation of a continuous function between two disjoint closed sets on a normal space.

Form the reweighted measure \(f\mu\), whose total variation, by the product-measure formula established above, is \[ \alpha \;=\; \|f\mu\| \;=\; \int |f| \, d|\mu| \;=\; \int f \, d|\mu|, \] the last equality because \(0 \leq f \leq 1\) is real. Since \(x_0 \in K\), every neighborhood of \(x_0\) has positive \(|\mu|\)-measure, so \(\int f \, d|\mu| \geq |\mu|(\{f \equiv 1 \text{ near } x_0\}) > 0\); thus \(\alpha > 0\). Symmetrically, since \(x \in K\) and \(f\) vanishes near \(x\), \[ 1 - \alpha \;=\; \int (1 - f) \, d|\mu| \;\geq\; |\mu|(\{f \equiv 0 \text{ near } x\}) > 0, \] so \(\alpha < 1\). Therefore \(0 < \alpha < 1\), and we may write \[ \mu \;=\; \alpha \cdot \frac{f\mu}{\alpha} \;+\; (1 - \alpha) \cdot \frac{(1 - f)\mu}{1 - \alpha}, \] a convex combination of two measures each of total-variation norm \(1\): the first factor has norm \(\|f\mu\|/\alpha = 1\), and the second has norm \(\|(1 - f)\mu\|/(1 - \alpha) = 1\) since \(\|(1 - f)\mu\| = \int (1 - f) \, d|\mu| = 1 - \alpha\).

Because \(\mu\) is an extreme point, the two factors must coincide with \(\mu\); in particular \(\mu = f\mu / \alpha\), i.e. \(f\mu = \alpha\mu\). Reading this as an identity of measures, \(f = \alpha\) almost everywhere with respect to \(|\mu|\), hence \(f \equiv \alpha\) on the support \(K\). But \(f \equiv 1\) on a neighborhood of \(x_0 \in K\) carrying positive \(|\mu|\)-mass, forcing \(\alpha = 1\) — contradicting \(\alpha < 1\). The contradiction shows \(K = \{x_0\}\) is a single point. A measure of norm \(1\) supported at one point is \(\mu = \alpha\delta_{x_0}\) with \(|\alpha| = \|\mu\| = 1\). This is the claimed form.

The probability measures — the positive measures \(\mu\) with \(\mu(X) = 1\) — form a convex subset of the ball, and the same reweighting argument identifies their extreme points as the unscaled point masses \(\{\delta_x : x \in X\}\): the positivity and unit-mass constraints remove the freedom to choose a phase \(\alpha\), leaving exactly the evaluations. Either way, the extreme structure collapses to the points of \(X\). In the language of functionals, reading these measures back through the Riesz identification, the extreme points of the dual ball of \(C(X)\) are the scaled evaluations \(f \mapsto \alpha f(x)\). It is this collapse to single points that the proof of the next section converts into the statement that an annihilating measure, once pushed to an extreme point, must be supported at one point — and a measure supported at one point cannot annihilate an algebra that contains the constants.

The Proof via the Dual Ball

We now prove the theorem. By the reduction of the first section it suffices to show \(\mathcal{A}^\perp = (0)\). Suppose, for contradiction, that \(\mathcal{A}^\perp \neq (0)\). The plan is to manufacture an extreme point of the unit ball of \(\mathcal{A}^\perp\), show by the reweighting technique of the previous section that it must be a point mass, and then collide that point mass with the hypothesis \(1 \in \mathcal{A}\).

An Extreme Annihilating Measure

The unit ball of \(\mathcal{A}^\perp\) is a closed subset of the unit ball of \(C(X)^*\). It is convex, and it is closed in the weak-\(*\) topology: each condition \(\varphi(f) = 0\) defining \(\mathcal{A}^\perp\) is a weak-\(*\) continuous constraint, so \(\mathcal{A}^\perp\) is an intersection of weak-\(*\) closed hyperplanes, and intersecting with the ball keeps it weak-\(*\) closed. By the Banach-Alaoglu Theorem the ball of \(C(X)^*\) is weak-\(*\) compact, and a closed subset of a compact set is compact; hence the unit ball of \(\mathcal{A}^\perp\) is weak-\(*\) compact and convex. Since \(\mathcal{A}^\perp \neq (0)\), this ball is nonempty and not reduced to the origin, so it has nonzero extreme points by the Krein-Milman Theorem. Choose an extreme point \(\mu\) of the unit ball of \(\mathcal{A}^\perp\); normalizing, \(\|\mu\| = 1\), and by the Riesz Representation we treat \(\mu\) as a complex regular Borel measure on \(X\) with \(\int f \, d\mu = 0\) for every \(f \in \mathcal{A}\).

Building a Separating Function in the Algebra

Let \(K = \operatorname{supp}\mu\). Since \(\|\mu\| = 1 \neq 0\), the support is nonempty. We show \(K\) is a single point. Suppose instead that \(K\) contains two distinct points \(x_0 \neq x\). Here the hypotheses on \(\mathcal{A}\) enter for the first time: we use them to produce a function in \(\mathcal{A}\) that is positive at \(x_0\), vanishes at \(x\), and is squeezed into \([0, 1)\).

By separation, hypothesis (b), there is \(f_1 \in \mathcal{A}\) with \(f_1(x_0) \neq f_1(x)\). Let \(\beta = f_1(x)\); by hypothesis (a) the constant \(\beta\) lies in \(\mathcal{A}\), so \(f_2 = f_1 - \beta \in \mathcal{A}\) satisfies \(f_2(x) = 0\) and \(f_2(x_0) \neq 0\). By self-adjointness, hypothesis (c), \(\bar{f_2} \in \mathcal{A}\), and since \(\mathcal{A}\) is an algebra the product \(f_3 = f_2 \bar{f_2} = |f_2|^2 \in \mathcal{A}\). This \(f_3\) is real and nonnegative, with \(f_3(x) = 0\) and \(f_3(x_0) = |f_2(x_0)|^2 > 0\). Finally set \[ f \;=\; \frac{f_3}{\,\|f_3\|_\infty + 1\,} \;\in\; \mathcal{A}, \] which inherits \(f(x) = 0\) and \(f(x_0) > 0\), and satisfies \(0 \leq f < 1\) everywhere because dividing the nonnegative \(f_3\) by a constant strictly larger than its maximum keeps it below \(1\). The role of the three hypotheses is now visible: (b) gives a function distinguishing the two points, (a) lets us recenter it to vanish at one of them, and (c) lets us pass to the modulus squared so that the result is real, nonnegative, and still in the algebra.

Splitting the Measure

Because \(\mathcal{A}\) is an algebra and \(f \in \mathcal{A}\), the products \(gf\) and \(g(1 - f)\) lie in \(\mathcal{A}\) for every \(g \in \mathcal{A}\). Testing \(\mu \in \mathcal{A}^\perp\) against these, \[ \int g \, d(f\mu) \;=\; \int gf \, d\mu \;=\; 0, \qquad \int g \, d\bigl((1 - f)\mu\bigr) \;=\; \int g(1 - f) \, d\mu \;=\; 0, \] for all \(g \in \mathcal{A}\). Thus both reweighted measures \(f\mu\) and \((1 - f)\mu\) again annihilate \(\mathcal{A}\): they lie in \(\mathcal{A}^\perp\). This is the step that the algebra structure exists to provide — reweighting by a member of \(\mathcal{A}\) preserves membership in the annihilator.

Set \(\alpha = \|f\mu\| = \int f \, d|\mu|\), the total variation of \(f\mu\) computed as in the previous section. Since \(f(x_0) > 0\) and \(f\) is continuous, there is an open neighborhood \(U\) of \(x_0\) and an \(\varepsilon > 0\) with \(f > \varepsilon\) on \(U\); and because \(x_0 \in K = \operatorname{supp}\mu\), the set \(U\) has positive \(|\mu|\)-measure. Hence \[ \alpha \;=\; \int f \, d|\mu| \;\geq\; \varepsilon\, |\mu|(U) \;>\; 0. \] Symmetrically, \(f(x) = 0\) with \(x \in K\) forces \(1 - \alpha = \int (1 - f) \, d|\mu| > 0\) by the same reasoning applied to \(1 - f\) near \(x\); so \(\alpha < 1\). Therefore \(0 < \alpha < 1\), and using \(\|f\mu\| = \alpha\) and \(\|(1 - f)\mu\| = \int(1 - f)\,d|\mu| = 1 - \alpha\) we may write \(\mu\) as a proper convex combination of unit vectors of \(\mathcal{A}^\perp\): \[ \mu \;=\; \alpha \cdot \frac{f\mu}{\alpha} \;+\; (1 - \alpha) \cdot \frac{(1 - f)\mu}{1 - \alpha}. \]

Collapsing the Support and the Final Contradiction

The measure \(\mu\) is an extreme point of the unit ball of \(\mathcal{A}^\perp\), and the display exhibits it as a convex combination of two members of that ball. Extremality forces the two factors to equal \(\mu\); in particular \(\mu = f\mu / \alpha\), that is \(f\mu = \alpha\mu\). As an identity of measures this says \(f = \alpha\) almost everywhere with respect to \(|\mu|\), hence \(f \equiv \alpha\) on the support \(K\). But \(x_0\) and \(x\) both lie in \(K\), so \(f(x_0) = \alpha = f(x)\); yet \(f(x_0) > 0 = f(x)\) by construction. This contradiction shows that \(K\) cannot contain two distinct points: \(K = \{x_0\}\) is a single point.

A measure of total variation \(1\) supported at the single point \(x_0\) is a unimodular point mass, \(\mu = \gamma\,\delta_{x_0}\) with \(|\gamma| = 1\) — exactly the extreme form found in the previous section. Now the last hypothesis closes the argument. The constant \(1\) belongs to \(\mathcal{A}\) by (a), and \(\mu \in \mathcal{A}^\perp\), so \[ 0 \;=\; \int 1 \, d\mu \;=\; \mu(X) \;=\; \gamma\,\delta_{x_0}(X) \;=\; \gamma, \] forcing \(\gamma = 0\). This contradicts \(|\gamma| = 1\). The assumption \(\mathcal{A}^\perp \neq (0)\) is therefore untenable, so \(\mathcal{A}^\perp = (0)\) and, by the density criterion, \(\mathcal{A} = C(X)\). \(\blacksquare\)

Conjugate-Closed and Locally Compact Forms

Two refinements extend the reach of the theorem. The first asks what happens when the constants are dropped from the hypotheses; the second moves from compact \(X\) to the locally compact setting that the Riesz representation already accommodated.

When the Constants Are Absent

Suppose \(\mathcal{A}\) is a closed self-adjoint subalgebra of \(C(X)\) that separates points but is not assumed to contain \(1\). One borderline possibility now appears: \(\mathcal{A}\) might consist precisely of the functions vanishing at a single fixed point. This is the only way the conclusion can fail.

Corollary: The Conjugate-Closed Case Without Constants

Let \(X\) be compact Hausdorff and \(\mathcal{A}\) a closed subalgebra of \(C(X)\) that separates points and is self-adjoint. Then either \(\mathcal{A} = C(X)\), or there is a point \(x_0 \in X\) such that \[ \mathcal{A} \;=\; \{\, f \in C(X) : f(x_0) = 0 \,\}. \]

The mechanism is to restore the constants and then measure how much was lost. Adjoining the constant functions produces \(\mathcal{A} + \mathbb{F}\), still a closed subalgebra, and now satisfying all three hypotheses of the theorem; so \(\mathcal{A} + \mathbb{F} = C(X)\). Hence \(\mathcal{A}\) is either all of \(C(X)\) or has codimension one. In the codimension-one case the space of measures annihilating \(\mathcal{A}\) is correspondingly one-dimensional, spanned by a single \(\mu\) of norm \(1\). Self-adjointness makes \(f\mu\) annihilate \(\mathcal{A}\) for every \(f \in \mathcal{A}\), so \(f\mu\) is a scalar multiple of \(\mu\); reading this as before, every \(f \in \mathcal{A}\) is constant on the support of \(\mu\). Because \(\mathcal{A}\) separates the points of \(X\), that support cannot contain two points, so it is a single \(x_0\). The annihilator is then spanned by \(\delta_{x_0}\), and \(\mathcal{A}\) is exactly the functions killed by evaluation at \(x_0\). The argument reuses the support-collapsing technique of the main proof; what changes is only that the contradiction with the constants is replaced by the bookkeeping of a one-dimensional annihilator.

The Locally Compact Setting

When \(X\) is merely locally compact, the natural algebra is \(C_0(X)\), the continuous functions vanishing at infinity, whose dual is again the space of measures. The constant \(1\) is no longer available — it does not vanish at infinity unless \(X\) is compact — so the role of hypothesis (a) is taken by the weaker requirement that the functions not all vanish at any single point.

Corollary: The Locally Compact Case

Let \(X\) be locally compact Hausdorff and \(\mathcal{A}\) a closed subalgebra of \(C_0(X)\) such that

(a) for each \(x \in X\) there is \(f \in \mathcal{A}\) with \(f(x) \neq 0\);
(b) \(\mathcal{A}\) separates the points of \(X\);
(c) \(\mathcal{A}\) is self-adjoint.

Then \(\mathcal{A} = C_0(X)\).

The reduction is to compactify. Adjoining a single point at infinity turns \(X\) into a compact space \(X_\infty\), and \(C_0(X)\) becomes the functions on \(X_\infty\) that vanish at the added point. The algebra \(\mathcal{A}\), viewed inside \(C(X_\infty)\), is closed, self-adjoint, and separates the points of \(X_\infty\) once one checks that conditions (a) and (b) together prevent any point of \(X\) from being indistinguishable from the point at infinity. The conjugate-closed case without constants then applies on \(X_\infty\): either \(\mathcal{A}\) is all of the relevant algebra, or it is the set of functions vanishing at one point — and that point must be the point at infinity, which returns exactly \(\mathcal{A} = C_0(X)\).

Function Algebras

The three hypotheses are not idle: a subalgebra that separates points and contains the constants but fails to be self-adjoint can be a proper closed subalgebra of \(C(X)\). The functions holomorphic on a disc and continuous up to its boundary form such an algebra, generated by the coordinate function alone; it separates points and contains the constants, yet the conjugate of the coordinate function escapes it. Closed subalgebras of \(C(X)\) that separate points and contain the constants — with self-adjointness dropped — are the uniform algebras, and they are objects of independent study rather than degenerate cases. The Stone-Weierstrass theorem marks the boundary precisely: with conjugation, the only point-separating constant-containing closed subalgebra is everything; without it, an entire theory of proper function algebras opens up. The same dual-ball method that proved the theorem — extract an extreme annihilating measure, force its support to a point — is the prototype for analyzing those algebras through the measures that annihilate them.