Reducing the Codimension by Generic Projection
We have developed manifolds as abstract objects, glued from charts with no reference to any
surrounding space. Yet our pictures of them have always been of surfaces sitting in
\(\mathbb{R}^3\), or curves in the plane — subsets of a Euclidean space. The first application of
Sard's theorem justifies that habit completely: every smooth manifold can be realized as a
submanifold of some \(\mathbb{R}^N\), and in fact of one whose dimension is controlled by the
manifold's own. This is the Whitney embedding theorem, and the work of this page is to prove it.
The strategy has two movements. It is comparatively easy to embed a manifold into some
Euclidean space of possibly enormous dimension, by patching together finitely or countably many
charts. The substance of the theorem lies in driving that dimension back down to a controlled
value. The engine for the descent is Sard's theorem in the form proved on the previous page: the
image of a smooth map from a lower-dimensional domain is
negligible,
so almost every direction is safe to project along. We begin with the single step that removes one
dimension, then iterate it.
Fix a manifold already sitting inside \(\mathbb{R}^N\) and a vector \(v\) whose last coordinate is
nonzero; projecting along the line \(\mathbb{R}v\) onto the coordinate hyperplane
\(\mathbb{R}^{N-1}\) of vectors with vanishing last coordinate lowers the ambient dimension by one.
This projection can fail to be injective — if two points of the manifold differ by a multiple of
\(v\), they collapse onto each other — and it can fail to be an immersion — if some tangent vector
points along \(v\), it is killed. The next lemma shows that both failures are rare: as long as the
ambient dimension is large enough relative to the manifold's dimension, almost every direction
\(v\) avoids them.
Lemma (Generic Projections Reduce Codimension)
Suppose \(M \subseteq \mathbb{R}^N\) is a smooth \(n\)-dimensional submanifold without
boundary. For a vector \(v \in \mathbb{R}^N \setminus \mathbb{R}^{N-1}\) — that is, with
nonzero last coordinate — let \(\pi_v : \mathbb{R}^N \to \mathbb{R}^{N-1}\) denote the
projection with kernel the line \(\mathbb{R}v\), identifying \(\mathbb{R}^{N-1}\) with the
subspace of vectors whose last coordinate is zero. If \(N > 2n + 1\), then the set of vectors
\(v\) for which \(\pi_v|_M\) is an injective immersion of \(M\) into \(\mathbb{R}^{N-1}\) is
dense.
Proof.
We translate the two requirements on \(\pi_v\) into conditions on the direction \([v]\), and
then show that the bad directions form a negligible set. For \(\pi_v|_M\) to be
injective, it is necessary and sufficient that no two distinct points
\(p, q \in M\) have \(p - q\) parallel to \(v\): if they did, they would project to the same
point, and conversely. For \(\pi_v|_M\) to be an immersion, it is necessary
and sufficient that no nonzero tangent vector of \(M\) be parallel to \(v\). Indeed, \(\pi_v\)
is linear, so its differential at any point is \(\pi_v\) itself under the usual identification
of the tangent space of \(\mathbb{R}^N\) with \(\mathbb{R}^N\); its kernel is \(\mathbb{R}v\),
and the restriction to \(T_pM\) is injective exactly when \(T_pM\) meets \(\mathbb{R}v\) only
at the origin.
Both conditions say that \([v]\), the direction of \(v\) regarded as a point of the
real projective space
\(\mathbb{RP}^{N-1}\), avoids certain forbidden directions. Make this precise with two smooth
maps. Let \(\Delta_M = \{(p, p) : p \in M\}\) be the diagonal of \(M \times M\), and let
\(M_0 = \{(p, 0) : p \in M\}\) be the zero section of the
tangent bundle
\(TM\). Define
\[
\kappa : (M \times M) \setminus \Delta_M \to \mathbb{RP}^{N-1},
\qquad \kappa(p, q) = [\,p - q\,],
\]
\[
\tau : TM \setminus M_0 \to \mathbb{RP}^{N-1},
\qquad \tau(p, w) = [\,w\,],
\]
where the brackets denote the direction in \(\mathbb{RP}^{N-1}\) of a nonzero vector of
\(\mathbb{R}^N\). Both maps are smooth, being the projection
\(\mathbb{R}^N \setminus \{0\} \to \mathbb{RP}^{N-1}\) composed with smooth maps. By the two
characterizations above, \(\pi_v|_M\) is an injective immersion precisely when \([v]\) lies in
the image of neither \(\kappa\) nor \(\tau\): a point in the image of \(\kappa\) is a
secant direction joining two points of \(M\), and a point in the image of \(\tau\) is a
tangent direction.
Now count dimensions. The domain of \(\kappa\) is an open subset of \(M \times M\), of
dimension \(2n\); the domain of \(\tau\) is an open subset of \(TM\), also of dimension \(2n\).
The target \(\mathbb{RP}^{N-1}\) has dimension \(N - 1\). The hypothesis \(N > 2n + 1\) is
exactly the statement that
\[
2n < N - 1 = \dim \mathbb{RP}^{N-1},
\]
so both \(\kappa\) and \(\tau\) are smooth maps from a manifold of dimension strictly less than
that of their target. By the
corollary to Sard's theorem on lower-dimensional images,
each image has measure zero in \(\mathbb{RP}^{N-1}\), and the union of the two images has
measure zero as well. Its complement — the admissible directions — is therefore dense in
\(\mathbb{RP}^{N-1}\). The vectors \(v\) with nonzero last coordinate are exactly those whose
direction lies in the dense open set \(\mathbb{RP}^{N-1} \setminus \mathbb{RP}^{N-2}\); since a
dense set meets every nonempty open set, the admissible directions remain dense within this
open set, and taking directions to vectors shows the admissible \(v\) are dense in
\(\mathbb{R}^N \setminus \mathbb{R}^{N-1}\). \(\blacksquare\)
Where Sard does the work
The condition \(N > 2n + 1\), equivalently \(2n < N - 1\), is the entire mechanism. A
manifold of dimension \(n\) carries an \(n\)-dimensional family of points and, at each, an
\(n\)-dimensional family of tangent directions; the secant and tangent directions it generates
thus sweep out at most a \(2n\)-dimensional family inside the \((N-1)\)-dimensional space of all
directions. Whenever \(2n\) falls short of \(N - 1\), this family is a lower-dimensional image,
and Sard's theorem declares it negligible — leaving a dense set of safe directions to project
along. The space-filling curve we met alongside Sard's theorem is the reminder that
smoothness, not mere continuity, is what keeps the bad directions confined to a negligible set.
With one dimension removed, we simply repeat. As long as the ambient dimension still exceeds
\(2n + 1\), the lemma supplies a direction whose projection keeps the manifold an injective
immersion in one fewer dimension; applying it again and again, we descend until the ambient
dimension is exactly \(2n + 1\), where the hypothesis \(N > 2n + 1\) first fails and the descent
halts. This produces an injective immersion of \(M\) into \(\mathbb{R}^{2n+1}\).
An injective immersion is not yet an embedding, however, and the gap matters. When \(M\) is
compact, there is no gap at all: a continuous injection from a compact space is automatically a
topological embedding, so an injective immersion of a compact manifold is already a
smooth embedding.
When \(M\) is noncompact, an injective immersion can wind through space in a way that fails to be a
homeomorphism onto its image — approaching itself without ever meeting, as the dense line on a
torus does. Closing this gap is the task of the next section.
From Immersion to Proper Embedding
The compact case is settled, so suppose \(M\) is noncompact. We have an injective immersion into
some \(\mathbb{R}^N\), but to call it an embedding we must control how the image escapes to
infinity. The right notion is properness: a continuous map is proper when the
preimage of every compact set is compact, and a proper injective immersion is automatically an
embedding.
The goal of this section is to upgrade a bare smooth embedding into some \(\mathbb{R}^N\)
into a proper smooth embedding into \(\mathbb{R}^{2n+1}\). The geometric device that makes
this possible is a tube.
Definition: Tube
Given a one-dimensional linear subspace \(S \subseteq \mathbb{R}^N\) and a positive number
\(R\), the tube with axis \(S\) and radius \(R\) is the open set of points
whose distance from \(S\) is less than \(R\):
\[
T_R(S) = \{\, x \in \mathbb{R}^N : \lvert x - y \rvert < R \text{ for some } y \in S \,\}.
\]
Lemma (Upgrading to a Proper Embedding)
Let \(M\) be a smooth \(n\)-manifold with or without boundary. If \(M\) admits a smooth
embedding into \(\mathbb{R}^N\) for some \(N\), then it admits a proper smooth embedding into
\(\mathbb{R}^{2n+1}\).
Proof.
The argument has two parts: first arrange that the embedding is proper and confined to a tube,
without regard to dimension; then lower the dimension to \(2n+1\) while preserving properness.
Part 1: a proper embedding inside a tube. Let \(F : M \to \mathbb{R}^N\) be
the given smooth embedding. Choose a diffeomorphism \(G : \mathbb{R}^N \to \mathbb{B}^N\) onto
the open unit ball, and a smooth
exhaustion function
\(f : M \to \mathbb{R}\) — a smooth function whose sublevel sets \(f^{-1}((-\infty, c])\) are
all compact. Define
\[
\Psi : M \to \mathbb{R}^N \times \mathbb{R},
\qquad \Psi(p) = \big(G \circ F(p),\, f(p)\big).
\]
Because \(G \circ F\) is an embedding, \(\Psi\) is an injective immersion. It is moreover
proper: if \(K\) is compact, then \(\Psi^{-1}(K)\) is a closed subset of
\(f^{-1}((-\infty, c])\) for any \(c\) bounding the last coordinate on \(K\), and that
sublevel set is compact, so \(\Psi^{-1}(K)\) is compact. A proper injective immersion is an
embedding,
and by construction its image lies in the tube \(\mathbb{B}^N \times \mathbb{R}\), whose axis is
the last coordinate line. Renaming \(N + 1\) as \(N\), we may now assume that \(M\) admits a
proper smooth embedding into \(\mathbb{R}^N\) whose image lies in some tube \(T_R(S)\).
Part 2: lowering the dimension while staying proper. Identify \(M\) with its
image, a properly embedded submanifold contained in the tube \(T_R(S)\). Suppose
\(N > 2n + 1\). By the
generic projection lemma,
the directions \(v\) for which \(\pi_v|_M\) is an injective immersion are dense, so we may
choose such a \(v\) that additionally does not lie in the axis \(S\). The image
\(\pi_v(S)\) is then a one-dimensional subspace of \(\mathbb{R}^{N-1}\), and because \(\pi_v\)
is a bounded linear map, it carries the tube \(T_R(S)\) into a tube around \(\pi_v(S)\); thus
\(\pi_v(M)\) again lies in a tube.
It remains to verify that \(\pi_v|_M\) is proper. Let \(K \subseteq \mathbb{R}^{N-1}\) be
compact, hence contained in the ball of some radius \(R_1\) about the origin. For any
\(x \in \pi_v^{-1}(K) \cap M\), writing \(\pi_v(x) = x - cv\) for the appropriate scalar
\(c\), the bound \(\lvert \pi_v(x) \rvert < R_1\) places \(x\) in the tube of radius \(R_1\)
about the line \(\mathbb{R}v\). At the same time \(x \in M\) lies in the tube \(T_R(S)\) about
\(S\). So \(M \cap \pi_v^{-1}(K)\) is contained in the intersection of two tubes, one with axis
\(S = \mathbb{R}s\) and one with axis \(\mathbb{R}v\), where \(s\) and \(v\) are not parallel
because \(v \notin S\). Such an intersection is bounded. Indeed, a point \(x\) in both tubes
satisfies \(\lvert x - a s \rvert < R\) and \(\lvert x - b v \rvert < R_1\) for some
scalars \(a, b\), so by the triangle inequality \(\lvert a s - b v \rvert < R + R_1\). Since
\(s\) and \(v\) are linearly independent, the norm \(\lvert a s - b v \rvert\) is bounded below
by a positive multiple of \(\lvert (a, b) \rvert\) — two norms on the two-dimensional space
\(\operatorname{span}\{s, v\}\) being equivalent — so \(\lvert a \rvert\) and \(\lvert b
\rvert\) are bounded, and then \(\lvert x \rvert \le \lvert a s \rvert + R\) is bounded as well.
Being also closed in \(M\), the set \(M \cap \pi_v^{-1}(K)\) is compact, so \(\pi_v|_M\) is
proper, hence an embedding. The image is a properly embedded submanifold of
\(\mathbb{R}^{N-1}\) contained in a tube. Iterating this step lowers the ambient dimension one
at a time until it reaches \(2n+1\). \(\blacksquare\)
Why two nonparallel tubes meet in a bounded set
The properness of the lowered projection rests on a fact one can see in three dimensions. A
tube about a line is an infinite cylinder; two infinite cylinders whose axes are parallel
intersect in another infinite region, but two whose axes cross at an angle can only overlap
near the crossing, because far from it each cylinder has drifted away from the other's axis by
more than its radius. The angle between the axes \(S\) and \(\mathbb{R}v\) is bounded below
precisely because \(v\) was chosen outside \(S\), and that positive angle is what confines the
intersection to a bounded neighborhood of the origin — turning a statement about preimages of
compact sets into elementary Euclidean geometry.
The Whitney Embedding Theorem
The two lemmas reduce the theorem to a single remaining task: produce a smooth embedding of
\(M\) into some Euclidean space at all. Once we have that, the codimension-reduction
lemma carries it down to \(\mathbb{R}^{2n+1}\) — directly when \(M\) is compact, and through the
proper-embedding upgrade when it is not. The construction of an initial embedding is a patching
argument: cover \(M\) by charts, and assemble the local coordinate maps into a single global map
using bump functions to blend them. We treat the compact and noncompact cases in turn, since the
bookkeeping of the patching differs.
Theorem (Whitney Embedding Theorem)
Every smooth \(n\)-manifold with or without boundary admits a proper smooth embedding into
\(\mathbb{R}^{2n+1}\).
Proof.
By the two preceding lemmas it suffices to embed \(M\) smoothly into some Euclidean space;
codimension reduction and the proper-embedding upgrade then deliver a proper embedding into
\(\mathbb{R}^{2n+1}\).
The compact case. Cover \(M\) by finitely many
regular coordinate balls
(or
regular coordinate half-balls
at boundary points) \(B_1, \dots, B_m\), where each \(B_i\) sits inside a slightly
larger coordinate domain \(B_i'\) carrying a chart \(\varphi_i : B_i' \to \mathbb{R}^n\). For
each \(i\) choose a
smooth bump function
\(\rho_i : M \to \mathbb{R}\) equal to \(1\) on \(\overline{B_i}\) and supported in \(B_i'\).
Define
\[
F = \big(\rho_1 \varphi_1, \dots, \rho_m \varphi_m,\, \rho_1, \dots, \rho_m\big)
: M \to \mathbb{R}^{nm + m},
\]
where each product \(\rho_i \varphi_i\), defined on \(B_i'\), is extended by zero outside the
support of \(\rho_i\) to a smooth map on all of \(M\).
This \(F\) is injective. Suppose \(F(p) = F(q)\). Since the \(B_i\) cover \(M\), some
\(p \in B_i\), whence \(\rho_i(p) = 1\); matching the last \(m\) coordinates gives
\(\rho_i(q) = 1\), so \(q \in \operatorname{supp} \rho_i \subseteq B_i'\). Then matching the
block \(\rho_i \varphi_i\) gives \(\varphi_i(p) = \varphi_i(q)\), and since \(\varphi_i\) is
injective on \(B_i'\), we conclude \(p = q\). It is also an immersion: near any \(p\), choosing
\(i\) with \(p \in B_i\), the function \(\rho_i\) is identically \(1\) on a neighborhood, so
there \(d(\rho_i \varphi_i)_p = d(\varphi_i)_p\), which is injective. An injective immersion of
a compact manifold is an
embedding,
and the compact case is done.
The noncompact case. Here we cannot use finitely many charts, and a naive
countable sum need not even be smooth, since infinitely many terms could be nonzero at a point.
The fix is to organize the charts into slabs that overlap only with their immediate neighbors,
and then to separate the slabs into two interleaved families so that within each family the
supports are disjoint.
Let \(f : M \to \mathbb{R}\) be a smooth
exhaustion function.
By
Sard's theorem,
for each nonnegative integer \(i\) we may pick regular values \(a_i, b_i\) of \(f\) with
\(i < a_i < b_i < i + 1\). Define the compact slabs and a fattened version of them,
\[
D_i = f^{-1}\big([i, i+1]\big),
\qquad
E_i = f^{-1}\big([b_{i-1}, a_{i+1}]\big)
\quad (i \ge 1),
\]
with \(D_0 = f^{-1}((-\infty, 1])\) and \(E_0 = f^{-1}((-\infty, a_1])\). Because \(a_{i+1}\)
and \(b_{i-1}\) are regular values of \(f\), each is the value of a submersion near its level
set, so on a neighborhood of \(\partial E_i = f^{-1}(b_{i-1}) \cup f^{-1}(a_{i+1})\) the
function \(f\) has nonvanishing differential. The open slab \(f^{-1}((b_{i-1}, a_{i+1}))\) is
an open submanifold, and along each boundary level set the regular-value condition lets us
straighten \(f\) into a coordinate, exhibiting a neighborhood of the boundary as a half-space
chart; the two pieces fit together to make \(E_i\) a smooth manifold with boundary, compact
because \(f\) is an exhaustion. Its boundary level sets are
regular level sets,
smooth hypersurfaces in \(M\). The slabs satisfy \(M = \bigcup_i D_i\), each
\(D_i \subseteq \operatorname{Int} E_i\), and \(E_i \cap E_j = \varnothing\) unless
\(j \in \{i-1, i, i+1\}\): the fattened slabs meet only their immediate neighbors.
Each \(E_i\) is compact, so by the compact case together with the proper-embedding upgrade it
embeds smoothly in \(\mathbb{R}^{2n+1}\); call the embedding \(\varphi_i\), and let
\(\rho_i\) be a bump function equal to \(1\) on \(D_i\) and supported in
\(\operatorname{Int} E_i\). Now define
\[
F = \Big( \textstyle\sum_{i \text{ even}} \rho_i \varphi_i,\;
\sum_{i \text{ odd}} \rho_i \varphi_i,\; f \Big)
: M \to \mathbb{R}^{2n+1} \times \mathbb{R}^{2n+1} \times \mathbb{R}.
\]
Within each parity the supports are disjoint — even-indexed \(E_i\) overlap only odd-indexed
ones — so each of the first two sums has at most one nonzero term near any point and is
therefore smooth. The map \(F\) is proper because its last component is the exhaustion function
\(f\), whose sublevel sets are compact. To see that \(F\) is an injective immersion, fix
\(p \in M\); it lies in some \(D_i\), where \(\rho_i \equiv 1\), so the corresponding parity
block restricts near \(p\) to the embedding \(\varphi_i\) (the value \(f(p)\) pins down the
slab unambiguously), and an embedding is an injective immersion. A proper injective immersion
is an embedding, completing the noncompact case. Codimension reduction then brings the image
down to \(\mathbb{R}^{2n+1}\). \(\blacksquare\)
Two consequences are worth stating explicitly. The first simply rephrases the theorem in the
language of submanifolds; the second records that, in high enough codimension, embeddings are not
merely possible but typical.
Corollary: Every Manifold Is a Euclidean Submanifold
Every smooth \(n\)-manifold with or without boundary is diffeomorphic to a properly embedded
submanifold (with or without boundary) of \(\mathbb{R}^{2n+1}\).
This is the theorem's headline reading: the abstract manifolds we built from charts, with no
ambient space in sight, are no more general than the concrete submanifolds of Euclidean space.
There is no smooth manifold that cannot be realized as a surface sitting inside some
\(\mathbb{R}^N\) — and not just some \(N\), but one no larger than \(2n + 1\).
Corollary: Approximation by Embeddings
Suppose \(M\) is a compact smooth \(n\)-manifold with or without boundary. If
\(N \ge 2n + 1\), then every smooth map \(M \to \mathbb{R}^N\) can be uniformly approximated by
embeddings.
Proof Sketch.
Let \(f : M \to \mathbb{R}^N\) be smooth and let \(F : M \to \mathbb{R}^{2n+1}\) be a Whitney
embedding. The product \(G = f \times F : M \to \mathbb{R}^N \times \mathbb{R}^{2n+1}\) is an
embedding, since \(F\) alone already separates points and tangent vectors, and \(f\) is
recovered as \(\pi \circ G\) for the projection \(\pi\) onto the first factor. Applying the
codimension-reduction lemma to \(G\) produces projections arbitrarily close to \(\pi\) that
remain embeddings; composing, one obtains embeddings \(M \to \mathbb{R}^N\) arbitrarily close
to \(f\). \(\blacksquare\)
If only an immersion is required, rather than an embedding, the target dimension can be lowered by
one. The proof is a variant of the projection argument — one tracks tangent directions alone,
without the secant directions that injectivity demands — and we state the result without repeating
that analysis here, since it is not needed for what follows in the manifold series.
Theorem (Whitney Immersion Theorem)
Every smooth \(n\)-manifold with or without boundary admits a smooth immersion into
\(\mathbb{R}^{2n}\).
The manifold hypothesis and the cost of a low-dimensional representation
Much of modern machine learning is motivated by the
manifold hypothesis:
the empirical belief that high-dimensional data — images, sound, sensor streams — does not fill
its ambient space but clusters near a low-dimensional manifold, because comparatively few
latent factors generate it. A \(64 \times 64\) grayscale face lives in \(\mathbb{R}^{4096}\),
yet the realistic faces are thought to form a far thinner set, parametrized by a handful of
factors such as pose, lighting, and identity. Whether real data actually lies near such a
manifold is an empirical question the embedding theorem cannot settle. What the theorem does
settle is the converse half of the picture: if a phenomenon is governed by \(d\)
intrinsic degrees of freedom and so carries the structure of a smooth \(d\)-manifold, then that
structure can always be realized concretely as a submanifold of a Euclidean space — and of one
whose dimension is at most \(2d + 1\), no matter how the data was originally presented. The
habit of drawing data as a low-dimensional surface sitting in a high-dimensional space is, to
that extent, not merely a metaphor: an abstract \(d\)-manifold genuinely embeds in a Euclidean
space of controlled dimension, and the
negligibility of lower-dimensional submanifolds
is why such a set occupies no volume in whatever space contains it. That the smoothness in this
story is essential — that mere continuity would let a one-dimensional signal fill a square — is
the lesson of the space-filling curve attached to Sard's theorem. The same genericity that
drives the dimension count here reappears, in a transverse guise, when embedded submanifolds
are deformed to intersect cleanly; that is a development for another page.