The Whitney Embedding Theorem

Reducing the Codimension by Generic Projection

We have developed manifolds as abstract objects, glued from charts with no reference to any surrounding space. Yet our pictures of them have always been of surfaces sitting in \(\mathbb{R}^3\), or curves in the plane — subsets of a Euclidean space. The first application of Sard's theorem justifies that habit completely: every smooth manifold can be realized as a submanifold of some \(\mathbb{R}^N\), and in fact of one whose dimension is controlled by the manifold's own. This is the Whitney embedding theorem, and the work of this page is to prove it.

The strategy has two movements. It is comparatively easy to embed a manifold into some Euclidean space of possibly enormous dimension, by patching together finitely or countably many charts. The substance of the theorem lies in driving that dimension back down to a controlled value. The engine for the descent is Sard's theorem in the form proved on the previous page: the image of a smooth map from a lower-dimensional domain is negligible, so almost every direction is safe to project along. We begin with the single step that removes one dimension, then iterate it.

Fix a manifold already sitting inside \(\mathbb{R}^N\) and a vector \(v\) whose last coordinate is nonzero; projecting along the line \(\mathbb{R}v\) onto the coordinate hyperplane \(\mathbb{R}^{N-1}\) of vectors with vanishing last coordinate lowers the ambient dimension by one. This projection can fail to be injective — if two points of the manifold differ by a multiple of \(v\), they collapse onto each other — and it can fail to be an immersion — if some tangent vector points along \(v\), it is killed. The next lemma shows that both failures are rare: as long as the ambient dimension is large enough relative to the manifold's dimension, almost every direction \(v\) avoids them.

Lemma (Generic Projections Reduce Codimension)

Suppose \(M \subseteq \mathbb{R}^N\) is a smooth \(n\)-dimensional submanifold without boundary. For a vector \(v \in \mathbb{R}^N \setminus \mathbb{R}^{N-1}\) — that is, with nonzero last coordinate — let \(\pi_v : \mathbb{R}^N \to \mathbb{R}^{N-1}\) denote the projection with kernel the line \(\mathbb{R}v\), identifying \(\mathbb{R}^{N-1}\) with the subspace of vectors whose last coordinate is zero. If \(N \gt 2n + 1\), then the set of vectors \(v\) for which \(\pi_v|_M\) is an injective immersion of \(M\) into \(\mathbb{R}^{N-1}\) is dense.

Proof.

We translate the two requirements on \(\pi_v\) into conditions on the direction \([v]\), and then show that the bad directions form a negligible set. For \(\pi_v|_M\) to be injective, it is necessary and sufficient that no two distinct points \(p, q \in M\) have \(p - q\) parallel to \(v\): if they did, they would project to the same point, and conversely. For \(\pi_v|_M\) to be an immersion, it is necessary and sufficient that no nonzero tangent vector of \(M\) be parallel to \(v\). Indeed, \(\pi_v\) is linear, so its differential at any point is \(\pi_v\) itself under the usual identification of the tangent space of \(\mathbb{R}^N\) with \(\mathbb{R}^N\); its kernel is \(\mathbb{R}v\), and the restriction to \(T_pM\) is injective exactly when \(T_pM\) meets \(\mathbb{R}v\) only at the origin.

Both conditions say that \([v]\), the direction of \(v\) regarded as a point of the real projective space \(\mathbb{RP}^{N-1}\), avoids certain forbidden directions. Make this precise with two smooth maps. Let \(\Delta_M = \{(p, p) : p \in M\}\) be the diagonal of \(M \times M\), and let \(M_0 = \{(p, 0) : p \in M\}\) be the zero section of the tangent bundle \(TM\). Define \[ \begin{align*} \kappa : (M \times M) \setminus \Delta_M &\to \mathbb{RP}^{N-1}, \quad \kappa(p, q) = [\,p - q\,], \\\\ \tau : TM \setminus M_0 &\to \mathbb{RP}^{N-1}, \quad \tau(p, w) = [\,w\,], \end{align*} \] where the brackets denote the direction in \(\mathbb{RP}^{N-1}\) of a nonzero vector of \(\mathbb{R}^N\). Both maps are smooth, being the projection \(\mathbb{R}^N \setminus \{0\} \to \mathbb{RP}^{N-1}\) composed with smooth maps. By the two characterizations above, \(\pi_v|_M\) is an injective immersion precisely when \([v]\) lies in the image of neither \(\kappa\) nor \(\tau\): a point in the image of \(\kappa\) is a secant direction joining two points of \(M\), and a point in the image of \(\tau\) is a tangent direction.

Now count dimensions. The domain of \(\kappa\) is an open subset of \(M \times M\), of dimension \(2n\); the domain of \(\tau\) is an open subset of \(TM\), also of dimension \(2n\). The target \(\mathbb{RP}^{N-1}\) has dimension \(N - 1\). The hypothesis \(N \gt 2n + 1\) is exactly the statement that \[ 2n \lt N - 1 = \dim \mathbb{RP}^{N-1}, \] so both \(\kappa\) and \(\tau\) are smooth maps from a manifold of dimension strictly less than that of their target. By the corollary to Sard's theorem on lower-dimensional images, each image has measure zero in \(\mathbb{RP}^{N-1}\), and the union of the two images has measure zero as well. Its complement — the admissible directions — is therefore dense in \(\mathbb{RP}^{N-1}\). The vectors \(v\) with nonzero last coordinate are exactly those whose direction lies in the dense open set \(\mathbb{RP}^{N-1} \setminus \mathbb{RP}^{N-2}\); since a dense set meets every nonempty open set, the admissible directions remain dense within this open set, and taking directions to vectors shows the admissible \(v\) are dense in \(\mathbb{R}^N \setminus \mathbb{R}^{N-1}\).

Where Sard does the work

The condition \(N \gt 2n + 1\), equivalently \(2n \lt N - 1\), is the entire mechanism. A manifold of dimension \(n\) carries an \(n\)-dimensional family of points and, at each, an \(n\)-dimensional family of tangent directions; the secant and tangent directions it generates thus sweep out at most a \(2n\)-dimensional family inside the \((N-1)\)-dimensional space of all directions. Whenever \(2n\) falls short of \(N - 1\), this family is a lower-dimensional image, and Sard's theorem declares it negligible — leaving a dense set of safe directions to project along. The space-filling curve we met alongside Sard's theorem is the reminder that smoothness, not mere continuity, is what keeps the bad directions confined to a negligible set.

With one dimension removed, we simply repeat. As long as the ambient dimension still exceeds \(2n + 1\), the lemma supplies a direction whose projection keeps the manifold an injective immersion in one fewer dimension; applying it again and again, we descend until the ambient dimension is exactly \(2n + 1\), where the hypothesis \(N \gt 2n + 1\) first fails and the descent halts. This produces an injective immersion of \(M\) into \(\mathbb{R}^{2n+1}\).

An injective immersion is not yet an embedding, however, and the gap matters. When \(M\) is compact, there is no gap at all: a continuous injection from a compact space is automatically a topological embedding, so an injective immersion of a compact manifold is already a smooth embedding. When \(M\) is noncompact, an injective immersion can wind through space in a way that fails to be a homeomorphism onto its image — approaching itself without ever meeting, as the dense line on a torus does. Closing this gap is the task of the next section.

From Immersion to Proper Embedding

The compact case is settled, so suppose \(M\) is noncompact. We have an injective immersion into some \(\mathbb{R}^N\), but to call it an embedding we must control how the image escapes to infinity. The right notion is properness: a continuous map is proper when the preimage of every compact set is compact, and a proper injective immersion is automatically an embedding. The goal of this section is to upgrade a bare smooth embedding into some \(\mathbb{R}^N\) into a proper smooth embedding into \(\mathbb{R}^{2n+1}\). The geometric device that makes this possible is a tube.

Definition: Tube

Given a one-dimensional linear subspace \(S \subseteq \mathbb{R}^N\) and a positive number \(R\), the tube with axis \(S\) and radius \(R\) is the open set of points whose distance from \(S\) is less than \(R\): \[ T_R(S) = \{\, x \in \mathbb{R}^N : \lvert x - y \rvert \lt R \text{ for some } y \in S \,\}. \]

Lemma (Upgrading to a Proper Embedding)

Let \(M\) be a smooth \(n\)-manifold with or without boundary. If \(M\) admits a smooth embedding into \(\mathbb{R}^N\) for some \(N\), then it admits a proper smooth embedding into \(\mathbb{R}^{2n+1}\).

Proof.

The argument has two parts: first arrange that the embedding is proper and confined to a tube, without regard to dimension; then lower the dimension to \(2n+1\) while preserving properness.

Part 1: a proper embedding inside a tube.
Let \(F : M \to \mathbb{R}^N\) be the given smooth embedding. Choose a diffeomorphism \(G : \mathbb{R}^N \to \mathbb{B}^N\) onto the open unit ball, and a smooth exhaustion function \(f : M \to \mathbb{R}\) — a smooth function whose sublevel sets \(f^{-1}((-\infty, c])\) are all compact. Define \[ \Psi : M \to \mathbb{R}^N \times \mathbb{R}, \quad \Psi(p) = \big(G \circ F(p),\, f(p)\big). \] Because \(G \circ F\) is an embedding, \(\Psi\) is an injective immersion. It is moreover proper: if \(K\) is compact, then \(\Psi^{-1}(K)\) is a closed subset of \(f^{-1}((-\infty, c])\) for any \(c\) bounding the last coordinate on \(K\), and that sublevel set is compact, so \(\Psi^{-1}(K)\) is compact. A proper injective immersion is an embedding, and by construction its image lies in the tube \(\mathbb{B}^N \times \mathbb{R}\), whose axis is the last coordinate line. Renaming \(N + 1\) as \(N\), we may now assume that \(M\) admits a proper smooth embedding into \(\mathbb{R}^N\) whose image lies in some tube \(T_R(S)\).

Part 2: lowering the dimension while staying proper.
Identify \(M\) with its image, a properly embedded submanifold contained in the tube \(T_R(S)\). Suppose \(N \gt 2n + 1\). By the generic projection lemma, the directions \(v\) for which \(\pi_v|_M\) is an injective immersion are dense, so we may choose such a \(v\) that additionally does not lie in the axis \(S\). The image \(\pi_v(S)\) is then a one-dimensional subspace of \(\mathbb{R}^{N-1}\), and because \(\pi_v\) is a bounded linear map, it carries the tube \(T_R(S)\) into a tube around \(\pi_v(S)\); thus \(\pi_v(M)\) again lies in a tube.

It remains to verify that \(\pi_v|_M\) is proper. Let \(K \subseteq \mathbb{R}^{N-1}\) be compact, hence contained in the ball of some radius \(R_1\) about the origin. For any \(x \in \pi_v^{-1}(K) \cap M\), writing \(\pi_v(x) = x - cv\) for the appropriate scalar \(c\), the bound \(\lvert \pi_v(x) \rvert \lt R_1\) places \(x\) in the tube of radius \(R_1\) about the line \(\mathbb{R}v\). At the same time \(x \in M\) lies in the tube \(T_R(S)\) about \(S\). So \(M \cap \pi_v^{-1}(K)\) is contained in the intersection of two tubes, one with axis \(S = \mathbb{R}s\) and one with axis \(\mathbb{R}v\), where \(s\) and \(v\) are not parallel because \(v \notin S\). Such an intersection is bounded. Indeed, a point \(x\) in both tubes satisfies \(\lvert x - a s \rvert \lt R\) and \(\lvert x - b v \rvert \lt R_1\) for some scalars \(a, b\), so by the triangle inequality \(\lvert a s - b v \rvert \lt R + R_1\). Since \(s\) and \(v\) are linearly independent, the norm \(\lvert a s - b v \rvert\) is bounded below by a positive multiple of \(\lvert (a, b) \rvert\) — two norms on the two-dimensional space \(\operatorname{span}\{s, v\}\) being equivalent — so \(\lvert a \rvert\) and \(\lvert b \rvert\) are bounded, and then \(\lvert x \rvert \le \lvert a s \rvert + R\) is bounded as well. Being also closed in \(M\), the set \(M \cap \pi_v^{-1}(K)\) is compact, so \(\pi_v|_M\) is proper, hence an embedding. The image is a properly embedded submanifold of \(\mathbb{R}^{N-1}\) contained in a tube. Iterating this step lowers the ambient dimension one at a time until it reaches \(2n+1\).

Why two nonparallel tubes meet in a bounded set

The properness of the lowered projection rests on a fact one can see in three dimensions. A tube about a line is an infinite cylinder; two infinite cylinders whose axes are parallel intersect in another infinite region, but two whose axes cross at an angle can only overlap near the crossing, because far from it each cylinder has drifted away from the other's axis by more than its radius. The angle between the axes \(S\) and \(\mathbb{R}v\) is bounded below precisely because \(v\) was chosen outside \(S\), and that positive angle is what confines the intersection to a bounded neighborhood of the origin — turning a statement about preimages of compact sets into elementary Euclidean geometry.

The Whitney Embedding Theorem

The two lemmas reduce the theorem to a single remaining task: produce a smooth embedding of \(M\) into some Euclidean space at all. Once we have that, the codimension-reduction lemma carries it down to \(\mathbb{R}^{2n+1}\) — directly when \(M\) is compact, and through the proper-embedding upgrade when it is not. The construction of an initial embedding is a patching argument: cover \(M\) by charts, and assemble the local coordinate maps into a single global map using bump functions to blend them. We treat the compact and noncompact cases in turn, since the bookkeeping of the patching differs.

Theorem (Whitney Embedding Theorem)

Every smooth \(n\)-manifold with or without boundary admits a proper smooth embedding into \(\mathbb{R}^{2n+1}\).

Proof.

By the two preceding lemmas it suffices to embed \(M\) smoothly into some Euclidean space; codimension reduction and the proper-embedding upgrade then deliver a proper embedding into \(\mathbb{R}^{2n+1}\).

The compact case. Cover \(M\) by finitely many regular coordinate balls (or regular coordinate half-balls at boundary points) \(B_1, \dots, B_m\), where each \(B_i\) sits inside a slightly larger coordinate domain \(B_i'\) carrying a chart \(\varphi_i : B_i' \to \mathbb{R}^n\). For each \(i\) choose a smooth bump function \(\rho_i : M \to \mathbb{R}\) equal to \(1\) on \(\overline{B_i}\) and supported in \(B_i'\). Define \[ F = \big(\rho_1 \varphi_1, \dots, \rho_m \varphi_m,\, \rho_1, \dots, \rho_m\big) : M \to \mathbb{R}^{nm + m}, \] where each product \(\rho_i \varphi_i\), defined on \(B_i'\), is extended by zero outside the support of \(\rho_i\) to a smooth map on all of \(M\).

This \(F\) is injective. Suppose \(F(p) = F(q)\). Since the \(B_i\) cover \(M\), some \(p \in B_i\), whence \(\rho_i(p) = 1\); matching the last \(m\) coordinates gives \(\rho_i(q) = 1\), so \(q \in \operatorname{supp} \rho_i \subseteq B_i'\). Then matching the block \(\rho_i \varphi_i\) gives \(\varphi_i(p) = \varphi_i(q)\), and since \(\varphi_i\) is injective on \(B_i'\), we conclude \(p = q\). It is also an immersion: near any \(p\), choosing \(i\) with \(p \in B_i\), the function \(\rho_i\) is identically \(1\) on a neighborhood, so there \(d(\rho_i \varphi_i)_p = d(\varphi_i)_p\), which is injective. An injective immersion of a compact manifold is an embedding, and the compact case is done.

The noncompact case. Here we cannot use finitely many charts, and a naive countable sum need not even be smooth, since infinitely many terms could be nonzero at a point. The fix is to organize the charts into slabs that overlap only with their immediate neighbors, and then to separate the slabs into two interleaved families so that within each family the supports are disjoint.

Let \(f : M \to \mathbb{R}\) be a smooth exhaustion function. By Sard's theorem, for each nonnegative integer \(i\) we may pick regular values \(a_i, b_i\) of \(f\) with \(i \lt a_i \lt b_i \lt i + 1\). Define the compact slabs and a fattened version of them, \[ D_i = f^{-1}\big([i, i+1]\big), \quad E_i = f^{-1}\big([b_{i-1}, a_{i+1}]\big) \quad (i \ge 1), \] with \(D_0 = f^{-1}((-\infty, 1])\) and \(E_0 = f^{-1}((-\infty, a_1])\). Because \(a_{i+1}\) and \(b_{i-1}\) are regular values of \(f\), each is the value of a submersion near its level set, so on a neighborhood of \(\partial E_i = f^{-1}(b_{i-1}) \cup f^{-1}(a_{i+1})\) the function \(f\) has nonvanishing differential. The open slab \(f^{-1}((b_{i-1}, a_{i+1}))\) is an open submanifold, and along each boundary level set the regular-value condition lets us straighten \(f\) into a coordinate, exhibiting a neighborhood of the boundary as a half-space chart; the two pieces fit together to make \(E_i\) a smooth manifold with boundary, compact because \(f\) is an exhaustion. Its boundary level sets are regular level sets, smooth hypersurfaces in \(M\). The slabs satisfy \(M = \bigcup_i D_i\), each \(D_i \subseteq \operatorname{Int} E_i\), and \(E_i \cap E_j = \varnothing\) unless \(j \in \{i-1, i, i+1\}\): the fattened slabs meet only their immediate neighbors.

Each \(E_i\) is compact, so by the compact case together with the proper-embedding upgrade it embeds smoothly in \(\mathbb{R}^{2n+1}\); call the embedding \(\varphi_i\), and let \(\rho_i\) be a bump function equal to \(1\) on \(D_i\) and supported in \(\operatorname{Int} E_i\). Now define \[ F = \Big( \textstyle\sum_{i \text{ even}} \rho_i \varphi_i,\; \sum_{i \text{ odd}} \rho_i \varphi_i,\; f \Big) : M \to \mathbb{R}^{2n+1} \times \mathbb{R}^{2n+1} \times \mathbb{R}. \] Within each parity the supports are disjoint — even-indexed \(E_i\) overlap only odd-indexed ones — so each of the first two sums has at most one nonzero term near any point and is therefore smooth. The map \(F\) is proper because its last component is the exhaustion function \(f\), whose sublevel sets are compact. To see that \(F\) is an injective immersion, fix \(p \in M\); it lies in some \(D_i\), where \(\rho_i \equiv 1\), so the corresponding parity block restricts near \(p\) to the embedding \(\varphi_i\) (the value \(f(p)\) pins down the slab unambiguously), and an embedding is an injective immersion. A proper injective immersion is an embedding, completing the noncompact case. Codimension reduction then brings the image down to \(\mathbb{R}^{2n+1}\).

Two consequences are worth stating explicitly. The first simply rephrases the theorem in the language of submanifolds; the second records that, in high enough codimension, embeddings are not merely possible but typical.

Corollary: Every Manifold Is a Euclidean Submanifold

Every smooth \(n\)-manifold with or without boundary is diffeomorphic to a properly embedded submanifold (with or without boundary) of \(\mathbb{R}^{2n+1}\).

This is the theorem's headline reading: the abstract manifolds we built from charts, with no ambient space in sight, are no more general than the concrete submanifolds of Euclidean space. There is no smooth manifold that cannot be realized as a surface sitting inside some \(\mathbb{R}^N\) — and not just some \(N\), but one no larger than \(2n + 1\).

Corollary: Approximation by Embeddings

Suppose \(M\) is a compact smooth \(n\)-manifold with or without boundary. If \(N \ge 2n + 1\), then every smooth map \(M \to \mathbb{R}^N\) can be uniformly approximated by embeddings.

Proof Sketch.

Let \(f : M \to \mathbb{R}^N\) be smooth and let \(F : M \to \mathbb{R}^{2n+1}\) be a Whitney embedding. The product \(G = f \times F : M \to \mathbb{R}^N \times \mathbb{R}^{2n+1}\) is an embedding, since \(F\) alone already separates points and tangent vectors, and \(f\) is recovered as \(\pi \circ G\) for the projection \(\pi\) onto the first factor. Applying the codimension-reduction lemma to \(G\) produces projections arbitrarily close to \(\pi\) that remain embeddings; composing, one obtains embeddings \(M \to \mathbb{R}^N\) arbitrarily close to \(f\).

If only an immersion is required, rather than an embedding, the target dimension can be lowered by one. The proof is a variant of the projection argument — one tracks tangent directions alone, without the secant directions that injectivity demands — and we state the result without repeating that analysis here, since it is not needed for what follows in the manifold series.

Theorem (Whitney Immersion Theorem)

Every smooth \(n\)-manifold with or without boundary admits a smooth immersion into \(\mathbb{R}^{2n}\).

The manifold hypothesis and the cost of a low-dimensional representation

Much of modern machine learning is motivated by the manifold hypothesis: the empirical belief that high-dimensional data — images, sound, sensor streams — does not fill its ambient space but clusters near a low-dimensional manifold, because comparatively few latent factors generate it. A \(64 \times 64\) grayscale face lives in \(\mathbb{R}^{4096}\), yet the realistic faces are thought to form a far thinner set, parametrized by a handful of factors such as pose, lighting, and identity. Whether real data actually lies near such a manifold is an empirical question the embedding theorem cannot settle. What the theorem does settle is the converse half of the picture: if a phenomenon is governed by \(d\) intrinsic degrees of freedom and so carries the structure of a smooth \(d\)-manifold, then that structure can always be realized concretely as a submanifold of a Euclidean space — and of one whose dimension is at most \(2d + 1\), no matter how the data was originally presented. The habit of drawing data as a low-dimensional surface sitting in a high-dimensional space is, to that extent, not merely a metaphor: an abstract \(d\)-manifold genuinely embeds in a Euclidean space of controlled dimension, and the negligibility of lower-dimensional submanifolds is why such a set occupies no volume in whatever space contains it. That the smoothness in this story is essential — that mere continuity would let a one-dimensional signal fill a square — is the lesson of the space-filling curve attached to Sard's theorem. The same genericity that drives the dimension count here reappears, in a transverse guise, when embedded submanifolds are deformed to intersect cleanly; that is a development for another page.

The Whitney Embedding Theorem

Loading...

Reducing the Codimension by Generic Projection

Where Sard does the work

From Immersion to Proper Embedding

Why two nonparallel tubes meet in a bounded set

The Whitney Embedding Theorem

The manifold hypothesis and the cost of a low-dimensional representation