Sets of Measure Zero in Euclidean Space
A recurring theme of the manifold series has been that the interesting features of a smooth map
live where its differential behaves well. At a point where the differential is surjective, the
map looks locally like a projection and its level sets are submanifolds; at a point where it is
injective, the map looks locally like an inclusion. The points where the differential fails to
have maximal rank — the critical points — are where this orderly picture breaks
down. One might fear that such points are common enough to spoil the theory. The central result
of this page, Sard's theorem, says the opposite: the values a map takes at its critical
points form a vanishingly small set. To make "vanishingly small" precise on a manifold, where we
have no notion of length or volume to call upon, we first develop the right notion in Euclidean
space and then transport it across charts. That notion is a set of measure zero.
The idea is the most economical possible formalization of "negligible size." Recall the
elementary definition from the analysis of several variables. An open rectangle
in \(\mathbb{R}^n\) is a product \((a^1, b^1) \times \cdots \times (a^n, b^n)\) of open intervals,
and its volume is the product \(\prod_i (b^i - a^i)\) of its side lengths. A
subset is negligible when it can be hidden inside a countable family of such rectangles whose
volumes add up to as little as we please.
Definition: Set of Measure Zero in \(\mathbb{R}^n\)
A subset \(A \subseteq \mathbb{R}^n\) has measure zero if for every
\(\delta > 0\) there is a countable collection of open rectangles \(\{R_i\}\) covering
\(A\) whose total volume satisfies \(\sum_i \operatorname{Vol}(R_i) < \delta\).
Two features of this definition deserve emphasis, because the entire chapter turns on them. The
first is that it makes no use of measure theory: there is no \(\sigma\)-algebra, no measure to be
constructed, only the bare combinatorics of covering a set by boxes. The notion agrees on the nose
with the Lebesgue-null sets one meets in integration theory — the two pick out exactly the same
subsets of \(\mathbb{R}^n\) — but it is logically prior to them, asking only that arbitrarily
economical covers exist. This is what will let us define measure zero on a manifold long before
we have any way to integrate there.
The second feature is the freedom in the shape of the covering sets. Whether we cover by open
rectangles, open balls, or open cubes makes no difference to which sets come out negligible: each
family can be exchanged for another at the cost of a bounded factor in total volume. An open
rectangle is covered by finitely many open cubes of total volume at most \(2^n\) times its own;
a ball of radius \(r\) sits inside a cube of side \(2r\) and contains a cube of side \(2r/\sqrt{n}\),
so balls and cubes trap one another up to a dimensional constant. Because the definition asks only
that the total volume be made smaller than an arbitrary \(\delta\), multiplying every
cover by a fixed constant changes nothing. We will use rectangles, balls, and cubes
interchangeably from here on, choosing whichever is most convenient.
A first consequence of the definition is worth recording, as we will lean on it repeatedly. Any
subset of a set of measure zero again has measure zero, since a cover of the larger set already
covers the smaller one. A countable union of sets of measure zero also has measure zero: given
\(\delta\), cover the \(k\)th set with rectangles of total volume less than \(\delta / 2^k\), and
the combined cover has total volume less than \(\delta\). A single point has measure zero, so
every countable subset of \(\mathbb{R}^n\) does as well.
A Slicing Criterion
The covering definition is simple to state but awkward to verify directly for sets that are not
already presented as small. The following lemma is the workhorse that converts a statement about
an \(n\)-dimensional set into statements about its lower-dimensional slices, which can be handled
by induction. A reader who knows Lebesgue integration will recognize it as a special case of
Fubini's theorem; the point of the elementary proof below is that it needs none of that machinery,
keeping the whole development independent of measure theory.
Lemma (Slicing Criterion for Measure Zero)
Let \(A \subseteq \mathbb{R}^n\) be a compact set, and for \(c \in \mathbb{R}\) write the
slice \(A_c = \{x \in \mathbb{R}^{n-1} : (c, x) \in A\}\). If \(A_c\) has \((n-1)\)-dimensional
measure zero for every \(c \in \mathbb{R}\), then \(A\) has \(n\)-dimensional measure zero.
Proof.
Since \(A\) is compact it is bounded, so we may fix a closed interval \([a, b] \subseteq
\mathbb{R}\) with \(A \subseteq [a, b] \times \mathbb{R}^{n-1}\). Let \(\delta > 0\) be given.
Fix \(c \in [a, b]\). By hypothesis the slice \(A_c\) has \((n-1)\)-dimensional measure zero,
so it is covered by finitely many \((n-1)\)-dimensional open cubes \(C_1, \dots, C_k\) of total
\((n-1)\)-volume less than \(\delta\); set \(U_c = C_1 \cup \cdots \cup C_k \subseteq
\mathbb{R}^{n-1}\), an open set containing \(A_c\). We claim there is an open interval \(J_c\)
about \(c\) such that the part of \(A\) lying over \(J_c\) is trapped in \(J_c \times U_c\):
\[
A \cap (J_c \times \mathbb{R}^{n-1}) \subseteq J_c \times U_c.
\]
If no such interval existed, then for every \(m\) the interval \((c - \tfrac1m, c + \tfrac1m)\)
would contain a point \(c_m\) with some \((c_m, x_m) \in A\) and \(x_m \notin U_c\). The points
\((c_m, x_m)\) lie in the compact set \(A\), so a subsequence converges to a limit
\((c, x) \in A\) with \(x = \lim x_m\). Then \(x \in A_c \subseteq U_c\); but each \(x_m\) lies
in the closed complement \(\mathbb{R}^{n-1} \setminus U_c\), which therefore contains the limit
\(x\) — a contradiction. This proves the claim.
The intervals \(\{J_c : c \in [a, b]\}\) form an open cover of the compact set \([a, b]\), so
finitely many of them, say \(J_{c_1}, \dots, J_{c_m}\), already cover \([a, b]\). By shrinking
each interval where it overlaps a neighbor, we may arrange that no point of \([a, b]\) lies in
more than two of them; their lengths then add up to no more than twice the length of
\([a, b]\), that is, at most \(2|b - a|\). Then \(A\) is contained in the union of the boxes
\(J_{c_1} \times U_{c_1}, \dots, J_{c_m} \times U_{c_m}\), each itself a finite union of open
rectangles. The total \(n\)-volume of this cover is
\[
\sum_{j} \operatorname{length}(J_{c_j}) \cdot \operatorname{Vol}_{n-1}(U_{c_j})
< \Big(\sum_j \operatorname{length}(J_{c_j})\Big) \cdot \delta \le 2|b - a|\,\delta .
\]
Since \(\delta\) was arbitrary, this can be made as small as we wish, and so \(A\) has
\(n\)-dimensional measure zero. \(\blacksquare\)
Graphs, Subspaces, and the Image of a Set
The slicing criterion immediately delivers the most important examples. A graph is the cleanest
case: it is one dimension thin by construction, and slicing it along the graphed variable reduces
its dimension by one at a time.
Proposition: Graphs of Continuous Functions Have Measure Zero
Let \(A\) be an open or closed subset of \(\mathbb{R}^{n-1}\) or of the half-space
\(\mathbb{H}^{n-1}\), and let \(f : A \to \mathbb{R}\) be continuous. Then the graph
\(\{(x, f(x)) : x \in A\} \subseteq \mathbb{R}^n\) has measure zero.
Proof.
Assume first that \(A\) is compact, and induct on \(n\). When \(n = 1\) the domain \(A\) is a
compact subset of \(\mathbb{R}^0 = \{0\}\), so the graph is at most a single point and has
measure zero. For the inductive step, intersect the graph with a hyperplane
\(\{x^1 = c\}\): the slice is the graph of \(f\) restricted to \(\{x \in A : x^1 = c\}\), a
continuous function of the remaining \(n - 2\) variables, and so has \((n-1)\)-dimensional
measure zero by the inductive hypothesis. The slicing criterion then gives the graph
\(n\)-dimensional measure zero.
If \(A\) is not compact, it is a countable union of compact subsets, so its graph is a
countable union of sets of measure zero and hence has measure zero. \(\blacksquare\)
The flat case follows at once: a proper affine subspace is, after a choice of coordinates, the
graph of an affine function.
Corollary: Proper Affine Subspaces Have Measure Zero
Every proper affine subspace of \(\mathbb{R}^n\) has measure zero.
Proof.
Let \(S \subsetneq \mathbb{R}^n\) be a proper affine subspace. If \(\dim S = n - 1\), then some
coordinate axis is not parallel to \(S\); taking that coordinate as the dependent one exhibits
\(S\) as the graph of an affine function of the other \(n - 1\) coordinates, so the previous
proposition applies. If \(\dim S < n - 1\), then \(S\) is contained in some affine subspace
of dimension \(n - 1\), and a subset of a set of measure zero has measure zero. \(\blacksquare\)
The Invariance That Makes the Theory Possible
We now reach the proposition on which everything else rests. Our goal is to define measure zero on
a manifold, but a manifold carries no volumes: there is nothing to integrate, no preferred way to
measure a box. The natural attempt is to declare a subset of a manifold negligible when its image
in every chart is negligible in \(\mathbb{R}^n\). For this to be a sound definition it must not
depend on which chart we choose, and passing between charts means composing with a smooth
transition map. So the definition will make sense precisely if smooth maps cannot enlarge a set of
measure zero into something larger. The following proposition supplies exactly this, and it is the
quiet engine of the whole chapter.
Proposition: Smooth Maps Preserve Sets of Measure Zero
Suppose \(A \subseteq \mathbb{R}^n\) has measure zero and \(F : A \to \mathbb{R}^n\) is a
smooth map. Then \(F(A)\) has measure zero.
Proof.
Smoothness of \(F\) on the (possibly non-open) set \(A\) means, by definition, that each point
of \(A\) has a neighborhood on which \(F\) extends to a smooth map; shrinking, we may take that
neighborhood to be an open ball \(U\) on whose closure \(\overline{U}\) the extension is still
smooth. Countably many such balls cover \(A\), so \(F(A)\) is a countable union of sets of the
form \(F(A \cap \overline{U})\). Since a countable union of measure-zero sets has measure zero,
it suffices to show each \(F(A \cap \overline{U})\) has measure zero.
Fix one such ball. The closure \(\overline{U}\) is compact, so the derivative of the extended
map is bounded there, say \(\lvert DF(x) \rvert \le C\) for all \(x \in \overline{U}\). The
mean value inequality
then makes \(F\) Lipschitz on this convex set:
\[
\lvert F(x) - F(x') \rvert \le C \, \lvert x - x' \rvert
\qquad \text{for all } x, x' \in \overline{U},
\]
which is precisely the statement that \(F\) is
Lipschitz continuous
on \(\overline{U}\) with constant \(C\).
Now let \(\delta > 0\) be given. Because \(A \cap \overline{U}\) has measure zero, it is
covered by countably many open balls \(\{B_j\}\), each of radius \(r_j\), with
\(\sum_j \operatorname{Vol}(B_j) < \delta\); discarding any \(B_j\) disjoint from
\(\overline{U}\), we may assume each meets \(\overline{U}\). Two points of
\(\overline{U} \cap B_j\) differ by less than the diameter \(2 r_j\) of \(B_j\), so by the
Lipschitz bound their images differ by less than \(2 C r_j\); fixing any one image point as a
center, \(F(\overline{U} \cap B_j)\) is therefore contained in a ball of radius \(2 C r_j\).
Hence \(F(A \cap \overline{U})\) is covered by balls of total volume
\[
\sum_j \operatorname{Vol}\big(\text{ball of radius } 2 C r_j\big)
= (2C)^n \sum_j \operatorname{Vol}(B_j) < (2C)^n \delta .
\]
As \(\delta\) was arbitrary, \(F(A \cap \overline{U})\) has measure zero, and the proof is
complete. \(\blacksquare\)
It is worth pausing on why the hypothesis that the domain and codomain have the same
dimension \(n\) is indispensable here, since this constraint is the hinge on which the later
theory swings. The argument balances two scalings against each other. Covering \(A\) by balls of
total volume below \(\delta\) controls \(n\)-dimensional volume; the Lipschitz estimate inflates
each radius by a bounded factor, and so each \(n\)-dimensional volume by a bounded factor raised
to the \(n\)th power. The total volume of the image cover stays controlled only because the
source volume and the target volume are computed in the same dimension, so that a
radius scaling by a constant becomes a volume scaling by that same constant to a fixed power.
Were the codomain higher-dimensional, a measure-zero set could spread out to fill a set of
positive measure — a continuous curve can be made to fill a square, after all — and the
bookkeeping would collapse. The fact that smoothness, together with equal dimension, rules this
out is exactly what we will exploit twice over: positively, to make measure zero a property of
subsets of manifolds, and later, in the guise of lowering dimension to force images to be
negligible.
Measure Zero on Manifolds
With invariance in hand, the passage to manifolds is now forced upon us. A manifold has no
volumes, but it has charts, and the previous proposition guarantees that the question "is this
image negligible?" receives the same answer no matter which chart we ask it in. We make the
natural definition and then verify that it is well posed.
Definition: Measure Zero in a Manifold
Let \(M\) be a smooth \(n\)-manifold with or without boundary. A subset \(A \subseteq M\) has
measure zero in \(M\) if for every smooth chart \((U, \varphi)\) of \(M\), the
image \(\varphi(A \cap U) \subseteq \mathbb{R}^n\) has \(n\)-dimensional measure zero.
Stated this way, the definition seems to demand that we inspect every chart — an
impossible task in practice. The next lemma removes that burden: it is enough to check the
condition on a single family of charts that happens to cover \(A\). The proof is the first place
where the invariance of the previous section pays off, and it pays off in exactly the form one
would hope — the transition between two charts is a smooth map between open subsets of
\(\mathbb{R}^n\), so it cannot turn a negligible set into a substantial one.
Lemma: Checking Measure Zero on One Atlas Suffices
Let \(M\) be a smooth \(n\)-manifold with or without boundary and let \(A \subseteq M\).
Suppose there is a collection \(\{(U_\alpha, \varphi_\alpha)\}\) of smooth charts whose domains
cover \(A\), such that \(\varphi_\alpha(A \cap U_\alpha)\) has measure zero in \(\mathbb{R}^n\)
for every \(\alpha\). Then \(A\) has measure zero in \(M\).
Proof.
Let \((V, \psi)\) be an arbitrary smooth chart of \(M\); we must show
\(\psi(A \cap V) \subseteq \mathbb{R}^n\) has measure zero. Because \(M\) is second countable,
some countable subcollection of the \(U_\alpha\) already covers \(A \cap V\), so it is enough
to treat one chart \(U_\alpha\) at a time and then take the countable union.
On the overlap, the points of \(\psi(A \cap V \cap U_\alpha)\) are obtained from the points of
\(\varphi_\alpha(A \cap V \cap U_\alpha)\) by applying the transition map. Precisely,
\[
\psi(A \cap V \cap U_\alpha)
= \big(\psi \circ \varphi_\alpha^{-1}\big)\big(\varphi_\alpha(A \cap V \cap U_\alpha)\big).
\]
The set \(\varphi_\alpha(A \cap V \cap U_\alpha)\) is a subset of
\(\varphi_\alpha(A \cap U_\alpha)\), which has measure zero by hypothesis, so it too has
measure zero. The transition map \(\psi \circ \varphi_\alpha^{-1}\) is a smooth map between
open subsets of \(\mathbb{R}^n\) — a smooth map from \(\mathbb{R}^n\) to itself in the sense
required — so by the
invariance of measure zero under smooth maps
its image \(\psi(A \cap V \cap U_\alpha)\) has measure zero. Taking the union over the
countably many chosen \(U_\alpha\), the set \(\psi(A \cap V)\) is a countable union of
measure-zero sets and so has measure zero. \(\blacksquare\)
This is precisely where the equal-dimension hypothesis of the invariance proposition earns its
keep: transition maps go from \(\mathbb{R}^n\) to \(\mathbb{R}^n\), never changing dimension, so
the one situation in which smoothness preserves measure zero is exactly the situation that arises.
The definition of measure zero on a manifold is, in this sense, custom-built around the invariance
that makes it consistent.
Two basic closure properties follow with little effort. A countable union of measure-zero sets in
\(M\) again has measure zero, since in any chart the images form a countable union of
measure-zero sets in \(\mathbb{R}^n\). And measure zero in \(M\) captures the right qualitative
idea of "negligibly small": such a set can never contain any open set, so its complement is
everywhere dense.
Proposition: The Complement of a Measure-Zero Set Is Dense
Let \(M\) be a smooth manifold with or without boundary and let \(A \subseteq M\) have measure
zero in \(M\). Then \(M \setminus A\) is dense in \(M\).
Proof.
If \(M \setminus A\) were not dense, then \(A\) would contain a nonempty open subset of \(M\).
Choosing a smooth chart \((V, \psi)\) meeting that open set, the image \(\psi(A \cap V)\) would
contain a nonempty open subset of \(\mathbb{R}^n\), and hence a rectangle \(R\) of some
positive volume \(v\). No countable cover of \(R\) can have total volume less than \(v\): the
volumes of any family of rectangles covering \(R\) must sum to at least \(\operatorname{Vol}(R)
= v\). So \(R\), and therefore \(\psi(A \cap V)\), cannot have measure zero, contradicting the
assumption that \(A\) has measure zero in \(M\). \(\blacksquare\)
Finally, the invariance proposition lifts verbatim from Euclidean space to manifolds, provided we
keep the dimensions equal. This is the form in which we will invoke it inside the proof of Sard's
theorem, where charts reduce a statement about manifolds to a statement about open subsets of
\(\mathbb{R}^n\) and back again.
Theorem: Equidimensional Smooth Maps Preserve Measure Zero
Let \(M\) and \(N\) be smooth \(n\)-manifolds with or without boundary of the same
dimension, let \(F : M \to N\) be smooth, and let \(A \subseteq M\) have measure zero in
\(M\). Then \(F(A)\) has measure zero in \(N\).
Proof.
By the lemma above, it suffices to show that \(F(A)\) has measure-zero image in each chart of
a covering family of \(N\). Cover \(M\) by countably many smooth charts
\(\{(U_i, \varphi_i)\}\), and let \((V, \psi)\) be a smooth chart of \(N\). Then
\(F(A) \cap V\) is the countable union, over \(i\), of the sets \(F(A \cap U_i) \cap V\), and
in the chart \(\psi\) each of these is the image of the measure-zero set
\(\varphi_i(A \cap U_i \cap F^{-1}(V))\) under the smooth map
\(\psi \circ F \circ \varphi_i^{-1}\) between open subsets of \(\mathbb{R}^n\). Since \(M\) and
\(N\) have the same dimension, this map goes from \(\mathbb{R}^n\) to \(\mathbb{R}^n\), so the
invariance proposition
applies and the image has measure zero. A countable union of such images has measure zero, and
the lemma concludes that \(F(A)\) has measure zero in \(N\). \(\blacksquare\)
The qualifier "of the same dimension" is not a technicality to be apologized for; it is the
entire content. Sard's theorem will exploit what happens when the dimensions are allowed to
differ, but to state and prove it we first need the vocabulary of critical points. We therefore
turn to the theorem itself.
Sard's Theorem
We can now state and prove the theorem that organizes everything to follow. Recall that a point
\(p\) of the domain is a critical point of a smooth map \(F : M \to N\) when the
differential \(dF_p\) fails to be surjective, and a point of the codomain is a critical
value when it is the image of some critical point. The complementary notions are
regular points and regular values; in particular, a value with
empty preimage is regular by default. Sard's theorem asserts that the critical values, however
plentiful the critical points may be, occupy only a negligible part of the codomain.
Theorem (Sard's Theorem)
Let \(M\) and \(N\) be smooth manifolds with or without boundary and let \(F : M \to N\) be a
smooth map. Then the set of critical values of \(F\) has measure zero in \(N\).
The proof is an induction on the dimension of the domain. Before the formal argument, it helps to
see the shape of the whole. Charts reduce the statement to a map \(F\) from an open subset of
\(\mathbb{R}^m\) to \(\mathbb{R}^n\); we then sort the critical points by how degenerate they are,
measuring degeneracy by how many derivatives of \(F\) vanish. The points where some first
derivative survives are handled by changing coordinates so that \(F\) becomes a graph in one
variable and slicing, which drops the dimension of the domain and invites the inductive
hypothesis. The points where derivatives vanish to higher and higher order are handled by Taylor's
theorem: the more derivatives vanish, the more sharply \(F\) is pinned near its value, and once
enough vanish the image is squeezed into something of negligible volume by a direct count. The
induction and the three cases interlock to cover every critical point.
Proof.
Let \(m = \dim M\) and \(n = \dim N\); we induct on \(m\). For \(m = 0\) the claim is
immediate: if \(n = 0\) there are no critical points at all, while if \(n > 0\) the entire
image of \(F\) is countable, hence of measure zero.
Suppose now that \(m \ge 1\) and that the theorem holds for every smooth map whose domain has
dimension less than \(m\). Covering \(M\) and \(N\) by countably many smooth charts reduces the
statement, by the closure properties already established, to the case of a smooth map \(F\)
from an open subset \(U \subseteq \mathbb{R}^m\) (or \(\mathbb{H}^m\)) into \(\mathbb{R}^n\).
Write the domain coordinates as \((x^1, \dots, x^m)\) and the codomain coordinates as
\((y^1, \dots, y^n)\). Let \(C \subseteq U\) be the set of critical points of \(F\), and
define a decreasing sequence of subsets
\[
C \supseteq C_1 \supseteq C_2 \supseteq \cdots,
\qquad
C_k = \{x \in C : \text{all partial derivatives of } F \text{ of order} \le k \text{ vanish at } x\}.
\]
More precisely, \(C_k\) is the set of \(x \in C\) at which every partial derivative of every
component \(F^i\), of orders \(1\) through \(k\), is zero. By continuity, \(C\) and all the
\(C_k\) are closed in \(U\). We prove that \(F(C)\) has measure zero in three steps.
Step 1: \(F(C \setminus C_1)\) has measure zero.
The set \(C_1\) is closed, so we may discard it: replacing \(U\) by \(U \setminus C_1\), we may
assume \(C_1 = \varnothing\), which means that at every point of \(C\) some first partial
derivative of \(F\) is nonzero. Fix a point \(a \in C\), and by relabeling assume
\(\partial F^1 / \partial x^1 (a) \ne 0\). The map sending \(x\) to
\((F^1(x), x^2, \dots, x^m)\) then has nonsingular Jacobian at \(a\), so it defines new smooth
coordinates \((u, v^2, \dots, v^m)\) on a neighborhood \(V_a\) of \(a\), with \(u = F^1\) and
\(v^j = x^j\). Shrinking \(V_a\) so that its closure is a compact subset of \(U\) on which the
change of coordinates extends smoothly, we find that in these coordinates \(F\) takes the form
\[
F(u, v^2, \dots, v^m) = \big(u,\, F^2(u, v), \dots, F^n(u, v)\big),
\]
because the first component is now the coordinate \(u\) itself. Its Jacobian is correspondingly
block lower-triangular,
\[
DF(u, v) =
\begin{pmatrix}
1 & 0 \\
* & \dfrac{\partial F^i}{\partial v^j}
\end{pmatrix},
\]
so the rank of \(DF\) equals \(1\) plus the rank of the lower-right
\((n-1) \times (m-1)\) block. A point is critical for \(F\) exactly when \(DF\) has rank less
than \(n\), which happens precisely when that block has rank less than \(n - 1\). Thus
\(C \cap V_a\) consists of exactly the points where the matrix
\((\partial F^i / \partial v^j)\) has rank below \(n - 1\).
Because the first coordinate of \(F\) is preserved, \(F\) maps each hyperplane
\(\{u = c\}\) into the hyperplane \(\{y^1 = c\}\). Write \(F\) restricted to the slice
\(\{u = c\}\) as \(F_c(v) = (F^2(c, v), \dots, F^n(c, v))\), a smooth map of the \(m - 1\)
remaining variables into \(\mathbb{R}^{n-1}\). The full map sends \((c, v)\) to
\((c, F_c(v))\), so a critical point of \(F\) lying in the slice \(\{u = c\}\) is exactly a
point \((c, v)\) at which \(F_c\) is critical, and its image is \((c, w)\) with \(w\) a
critical value of \(F_c\). Since the domain of \(F_c\) has dimension \(m - 1 < m\), the
inductive hypothesis applies: the critical values of each \(F_c\) form a set of
\((n-1)\)-dimensional measure zero. Consider now the compact set \(F(C \cap \overline{V_a})\).
Its slice in the hyperplane \(\{y^1 = c\}\) consists of points \((c, w)\) with \(w\) a critical
value of \(F_c\), so that slice has \((n-1)\)-dimensional measure zero. The
slicing criterion
therefore gives \(F(C \cap \overline{V_a})\) measure zero, and so does its subset
\(F(C \cap V_a)\). Countably many such neighborhoods \(V_a\) cover \(C\), and a countable union
of measure-zero sets has measure zero, completing Step 1.
Step 2: for each \(k \ge 1\), \(F(C_k \setminus C_{k+1})\) has measure zero.
As before, the closed set \(C_{k+1}\) may be discarded, so we assume that at every point of
\(C_k\) some partial derivative of \(F\) of order exactly \(k + 1\) is nonzero. Fix
\(a \in C_k\), and let \(y : U \to \mathbb{R}\) be a \(k\)th-order partial derivative of some
component \(F^i\) such that one of its first partial derivatives — equivalently, some
\((k+1)\)st-order partial derivative of \(F^i\) — is nonzero at \(a\); such a \(y\) exists
because \(a \notin C_{k+1}\). Then \(dy_a \ne 0\), so \(a\) is a regular point of the smooth
function \(y\), and there is a neighborhood \(V_a\) of \(a\) consisting entirely of regular
points of \(y\). The zero set \(Y\) of \(y\) within \(V_a\) is, by the
regular level set theorem,
a smooth hypersurface — an embedded submanifold of \(V_a\) of dimension \(m - 1\).
On \(C_k\), all partial derivatives of \(F\) of order up to \(k\) vanish; in particular the
\(k\)th derivative \(y\) vanishes on \(C_k\), so \(C_k \cap V_a \subseteq Y\). Now consider the
restriction \(F|_Y : Y \to \mathbb{R}^n\), which factors as \(F\) composed with the inclusion
\(\iota : Y \hookrightarrow V_a\). By the chain rule its differential at a point \(p \in Y\) is
\(d(F|_Y)_p = dF_p \circ d\iota_p\); since \(d\iota_p\) identifies
the tangent space \(T_pY\)
with a subspace of \(T_pV_a\), this is just the restriction \((dF_p)|_{T_pY}\). At any point
\(p \in C_k \cap V_a\) the differential \(dF_p\) is not surjective, and restricting its domain
to the subspace \(T_pY\) can only shrink its image, so \(d(F|_Y)_p\) is not surjective either.
Hence \(p\) is a critical point of \(F|_Y\), and \(F(C_k \cap V_a)\) is contained in the set of
critical values of \(F|_Y\). Since \(Y\) has dimension \(m - 1 < m\), the inductive
hypothesis says these critical values have measure zero. Covering \(C_k\) by countably many
such \(V_a\) completes Step 2.
Step 3: for \(k > m/n - 1\), \(F(C_k)\) has measure zero.
Steps 1 and 2 between them account for every point that lies in some \(C_k \setminus C_{k+1}\),
but there may remain points of \(C\) belonging to every \(C_k\) — points at which all
partial derivatives of \(F\) vanish to all orders considered. The final step disposes of these
by a direct volume count, using that the high vanishing order pins the image tightly.
Cover \(U\) by countably many closed cubes contained in \(U\); it suffices to show that
\(F(C_k \cap E)\) has measure zero for one such cube \(E\), of side length \(R\). Let \(A\)
bound the absolute values of all \((k+1)\)st-order derivatives of \(F\) on the compact cube
\(E\), and let \(K\) be a large integer to be chosen. Subdivide \(E\) into \(K^m\) subcubes of
side length \(R/K\). If a subcube \(E_i\) contains a point \(a_i \in C_k\), then all
derivatives of \(F\) up to order \(k\) vanish at \(a_i\), so for each component \(F^j\) the
\(k\)th Taylor polynomial at \(a_i\) is the constant \(F^j(a_i)\). Applying the
Taylor error bound
to each component and combining them controls \(F\) on \(E_i\) by its \((k+1)\)st-order
behavior alone: for all \(x \in E_i\),
\[
\lvert F(x) - F(a_i) \rvert \le A' \lvert x - a_i \rvert^{k+1},
\]
where \(A'\) depends only on the bound \(A\) and on \(k\), \(m\), and \(n\). Since every point
of \(E_i\) is within distance \(\sqrt{m}\,(R/K)\) of \(a_i\), the image \(F(E_i)\) lies in a
ball of radius \(A''(R/K)^{k+1}\), where \(A'' = A'(\sqrt{m})^{k+1}\) is again independent of
\(K\).
Therefore \(F(C_k \cap E)\) is covered by at most \(K^m\) balls each of radius
\(A''(R/K)^{k+1}\), whose total \(n\)-dimensional volume is bounded by a constant multiple of
\[
K^m \cdot \big((R/K)^{k+1}\big)^n = (\text{const}) \cdot K^{\,m - n(k+1)}.
\]
The exponent \(m - n(k+1)\) is negative precisely when \(k > m/n - 1\), which is the
hypothesis of this step; for such \(k\), letting \(K \to \infty\) drives the total volume to
zero. Hence \(F(C_k \cap E)\) has measure zero, and summing over the countably many cubes,
so does \(F(C_k)\).
Finally, choosing any integer \(k\) with \(k > m/n - 1\), the set \(C\) is the union of
\(C \setminus C_1\), the sets \(C_j \setminus C_{j+1}\) for \(1 \le j < k\), and \(C_k\)
itself. Steps 1, 2, and 3 show each piece has measure-zero image, so \(F(C)\) has measure zero.
Undoing the chart reductions, the critical values of the original map \(F : M \to N\) form a
countable union of measure-zero sets, hence a set of measure zero in \(N\). \(\blacksquare\)
Critical Images and Negligible Submanifolds
Sard's theorem is most often used not for the critical values of an interesting map, but for a
deceptively simple consequence: when the domain has strictly smaller dimension than the codomain,
the map can have no regular points at all, and so its entire image is critical. The theorem then
says the whole image is negligible. This is the use of lowering dimension promised
earlier — the mirror image of the equidimensional invariance that made measure zero well defined.
Corollary: Images of Lower-Dimensional Domains Have Measure Zero
Let \(M\) and \(N\) be smooth manifolds with or without boundary, and let \(F : M \to N\) be a
smooth map. If \(\dim M < \dim N\), then \(F(M)\) has measure zero in \(N\).
Proof.
At every point \(p \in M\) the differential \(dF_p : T_pM \to T_{F(p)}N\) is a linear map from
a space of dimension \(\dim M\) to one of strictly larger dimension \(\dim N\), so it cannot be
surjective. Every point of \(M\) is thus a critical point, and \(F(M)\) is precisely the set of
critical values. By
Sard's theorem,
it has measure zero in \(N\). \(\blacksquare\)
The smoothness hypothesis is not a decoration that could be relaxed to continuity; it is the whole
point. A celebrated construction produces a space-filling curve: a continuous map
from the unit interval \([0, 1]\) onto the entire unit square \([0, 1] \times [0, 1]\). Here the
domain has dimension one and the codomain dimension two, yet the image is all of a two-dimensional
region — a set that is emphatically not of measure zero, since it contains rectangles of positive
area. Mere continuity allows a one-dimensional object to smear out and fill a two-dimensional one. What forbids this, and makes
the corollary true, is exactly the rigidity of smooth maps that we isolated in the invariance
proposition: a smooth map satisfies a local Lipschitz bound, and a Lipschitz map cannot increase
dimension in this way. Differentiability, not just continuity, is what keeps low-dimensional sets
small.
The corollary specializes immediately to the situation that recurs throughout differential
geometry: a submanifold of less than full dimension is negligible in its ambient manifold.
Corollary: Lower-Dimensional Submanifolds Have Measure Zero
Let \(M\) be a smooth manifold with or without boundary, and let \(S \subseteq M\) be an
immersed submanifold with or without boundary. If \(\dim S < \dim M\), then \(S\) has
measure zero in \(M\).
Proof.
The inclusion \(\iota : S \hookrightarrow M\) is a smooth map whose domain \(S\) has dimension
strictly less than that of \(M\), and its image is \(S\) itself. The previous corollary
applies. \(\blacksquare\)
This last statement makes precise an intuition we have appealed to repeatedly: a curve in a
surface, a surface in a three-dimensional space, or more generally any submanifold cut out by even
a single independent constraint, takes up no volume in the space that contains it. A point chosen
at random — in any sense that respects the ambient dimension — will miss it with certainty. The
same principle underwrites the working assumption of high-dimensional data analysis, that data
governed by a few degrees of freedom occupies a
low-dimensional set within its ambient space:
if that set is a smooth submanifold of positive codimension, it is genuinely negligible, and the
space-filling curve shows that the smoothness in that assumption is doing essential work.
With Sard's theorem proved and these consequences in hand, the tools are in place for the chapter's
larger purpose. The negligibility of low-dimensional images is the engine behind a dimension-count
that will let us take a manifold presented abstractly and place it, by a generic choice of
projection, inside a Euclidean space of controlled dimension — turning the habit of picturing a
manifold as a subset of \(\mathbb{R}^n\) into a theorem rather than a convenience. That is the
embedding theory the manifold series develops next.