Sard's Theorem

Sets of Measure Zero in Euclidean Space Measure Zero on Manifolds Sard's Theorem Critical Images and Negligible Submanifolds

Sets of Measure Zero in Euclidean Space

A recurring theme of the manifold series has been that the interesting features of a smooth map live where its differential behaves well. At a point where the differential is surjective, the map looks locally like a projection and its level sets are submanifolds; at a point where it is injective, the map looks locally like an inclusion. The points where the differential fails to have maximal rank — the critical points — are where this orderly picture breaks down. One might fear that such points are common enough to spoil the theory. The central result of this page, Sard's theorem, says the opposite: the values a map takes at its critical points form a vanishingly small set. To make "vanishingly small" precise on a manifold, where we have no notion of length or volume to call upon, we first develop the right notion in Euclidean space and then transport it across charts. That notion is a set of measure zero.

The idea is the most economical possible formalization of "negligible size." Recall the elementary definition from the analysis of several variables. An open rectangle in \(\mathbb{R}^n\) is a product \((a^1, b^1) \times \cdots \times (a^n, b^n)\) of open intervals, and its volume is the product \(\prod_i (b^i - a^i)\) of its side lengths. A subset is negligible when it can be hidden inside a countable family of such rectangles whose volumes add up to as little as we please.

Definition: Set of Measure Zero in \(\mathbb{R}^n\)

A subset \(A \subseteq \mathbb{R}^n\) has measure zero if for every \(\delta > 0\) there is a countable collection of open rectangles \(\{R_i\}\) covering \(A\) whose total volume satisfies \(\sum_i \operatorname{Vol}(R_i) < \delta\).

Two features of this definition deserve emphasis, because the entire chapter turns on them. The first is that it makes no use of measure theory: there is no \(\sigma\)-algebra, no measure to be constructed, only the bare combinatorics of covering a set by boxes. The notion agrees on the nose with the Lebesgue-null sets one meets in integration theory — the two pick out exactly the same subsets of \(\mathbb{R}^n\) — but it is logically prior to them, asking only that arbitrarily economical covers exist. This is what will let us define measure zero on a manifold long before we have any way to integrate there.

The second feature is the freedom in the shape of the covering sets. Whether we cover by open rectangles, open balls, or open cubes makes no difference to which sets come out negligible: each family can be exchanged for another at the cost of a bounded factor in total volume. An open rectangle is covered by finitely many open cubes of total volume at most \(2^n\) times its own; a ball of radius \(r\) sits inside a cube of side \(2r\) and contains a cube of side \(2r/\sqrt{n}\), so balls and cubes trap one another up to a dimensional constant. Because the definition asks only that the total volume be made smaller than an arbitrary \(\delta\), multiplying every cover by a fixed constant changes nothing. We will use rectangles, balls, and cubes interchangeably from here on, choosing whichever is most convenient.

A first consequence of the definition is worth recording, as we will lean on it repeatedly. Any subset of a set of measure zero again has measure zero, since a cover of the larger set already covers the smaller one. A countable union of sets of measure zero also has measure zero: given \(\delta\), cover the \(k\)th set with rectangles of total volume less than \(\delta / 2^k\), and the combined cover has total volume less than \(\delta\). A single point has measure zero, so every countable subset of \(\mathbb{R}^n\) does as well.

A Slicing Criterion

The covering definition is simple to state but awkward to verify directly for sets that are not already presented as small. The following lemma is the workhorse that converts a statement about an \(n\)-dimensional set into statements about its lower-dimensional slices, which can be handled by induction. A reader who knows Lebesgue integration will recognize it as a special case of Fubini's theorem; the point of the elementary proof below is that it needs none of that machinery, keeping the whole development independent of measure theory.

Lemma (Slicing Criterion for Measure Zero)

Let \(A \subseteq \mathbb{R}^n\) be a compact set, and for \(c \in \mathbb{R}\) write the slice \(A_c = \{x \in \mathbb{R}^{n-1} : (c, x) \in A\}\). If \(A_c\) has \((n-1)\)-dimensional measure zero for every \(c \in \mathbb{R}\), then \(A\) has \(n\)-dimensional measure zero.

Proof.

Since \(A\) is compact it is bounded, so we may fix a closed interval \([a, b] \subseteq \mathbb{R}\) with \(A \subseteq [a, b] \times \mathbb{R}^{n-1}\). Let \(\delta > 0\) be given.

Fix \(c \in [a, b]\). By hypothesis the slice \(A_c\) has \((n-1)\)-dimensional measure zero, so it is covered by finitely many \((n-1)\)-dimensional open cubes \(C_1, \dots, C_k\) of total \((n-1)\)-volume less than \(\delta\); set \(U_c = C_1 \cup \cdots \cup C_k \subseteq \mathbb{R}^{n-1}\), an open set containing \(A_c\). We claim there is an open interval \(J_c\) about \(c\) such that the part of \(A\) lying over \(J_c\) is trapped in \(J_c \times U_c\): \[ A \cap (J_c \times \mathbb{R}^{n-1}) \subseteq J_c \times U_c. \] If no such interval existed, then for every \(m\) the interval \((c - \tfrac1m, c + \tfrac1m)\) would contain a point \(c_m\) with some \((c_m, x_m) \in A\) and \(x_m \notin U_c\). The points \((c_m, x_m)\) lie in the compact set \(A\), so a subsequence converges to a limit \((c, x) \in A\) with \(x = \lim x_m\). Then \(x \in A_c \subseteq U_c\); but each \(x_m\) lies in the closed complement \(\mathbb{R}^{n-1} \setminus U_c\), which therefore contains the limit \(x\) — a contradiction. This proves the claim.

The intervals \(\{J_c : c \in [a, b]\}\) form an open cover of the compact set \([a, b]\), so finitely many of them, say \(J_{c_1}, \dots, J_{c_m}\), already cover \([a, b]\). By shrinking each interval where it overlaps a neighbor, we may arrange that no point of \([a, b]\) lies in more than two of them; their lengths then add up to no more than twice the length of \([a, b]\), that is, at most \(2|b - a|\). Then \(A\) is contained in the union of the boxes \(J_{c_1} \times U_{c_1}, \dots, J_{c_m} \times U_{c_m}\), each itself a finite union of open rectangles. The total \(n\)-volume of this cover is \[ \sum_{j} \operatorname{length}(J_{c_j}) \cdot \operatorname{Vol}_{n-1}(U_{c_j}) < \Big(\sum_j \operatorname{length}(J_{c_j})\Big) \cdot \delta \le 2|b - a|\,\delta . \] Since \(\delta\) was arbitrary, this can be made as small as we wish, and so \(A\) has \(n\)-dimensional measure zero. \(\blacksquare\)

Graphs, Subspaces, and the Image of a Set

The slicing criterion immediately delivers the most important examples. A graph is the cleanest case: it is one dimension thin by construction, and slicing it along the graphed variable reduces its dimension by one at a time.

Proposition: Graphs of Continuous Functions Have Measure Zero

Let \(A\) be an open or closed subset of \(\mathbb{R}^{n-1}\) or of the half-space \(\mathbb{H}^{n-1}\), and let \(f : A \to \mathbb{R}\) be continuous. Then the graph \(\{(x, f(x)) : x \in A\} \subseteq \mathbb{R}^n\) has measure zero.

Proof.

Assume first that \(A\) is compact, and induct on \(n\). When \(n = 1\) the domain \(A\) is a compact subset of \(\mathbb{R}^0 = \{0\}\), so the graph is at most a single point and has measure zero. For the inductive step, intersect the graph with a hyperplane \(\{x^1 = c\}\): the slice is the graph of \(f\) restricted to \(\{x \in A : x^1 = c\}\), a continuous function of the remaining \(n - 2\) variables, and so has \((n-1)\)-dimensional measure zero by the inductive hypothesis. The slicing criterion then gives the graph \(n\)-dimensional measure zero.

If \(A\) is not compact, it is a countable union of compact subsets, so its graph is a countable union of sets of measure zero and hence has measure zero. \(\blacksquare\)

The flat case follows at once: a proper affine subspace is, after a choice of coordinates, the graph of an affine function.

Corollary: Proper Affine Subspaces Have Measure Zero

Every proper affine subspace of \(\mathbb{R}^n\) has measure zero.

Proof.

Let \(S \subsetneq \mathbb{R}^n\) be a proper affine subspace. If \(\dim S = n - 1\), then some coordinate axis is not parallel to \(S\); taking that coordinate as the dependent one exhibits \(S\) as the graph of an affine function of the other \(n - 1\) coordinates, so the previous proposition applies. If \(\dim S < n - 1\), then \(S\) is contained in some affine subspace of dimension \(n - 1\), and a subset of a set of measure zero has measure zero. \(\blacksquare\)

The Invariance That Makes the Theory Possible

We now reach the proposition on which everything else rests. Our goal is to define measure zero on a manifold, but a manifold carries no volumes: there is nothing to integrate, no preferred way to measure a box. The natural attempt is to declare a subset of a manifold negligible when its image in every chart is negligible in \(\mathbb{R}^n\). For this to be a sound definition it must not depend on which chart we choose, and passing between charts means composing with a smooth transition map. So the definition will make sense precisely if smooth maps cannot enlarge a set of measure zero into something larger. The following proposition supplies exactly this, and it is the quiet engine of the whole chapter.

Proposition: Smooth Maps Preserve Sets of Measure Zero

Suppose \(A \subseteq \mathbb{R}^n\) has measure zero and \(F : A \to \mathbb{R}^n\) is a smooth map. Then \(F(A)\) has measure zero.

Proof.

Smoothness of \(F\) on the (possibly non-open) set \(A\) means, by definition, that each point of \(A\) has a neighborhood on which \(F\) extends to a smooth map; shrinking, we may take that neighborhood to be an open ball \(U\) on whose closure \(\overline{U}\) the extension is still smooth. Countably many such balls cover \(A\), so \(F(A)\) is a countable union of sets of the form \(F(A \cap \overline{U})\). Since a countable union of measure-zero sets has measure zero, it suffices to show each \(F(A \cap \overline{U})\) has measure zero.

Fix one such ball. The closure \(\overline{U}\) is compact, so the derivative of the extended map is bounded there, say \(\lvert DF(x) \rvert \le C\) for all \(x \in \overline{U}\). The mean value inequality then makes \(F\) Lipschitz on this convex set: \[ \lvert F(x) - F(x') \rvert \le C \, \lvert x - x' \rvert \qquad \text{for all } x, x' \in \overline{U}, \] which is precisely the statement that \(F\) is Lipschitz continuous on \(\overline{U}\) with constant \(C\).

Now let \(\delta > 0\) be given. Because \(A \cap \overline{U}\) has measure zero, it is covered by countably many open balls \(\{B_j\}\), each of radius \(r_j\), with \(\sum_j \operatorname{Vol}(B_j) < \delta\); discarding any \(B_j\) disjoint from \(\overline{U}\), we may assume each meets \(\overline{U}\). Two points of \(\overline{U} \cap B_j\) differ by less than the diameter \(2 r_j\) of \(B_j\), so by the Lipschitz bound their images differ by less than \(2 C r_j\); fixing any one image point as a center, \(F(\overline{U} \cap B_j)\) is therefore contained in a ball of radius \(2 C r_j\). Hence \(F(A \cap \overline{U})\) is covered by balls of total volume \[ \sum_j \operatorname{Vol}\big(\text{ball of radius } 2 C r_j\big) = (2C)^n \sum_j \operatorname{Vol}(B_j) < (2C)^n \delta . \] As \(\delta\) was arbitrary, \(F(A \cap \overline{U})\) has measure zero, and the proof is complete. \(\blacksquare\)

It is worth pausing on why the hypothesis that the domain and codomain have the same dimension \(n\) is indispensable here, since this constraint is the hinge on which the later theory swings. The argument balances two scalings against each other. Covering \(A\) by balls of total volume below \(\delta\) controls \(n\)-dimensional volume; the Lipschitz estimate inflates each radius by a bounded factor, and so each \(n\)-dimensional volume by a bounded factor raised to the \(n\)th power. The total volume of the image cover stays controlled only because the source volume and the target volume are computed in the same dimension, so that a radius scaling by a constant becomes a volume scaling by that same constant to a fixed power. Were the codomain higher-dimensional, a measure-zero set could spread out to fill a set of positive measure — a continuous curve can be made to fill a square, after all — and the bookkeeping would collapse. The fact that smoothness, together with equal dimension, rules this out is exactly what we will exploit twice over: positively, to make measure zero a property of subsets of manifolds, and later, in the guise of lowering dimension to force images to be negligible.

Measure Zero on Manifolds

With invariance in hand, the passage to manifolds is now forced upon us. A manifold has no volumes, but it has charts, and the previous proposition guarantees that the question "is this image negligible?" receives the same answer no matter which chart we ask it in. We make the natural definition and then verify that it is well posed.

Definition: Measure Zero in a Manifold

Let \(M\) be a smooth \(n\)-manifold with or without boundary. A subset \(A \subseteq M\) has measure zero in \(M\) if for every smooth chart \((U, \varphi)\) of \(M\), the image \(\varphi(A \cap U) \subseteq \mathbb{R}^n\) has \(n\)-dimensional measure zero.

Stated this way, the definition seems to demand that we inspect every chart — an impossible task in practice. The next lemma removes that burden: it is enough to check the condition on a single family of charts that happens to cover \(A\). The proof is the first place where the invariance of the previous section pays off, and it pays off in exactly the form one would hope — the transition between two charts is a smooth map between open subsets of \(\mathbb{R}^n\), so it cannot turn a negligible set into a substantial one.

Lemma: Checking Measure Zero on One Atlas Suffices

Let \(M\) be a smooth \(n\)-manifold with or without boundary and let \(A \subseteq M\). Suppose there is a collection \(\{(U_\alpha, \varphi_\alpha)\}\) of smooth charts whose domains cover \(A\), such that \(\varphi_\alpha(A \cap U_\alpha)\) has measure zero in \(\mathbb{R}^n\) for every \(\alpha\). Then \(A\) has measure zero in \(M\).

Proof.

Let \((V, \psi)\) be an arbitrary smooth chart of \(M\); we must show \(\psi(A \cap V) \subseteq \mathbb{R}^n\) has measure zero. Because \(M\) is second countable, some countable subcollection of the \(U_\alpha\) already covers \(A \cap V\), so it is enough to treat one chart \(U_\alpha\) at a time and then take the countable union.

On the overlap, the points of \(\psi(A \cap V \cap U_\alpha)\) are obtained from the points of \(\varphi_\alpha(A \cap V \cap U_\alpha)\) by applying the transition map. Precisely, \[ \psi(A \cap V \cap U_\alpha) = \big(\psi \circ \varphi_\alpha^{-1}\big)\big(\varphi_\alpha(A \cap V \cap U_\alpha)\big). \] The set \(\varphi_\alpha(A \cap V \cap U_\alpha)\) is a subset of \(\varphi_\alpha(A \cap U_\alpha)\), which has measure zero by hypothesis, so it too has measure zero. The transition map \(\psi \circ \varphi_\alpha^{-1}\) is a smooth map between open subsets of \(\mathbb{R}^n\) — a smooth map from \(\mathbb{R}^n\) to itself in the sense required — so by the invariance of measure zero under smooth maps its image \(\psi(A \cap V \cap U_\alpha)\) has measure zero. Taking the union over the countably many chosen \(U_\alpha\), the set \(\psi(A \cap V)\) is a countable union of measure-zero sets and so has measure zero. \(\blacksquare\)

This is precisely where the equal-dimension hypothesis of the invariance proposition earns its keep: transition maps go from \(\mathbb{R}^n\) to \(\mathbb{R}^n\), never changing dimension, so the one situation in which smoothness preserves measure zero is exactly the situation that arises. The definition of measure zero on a manifold is, in this sense, custom-built around the invariance that makes it consistent.

Two basic closure properties follow with little effort. A countable union of measure-zero sets in \(M\) again has measure zero, since in any chart the images form a countable union of measure-zero sets in \(\mathbb{R}^n\). And measure zero in \(M\) captures the right qualitative idea of "negligibly small": such a set can never contain any open set, so its complement is everywhere dense.

Proposition: The Complement of a Measure-Zero Set Is Dense

Let \(M\) be a smooth manifold with or without boundary and let \(A \subseteq M\) have measure zero in \(M\). Then \(M \setminus A\) is dense in \(M\).

Proof.

If \(M \setminus A\) were not dense, then \(A\) would contain a nonempty open subset of \(M\). Choosing a smooth chart \((V, \psi)\) meeting that open set, the image \(\psi(A \cap V)\) would contain a nonempty open subset of \(\mathbb{R}^n\), and hence a rectangle \(R\) of some positive volume \(v\). No countable cover of \(R\) can have total volume less than \(v\): the volumes of any family of rectangles covering \(R\) must sum to at least \(\operatorname{Vol}(R) = v\). So \(R\), and therefore \(\psi(A \cap V)\), cannot have measure zero, contradicting the assumption that \(A\) has measure zero in \(M\). \(\blacksquare\)

Finally, the invariance proposition lifts verbatim from Euclidean space to manifolds, provided we keep the dimensions equal. This is the form in which we will invoke it inside the proof of Sard's theorem, where charts reduce a statement about manifolds to a statement about open subsets of \(\mathbb{R}^n\) and back again.

Theorem: Equidimensional Smooth Maps Preserve Measure Zero

Let \(M\) and \(N\) be smooth \(n\)-manifolds with or without boundary of the same dimension, let \(F : M \to N\) be smooth, and let \(A \subseteq M\) have measure zero in \(M\). Then \(F(A)\) has measure zero in \(N\).

Proof.

By the lemma above, it suffices to show that \(F(A)\) has measure-zero image in each chart of a covering family of \(N\). Cover \(M\) by countably many smooth charts \(\{(U_i, \varphi_i)\}\), and let \((V, \psi)\) be a smooth chart of \(N\). Then \(F(A) \cap V\) is the countable union, over \(i\), of the sets \(F(A \cap U_i) \cap V\), and in the chart \(\psi\) each of these is the image of the measure-zero set \(\varphi_i(A \cap U_i \cap F^{-1}(V))\) under the smooth map \(\psi \circ F \circ \varphi_i^{-1}\) between open subsets of \(\mathbb{R}^n\). Since \(M\) and \(N\) have the same dimension, this map goes from \(\mathbb{R}^n\) to \(\mathbb{R}^n\), so the invariance proposition applies and the image has measure zero. A countable union of such images has measure zero, and the lemma concludes that \(F(A)\) has measure zero in \(N\). \(\blacksquare\)

The qualifier "of the same dimension" is not a technicality to be apologized for; it is the entire content. Sard's theorem will exploit what happens when the dimensions are allowed to differ, but to state and prove it we first need the vocabulary of critical points. We therefore turn to the theorem itself.

Sard's Theorem

We can now state and prove the theorem that organizes everything to follow. Recall that a point \(p\) of the domain is a critical point of a smooth map \(F : M \to N\) when the differential \(dF_p\) fails to be surjective, and a point of the codomain is a critical value when it is the image of some critical point. The complementary notions are regular points and regular values; in particular, a value with empty preimage is regular by default. Sard's theorem asserts that the critical values, however plentiful the critical points may be, occupy only a negligible part of the codomain.

Theorem (Sard's Theorem)

Let \(M\) and \(N\) be smooth manifolds with or without boundary and let \(F : M \to N\) be a smooth map. Then the set of critical values of \(F\) has measure zero in \(N\).

The proof is an induction on the dimension of the domain. Before the formal argument, it helps to see the shape of the whole. Charts reduce the statement to a map \(F\) from an open subset of \(\mathbb{R}^m\) to \(\mathbb{R}^n\); we then sort the critical points by how degenerate they are, measuring degeneracy by how many derivatives of \(F\) vanish. The points where some first derivative survives are handled by changing coordinates so that \(F\) becomes a graph in one variable and slicing, which drops the dimension of the domain and invites the inductive hypothesis. The points where derivatives vanish to higher and higher order are handled by Taylor's theorem: the more derivatives vanish, the more sharply \(F\) is pinned near its value, and once enough vanish the image is squeezed into something of negligible volume by a direct count. The induction and the three cases interlock to cover every critical point.

Proof.

Let \(m = \dim M\) and \(n = \dim N\); we induct on \(m\). For \(m = 0\) the claim is immediate: if \(n = 0\) there are no critical points at all, while if \(n > 0\) the entire image of \(F\) is countable, hence of measure zero.

Suppose now that \(m \ge 1\) and that the theorem holds for every smooth map whose domain has dimension less than \(m\). Covering \(M\) and \(N\) by countably many smooth charts reduces the statement, by the closure properties already established, to the case of a smooth map \(F\) from an open subset \(U \subseteq \mathbb{R}^m\) (or \(\mathbb{H}^m\)) into \(\mathbb{R}^n\). Write the domain coordinates as \((x^1, \dots, x^m)\) and the codomain coordinates as \((y^1, \dots, y^n)\). Let \(C \subseteq U\) be the set of critical points of \(F\), and define a decreasing sequence of subsets \[ C \supseteq C_1 \supseteq C_2 \supseteq \cdots, \qquad C_k = \{x \in C : \text{all partial derivatives of } F \text{ of order} \le k \text{ vanish at } x\}. \] More precisely, \(C_k\) is the set of \(x \in C\) at which every partial derivative of every component \(F^i\), of orders \(1\) through \(k\), is zero. By continuity, \(C\) and all the \(C_k\) are closed in \(U\). We prove that \(F(C)\) has measure zero in three steps.

Step 1: \(F(C \setminus C_1)\) has measure zero.

The set \(C_1\) is closed, so we may discard it: replacing \(U\) by \(U \setminus C_1\), we may assume \(C_1 = \varnothing\), which means that at every point of \(C\) some first partial derivative of \(F\) is nonzero. Fix a point \(a \in C\), and by relabeling assume \(\partial F^1 / \partial x^1 (a) \ne 0\). The map sending \(x\) to \((F^1(x), x^2, \dots, x^m)\) then has nonsingular Jacobian at \(a\), so it defines new smooth coordinates \((u, v^2, \dots, v^m)\) on a neighborhood \(V_a\) of \(a\), with \(u = F^1\) and \(v^j = x^j\). Shrinking \(V_a\) so that its closure is a compact subset of \(U\) on which the change of coordinates extends smoothly, we find that in these coordinates \(F\) takes the form \[ F(u, v^2, \dots, v^m) = \big(u,\, F^2(u, v), \dots, F^n(u, v)\big), \] because the first component is now the coordinate \(u\) itself. Its Jacobian is correspondingly block lower-triangular, \[ DF(u, v) = \begin{pmatrix} 1 & 0 \\ * & \dfrac{\partial F^i}{\partial v^j} \end{pmatrix}, \] so the rank of \(DF\) equals \(1\) plus the rank of the lower-right \((n-1) \times (m-1)\) block. A point is critical for \(F\) exactly when \(DF\) has rank less than \(n\), which happens precisely when that block has rank less than \(n - 1\). Thus \(C \cap V_a\) consists of exactly the points where the matrix \((\partial F^i / \partial v^j)\) has rank below \(n - 1\).

Because the first coordinate of \(F\) is preserved, \(F\) maps each hyperplane \(\{u = c\}\) into the hyperplane \(\{y^1 = c\}\). Write \(F\) restricted to the slice \(\{u = c\}\) as \(F_c(v) = (F^2(c, v), \dots, F^n(c, v))\), a smooth map of the \(m - 1\) remaining variables into \(\mathbb{R}^{n-1}\). The full map sends \((c, v)\) to \((c, F_c(v))\), so a critical point of \(F\) lying in the slice \(\{u = c\}\) is exactly a point \((c, v)\) at which \(F_c\) is critical, and its image is \((c, w)\) with \(w\) a critical value of \(F_c\). Since the domain of \(F_c\) has dimension \(m - 1 < m\), the inductive hypothesis applies: the critical values of each \(F_c\) form a set of \((n-1)\)-dimensional measure zero. Consider now the compact set \(F(C \cap \overline{V_a})\). Its slice in the hyperplane \(\{y^1 = c\}\) consists of points \((c, w)\) with \(w\) a critical value of \(F_c\), so that slice has \((n-1)\)-dimensional measure zero. The slicing criterion therefore gives \(F(C \cap \overline{V_a})\) measure zero, and so does its subset \(F(C \cap V_a)\). Countably many such neighborhoods \(V_a\) cover \(C\), and a countable union of measure-zero sets has measure zero, completing Step 1.

Step 2: for each \(k \ge 1\), \(F(C_k \setminus C_{k+1})\) has measure zero.

As before, the closed set \(C_{k+1}\) may be discarded, so we assume that at every point of \(C_k\) some partial derivative of \(F\) of order exactly \(k + 1\) is nonzero. Fix \(a \in C_k\), and let \(y : U \to \mathbb{R}\) be a \(k\)th-order partial derivative of some component \(F^i\) such that one of its first partial derivatives — equivalently, some \((k+1)\)st-order partial derivative of \(F^i\) — is nonzero at \(a\); such a \(y\) exists because \(a \notin C_{k+1}\). Then \(dy_a \ne 0\), so \(a\) is a regular point of the smooth function \(y\), and there is a neighborhood \(V_a\) of \(a\) consisting entirely of regular points of \(y\). The zero set \(Y\) of \(y\) within \(V_a\) is, by the regular level set theorem, a smooth hypersurface — an embedded submanifold of \(V_a\) of dimension \(m - 1\).

On \(C_k\), all partial derivatives of \(F\) of order up to \(k\) vanish; in particular the \(k\)th derivative \(y\) vanishes on \(C_k\), so \(C_k \cap V_a \subseteq Y\). Now consider the restriction \(F|_Y : Y \to \mathbb{R}^n\), which factors as \(F\) composed with the inclusion \(\iota : Y \hookrightarrow V_a\). By the chain rule its differential at a point \(p \in Y\) is \(d(F|_Y)_p = dF_p \circ d\iota_p\); since \(d\iota_p\) identifies the tangent space \(T_pY\) with a subspace of \(T_pV_a\), this is just the restriction \((dF_p)|_{T_pY}\). At any point \(p \in C_k \cap V_a\) the differential \(dF_p\) is not surjective, and restricting its domain to the subspace \(T_pY\) can only shrink its image, so \(d(F|_Y)_p\) is not surjective either. Hence \(p\) is a critical point of \(F|_Y\), and \(F(C_k \cap V_a)\) is contained in the set of critical values of \(F|_Y\). Since \(Y\) has dimension \(m - 1 < m\), the inductive hypothesis says these critical values have measure zero. Covering \(C_k\) by countably many such \(V_a\) completes Step 2.

Step 3: for \(k > m/n - 1\), \(F(C_k)\) has measure zero.

Steps 1 and 2 between them account for every point that lies in some \(C_k \setminus C_{k+1}\), but there may remain points of \(C\) belonging to every \(C_k\) — points at which all partial derivatives of \(F\) vanish to all orders considered. The final step disposes of these by a direct volume count, using that the high vanishing order pins the image tightly.

Cover \(U\) by countably many closed cubes contained in \(U\); it suffices to show that \(F(C_k \cap E)\) has measure zero for one such cube \(E\), of side length \(R\). Let \(A\) bound the absolute values of all \((k+1)\)st-order derivatives of \(F\) on the compact cube \(E\), and let \(K\) be a large integer to be chosen. Subdivide \(E\) into \(K^m\) subcubes of side length \(R/K\). If a subcube \(E_i\) contains a point \(a_i \in C_k\), then all derivatives of \(F\) up to order \(k\) vanish at \(a_i\), so for each component \(F^j\) the \(k\)th Taylor polynomial at \(a_i\) is the constant \(F^j(a_i)\). Applying the Taylor error bound to each component and combining them controls \(F\) on \(E_i\) by its \((k+1)\)st-order behavior alone: for all \(x \in E_i\), \[ \lvert F(x) - F(a_i) \rvert \le A' \lvert x - a_i \rvert^{k+1}, \] where \(A'\) depends only on the bound \(A\) and on \(k\), \(m\), and \(n\). Since every point of \(E_i\) is within distance \(\sqrt{m}\,(R/K)\) of \(a_i\), the image \(F(E_i)\) lies in a ball of radius \(A''(R/K)^{k+1}\), where \(A'' = A'(\sqrt{m})^{k+1}\) is again independent of \(K\).

Therefore \(F(C_k \cap E)\) is covered by at most \(K^m\) balls each of radius \(A''(R/K)^{k+1}\), whose total \(n\)-dimensional volume is bounded by a constant multiple of \[ K^m \cdot \big((R/K)^{k+1}\big)^n = (\text{const}) \cdot K^{\,m - n(k+1)}. \] The exponent \(m - n(k+1)\) is negative precisely when \(k > m/n - 1\), which is the hypothesis of this step; for such \(k\), letting \(K \to \infty\) drives the total volume to zero. Hence \(F(C_k \cap E)\) has measure zero, and summing over the countably many cubes, so does \(F(C_k)\).

Finally, choosing any integer \(k\) with \(k > m/n - 1\), the set \(C\) is the union of \(C \setminus C_1\), the sets \(C_j \setminus C_{j+1}\) for \(1 \le j < k\), and \(C_k\) itself. Steps 1, 2, and 3 show each piece has measure-zero image, so \(F(C)\) has measure zero. Undoing the chart reductions, the critical values of the original map \(F : M \to N\) form a countable union of measure-zero sets, hence a set of measure zero in \(N\). \(\blacksquare\)

Critical Images and Negligible Submanifolds

Sard's theorem is most often used not for the critical values of an interesting map, but for a deceptively simple consequence: when the domain has strictly smaller dimension than the codomain, the map can have no regular points at all, and so its entire image is critical. The theorem then says the whole image is negligible. This is the use of lowering dimension promised earlier — the mirror image of the equidimensional invariance that made measure zero well defined.

Corollary: Images of Lower-Dimensional Domains Have Measure Zero

Let \(M\) and \(N\) be smooth manifolds with or without boundary, and let \(F : M \to N\) be a smooth map. If \(\dim M < \dim N\), then \(F(M)\) has measure zero in \(N\).

Proof.

At every point \(p \in M\) the differential \(dF_p : T_pM \to T_{F(p)}N\) is a linear map from a space of dimension \(\dim M\) to one of strictly larger dimension \(\dim N\), so it cannot be surjective. Every point of \(M\) is thus a critical point, and \(F(M)\) is precisely the set of critical values. By Sard's theorem, it has measure zero in \(N\). \(\blacksquare\)

The smoothness hypothesis is not a decoration that could be relaxed to continuity; it is the whole point. A celebrated construction produces a space-filling curve: a continuous map from the unit interval \([0, 1]\) onto the entire unit square \([0, 1] \times [0, 1]\). Here the domain has dimension one and the codomain dimension two, yet the image is all of a two-dimensional region — a set that is emphatically not of measure zero, since it contains rectangles of positive area. Mere continuity allows a one-dimensional object to smear out and fill a two-dimensional one. What forbids this, and makes the corollary true, is exactly the rigidity of smooth maps that we isolated in the invariance proposition: a smooth map satisfies a local Lipschitz bound, and a Lipschitz map cannot increase dimension in this way. Differentiability, not just continuity, is what keeps low-dimensional sets small.

The corollary specializes immediately to the situation that recurs throughout differential geometry: a submanifold of less than full dimension is negligible in its ambient manifold.

Corollary: Lower-Dimensional Submanifolds Have Measure Zero

Let \(M\) be a smooth manifold with or without boundary, and let \(S \subseteq M\) be an immersed submanifold with or without boundary. If \(\dim S < \dim M\), then \(S\) has measure zero in \(M\).

Proof.

The inclusion \(\iota : S \hookrightarrow M\) is a smooth map whose domain \(S\) has dimension strictly less than that of \(M\), and its image is \(S\) itself. The previous corollary applies. \(\blacksquare\)

This last statement makes precise an intuition we have appealed to repeatedly: a curve in a surface, a surface in a three-dimensional space, or more generally any submanifold cut out by even a single independent constraint, takes up no volume in the space that contains it. A point chosen at random — in any sense that respects the ambient dimension — will miss it with certainty. The same principle underwrites the working assumption of high-dimensional data analysis, that data governed by a few degrees of freedom occupies a low-dimensional set within its ambient space: if that set is a smooth submanifold of positive codimension, it is genuinely negligible, and the space-filling curve shows that the smoothness in that assumption is doing essential work.

With Sard's theorem proved and these consequences in hand, the tools are in place for the chapter's larger purpose. The negligibility of low-dimensional images is the engine behind a dimension-count that will let us take a manifold presented abstractly and place it, by a generic choice of projection, inside a Euclidean space of controlled dimension — turning the habit of picturing a manifold as a subset of \(\mathbb{R}^n\) into a theorem rather than a convenience. That is the embedding theory the manifold series develops next.