The Inverse Function Theorem
The two theorems of this page are the central analytic results about smooth maps between open subsets of Euclidean space. They make precise a single principle: under a nonsingularity hypothesis on the derivative, the local behavior of a smooth map is faithfully modeled by the behavior of its total derivative. The total derivative is the best linear approximation to \(F\) at a point; the inverse function theorem promotes a statement about that linear approximation — invertibility of \(DF(a)\) — into a statement about \(F\) itself, namely that \(F\) is a diffeomorphism near \(a\). The implicit function theorem, its close companion, then tells us when an equation \(\Phi(x, y) = c\) can be solved locally for \(y\) as a smooth function of \(x\). Together they are the engine behind the local theory of smooth manifolds, regular level sets, and the constraint geometry underlying optimization.
Let \(U \subseteq \mathbb{R}^n\) and \(V \subseteq \mathbb{R}^m\) be open subsets. A map \(F : U \to V\) is called smooth (or \(C^\infty\), or infinitely differentiable) if each of its component functions \(F^1, \ldots, F^m : U \to \mathbb{R}\) has continuous partial derivatives of all orders.
Throughout, we use smooth as synonymous with \(C^\infty\). Conventions in the literature vary — some authors use smooth to mean merely continuously differentiable, and some use differentiable to mean what we call smooth — but the choice of \(C^\infty\) is the most convenient here, since it is preserved under all the operations we use (compositions, restrictions, and matrix inversion), so we never need to track how many derivatives survive an argument.
Let \(U, V \subseteq \mathbb{R}^n\) be open subsets of the same Euclidean space. A map \(F : U \to V\) is called a diffeomorphism if \(F\) is a smooth bijection and its inverse \(F^{-1} : V \to U\) is also smooth.
The restriction to a common ambient \(\mathbb{R}^n\) is not an additional constraint imposed by the definition but a necessary consequence of any candidate diffeomorphism's existence. If \(F : U \to V\) were a diffeomorphism with \(U \subseteq \mathbb{R}^n\) and \(V \subseteq \mathbb{R}^m\), then at every point of \(U\) the chain rule applied to \(F^{-1} \circ F = \mathrm{id}_U\) would force the Jacobian \(DF\) to be a left-invertible \(m \times n\) matrix, and a similar argument with \(F \circ F^{-1} = \mathrm{id}_V\) would force it to be right-invertible; both conditions can hold simultaneously only when the matrix is square, hence \(m = n\). The two-sided smooth condition therefore forces the two ambient dimensions to agree, so we lose nothing by building this agreement into the definition itself. (This is the first hint of the inverse function theorem's central mechanism: the invertibility of a map is constrained by the invertibility of its derivative.)
Every diffeomorphism is, in particular, a homeomorphism: a smooth map is continuous, and the same is true of its inverse. The converse fails, and the standard cautionary example is worth keeping in mind. The map \(\psi(x, y) = (x^3, y^3)\) is a homeomorphism of \(\mathbb{R}^2\) onto itself, and \(\psi\) itself is smooth (its components are polynomials); but the inverse \(\psi^{-1}(u, v) = (u^{1/3}, v^{1/3})\) fails to be smooth at the origin, since the partial derivatives blow up there. So \(\psi\) is a smooth homeomorphism that is not a diffeomorphism — and notably, its Jacobian determinant vanishes at the origin, exactly where the inverse function theorem's hypothesis fails.
We write \(DF(a)\) for the total derivative of \(F\) at \(a\), identified with its Jacobian matrix.
The analytic engine: a contraction with a fixed point
The proof of the inverse function theorem rests on one elementary fact about complete metric spaces. Recall that a map \(G : X \to X\) is a contraction if there is a constant \(\lambda \in (0, 1)\) with \(d(G(x), G(y)) \leq \lambda\, d(x, y)\) for all \(x, y \in X\), and that a fixed point of \(G\) is a point \(x\) with \(G(x) = x\). The Banach fixed-point theorem guarantees that every contraction of a nonempty complete metric space has a unique fixed point. This is the only fact about completeness the argument needs; we apply it to a closed ball in \(\mathbb{R}^n\), which is complete because it is a closed subset of a complete space.
Suppose \(U\) and \(V\) are open subsets of \(\mathbb{R}^n\), and \(F : U \to V\) is a smooth map. If \(DF(a)\) is invertible at some point \(a \in U\), then there exist connected neighborhoods \(U_0 \subseteq U\) of \(a\) and \(V_0 \subseteq V\) of \(F(a)\) such that \(F|_{U_0} : U_0 \to V_0\) is a diffeomorphism.
Reduction to a normalized map. We first make two simplifications that cost nothing. Define \(F_1(x) = F(x + a) - F(a)\); this is smooth on a neighborhood of \(0\), satisfies \(F_1(0) = 0\) and \(DF_1(0) = DF(a)\), and \(F\) is a diffeomorphism on a connected neighborhood of \(a\) if and only if \(F_1\) is one on a connected neighborhood of \(0\). Next set \(F_2 = DF_1(0)^{-1} \circ F_1\); this is smooth on the same neighborhood, satisfies \(F_2(0) = 0\) and \(DF_2(0) = I_n\), and is a diffeomorphism near \(0\) precisely when \(F_1\) is. Replacing \(F\) by \(F_2\), we may therefore assume that \(F\) is defined on a neighborhood \(U\) of \(0\) with \(F(0) = 0\) and \(DF(0) = I_n\). Because \(\det DF(x)\) is a continuous function of \(x\), by shrinking \(U\) we may further assume that \(DF(x)\) is invertible for every \(x \in U\).
A Lipschitz estimate for the displacement. Let \(H(x) = x - F(x)\) for \(x \in U\). Then \(DH(0) = I_n - I_n = 0\). Since the entries of \(DH(x)\) depend continuously on \(x\), there is a number \(\delta > 0\) with \(\overline{B}_\delta(0) \subseteq U\) and \(\|DH(x)\| \leq \tfrac{1}{2}\) for all \(x \in \overline{B}_\delta(0)\), where \(\|\cdot\|\) denotes the operator norm. Because \(\overline{B}_\delta(0)\) is convex, the Mean Value Inequality shows that \(H\) is Lipschitz on \(\overline{B}_\delta(0)\) with constant \(\tfrac{1}{2}\): \[ |H(x') - H(x)| \;\leq\; \tfrac{1}{2}\,|x' - x|, \qquad x, x' \in \overline{B}_\delta(0). \tag{1} \] Taking \(x = 0\) and using \(H(0) = 0\) gives \(|H(x)| \leq \tfrac{1}{2}|x|\). Moreover, since \(x' - x = F(x') - F(x) + H(x') - H(x)\), the triangle inequality and \((1)\) yield \[ |x' - x| \;\leq\; |F(x') - F(x)| + \tfrac{1}{2}|x' - x|, \qquad\text{hence}\qquad |x' - x| \;\leq\; 2\,|F(x') - F(x)|. \tag{2} \] In particular \(F\) is injective on \(\overline{B}_\delta(0)\).
Local surjectivity via a fixed point. Let \(y \in B_{\delta/2}(0)\) be arbitrary; we show there is a unique \(x \in \overline{B}_\delta(0)\) with \(F(x) = y\). Define \(G(x) = y + H(x) = y + x - F(x)\), so that \(G(x) = x\) if and only if \(F(x) = y\). For \(|x| \leq \delta\), \[ |G(x)| \;\leq\; |y| + |H(x)| \;<\; \tfrac{\delta}{2} + \tfrac{1}{2}|x| \;\leq\; \delta, \] so \(G\) maps \(\overline{B}_\delta(0)\) into itself; and by \((1)\), \(|G(x) - G(x')| = |H(x) - H(x')| \leq \tfrac{1}{2}|x - x'|\), so \(G\) is a contraction. Since \(\overline{B}_\delta(0)\) is a nonempty complete metric space, the Banach fixed-point theorem gives a unique fixed point \(x \in \overline{B}_\delta(0)\); the strict bound above forces \(|x| < \delta\), so in fact \(x \in B_\delta(0)\). This proves that each \(y \in B_{\delta/2}(0)\) has exactly one preimage in \(B_\delta(0)\).
The restricted map is a homeomorphism. Set \(V_0 = B_{\delta/2}(0)\) and \(U_0 = B_\delta(0) \cap F^{-1}(V_0)\). Then \(U_0\) is open in \(\mathbb{R}^n\), and the argument above shows that \(F : U_0 \to V_0\) is a bijection, so an inverse \(F^{-1} : V_0 \to U_0\) exists. Substituting \(x = F^{-1}(y)\) and \(x' = F^{-1}(y')\) into \((2)\) gives \(|F^{-1}(y') - F^{-1}(y)| \leq 2|y' - y|\), so \(F^{-1}\) is (Lipschitz) continuous. Thus \(F : U_0 \to V_0\) is a homeomorphism; since \(V_0\) is connected and \(F^{-1}\) is continuous, \(U_0\) is connected as well.
The inverse is differentiable. Fix \(y \in V_0\), set \(x = F^{-1}(y)\) and \(L = DF(x)\) (invertible by our normalization). We claim \(F^{-1}\) is differentiable at \(y\) with derivative \(L^{-1}\). For \(y' \in V_0 \setminus \{y\}\), write \(x' = F^{-1}(y') \neq x\). A direct manipulation gives \[ \frac{F^{-1}(y') - F^{-1}(y) - L^{-1}(y' - y)}{|y' - y|} \;=\; \frac{|x' - x|}{|y' - y|}\, L^{-1}\!\left( -\,\frac{F(x') - F(x) - L(x' - x)}{|x' - x|} \right). \] The factor \(|x' - x| / |y' - y|\) is bounded by \(2\) thanks to \((2)\); the linear map \(L^{-1}\) is bounded; and as \(y' \to y\) we have \(x' \to x\) by continuity of \(F^{-1}\), so the rightmost factor tends to \(0\) because \(F\) is differentiable at \(x\) with derivative \(L\). Hence the whole expression tends to \(0\), proving that \(F^{-1}\) is differentiable at \(y\) with \(D(F^{-1})(y) = DF(x)^{-1}\).
The inverse is smooth. The partial derivatives of \(F^{-1}\) are therefore defined at every point of \(V_0\), and the formula \(D(F^{-1})(y) = DF(F^{-1}(y))^{-1}\) exhibits the matrix-valued map \(y \mapsto D(F^{-1})(y)\) as the composition \[ y \;\xrightarrow{\;F^{-1}\;}\; F^{-1}(y) \;\xrightarrow{\;DF\;}\; DF(F^{-1}(y)) \;\xrightarrow{\;\text{inv}\;}\; DF(F^{-1}(y))^{-1}, \] where \(\text{inv}\) denotes matrix inversion. In this composition \(F^{-1}\) is continuous; \(DF\) is smooth, its component functions being the partial derivatives of \(F\); and matrix inversion is smooth, since Cramer's rule expresses the entries of an inverse as rational functions of the entries with nonvanishing denominator \(\det\). The composition of continuous maps is continuous, so the partial derivatives of \(F^{-1}\) are continuous and \(F^{-1}\) is of class \(C^1\). Assuming inductively that \(F^{-1}\) is of class \(C^k\), each map in the composition is then \(C^k\) (with \(DF\) smooth and inversion smooth), so \(D(F^{-1})\) is \(C^k\); this makes the partial derivatives of \(F^{-1}\) of class \(C^k\), hence \(F^{-1}\) itself is \(C^{k+1}\). By induction \(F^{-1}\) is smooth, and \(F|_{U_0} : U_0 \to V_0\) is a diffeomorphism. \(\blacksquare\)
Why a fixed point, and why completeness
The heart of the argument is the reformulation of "solve \(F(x) = y\)" as "find a fixed point of \(G(x) = y + x - F(x)\)." The nonsingularity of \(DF\) is what makes the displacement \(H = \mathrm{id} - F\) have small derivative, and the Mean Value Inequality converts that derivative bound into the contraction constant \(\tfrac{1}{2}\). Completeness of the closed ball is then exactly the hypothesis the Banach fixed-point theorem requires — the same property that guarantees convergence of iterative solvers in numerical analysis. The inverse function theorem is, in this sense, the analytic shadow of the fact that a well-conditioned nonlinear system can be solved by iteration.