The Matrix Exponential

Definition and Properties The Exponential Maps into the Group Rotations and Rodrigues' Formula One-Parameter Subgroups Connections: Robotics, Information Geometry, and Beyond

Definition and Properties

In the previous page, we defined the classical matrix Lie groups — \(GL\), \(SL\), \(O\), \(SO\), \(U\), \(SU\), and \(SE(3)\) — as closed subgroups of the general linear group. Each is carved out of \(GL(n)\) by nonlinear algebraic equations: \(A^\top A = I\), \(\det(A) = 1\), and so on. We now introduce the tool that bridges the nonlinear world of groups and the linear world of matrices: the matrix exponential.

The guiding analogy is the scalar ODE. The equation \(y'(t) = ay(t)\) with initial condition \(y(0) = 1\) has the unique solution \(y(t) = e^{at}\). For a system of linear ODEs, \(\mathbf{Y}'(t) = A\mathbf{Y}(t)\) with \(\mathbf{Y}(0) = I\), the solution should be \(\mathbf{Y}(t) = \exp(tA)\). This requires defining the exponential of a matrix.

Definition: Matrix Exponential

For any \(A \in M_n(\mathbb{C})\), the matrix exponential of \(A\) is \[ \exp(A) = \sum_{k=0}^{\infty} \frac{A^k}{k!} = I + A + \frac{A^2}{2!} + \frac{A^3}{3!} + \cdots \] where \(A^0 = I\) by convention.

Before we can use this definition, we must verify that the series converges.

Theorem: Convergence of the Matrix Exponential

The series \(\displaystyle\sum_{k=0}^{\infty} \frac{A^k}{k!}\) converges absolutely for every \(A \in M_n(\mathbb{C})\) with respect to any submultiplicative matrix norm \(\|\cdot\|\) (i.e., any norm satisfying \(\|AB\| \leq \|A\|\,\|B\|\)).

Proof:

Let \(\|\cdot\|\) be a submultiplicative norm on \(M_n(\mathbb{C})\). By induction, \(\|A^k\| \leq \|A\|^k\) for all \(k \geq 0\). Therefore: \[ \sum_{k=0}^{\infty} \left\|\frac{A^k}{k!}\right\| \leq \sum_{k=0}^{\infty} \frac{\|A\|^k}{k!} = e^{\|A\|} < \infty. \] The right-hand side is the scalar exponential of \(\|A\|\), which converges for all real numbers. By the comparison test, the matrix series converges absolutely. Since \(M_n(\mathbb{C})\) is a finite-dimensional (hence complete) normed vector space, absolute convergence implies convergence. \(\square\)

Fundamental Properties

The matrix exponential shares many properties with the scalar exponential — but the failure of one key property reveals the essential role of non-commutativity.

Theorem: Properties of the Matrix Exponential

Let \(A, B \in M_n(\mathbb{C})\) and \(P \in GL(n, \mathbb{C})\). Then:

(a) \(\exp(0) = I\).

(b) If \(AB = BA\), then \(\exp(A + B) = \exp(A)\exp(B)\).

(c) \(\exp(A)\) is always invertible, with \(\exp(A)^{-1} = \exp(-A)\).

(d) \(\det(\exp(A)) = \exp(\mathrm{tr}(A))\).

(e) \(\exp(PAP^{-1}) = P\exp(A)P^{-1}\).

Proofs:

(a) Immediate: \(\exp(0) = I + 0 + 0 + \cdots = I\).

(b) Assume \(AB = BA\). Then \(A\) and \(B\) generate a commutative subalgebra of \(M_n(\mathbb{C})\), so the binomial theorem applies: \[ (A + B)^k = \sum_{j=0}^{k} \binom{k}{j} A^j B^{k-j}. \] Therefore: \[ \begin{align*} \exp(A + B) &= \sum_{k=0}^{\infty} \frac{(A+B)^k}{k!} = \sum_{k=0}^{\infty} \sum_{j=0}^{k} \frac{A^j}{j!} \cdot \frac{B^{k-j}}{(k-j)!} \\ &= \left(\sum_{j=0}^{\infty} \frac{A^j}{j!}\right)\left(\sum_{m=0}^{\infty} \frac{B^m}{m!}\right) = \exp(A)\exp(B). \end{align*} \] The interchange of summation order is justified by absolute convergence.

(c) Since \(A\) and \(-A\) commute, property (b) gives \(\exp(A)\exp(-A) = \exp(A + (-A)) = \exp(0) = I\). Similarly, \(\exp(-A)\exp(A) = I\). Therefore \(\exp(A)\) is invertible with inverse \(\exp(-A)\).

(d) When \(A\) is diagonalizable, say \(A = P \,\mathrm{diag}(\lambda_1, \dots, \lambda_n)\, P^{-1}\), property (e) gives \(\exp(A) = P\,\mathrm{diag}(e^{\lambda_1}, \dots, e^{\lambda_n})\,P^{-1}\), so \[ \det(\exp(A)) = \prod_{i=1}^{n} e^{\lambda_i} = e^{\sum_i \lambda_i} = e^{\mathrm{tr}(A)}. \] The general case follows by a density argument: diagonalizable matrices are dense in \(M_n(\mathbb{C})\), and both sides are continuous functions of \(A\).

(e) By induction, \((PAP^{-1})^k = PA^kP^{-1}\) for all \(k \geq 0\). Therefore: \[ \exp(PAP^{-1}) = \sum_{k=0}^{\infty} \frac{(PAP^{-1})^k}{k!} = P\left(\sum_{k=0}^{\infty} \frac{A^k}{k!}\right)P^{-1} = P\exp(A)P^{-1}. \qquad \square \]

Warning: The Commutativity Hypothesis in Property (c)

Property (b) states that \(\exp(A+B) = \exp(A)\exp(B)\) when \(AB = BA\). This commutativity hypothesis is essential — the identity fails in general for non-commuting matrices. For example, if \[ A = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, \quad B = \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}, \] then \(AB \neq BA\), and one can verify that \(\exp(A+B) \neq \exp(A)\exp(B)\).

This failure is not a defect — it is the defining feature of non-commutative groups, expressed at the level of the exponential map. The precise correction is given by the Baker-Campbell-Hausdorff formula, which expresses \(\log(\exp(A)\exp(B))\) as a series involving nested commutators \([A, B]\), \([A, [A, B]]\), etc. The first correction term is: \[ \exp(A)\exp(B) = \exp\!\left(A + B + \tfrac{1}{2}[A, B] + \cdots\right) \] where \([A, B] = AB - BA\) is the matrix commutator. This commutator — the Lie bracket — will be formalized in Lie Algebras and the Lie Bracket.

The Exponential Maps into the Group

A remarkable property of the matrix exponential is that it automatically respects the defining equations of each classical group. Exponentiating a matrix satisfying certain linear conditions produces a group element satisfying the corresponding nonlinear conditions.

Theorem: The Exponential Map Respects Group Structure

Let \(A \in M_n(\mathbb{C})\). Then:

(a) If \(\mathrm{tr}(A) = 0\), then \(\exp(A) \in SL(n)\).

(b) If \(A^\top = -A\) (skew-symmetric), then \(\exp(A) \in SO(n)\).

(c) If \(A^* = -A\) (skew-Hermitian), then \(\exp(A) \in U(n)\).

(d) If \(A^* = -A\) and \(\mathrm{tr}(A) = 0\), then \(\exp(A) \in SU(n)\).

Proofs:

(a) By property (d) of the matrix exponential: \(\det(\exp(A)) = \exp(\mathrm{tr}(A)) = \exp(0) = 1\).

(b) If \(A^\top = -A\), then: \[ \exp(A)^\top = \exp(A^\top) = \exp(-A) = \exp(A)^{-1}. \] Hence \(\exp(A)^\top \exp(A) = I\), so \(\exp(A) \in O(n)\). Moreover, \(\det(\exp(A)) = \exp(\mathrm{tr}(A)) = \exp(0) = 1\) (since \(\mathrm{tr}(A) = 0\) for every skew-symmetric matrix: the diagonal entries of a skew-symmetric matrix are all zero), so \(\exp(A) \in SO(n)\).

(c) If \(A^* = -A\), then \(\exp(A)^* = \exp(A^*) = \exp(-A) = \exp(A)^{-1}\), so \(\exp(A) \in U(n)\).

(d) Combine (c) and (a). \(\qquad \square\)

This theorem reveals a beautiful pattern. The following table places it alongside the group definitions from the previous page:

Linear Condition on \(A\) \(\xrightarrow{\;\exp\;}\) Nonlinear Group Condition \(\exp(A) \in\)
(no constraint) \(\to\) \(\det \neq 0\) (automatic) \(GL(n)\)
\(\mathrm{tr}(A) = 0\) \(\to\) \(\det = 1\) \(SL(n)\)
\(A^\top = -A\) \(\to\) \(A^\top A = I,\; \det = 1\) \(SO(n)\)
\(A^* = -A\) \(\to\) \(A^* A = I\) \(U(n)\)
\(A^* = -A,\; \mathrm{tr}(A) = 0\) \(\to\) \(A^* A = I,\; \det = 1\) \(SU(n)\)

The left column consists of linear subspaces of \(M_n(\mathbb{C})\); these will be identified as Lie algebras in Lie Algebras and the Lie Bracket. The right column consists of the nonlinear matrix Lie groups. The exponential map is the bridge between them: it converts infinitesimal (linear) symmetry into finite (nonlinear) symmetry.

Rotations and Rodrigues' Formula

We now put the matrix exponential to work by computing it explicitly for the rotation groups. These calculations are not merely exercises — they produce formulas that are used daily in robotics, computer graphics, and aerospace engineering.

Rotations in \(SO(2)\)

The simplest non-trivial rotation group is \(SO(2)\), the group of rotations of the plane. The skew-symmetry condition \(A^\top = -A\) for a \(2 \times 2\) matrix gives the general form: \[ A = \theta \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}, \quad \theta \in \mathbb{R}. \] Let us write \(J = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\). Since \(J^2 = -I\), we have \(J^3 = -J\), \(J^4 = I\), and the pattern repeats with period 4. Therefore: \[ \begin{align*} \exp(\theta J) &= I + \theta J + \frac{\theta^2 J^2}{2!} + \frac{\theta^3 J^3}{3!} + \cdots \\ &= \left(1 - \frac{\theta^2}{2!} + \frac{\theta^4}{4!} - \cdots\right) I + \left(\theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \cdots\right) J \\ &= \cos\theta \cdot I + \sin\theta \cdot J = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}. \end{align*} \] This is precisely the 2D rotation matrix \(R_\theta\) encountered at the very beginning of our study of orthogonal matrices — now derived from first principles via the exponential map. The exponential map establishes a surjective homomorphism \((\mathbb{R}, +) \to SO(2)\), \(\theta \mapsto \exp(\theta J)\), with kernel \(2\pi\mathbb{Z}\). By the First Isomorphism Theorem: \[ SO(2) \cong \mathbb{R} / 2\pi\mathbb{Z} \cong S^1. \]

Rotations in \(SO(3)\): The Hat Map and Infinitesimal Generators

For \(SO(3)\), the skew-symmetric matrices form a 3-dimensional vector space. A standard basis consists of the infinitesimal generators of rotation about each coordinate axis: \[ E_1 = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{pmatrix}, \quad E_2 = \begin{pmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0 \end{pmatrix}, \quad E_3 = \begin{pmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}. \] Here \(E_i\) generates rotations about the \(i\)-th coordinate axis: \(\exp(\theta E_1)\) rotates about the \(x\)-axis by angle \(\theta\), and likewise for \(E_2\) (\(y\)-axis) and \(E_3\) (\(z\)-axis).

An arbitrary unit vector \(\hat{\mathbf{n}} = (n_1, n_2, n_3)^\top\) with \(\|\hat{\mathbf{n}}\| = 1\) defines a skew-symmetric matrix via the hat map: \[ \hat{\mathbf{n}}_\times = n_1 E_1 + n_2 E_2 + n_3 E_3 = \begin{pmatrix} 0 & -n_3 & n_2 \\ n_3 & 0 & -n_1 \\ -n_2 & n_1 & 0 \end{pmatrix}. \] This matrix encodes the cross product: \(\hat{\mathbf{n}}_\times \mathbf{v} = \hat{\mathbf{n}} \times \mathbf{v}\) for all \(\mathbf{v} \in \mathbb{R}^3\). The exponential \(\exp(\theta\, \hat{\mathbf{n}}_\times)\) gives a rotation by angle \(\theta\) about the axis \(\hat{\mathbf{n}}\), and it admits a closed-form expression.

Theorem: Rodrigues' Rotation Formula

Let \(\hat{\mathbf{n}} \in \mathbb{R}^3\) be a unit vector and \(\theta \in \mathbb{R}\). Then: \[ \exp(\theta\, \hat{\mathbf{n}}_\times) = I + \sin\theta\;\hat{\mathbf{n}}_\times + (1 - \cos\theta)\;\hat{\mathbf{n}}_\times^{\,2}. \]

Proof:

The key observation is that \(\hat{\mathbf{n}}_\times\) satisfies the identity \(\hat{\mathbf{n}}_\times^{\,3} = -\hat{\mathbf{n}}_\times\), which can be verified by direct computation using \(\|\hat{\mathbf{n}}\| = 1\). This gives a cyclic pattern: \[ \hat{\mathbf{n}}_\times^{\,3} = -\hat{\mathbf{n}}_\times, \quad \hat{\mathbf{n}}_\times^{\,4} = -\hat{\mathbf{n}}_\times^{\,2}, \quad \hat{\mathbf{n}}_\times^{\,5} = \hat{\mathbf{n}}_\times, \quad \dots \] Substituting into the exponential series and grouping by powers of \(\hat{\mathbf{n}}_\times\): \[ \begin{align*} \exp(\theta\,\hat{\mathbf{n}}_\times) &= I + \theta\,\hat{\mathbf{n}}_\times + \frac{\theta^2}{2!}\hat{\mathbf{n}}_\times^{\,2} + \frac{\theta^3}{3!}\hat{\mathbf{n}}_\times^{\,3} + \frac{\theta^4}{4!}\hat{\mathbf{n}}_\times^{\,4} + \cdots \\ &= I + \left(\theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \cdots\right)\hat{\mathbf{n}}_\times + \left(\frac{\theta^2}{2!} - \frac{\theta^4}{4!} + \frac{\theta^6}{6!} - \cdots\right)\hat{\mathbf{n}}_\times^{\,2} \\ &= I + \sin\theta\;\hat{\mathbf{n}}_\times + (1 - \cos\theta)\;\hat{\mathbf{n}}_\times^{\,2}. \qquad \square \end{align*} \]

Rodrigues' formula converts the infinite series \(\exp(\theta\,\hat{\mathbf{n}}_\times)\) into a finite expression involving only \(I\), \(\hat{\mathbf{n}}_\times\), and \(\hat{\mathbf{n}}_\times^{\,2}\). Rotation by any angle about any axis can be computed with a handful of matrix operations rather than a series truncation. This formula is the backbone of rotation representations in robotics, computer graphics, and attitude estimation.

One-Parameter Subgroups

The examples above — \(\theta \mapsto \exp(\theta J)\) in \(SO(2)\), \(t \mapsto \exp(t E_i)\) in \(SO(3)\) — are all instances of a single pattern: a continuous homomorphism from \((\mathbb{R}, +)\) into a matrix Lie group. We now formalize this pattern.

Definition: One-Parameter Subgroup

A one-parameter subgroup of a matrix Lie group \(G\) is a continuous group homomorphism \(\gamma : (\mathbb{R}, +) \to G\). That is, \(\gamma\) is a continuous map satisfying: \[ \gamma(s + t) = \gamma(s)\,\gamma(t) \quad \text{for all } s, t \in \mathbb{R}, \qquad \gamma(0) = I. \]

The fundamental theorem of this section shows that every one-parameter subgroup arises from the matrix exponential, and conversely.

Theorem: One-Parameter Subgroups are Matrix Exponentials

Every one-parameter subgroup of a matrix Lie group \(G\) has the form \[ \gamma(t) = \exp(tA) \] for a unique matrix \(A = \gamma'(0)\).

Proof sketch:

Let \(\gamma : \mathbb{R} \to G\) be a one-parameter subgroup. By Cartan's Closed Subgroup Theorem (Cartan's Theorem), \(G\) is a smooth submanifold of \(M_n(\mathbb{C})\), so \(\gamma\) is smooth (a continuous homomorphism between Lie groups is automatically smooth). Differentiating the identity \(\gamma(t + s) = \gamma(t)\gamma(s)\) with respect to \(s\) at \(s = 0\) gives: \[ \gamma'(t) = \gamma(t)\,\gamma'(0) = \gamma(t)\,A \] where \(A = \gamma'(0)\). This is the matrix ODE \(\gamma'(t) = \gamma(t)\,A\) with initial condition \(\gamma(0) = I\). Since \(\exp(tA)\) also satisfies this ODE (as \(\frac{d}{dt}\exp(tA) = \exp(tA)\,A\)), uniqueness of solutions gives \(\gamma(t) = \exp(tA)\). \(\square\)

Geometric Interpretation

Each one-parameter subgroup \(\gamma(t) = \exp(tA)\) traces a curve through the identity in the group \(G\). The matrix \(A = \gamma'(0)\) is the velocity vector of this curve at \(t = 0\). The collection of all such velocity vectors forms a vector space — this will be formalized as the Lie algebra \(\mathfrak{g}\) of \(G\) in Lie Algebras and the Lie Bracket.

Geometrically, one-parameter subgroups are the "straight lines through the identity" in the group. Just as a line in \(\mathbb{R}^n\) through the origin is determined by its direction vector, a one-parameter subgroup of \(G\) is determined by its tangent vector \(A\) at the identity. The exponential map \(\exp : A \mapsto \exp(A)\) sends tangent vectors to group elements — it is the map from "infinitesimal motions" to "finite transformations."

Example: One-Parameter Subgroups of \(SO(3)\)

The three curves \[ \gamma_1(t) = \exp(t\,E_1), \qquad \gamma_2(t) = \exp(t\,E_2), \qquad \gamma_3(t) = \exp(t\,E_3) \] are one-parameter subgroups of \(SO(3)\), where \(E_1, E_2, E_3\) are the infinitesimal generators defined in the previous section. Each \(\gamma_i\) traces out a great circle in \(SO(3)\): it is the family of all rotations about a single coordinate axis. An arbitrary one-parameter subgroup \(\gamma(t) = \exp(t(a_1 E_1 + a_2 E_2 + a_3 E_3))\) corresponds to rotation about the axis \((a_1, a_2, a_3)\) at rate \(\|(a_1, a_2, a_3)\|\).

Non-Example: Finite Groups Have No Non-Trivial One-Parameter Subgroups

If \(G\) is a finite group (such as the dihedral group \(D_n\) embedded as matrices in \(O(2)\)), then any continuous homomorphism \(\gamma : \mathbb{R} \to G\) must be constant. Indeed, \(\gamma(\mathbb{R})\) is a connected subset of \(G\) (as the continuous image of the connected space \(\mathbb{R}\)), but a finite group with the discrete topology has no connected subsets other than singletons. Since \(\gamma(0) = I\), we conclude \(\gamma(t) = I\) for all \(t\).

This is the rigorous version of an observation from Geometry of Symmetry: the dihedral group \(D_n\) has "no in-between states" — one cannot continuously interpolate between a rotation by \(0°\) and a rotation by \(360°/n\) while staying inside \(D_n\). The group \(SO(2)\), by contrast, has a one-parameter subgroup for every real number \(\theta\), reflecting the fact that rotation by any angle is a valid symmetry.

Connections: Robotics, Information Geometry, and Beyond

SE(3) and Screw Motions in Robotics

The special Euclidean group \(SE(3)\) models rigid body motion: every element is a rotation composed with a translation. The Lie algebra of \(SE(3)\) consists of \(4 \times 4\) matrices of the form \[ \hat{\xi} = \begin{pmatrix} \hat{\boldsymbol{\omega}}_\times & \mathbf{v} \\ \mathbf{0}^\top & 0 \end{pmatrix} \] where \(\hat{\boldsymbol{\omega}}_\times \in \mathfrak{so}(3)\) is a skew-symmetric matrix (encoding angular velocity) and \(\mathbf{v} \in \mathbb{R}^3\) encodes translational velocity. The 6-dimensional vector \(\xi = (\boldsymbol{\omega}, \mathbf{v}) \in \mathbb{R}^6\) is called a twist in robotics terminology.

Exponentiating a twist produces a rigid body transformation: \(\exp(\hat{\xi}) \in SE(3)\). Remarkably, the resulting motion is always a screw motion — a simultaneous rotation about and translation along a fixed axis (Chasles' theorem). The exponential map for \(SE(3)\) is therefore the mathematical backbone of robot forward kinematics: given the joint velocities of a robotic arm, the exponential map produces the resulting motion of the end effector.

Statistical Manifolds and Information Geometry

In Natural Gradient Descent, we saw that the Fisher Information Matrix (FIM) serves as a Riemannian metric on spaces of probability distributions. We can now sharpen this observation using Lie group language.

The space of Gaussian distributions \(\mathcal{N}(\mu, \sigma^2)\), parameterized by \((\mu, \sigma)\) with \(\sigma > 0\), can be identified with the affine group \(\mathrm{Aff}^+(1)\) — a 2-dimensional Lie group — and the FIM is a left-invariant metric on this group — it respects the group structure in the sense that translating by a group element does not change the metric. The natural gradient is the gradient with respect to this invariant metric; it follows the group's intrinsic geometry rather than the ambient Euclidean geometry. The exponential map on this statistical Lie group connects to the exponential family of distributions, giving a geometric explanation for why exponential families are "natural" in statistics.

This connection will be fully formalized once we have smooth manifolds and Riemannian metrics at our disposal, but the essential point is already clear: the matrix exponential is not just a tool for rotations — it is the universal mechanism for moving along group-invariant directions, whether the group describes physical symmetries or statistical structure.

Looking Ahead

In this page, we have seen that the matrix exponential converts linear data (matrices satisfying conditions like skew-symmetry or tracelessness) into nonlinear group elements (orthogonal matrices, volume-preserving transformations). The one-parameter subgroup theorem showed that this passage is canonical: every continuous path through the identity in a matrix Lie group is an exponential.

The collection of all velocity vectors \(A = \gamma'(0)\) of one-parameter subgroups forms a vector space, and this vector space carries additional algebraic structure: the Lie bracket \([A, B] = AB - BA\), which measures the failure of the exponential to be a homomorphism. In the next page, we formalize this as the Lie algebra of a matrix Lie group, compute the Lie algebras of all the classical groups, and discover that the bracket encodes the group's non-commutativity at the infinitesimal level.