Definition and Properties
In the previous page, we defined the
classical matrix Lie groups — \(GL\), \(SL\), \(O\), \(SO\), \(U\), \(SU\), and
\(SE(3)\) — as closed subgroups of the general linear group. Each is carved out of
\(GL(n)\) by nonlinear algebraic equations: \(A^\top A = I\), \(\det(A) = 1\),
and so on. We now introduce the tool that bridges the nonlinear world of groups and the
linear world of matrices: the matrix exponential.
The guiding analogy is the scalar ODE. The equation \(y'(t) = ay(t)\) with initial
condition \(y(0) = 1\) has the unique solution \(y(t) = e^{at}\). For a system of
linear ODEs, \(\mathbf{Y}'(t) = A\mathbf{Y}(t)\) with \(\mathbf{Y}(0) = I\), the
solution should be \(\mathbf{Y}(t) = \exp(tA)\). This requires defining the exponential
of a matrix.
Definition: Matrix Exponential
For any \(A \in M_n(\mathbb{C})\), the matrix exponential of \(A\) is
\[
\exp(A) = \sum_{k=0}^{\infty} \frac{A^k}{k!} = I + A + \frac{A^2}{2!} + \frac{A^3}{3!} + \cdots
\]
where \(A^0 = I\) by convention.
Before we can use this definition, we must verify that the series converges.
Theorem: Convergence of the Matrix Exponential
The series \(\displaystyle\sum_{k=0}^{\infty} \frac{A^k}{k!}\) converges absolutely
for every \(A \in M_n(\mathbb{C})\) with respect to any submultiplicative matrix norm
\(\|\cdot\|\) (i.e., any norm satisfying \(\|AB\| \leq \|A\|\,\|B\|\)).
Proof:
Let \(\|\cdot\|\) be a submultiplicative norm on \(M_n(\mathbb{C})\). By induction,
\(\|A^k\| \leq \|A\|^k\) for all \(k \geq 0\). Therefore:
\[
\sum_{k=0}^{\infty} \left\|\frac{A^k}{k!}\right\|
\leq \sum_{k=0}^{\infty} \frac{\|A\|^k}{k!} = e^{\|A\|} < \infty.
\]
The right-hand side is the scalar exponential of \(\|A\|\), which converges for all
real numbers. By the comparison test, the matrix series converges absolutely. Since
\(M_n(\mathbb{C})\) is a finite-dimensional (hence complete) normed vector space,
absolute convergence implies convergence. \(\square\)
Fundamental Properties
The matrix exponential shares many properties with the scalar exponential — but the failure
of one key property reveals the essential role of non-commutativity.
Theorem: Properties of the Matrix Exponential
Let \(A, B \in M_n(\mathbb{C})\) and \(P \in GL(n, \mathbb{C})\). Then:
(a) \(\exp(0) = I\).
(b) If \(AB = BA\), then
\(\exp(A + B) = \exp(A)\exp(B)\).
(c) \(\exp(A)\) is always invertible, with
\(\exp(A)^{-1} = \exp(-A)\).
(d) \(\det(\exp(A)) = \exp(\mathrm{tr}(A))\).
(e) \(\exp(PAP^{-1}) = P\exp(A)P^{-1}\).
Proofs:
(a) Immediate: \(\exp(0) = I + 0 + 0 + \cdots = I\).
(b) Assume \(AB = BA\). Then \(A\) and \(B\) generate a commutative
subalgebra of \(M_n(\mathbb{C})\), so the binomial theorem applies:
\[
(A + B)^k = \sum_{j=0}^{k} \binom{k}{j} A^j B^{k-j}.
\]
Therefore:
\[
\begin{align*}
\exp(A + B) &= \sum_{k=0}^{\infty} \frac{(A+B)^k}{k!}
= \sum_{k=0}^{\infty} \sum_{j=0}^{k} \frac{A^j}{j!} \cdot \frac{B^{k-j}}{(k-j)!} \\
&= \left(\sum_{j=0}^{\infty} \frac{A^j}{j!}\right)\left(\sum_{m=0}^{\infty} \frac{B^m}{m!}\right)
= \exp(A)\exp(B).
\end{align*}
\]
The interchange of summation order is justified by absolute convergence.
(c) Since \(A\) and \(-A\) commute, property (b) gives
\(\exp(A)\exp(-A) = \exp(A + (-A)) = \exp(0) = I\). Similarly,
\(\exp(-A)\exp(A) = I\). Therefore \(\exp(A)\) is invertible with inverse
\(\exp(-A)\).
(d) When \(A\) is diagonalizable, say
\(A = P \,\mathrm{diag}(\lambda_1, \dots, \lambda_n)\, P^{-1}\),
property (e) gives
\(\exp(A) = P\,\mathrm{diag}(e^{\lambda_1}, \dots, e^{\lambda_n})\,P^{-1}\), so
\[
\det(\exp(A)) = \prod_{i=1}^{n} e^{\lambda_i}
= e^{\sum_i \lambda_i} = e^{\mathrm{tr}(A)}.
\]
The general case follows by a density argument: diagonalizable matrices are dense in
\(M_n(\mathbb{C})\), and both sides are continuous functions of \(A\).
(e) By induction, \((PAP^{-1})^k = PA^kP^{-1}\) for all
\(k \geq 0\). Therefore:
\[
\exp(PAP^{-1}) = \sum_{k=0}^{\infty} \frac{(PAP^{-1})^k}{k!}
= P\left(\sum_{k=0}^{\infty} \frac{A^k}{k!}\right)P^{-1}
= P\exp(A)P^{-1}. \qquad \square
\]
Warning: The Commutativity Hypothesis in Property (c)
Property (b) states that \(\exp(A+B) = \exp(A)\exp(B)\) when
\(AB = BA\). This commutativity hypothesis is essential —
the identity fails in general for non-commuting matrices. For example, if
\[
A = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, \quad
B = \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix},
\]
then \(AB \neq BA\), and one can verify that
\(\exp(A+B) \neq \exp(A)\exp(B)\).
This failure is not a defect — it is the defining feature of non-commutative
groups, expressed at the level of the exponential map. The precise correction is given
by the Baker-Campbell-Hausdorff formula, which expresses
\(\log(\exp(A)\exp(B))\) as a series involving nested commutators \([A, B]\),
\([A, [A, B]]\), etc. The first correction term is:
\[
\exp(A)\exp(B) = \exp\!\left(A + B + \tfrac{1}{2}[A, B] + \cdots\right)
\]
where \([A, B] = AB - BA\) is the matrix commutator. This
commutator — the Lie bracket — will be formalized in
Lie Algebras and the Lie Bracket.
The Exponential Maps into the Group
A remarkable property of the matrix exponential is that it automatically respects the
defining equations of each classical group. Exponentiating a matrix satisfying certain
linear conditions produces a group element satisfying the corresponding
nonlinear conditions.
Theorem: The Exponential Map Respects Group Structure
Let \(A \in M_n(\mathbb{C})\). Then:
(a) If \(\mathrm{tr}(A) = 0\), then
\(\exp(A) \in SL(n)\).
(b) If \(A^\top = -A\) (skew-symmetric), then
\(\exp(A) \in SO(n)\).
(c) If \(A^* = -A\) (skew-Hermitian), then
\(\exp(A) \in U(n)\).
(d) If \(A^* = -A\) and \(\mathrm{tr}(A) = 0\), then
\(\exp(A) \in SU(n)\).
Proofs:
(a) By property (d) of the matrix exponential:
\(\det(\exp(A)) = \exp(\mathrm{tr}(A)) = \exp(0) = 1\).
(b) If \(A^\top = -A\), then:
\[
\exp(A)^\top = \exp(A^\top) = \exp(-A) = \exp(A)^{-1}.
\]
Hence \(\exp(A)^\top \exp(A) = I\), so \(\exp(A) \in O(n)\). Moreover,
\(\det(\exp(A)) = \exp(\mathrm{tr}(A)) = \exp(0) = 1\) (since
\(\mathrm{tr}(A) = 0\) for every skew-symmetric matrix: the diagonal entries
of a skew-symmetric matrix are all zero), so \(\exp(A) \in SO(n)\).
(c) If \(A^* = -A\), then
\(\exp(A)^* = \exp(A^*) = \exp(-A) = \exp(A)^{-1}\), so
\(\exp(A) \in U(n)\).
(d) Combine (c) and (a). \(\qquad \square\)
This theorem reveals a beautiful pattern. The following table places it alongside the
group definitions from the
previous page:
| Linear Condition on \(A\) |
\(\xrightarrow{\;\exp\;}\) |
Nonlinear Group Condition |
\(\exp(A) \in\) |
| (no constraint) |
\(\to\) |
\(\det \neq 0\) (automatic) |
\(GL(n)\) |
| \(\mathrm{tr}(A) = 0\) |
\(\to\) |
\(\det = 1\) |
\(SL(n)\) |
| \(A^\top = -A\) |
\(\to\) |
\(A^\top A = I,\; \det = 1\) |
\(SO(n)\) |
| \(A^* = -A\) |
\(\to\) |
\(A^* A = I\) |
\(U(n)\) |
| \(A^* = -A,\; \mathrm{tr}(A) = 0\) |
\(\to\) |
\(A^* A = I,\; \det = 1\) |
\(SU(n)\) |
The left column consists of linear subspaces of \(M_n(\mathbb{C})\); these
will be identified as Lie algebras in
Lie Algebras and the Lie Bracket.
The right column consists of the nonlinear matrix Lie groups. The exponential
map is the bridge between them: it converts infinitesimal (linear) symmetry into finite
(nonlinear) symmetry.
Rotations and Rodrigues' Formula
We now put the matrix exponential to work by computing it explicitly for the rotation
groups. These calculations are not merely exercises — they produce formulas that are used
daily in robotics, computer graphics, and aerospace engineering.
Rotations in \(SO(2)\)
The simplest non-trivial rotation group is \(SO(2)\), the group of rotations of the
plane. The skew-symmetry condition \(A^\top = -A\) for a \(2 \times 2\) matrix gives the
general form:
\[
A = \theta \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}, \quad \theta \in \mathbb{R}.
\]
Let us write \(J = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\). Since
\(J^2 = -I\), we have \(J^3 = -J\), \(J^4 = I\), and the pattern repeats with period 4.
Therefore:
\[
\begin{align*}
\exp(\theta J) &= I + \theta J + \frac{\theta^2 J^2}{2!} + \frac{\theta^3 J^3}{3!} + \cdots \\
&= \left(1 - \frac{\theta^2}{2!} + \frac{\theta^4}{4!} - \cdots\right) I
+ \left(\theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \cdots\right) J \\
&= \cos\theta \cdot I + \sin\theta \cdot J
= \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}.
\end{align*}
\]
This is precisely the 2D rotation matrix \(R_\theta\) encountered
at the very beginning of our study of orthogonal matrices — now derived
from first principles via the exponential map.
The exponential map establishes a surjective homomorphism \((\mathbb{R}, +) \to SO(2)\),
\(\theta \mapsto \exp(\theta J)\), with kernel \(2\pi\mathbb{Z}\). By the
First Isomorphism Theorem:
\[
SO(2) \cong \mathbb{R} / 2\pi\mathbb{Z} \cong S^1.
\]
Rotations in \(SO(3)\): The Hat Map and Infinitesimal Generators
For \(SO(3)\), the skew-symmetric matrices form a 3-dimensional vector space. A
standard basis consists of the infinitesimal generators of rotation
about each coordinate axis:
\[
E_1 = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{pmatrix}, \quad
E_2 = \begin{pmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0 \end{pmatrix}, \quad
E_3 = \begin{pmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}.
\]
Here \(E_i\) generates rotations about the \(i\)-th coordinate axis:
\(\exp(\theta E_1)\) rotates about the \(x\)-axis by angle \(\theta\), and likewise
for \(E_2\) (\(y\)-axis) and \(E_3\) (\(z\)-axis).
An arbitrary unit vector \(\hat{\mathbf{n}} = (n_1, n_2, n_3)^\top\) with
\(\|\hat{\mathbf{n}}\| = 1\) defines a skew-symmetric matrix via the
hat map:
\[
\hat{\mathbf{n}}_\times = n_1 E_1 + n_2 E_2 + n_3 E_3
= \begin{pmatrix} 0 & -n_3 & n_2 \\ n_3 & 0 & -n_1 \\ -n_2 & n_1 & 0 \end{pmatrix}.
\]
This matrix encodes the cross product:
\(\hat{\mathbf{n}}_\times \mathbf{v} = \hat{\mathbf{n}} \times \mathbf{v}\) for all
\(\mathbf{v} \in \mathbb{R}^3\). The exponential
\(\exp(\theta\, \hat{\mathbf{n}}_\times)\) gives a rotation by angle \(\theta\) about
the axis \(\hat{\mathbf{n}}\), and it admits a closed-form expression.
Proof:
The key observation is that \(\hat{\mathbf{n}}_\times\) satisfies the identity
\(\hat{\mathbf{n}}_\times^{\,3} = -\hat{\mathbf{n}}_\times\), which can be verified
by direct computation using \(\|\hat{\mathbf{n}}\| = 1\). This gives a cyclic
pattern:
\[
\hat{\mathbf{n}}_\times^{\,3} = -\hat{\mathbf{n}}_\times, \quad
\hat{\mathbf{n}}_\times^{\,4} = -\hat{\mathbf{n}}_\times^{\,2}, \quad
\hat{\mathbf{n}}_\times^{\,5} = \hat{\mathbf{n}}_\times, \quad \dots
\]
Substituting into the exponential series and grouping by powers of
\(\hat{\mathbf{n}}_\times\):
\[
\begin{align*}
\exp(\theta\,\hat{\mathbf{n}}_\times)
&= I + \theta\,\hat{\mathbf{n}}_\times
+ \frac{\theta^2}{2!}\hat{\mathbf{n}}_\times^{\,2}
+ \frac{\theta^3}{3!}\hat{\mathbf{n}}_\times^{\,3}
+ \frac{\theta^4}{4!}\hat{\mathbf{n}}_\times^{\,4} + \cdots \\
&= I + \left(\theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \cdots\right)\hat{\mathbf{n}}_\times
+ \left(\frac{\theta^2}{2!} - \frac{\theta^4}{4!} + \frac{\theta^6}{6!} - \cdots\right)\hat{\mathbf{n}}_\times^{\,2} \\
&= I + \sin\theta\;\hat{\mathbf{n}}_\times
+ (1 - \cos\theta)\;\hat{\mathbf{n}}_\times^{\,2}. \qquad \square
\end{align*}
\]
Rodrigues' formula converts the infinite series
\(\exp(\theta\,\hat{\mathbf{n}}_\times)\) into a finite expression involving only
\(I\), \(\hat{\mathbf{n}}_\times\), and \(\hat{\mathbf{n}}_\times^{\,2}\). Rotation
by any angle about any axis can be computed with a handful of matrix operations rather
than a series truncation. This formula is the backbone of rotation representations in
robotics, computer graphics, and attitude estimation.
One-Parameter Subgroups
The examples above — \(\theta \mapsto \exp(\theta J)\) in \(SO(2)\),
\(t \mapsto \exp(t E_i)\) in \(SO(3)\) — are all instances of a single pattern:
a continuous homomorphism from \((\mathbb{R}, +)\) into a matrix Lie group. We now
formalize this pattern.
Definition: One-Parameter Subgroup
A one-parameter subgroup of a matrix Lie group \(G\) is a
continuous group homomorphism \(\gamma : (\mathbb{R}, +) \to G\). That is,
\(\gamma\) is a continuous map satisfying:
\[
\gamma(s + t) = \gamma(s)\,\gamma(t) \quad
\text{for all } s, t \in \mathbb{R}, \qquad \gamma(0) = I.
\]
The fundamental theorem of this section shows that every one-parameter subgroup arises
from the matrix exponential, and conversely.
Theorem: One-Parameter Subgroups are Matrix Exponentials
Every one-parameter subgroup of a matrix Lie group \(G\) has the form
\[
\gamma(t) = \exp(tA)
\]
for a unique matrix \(A = \gamma'(0)\).
Proof sketch:
Let \(\gamma : \mathbb{R} \to G\) be a one-parameter subgroup. By Cartan's Closed
Subgroup Theorem
(Cartan's Theorem),
\(G\) is a smooth submanifold of \(M_n(\mathbb{C})\), so \(\gamma\) is smooth
(a continuous homomorphism between Lie groups is automatically smooth).
Differentiating the identity \(\gamma(t + s) = \gamma(t)\gamma(s)\) with respect to
\(s\) at \(s = 0\) gives:
\[
\gamma'(t) = \gamma(t)\,\gamma'(0) = \gamma(t)\,A
\]
where \(A = \gamma'(0)\). This is the matrix ODE
\(\gamma'(t) = \gamma(t)\,A\) with initial condition \(\gamma(0) = I\). Since
\(\exp(tA)\) also satisfies this ODE (as
\(\frac{d}{dt}\exp(tA) = \exp(tA)\,A\)), uniqueness of solutions gives
\(\gamma(t) = \exp(tA)\). \(\square\)
Geometric Interpretation
Each one-parameter subgroup \(\gamma(t) = \exp(tA)\) traces a curve through
the identity in the group \(G\). The matrix \(A = \gamma'(0)\) is the
velocity vector of this curve at \(t = 0\). The collection of all
such velocity vectors forms a vector space — this will be formalized as the
Lie algebra \(\mathfrak{g}\) of \(G\) in
Lie Algebras and the Lie Bracket.
Geometrically, one-parameter subgroups are the "straight lines through the identity"
in the group. Just as a line in \(\mathbb{R}^n\) through the origin is determined by
its direction vector, a one-parameter subgroup of \(G\) is determined by its tangent
vector \(A\) at the identity. The exponential map \(\exp : A \mapsto \exp(A)\) sends
tangent vectors to group elements — it is the map from "infinitesimal motions" to
"finite transformations."
Example: One-Parameter Subgroups of \(SO(3)\)
The three curves
\[
\gamma_1(t) = \exp(t\,E_1), \qquad
\gamma_2(t) = \exp(t\,E_2), \qquad
\gamma_3(t) = \exp(t\,E_3)
\]
are one-parameter subgroups of \(SO(3)\), where \(E_1, E_2, E_3\) are the
infinitesimal generators defined in the previous section. Each \(\gamma_i\)
traces out a great circle in \(SO(3)\): it is the family of all rotations about
a single coordinate axis. An arbitrary one-parameter subgroup
\(\gamma(t) = \exp(t(a_1 E_1 + a_2 E_2 + a_3 E_3))\) corresponds to rotation
about the axis \((a_1, a_2, a_3)\) at rate \(\|(a_1, a_2, a_3)\|\).
Non-Example: Finite Groups Have No Non-Trivial One-Parameter Subgroups
If \(G\) is a finite group (such as the
dihedral group
\(D_n\) embedded as matrices in \(O(2)\)), then any continuous homomorphism
\(\gamma : \mathbb{R} \to G\) must be constant. Indeed, \(\gamma(\mathbb{R})\) is
a connected subset of \(G\) (as the continuous image of the connected space
\(\mathbb{R}\)), but a finite group with the discrete topology has no connected
subsets other than singletons. Since \(\gamma(0) = I\), we conclude
\(\gamma(t) = I\) for all \(t\).
This is the rigorous version of an observation from
Geometry of Symmetry: the
dihedral group \(D_n\) has "no in-between states" — one cannot continuously
interpolate between a rotation by \(0°\) and a rotation by \(360°/n\) while
staying inside \(D_n\). The group \(SO(2)\), by contrast, has a one-parameter
subgroup for every real number \(\theta\), reflecting the fact that rotation by
any angle is a valid symmetry.
Connections: Robotics, Information Geometry, and Beyond
SE(3) and Screw Motions in Robotics
The
special Euclidean group
\(SE(3)\) models rigid body motion: every element is a rotation composed with a
translation. The Lie algebra of \(SE(3)\) consists of \(4 \times 4\) matrices of
the form
\[
\hat{\xi} = \begin{pmatrix} \hat{\boldsymbol{\omega}}_\times & \mathbf{v} \\
\mathbf{0}^\top & 0 \end{pmatrix}
\]
where \(\hat{\boldsymbol{\omega}}_\times \in \mathfrak{so}(3)\) is a
skew-symmetric matrix (encoding angular velocity) and
\(\mathbf{v} \in \mathbb{R}^3\) encodes translational velocity. The 6-dimensional
vector \(\xi = (\boldsymbol{\omega}, \mathbf{v}) \in \mathbb{R}^6\) is called a
twist in robotics terminology.
Exponentiating a twist produces a rigid body transformation:
\(\exp(\hat{\xi}) \in SE(3)\). Remarkably, the resulting motion is always a
screw motion — a simultaneous rotation about and translation
along a fixed axis (Chasles' theorem). The exponential map for \(SE(3)\) is
therefore the mathematical backbone of robot forward kinematics:
given the joint velocities of a robotic arm, the exponential map produces the
resulting motion of the end effector.
Statistical Manifolds and Information Geometry
In Natural Gradient Descent, we saw that the
Fisher Information Matrix (FIM)
serves as a Riemannian metric on spaces of probability distributions. We can now sharpen this observation using Lie group language.
The space of Gaussian distributions \(\mathcal{N}(\mu, \sigma^2)\), parameterized
by \((\mu, \sigma)\) with \(\sigma > 0\), can be identified with the affine group
\(\mathrm{Aff}^+(1)\) — a 2-dimensional Lie group — and the FIM is a left-invariant metric
on this group — it respects the group structure in the sense that translating by a group
element does not change the metric. The natural gradient is the
gradient with respect to this invariant metric; it follows the group's intrinsic
geometry rather than the ambient Euclidean geometry. The exponential map on this
statistical Lie group connects to the exponential family of distributions, giving a
geometric explanation for why exponential families are "natural" in statistics.
This connection will be fully formalized once we have smooth manifolds and
Riemannian metrics at our disposal, but the essential point is already clear:
the matrix exponential is not just a tool for rotations — it is the universal
mechanism for moving along group-invariant directions, whether the group describes
physical symmetries or statistical structure.
Looking Ahead
In this page, we have seen that the matrix exponential converts linear data
(matrices satisfying conditions like skew-symmetry or tracelessness) into
nonlinear group elements (orthogonal matrices, volume-preserving
transformations). The one-parameter subgroup theorem showed that this passage is
canonical: every continuous path through the identity in a matrix Lie group is an
exponential.
The collection of all velocity vectors \(A = \gamma'(0)\) of one-parameter subgroups
forms a vector space, and this vector space carries additional algebraic
structure: the Lie bracket \([A, B] = AB - BA\), which measures the
failure of the exponential to be a homomorphism. In the
next page, we formalize this as the
Lie algebra of a matrix Lie group, compute the Lie algebras of all the
classical groups, and discover that the bracket encodes the group's non-commutativity at
the infinitesimal level.