Eigenvectors and Eigenvalues
Eigenvectors reveal directions in which a linear transformation acts by pure scaling. When \(A\mathbf{x} = \lambda \mathbf{x}\),
the transformation \(\mathbf{x} \to A\mathbf{x}\) stretches or shrinks \(\mathbf{x}\) by the factor \(\lambda\) without changing its
direction (or reversing it if \(\lambda < 0\)). This geometric interpretation makes eigenvalues and eigenvectors
fundamental to understanding matrix behavior, with applications ranging from stability analysis of dynamical systems
to dimensionality reduction in data science.
Definition: Eigenvectors & Eigenvalues
An eigenvector of an \(n \times n\) matrix \(A\) is a "nonzero" vector \(\mathbf{x}\) such that
\[
A\mathbf{x} = \lambda \mathbf{x} \tag{1}
\]
for some scalar \(\lambda\), which is called an eigenvalue of \(A\) if there is a nontrivial
solution \(\mathbf{x}\) to equation (1). Such an \(\mathbf{x}\) is referred to as an eigenvector corresponding to \(\lambda\).
Equation (1) can be written as:
\[
(A - \lambda I)\mathbf{x} = \mathbf{0}. \tag{2}
\]
The scalar \(\lambda\) is an eigenvalue of \(A\) if and only if (2) has a nontrivial solution. The set of all solutions
of (2) is the null space \(\operatorname{Nul}(A - \lambda I) \subseteq \mathbb{R}^n\), which we now name.
Definition: Eigenspace
Let \(\lambda\) be an eigenvalue of an \(n \times n\) matrix \(A\). The eigenspace of \(A\) corresponding to \(\lambda\)
is the set of all solutions \(\mathbf{x} \in \mathbb{R}^n\) of \((A - \lambda I)\mathbf{x} = \mathbf{0}\), namely the null space
\(\operatorname{Nul}(A - \lambda I)\). Equivalently, it is the set of all eigenvectors of \(A\) corresponding to \(\lambda\),
together with the zero vector.
Theorem: Eigenvalues of Triangular Matrices
The eigenvalues of a triangular matrix are its main diagonal entries.
Proof:
Suppose \(A \in \mathbb{R}^{3 \times 3}\) is a lower triangular matrix. Then for any scalar \(\lambda\),
\[A - \lambda I = \begin{bmatrix}
a_{11} - \lambda & 0 & 0 \\
a_{21} & a_{22} - \lambda & 0 \\
a_{31} & a_{32} & a_{33} - \lambda \\
\end{bmatrix}
\]
is itself lower triangular. By the determinant of a triangular matrix,
\[
\det(A - \lambda I) = (a_{11} - \lambda)(a_{22} - \lambda)(a_{33} - \lambda).
\]
By the Invertible Matrix Theorem,
\(\lambda\) is an eigenvalue of \(A\) if and only if \(A - \lambda I\) is singular, i.e., \(\det(A - \lambda I) = 0\).
This product vanishes if and only if at least one factor vanishes, that is, \(\lambda = a_{ii}\) for some \(i \in \{1, 2, 3\}\).
The same argument applies to upper triangular matrices and to higher-dimensional cases.
Example:
Consider
\[
A = \begin{bmatrix} 0 & 1 & 8 \\ 0 & 2 & 7 \\ 0 & 0 & 3 \\ \end{bmatrix}.
\]
\(A\) has eigenvalues; \(0, 2, \text{and } 3\). Since \(A\) has a zero eigenvalue, equation (1) becomes
the homogeneous equation \(A\mathbf{x} = \mathbf{0}\), which must have a nontrivial solution. This happens if and only if
\(A\) is a singular matrix (NOT invertible).
We can verify this by computing its determinant:
\[
\det A = 0(6-0)-(0-0)+8(0-0)=0
\]
Since \(\det A=0\), \(A\) is indeed not invertible.
Theorem: Eigenvectors of Distinct Eigenvalues are Independent
If \(\mathbf{v}_1, \cdots, \mathbf{v}_n\) are eigenvectors corresponding to distinct eigenvalues \(\lambda_1, \cdots, \lambda_n\)
of a square matrix \(A\), then the set \(\{\mathbf{v}_1, \cdots, \mathbf{v}_n\}\) is linearly independent.
Proof:
Suppose, for contradiction, that \(\{\mathbf{v}_1, \ldots, \mathbf{v}_n\}\) is linearly dependent. Since each eigenvector is nonzero by definition,
the singleton \(\{\mathbf{v}_1\}\) is independent; hence there is some smallest index \(k \geq 2\) such that \(\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}\) is dependent.
Writing \(k = i + 1\) with \(i \geq 1\), minimality implies that \(\{\mathbf{v}_1, \ldots, \mathbf{v}_i\}\) is independent, and \(\mathbf{v}_{i+1}\) is
a linear combination of \(\mathbf{v}_1, \ldots, \mathbf{v}_i\): there exist scalars \(c_1, \ldots, c_i \in \mathbb{R}\) such that
\[
\mathbf{v}_{i+1} = c_1 \mathbf{v}_1 + \cdots + c_i \mathbf{v}_i. \tag{3}
\]
Multiplying both sides of (3) by \(A\) and using \(A \mathbf{v}_j = \lambda_j \mathbf{v}_j\) for each \(j\),
\[
\lambda_{i+1} \mathbf{v}_{i+1} = A \mathbf{v}_{i+1} = c_1 \lambda_1 \mathbf{v}_1 + \cdots + c_i \lambda_i \mathbf{v}_i. \tag{4}
\]
Subtracting \(\lambda_{i+1}\) times (3) from (4) yields
\[
c_1(\lambda_1 - \lambda_{i+1}) \mathbf{v}_1 + \cdots + c_i(\lambda_i - \lambda_{i+1}) \mathbf{v}_i = \mathbf{0}. \tag{5}
\]
Since \(\{\mathbf{v}_1, \ldots, \mathbf{v}_i\}\) is independent, every coefficient in (5) must be zero. The eigenvalues are distinct, so
\(\lambda_j - \lambda_{i+1} \neq 0\) for \(j = 1, \ldots, i\); therefore \(c_1 = \cdots = c_i = 0\). But then (3) gives \(\mathbf{v}_{i+1} = \mathbf{0}\),
contradicting the fact that eigenvectors are nonzero. Hence \(\{\mathbf{v}_1, \ldots, \mathbf{v}_n\}\) is linearly independent.
Characteristic Equations
While the previous theorem tells us that eigenvectors corresponding to distinct eigenvalues are linearly independent,
we still need a systematic method to find eigenvalues in the first place. The characteristic equation
provides this method by reformulating the eigenvalue problem as a polynomial equation.
Definition: Characteristic Equation
For an \(n \times n\) matrix \(A\), the characteristic equation of \(A\) is
\[
\det (A - \lambda I) = 0.
\]
The polynomial \(p(\lambda) = \det(A - \lambda I)\) in the variable \(\lambda\) is called the
characteristic polynomial of \(A\).
A scalar \(\lambda\) is an eigenvalue of \(A\) if and only if \(\lambda\) satisfies the characteristic equation.
This follows from equation (2): \((A - \lambda I)\mathbf{x} = \mathbf{0}\) has a nontrivial solution if and only if
\(A - \lambda I\) is singular, which occurs precisely when its determinant is zero
(see the Invertible Matrix Theorem).
Example:
Consider the matrix
\[
A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \\ \end{bmatrix}
\]
Then,
\[
\det(A- \lambda I) = (1-\lambda)(\lambda^2 -\lambda -24) -0 +5(5 +3\lambda) = 0
\]
Expanding, the characteristic polynomial is:
\[
-\lambda^3 +2\lambda^2 + 38\lambda +1 = 0
\]
Solving this equation for \(\lambda\), we get eigenvalues for \(A\): \(\lambda \approx -5.230, -0.026, 7.256\).
Note: Like this example, in practice, we typically approximate eigenvalues using numerical methods.
Definition: Algebraic Multiplicity
The algebraic multiplicity of an eigenvalue is its multiplicity as a root of the
characteristic equation.
For example, if the characteristic polynomial of a matrix is \((\lambda -1)^2 (\lambda -2) = 0\), then
the eigenvalue 1 has multiplicity 2.
To understand when matrices share eigenvalue properties, we need the concept of similarity.
Similar matrices represent the same linear transformation expressed in different coordinate systems,
which explains why they must have identical eigenvalues.
Definition: Similarity
Suppose \(A\) and \(B\) are
\(n \times n\) matrices. Then, \(A\) is said to be similar to \(B\) if there exists an
invertible matrix \(P\) such that
\[
P^{-1}AP =B \quad \text{, or equivalently, } A=PBP^{-1}.
\]
Theorem: Similar Matrices Share Characteristic Polynomial
If \(n \times n\) matrices \(A\) and \(B\) are similar, then they have the same characteristic
polynomial and thus the same eigenvalues with the same multiplicities.
Note: the converse of this theorem is not true. Having the same eigenvalues does not imply
the matrices are similar.
Proof:
If \(B = P^{-1}AP\), then
\[
\begin{align*}
B - \lambda I &= P^{-1}AP - \lambda P^{-1}P \\
&= P^{-1}(A - \lambda I)P
\end{align*}
\]
By the multiplicative property of determinants, we have
\[
\begin{align*}
\det (B-\lambda I) &= \det (P^{-1}) \det (A-\lambda I) \det (P)\\
&= \det (A-\lambda I).
\end{align*}
\]
Note: \(\det (P^{-1}P) = \det (I) = 1\).
Diagonalization
Similarity becomes especially powerful when we can find a matrix \(P\) that transforms \(A\) into a
diagonal matrix \(D\). Diagonal matrices are computationally simple. Its powers, exponentials, and other
functions become trivial to compute. The diagonalization \(A = PDP^{-1}\) allows us to transfer these
computational advantages back to \(A\), since \(A^k = PD^kP^{-1}\) and computing \(D^k\) only requires
raising each diagonal entry to the \(k\)-th power.
Definition: Diagonalizable Matrix
A square matrix \(A\) is said to be diagonalizable if for some invertible matrix \(P\),
\(A\) is similar to a diagonal matrix \(D\):
\[
A = PDP^{-1}.
\]
Theorem: Diagonalization Criterion
An \(n \times n\) matrix \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors.
Thus, the columns of \(P\) are linearly independent eigenvectors of \(A\) and the diagonal entries of \(D\)
are eigenvalues of \(A\) corresponding to the eigenvectors in \(P\).
Note: \(P\) and \(D\) are not unique because the order of the diagonal entries in \(D\) can be changed.
Proof:
Let \(P\) be any \(n \times n\) matrix with columns \(\mathbf{v}_1, \ldots, \mathbf{v}_n\), and let \(D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)\).
Direct column-wise multiplication gives
\[
AP = \begin{bmatrix} A\mathbf{v}_1 & \cdots & A\mathbf{v}_n \end{bmatrix}, \qquad
PD = \begin{bmatrix} \lambda_1 \mathbf{v}_1 & \cdots & \lambda_n \mathbf{v}_n \end{bmatrix}.
\]
Therefore \(AP = PD\) if and only if \(A\mathbf{v}_j = \lambda_j \mathbf{v}_j\) for each \(j = 1, \ldots, n\). We use this identity below.
(\(\Rightarrow\)) Suppose \(A\) is diagonalizable: \(A = P D P^{-1}\) for some invertible \(P\) and diagonal \(D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)\).
Right-multiplying by \(P\) gives \(AP = PD\), and by the identity above, \(A\mathbf{v}_j = \lambda_j \mathbf{v}_j\) for each \(j\). Since \(P\) is invertible,
its columns \(\mathbf{v}_1, \ldots, \mathbf{v}_n\) are linearly independent, and in particular nonzero (by the Invertible Matrix Theorem).
Hence the \(\mathbf{v}_j\) are \(n\) linearly independent eigenvectors of \(A\) with eigenvalues \(\lambda_j\).
(\(\Leftarrow\)) Conversely, suppose \(A\) has \(n\) linearly independent eigenvectors \(\mathbf{v}_1, \ldots, \mathbf{v}_n\) with corresponding eigenvalues \(\lambda_1, \ldots, \lambda_n\).
Form \(P = \begin{bmatrix} \mathbf{v}_1 & \cdots & \mathbf{v}_n \end{bmatrix}\) and \(D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)\).
By the eigenvalue equations and the identity above, \(AP = PD\). The columns of \(P\) are linearly independent, so \(P\) is invertible
(again by the Invertible Matrix Theorem).
Right-multiplying \(AP = PD\) by \(P^{-1}\) yields \(A = P D P^{-1}\), so \(A\) is diagonalizable.
Example:
Given
\[
A = \begin{bmatrix} 4 & 1 & 1\\ 1 & 4 & 1 \\ 1 & 1 & 4 \\ \end{bmatrix},
\]
computing the characteristic equation:
\[
\det(A- \lambda I) = (4-\lambda)((4-\lambda)^2 - 1) -((4-\lambda )-1)+(1-(4 -\lambda)) = 0.
\]
Simplifying this, we get:
\[
-\lambda^3 +12\lambda^2 -45\lambda +54 = 0,
\]
or equivalently (multiplying both sides by \(-1\)),
\[
\lambda^3 -12\lambda^2 +45\lambda -54 = 0,
\]
which is factored as
\[
(\lambda - 3)^2(\lambda -6) = 0.
\]
Thus, eigenvalues are \(\lambda_1 = 3\), \(\lambda_2 = 3\), and \(\lambda_3 = 6\).
Next, we need to find three linearly independent eigenvectors corresponding to each eigenvalue.
For \(\lambda_1 = 3\) and \(\lambda_2 = 3\):
\[
A-3I = \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix}
\xrightarrow{\text{rref}}
\begin{bmatrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\\end{bmatrix}.
\]
We choose the eigenvectors \(\mathbf{v}_1 = \begin{bmatrix} -1 \\ 0 \\ 1 \\ \end{bmatrix}\) for \(\lambda_1 = 3\)
and \(\mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \\ 0 \\ \end{bmatrix}\).
For \(\lambda_3 = 6\):
\[
A-6I = \begin{bmatrix} -2 & 1 & 1\\ 1 & -2 & 1 \\ 1 & 1 & -2 \\ \end{bmatrix}
\xrightarrow{\text{rref}}
\begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 0 \\\end{bmatrix}
\]
We choose the eigenvector \(\mathbf{v}_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ \end{bmatrix}\).
Therefore,
\[
\begin{align*}
&D = \begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \\ \end{bmatrix}, \\\\
&P = \begin{bmatrix} -1 & -1 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ \end{bmatrix}, \\\\
&P^{-1} =
\begin{bmatrix} \frac{-1}{3} & \frac{-1}{3} & \frac{2}{3} \\
\frac{-1}{3} & \frac{2}{3} & \frac{-1}{3}\\
\frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \end{bmatrix}.
\end{align*}
\]
This example demonstrates an important point: an \(n \times n\) matrix can be diagonalizable even without
\(n\) distinct eigenvalues. By the Diagonalization Criterion, diagonalizability requires \(n\) linearly independent eigenvectors,
NOT \(n\) "distinct" eigenvalues. When an eigenvalue \(\lambda\) has algebraic multiplicity greater than 1,
diagonalizability depends on whether its eigenspace \(\operatorname{Nul}(A - \lambda I)\) has dimension equal to
this multiplicity. Here, the eigenvalue 3 has multiplicity 2, and we found 2 linearly independent eigenvectors
in its eigenspace, satisfying the requirement for diagonalizability.
Note: If a matrix has \(n\) distinct eigenvalues, then by the theorem on distinct eigenvalues above it automatically has \(n\) linearly
independent eigenvectors and must be diagonalizable. The converse is false, as this example shows.
Complex Eigenvalues and Eigenvectors
Scope of This Section
Up to this point, we have worked with real matrices \(A \in \mathbb{R}^{n \times n}\) and real eigenvalues/eigenvectors.
In this section we extend the scalar field from \(\mathbb{R}\) to \(\mathbb{C}\): the matrix entries remain real,
but we now allow eigenvalues \(\lambda \in \mathbb{C}\) and eigenvectors \(\mathbf{x} \in \mathbb{C}^n\).
Every definition and theorem already proven transfers verbatim with \(\mathbb{R}\) replaced by \(\mathbb{C}\) and arithmetic carried out in \(\mathbb{C}\).
The characteristic polynomial of a real matrix may have complex roots. For instance,
rotation matrices in \(\mathbb{R}^2\) have no real eigenvalues (except for rotations by \(0\) or \(\pi\))
because rotations preserve all directions rather than stretching along any particular direction.
However, when we allow complex eigenvalues and eigenvectors, these matrices become diagonalizable over
\(\mathbb{C}\).
A complex scalar \(\lambda\) satisfies the characteristic equation \(\det(A -\lambda I) = 0\) if and only
if there is a nonzero vector \(\mathbf{x} \in \mathbb{C}^n\) such that \(A\mathbf{x} = \lambda \mathbf{x}\). In this case, \(\lambda\)
is called a complex eigenvalue and \(\mathbf{x}\) is its corresponding complex eigenvector.
Rotation matrices demonstrate the importance of complex eigenvalues in understanding geometric transformations.
While a rotation in \(\mathbb{R}^2\) (excluding rotations by multiples of \(\pi\)) has no real eigenvectors,
it is always diagonalizable over \(\mathbb{C}\). The complex eigenvalues encode both the rotation angle and
any scaling, revealing the structure of the transformation.
(For orthogonal and symmetric matrices, see Orthogonality
and Symmetry.)
Example:
Consider the matrix \(R = \begin{bmatrix} a & -b \\ b & a \end{bmatrix}\), where
\(a\) and \(b\) are real and not both zero. (This is a rotation only when \(a^2 + b^2 = 1\); in general it combines rotation and scaling,
as we will see via the polar decomposition below.) The "complex" eigenvalues of \(R\) can be found by
solving the characteristic equation:
\[
\begin{align*}
\det (R - \lambda I)=0
&\Longrightarrow
\lambda^2 -2a\lambda +(a^2 + b^2) = 0 \\\\
&\Longrightarrow
(\lambda -(a+bi))(\lambda -(a-bi)) = 0
\end{align*}
\]
Thus, we get complex eigenvalues \(\lambda = a \pm bi\), and we can find complex eigenvectors:
\[
R \mathbf{v}_1 = \begin{bmatrix} a & -b \\ b & a \\ \end{bmatrix} \begin{bmatrix} 1 \\ -i \\ \end{bmatrix}
= \begin{bmatrix} a + bi \\ b - ai\\ \end{bmatrix}
= (a+bi)\begin{bmatrix} 1 \\ -i \\ \end{bmatrix}.
\]
Hence, \(\mathbf{v}_1 = \begin{bmatrix} 1 \\ -i \\ \end{bmatrix}\) is an eigenvector corresponding to
the eigenvalue \(\lambda = a + bi\).
Moreover, the complex conjugate of \(\lambda\), denoted \(\bar{\lambda} = a - bi\), is
an eigenvalue with its corresponding eigenvector \(\mathbf{v}_2 = \begin{bmatrix} 1 \\ i \\ \end{bmatrix}\).
Therefore, we can diagonalize \(R\) in \(\mathbb{C}^2 \,\):
\[
\begin{align*}
R &= PDP^{-1} \\
&= \begin{bmatrix} 1 & 1 \\ -i & i\end{bmatrix}
\begin{bmatrix} a+bi & 0 \\ 0 & a-bi \end{bmatrix}
\frac{1}{2}\begin{bmatrix} 1 & i \\ 1 & -i\end{bmatrix}.
\end{align*}
\]
Now, we can map this representation back to \(\mathbb{R}^2\) as the polar decomposition of \(R\) into
its rotation angle \(\varphi\) and scaling factor \(r\).
Let \(r = \sqrt{a^2 + b^2}\) be the magnitude of \(\lambda\), so \(|\lambda| = r\), and let \(\varphi = \arg(\lambda)\)
be the argument of \(\lambda\), so that
\[
\cos \varphi = \frac{a}{r}, \quad \sin \varphi = \frac{b}{r}.
\]
Then \(R\) can be written in polar form as:
\[
\begin{align*}
R &= r \begin{bmatrix} \frac{a}{r} & \frac{-b}{r} \\ \frac{b}{r} & \frac{a}{r} \\ \end{bmatrix} \\\\
&= \begin{bmatrix} r & 0 \\ 0 & r \\ \end{bmatrix} \begin{bmatrix} \cos \varphi & -\sin \varphi \\ \sin \varphi & \cos \varphi \\ \end{bmatrix}.
\end{align*}
\]
Note:
In general, given a nonzero complex number \(z\) corresponding to a point \((a, b)\) in the complex plane, then
\[
a = |z| \cos \varphi , \quad b = |z| \sin \varphi
\]
and so
\[
z = a + bi = |z| (\cos \varphi + i\sin \varphi ) = |z|e^{i\varphi}
\]
where \(\varphi = \arg z \, \) and \(|z| = \sqrt{a^2 + b^2}\) because \(z\cdot \bar{z} = a^2 + b^2\).