Eigenvectors and Eigenvalues
Eigenvectors reveal directions in which a linear transformation acts by pure scaling. When \(Ax = \lambda x\),
the transformation \(x \to Ax\) stretches or shrinks \(x\) by the factor \(\lambda\) without changing its
direction (or reversing it if \(\lambda < 0\)). This geometric interpretation makes eigenvalues and eigenvectors
fundamental to understanding matrix behavior, with applications ranging from stability analysis of dynamical systems
to dimensionality reduction in data science.
Definition: Eigenvectors & Eigenvalues
An eigenvector of an \(n \times n\) matrix \(A\) is a "nonzero" vector \(x\) such that
\[
Ax = \lambda x \tag{1}
\]
for some scalar \(\lambda\), which is called an eigenvalue of \(A\) if there is a nontrivial
solution \(x\) to equation (1). Such an \(x\) is referred to as an eigenvector corresponding to \(\lambda\).
Equation (1) can be written as:
\[
(A - \lambda I)x = 0. \tag{2}
\]
The scalar \(\lambda\) is an eigenvalue of \(A\) if and only if (2) has a nontrivial solution. The set of all solutions
of (2) is the null space \(Nul(A - \lambda I) \subseteq \mathbb{R}^n\) and is called the eigenspace of \(A\)
corresponding to the eigenvalue \(\lambda\).
Theorem 1:
The eigenvalues of a triangular matrix are its main diagonal entries.
Proof:
Suppose \(A \in \mathbb{R}^{3 \times 3} \) is a lower triangular matrix, and \(\lambda\) is an
eigenvalue of \(A\). Then,
\[A - \lambda I = \begin{bmatrix}
a_{11} - \lambda & 0 & 0 \\
a_{21} & a_{22} - \lambda & 0 \\
a_{31} & a_{32} & a_{33} - \lambda \\
\end{bmatrix}.
\]
Since \(\lambda\) is an eigenvalue of \(A\), \((A-\lambda I)x =0\) has a nontrivial solution.
In other words, the equation has a free variable. This occurs if and only if at least one of the main
diagonal entries is zero so that \(\lambda\) is equal to one of the main diagonal entries of \(A\).
The same reasoning applies to upper triangular matrices and higher-dimensional cases.
Example:
Consider
\[
A = \begin{bmatrix} 0 & 1 & 8 \\ 0 & 2 & 7 \\ 0 & 0 & 3 \\ \end{bmatrix}.
\]
\(A\) has eigenvalues; \(0, 2, \text{and } 3\). Since \(A\) has a zero eigenvalue, equation (1) becomes
the homogeneous equation \(Ax =0\), which must have a nontrivial solution. This happens if and only if
\(A\) is a singular matrix (NOT invertible).
We can verify this by computing its determinant:
\[
det A = 0(6-0)-(0-0)+8(0-0)=0
\]
Since \(detA=0\), \(A\) is indeed not invertible.
Theorem 2:
If \(v_1, \cdots, v_n\) are eigenvectors corresponding to distinct eigenvalues \(\lambda_1, \cdots, \lambda_n\)
of a square matrix \(A\), then the set \(\{v_1, \cdots, v_n\}\) is linearly independent.
Proof:
Assume the set of eigenvectors \(\{v_1, \cdots, v_n\}\) is linearly "dependent." By definition, each
eigenvector \(v_i\) is nonzero. If the set is dependent, at least one eigenvector \(v_{i+1}\) can be
written as a linear combination of the preceding eigenvectors. Let \(i\) be the least index so that
\(v_{i+1}\) is a linear combination of the preceding eigenvectors. Then there exist scalars \(c_1, \cdots, c_i\)
such that
\[
v_{i+1} = c_1v_1 + \cdots + c_iv_i, \quad c_1, \cdots, c_i \in \mathbb{R}. \tag{3}
\]
Multiplying both sides of (3) by a square \(A\), we get:
\[
c_1Av_1 + \cdots + c_iAv_i = Av_{i+1}
\]
By definition of eigenvalues & eigenvectors, (3) becomes
\[
c_1 \lambda_1 v_1 + \cdots + c_i \lambda_i v_i = \lambda_{i+1} v_{i+1}. \tag{4}
\]
Subtracting \(\lambda_{i+1}\) times (3) from (4), we get:
\[
c_1(\lambda_1 - \lambda_{i+1}) v_1 + \cdots + c_i(\lambda_i - \lambda_{i+1} )v_i = 0. \tag{5}
\]
All coefficients in (5) must be zero since \(\{ v_1, \cdots, v_i\}\) is linearly independent.
However, \((\lambda_1 - \lambda_{i+1}), \cdots, (\lambda_i - \lambda_{i+1} )\) are nonzero because
the eigenvalues are distinct. Hence, \(c_1 = c_2 = \cdots = c_i = 0\). This is a contradiction to equation (3)
Therefore, the set \(\{v_1, \cdots, v_n\}\) must be a linearly independent set.
Characteristic Equations
While Theorem 2 tells us that eigenvectors corresponding to distinct eigenvalues are linearly independent,
we still need a systematic method to find eigenvalues in the first place. The characteristic equation
provides this method by reformulating the eigenvalue problem as a polynomial equation.
A scalar \(\lambda\) is an eigenvalue of an \(n \times n\) matrix \(A\) if and only if \(\lambda\) satisfies
the characteristic equation
\[
\det (A - \lambda I) = 0.
\]
This follows from equation (2): \((A - \lambda I)x = 0\) has a nontrivial solution if and only if
\(A - \lambda I\) is singular, which occurs precisely when its determinant is zero.
Example:
Consider the matrix
\[
A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \\ \end{bmatrix}
\]
Then,
\[
det(A- \lambda I) = (1-\lambda)(-\lambda^2 -\lambda -24) -0 +5(5 +3\lambda) = 0
\]
Expanding this, we obtain the characteristic polynomial:
\[
-\lambda^3 +2\lambda^2 + 38\lambda +1 = 0
\]
Solving this equation for \(\lambda\), we get eigenvalues for \(A\): \(\lambda \approx -5.230, -0.026, 7.256\).
Note: Like this example, in practice, we typically approximate eigenvalues using numerical methods.
Definition: Algebraic Multiplicity
The algebraic multiplicity of an eigenvalue is its multiplicity as a root of the
characteristic equation.
For example, if the characteristic polynomial of a matrix is \((\lambda -1)^2 (\lambda -2) = 0\), then
the eigenvalue 1 has multiplicity 2.
To understand when matrices share eigenvalue properties, we need the concept of similarity.
Similar matrices represent the same linear transformation expressed in different coordinate systems,
which explains why they must have identical eigenvalues.
Definition: Similarity
Suppose \(A\) and \(B\) are
\(n \times n\) matrices. Then, \(A\) is said to be similar to \(B\) if there exists an
invertible matrix \(P\) such that
\[
P^{-1}AP =B \quad \text{, or equivalently, } A=PBP^{-1}.
\]
Theorem 3:
If \(n \times n\) matrices \(A\) and \(B\) are similar, then they have the same characteristic
polynomial and thus the same eigenvalues with the same multiplicities.
Note: the converse of this theorem is not true. Having the same eigenvalues does not imply
the matrices are similar.
Proof:
If \(B = P^{-1}AP\), then
\[
\begin{align*}
B - \lambda I &= P^{-1}AP - \lambda P^{-1}P \\\\
&= P^{-1}(A - \lambda I)P
\end{align*}
\]
By the multiplicative property of determinants, we have
\[
\begin{align*}
\det (B-\lambda I) &= \det (P^{-1}) \det (A-\lambda I) \det (P)\\\\
&= \det (A-\lambda I).
\end{align*}
\]
Note: \(\det (P^{-1}P) = \det (I) = 1\).
Diagonalization
Similarity becomes especially powerful when we can find a matrix \(P\) that transforms \(A\) into a
diagonal matrix \(D\). Diagonal matrices are computationally simple. Its powers, exponentials, and other
functions become trivial to compute. The diagonalization \(A = PDP^{-1}\) allows us to transfer these
computational advantages back to \(A\), since \(A^k = PD^kP^{-1}\) and computing \(D^k\) only requires
raising each diagonal entry to the \(k\)-th power.
Definition: diagonalizable Matrix
A square matrix \(A\) is said to be diagonalizable if for some invertible matrix \(P\),
\(A\) is similar to a diagonal matrix \(D\):
\[
A = PDP^{-1}.
\]
Theorem 4: Diagonalization
An \(n \times n\) matrix \(A\) is diagonalizable iff \(A\) has \(n\) linearly independent eigenvectors.
Thus, the columns of \(P\) are linearly independent eigenvectors of \(A\) and the diagonal entries of \(D\)
are eigenvalues of \(A\) corresponding to the eigenvectors in \(P\).
Note: \(P\) and \(D\) are not unique because the order of the diagonal entries in \(D\) can be changed.
Proof:
Let \(P\) be a square matrix with columns \(v_1, \cdots, v_n\) and \(D\) be a diagonal matrix with
diagonal entries \(\lambda_1, \cdots, \lambda_n \). Then,
\[
AP = \begin{bmatrix}Av_1 & \cdots & Av_n \\\end{bmatrix}
\]
\[
PD = \begin{bmatrix}\lambda_1 v_1 & \cdots & \lambda_n v_n \\\end{bmatrix}
\]
Assume \(A\) is diagonalizable, so \(A = PDP^{-1}\).
By multiplying both sides of this equation on the right by \(P\), we get:
\[
AP = PD
\].
Thus, for each column of \(AP\), we have
\[
Av_1 = \lambda_1 v_1, \cdots, Av_n = \lambda_n v_n. \tag{1}
\]
Since \(P\) is invertible, its columns are linearly independent and must be nonzero.
From equation (1), \(\lambda_1, \cdots, \lambda_n \) are eigenvalues, and \(v_1, \cdots, v_n\) are
corresponding eigenvectors.
Finally, consider any \(n\) linearly independent eigenvectors \(v_1, \cdots, v_n\). We can construct \(P\) and \(D\)
from these eigenvectors and their corresponding eigenvalues \(\lambda_1, \dots, \lambda_n\).
If the eigenvectors are linearly independent, then \(P\) is invertible, and we obtain \(A = P D P^{-1}\).
Example:
Given
\[
A = \begin{bmatrix} 4 & 1 & 1\\ 1 & 4 & 1 \\ 1 & 1 & 4 \\ \end{bmatrix},
\]
computing the characteristic equation:
\[
\det(A- \lambda I) = (4-\lambda)((4-\lambda)^2 - 1) -((4-\lambda )-1)+(1-(4 -\lambda)) = 0.
\]
Simplifying this, we get:
\[
-\lambda^3 +12\lambda^2 -45\lambda -54 = 0,
\]
which is factored as
\[
(\lambda - 3)^2(\lambda -6) = 0.
\]
Thus, eigenvalues are \(\lambda_1 = 3\), \(\lambda_2 = 3\), and \(\lambda_3 = 6\).
Next, we need to find three linearly independent eigenvectors corresponding to each eigenvalue.
For \(\lambda_1 = 3\) and \(\lambda_2 = 3\):
\[
A-3I = \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix}
\xrightarrow{\text{rref}}
\begin{bmatrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\\end{bmatrix}.
\]
We choose the eigenvectors \(v_1 = \begin{bmatrix} -1 \\ 0 \\ 1 \\ \end{bmatrix}\) for \(\lambda_1 = 3\)
and \(v_2 = \begin{bmatrix} -1 \\ 1 \\ 0 \\ \end{bmatrix}\).
For \(\lambda_3 = 6\):
\[
A-6I = \begin{bmatrix} -2 & 1 & 1\\ 1 & -2 & 1 \\ 1 & 1 & -2 \\ \end{bmatrix}
\xrightarrow{\text{rref}}
\begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 0 \\\end{bmatrix}
\]
We choose the eigenvector \(v_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ \end{bmatrix}\).
Therefore,
\[
\begin{align*}
&D = \begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \\ \end{bmatrix}, \\\\
&P = \begin{bmatrix} -1 & -1 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ \end{bmatrix}, \\\\
&P^{-1} =
\begin{bmatrix} \frac{-1}{3} & \frac{-1}{3} & \frac{2}{3} \\
\frac{-1}{3} & \frac{2}{3} & \frac{-1}{3}\\
\frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \end{bmatrix}.
\end{align*}
\]
This example demonstrates an important point: an \(n \times n\) matrix can be diagonalizable even without
\(n\) distinct eigenvalues. By Theorem 4, diagonalizability requires \(n\) linearly independent eigenvectors,
NOT \(n\) "distinct" eigenvalues. When an eigenvalue \(\lambda\) has algebraic multiplicity greater than 1,
diagonalizability depends on whether its eigenspace \(\text{Nul}(A - \lambda I)\) has dimension equal to
this multiplicity. Here, the eigenvalue 3 has multiplicity 2, and we found 2 linearly independent eigenvectors
in its eigenspace, satisfying the requirement for diagonalizability.
Note: If a matrix has \(n\) distinct eigenvalues, then by Theorem 2 it automatically has \(n\) linearly
independent eigenvectors and must be diagonalizable. The converse is false, as this example shows.
Complex Eigenvalues and Eigenvectors
The characteristic polynomial of a real matrix may have complex roots. For instance,
rotation matrices in \(\mathbb{R}^2\) have no real eigenvalues (except for rotations by \(0\) or \(\pi\))
because rotations preserve all directions rather than stretching along any particular direction.
However, when we allow complex eigenvalues and eigenvectors, these matrices become diagonalizable over
\(\mathbb{C}\).
A complex scalar \(\lambda\) satisfies the characteristic equation \(\det(A -\lambda I) = 0\) if and only
if there is a nonzero vector \(x \in \mathbb{C}^n\) such that \(Ax = \lambda x\). In this case, \(\lambda\)
is called a complex eigenvalue and \(x\) is its corresponding complex eigenvector.
Rotation matrices demonstrate the importance of complex eigenvalues in understanding geometric transformations.
While a rotation in \(\mathbb{R}^2\) (excluding rotations by multiples of \(\pi\)) has no real eigenvectors,
it is always diagonalizable over \(\mathbb{C}\). The complex eigenvalues encode both the rotation angle and
any scaling, revealing the structure of the transformation.
(For orthogonal and symmetric matrices, see Orthogonality
and Symmetry.)
Example:
Consider the 2D rotation matrix \(R = \begin{bmatrix} a & -b \\ b & a \end{bmatrix}\), where
\(a\) and \(b\) are real and not both nonzero. Then the "complex" eigenvalues of \(R\) can be found by
solving the characteristic equation:
\[
\begin{align*}
\det (R - \lambda I)=0
&\Longrightarrow
\lambda^2 -2a\lambda +(a^2 + b^2) = 0 \\\\
&\Longrightarrow
(\lambda -(a+bi))(\lambda -(a-bi)) = 0
\end{align*}
\]
Thus, we get complex eigenvalues \(\lambda = a \pm bi\), and we can find complex eigenvectors:
\[
Rv_1 = \begin{bmatrix} a & -b \\ b & a \\ \end{bmatrix} \begin{bmatrix} 1 \\ -i \\ \end{bmatrix}
= \begin{bmatrix} a + bi \\ b - ai\\ \end{bmatrix}
= (a+bi)\begin{bmatrix} 1 \\ -i \\ \end{bmatrix}.
\]
Hence, \(v_1 = \begin{bmatrix} 1 \\ -i \\ \end{bmatrix}\) is an eigenvector corresponding to
the eigenvalue \(\lambda = a + bi\).
Moreover, the complex conjugate of \(\lambda\), denoted \(\bar{\lambda} = a - bi\), is
an eigenvalue with its corresponding eigenvector \(v_2 = \begin{bmatrix} 1 \\ i \\ \end{bmatrix}\).
Therefore, we can diagonalize \(R\) in \(\mathbb{C}^2 \,\):
\[
\begin{align*}
R &= PDP^{-1} \\\\
&= \begin{bmatrix} 1 & 1 \\ -i & i\end{bmatrix}
\begin{bmatrix} a+bi & 0 \\ 0 & a-bi \end{bmatrix}
\frac{1}{2}\begin{bmatrix} 1 & i \\ 1 & -i\end{bmatrix}.
\end{align*}
\]
Now, we can map this representation back to \(\mathbb{R}^2\) as the polar decomposition of \(R\) into
its rotation angle \(\varphi\) and scaling factor \(r\).
Let \(r = \sqrt{a^2 + b^2}\) be the magnitude of \(\lambda\), so \(|\lambda | = r \) and \(\varphi\) be the
angle(the argument of \(\|\lambda \| = r\)) such that
\[
\cos \varphi = \frac{a}{r}, \quad \sin \varphi = \frac{b}{r}.
\]
Then \(R\) can be written in polar form as:
\[
\begin{align*}
R &= r \begin{bmatrix} \frac{a}{r} & \frac{-b}{r} \\ \frac{b}{r} & \frac{a}{r} \\ \end{bmatrix} \\\\
&= \begin{bmatrix} r & 0 \\ 0 & r \\ \end{bmatrix} \begin{bmatrix} \cos \varphi & -\sin \varphi \\ \sin \varphi & \cos \varphi \\ \end{bmatrix}.
\end{align*}
\]
Note:
In general, given a nonzero complex number \(z\) corresponding to a point \((a, b)\) in the complex plane, then
\[
a = |z| \cos \varphi , \quad b = |z| \sin \varphi
\]
and so
\[
z = a + bi = |z| (\cos \varphi + i\sin \varphi ) = |z|e^{i\varphi}
\]
where \(\varphi = \arg z \, \) and \(|z| = \sqrt{a^2 + b^2}\) because \(z\cdot \bar{z} = a^2 + b^2\).