Eigenvalues & Eigenvectors

Eigenvectors and Eigenvalues

Eigenvectors reveal directions in which a linear transformation acts by pure scaling. When \(Ax = \lambda x\), the transformation \(x \to Ax\) stretches or shrinks \(x\) by the factor \(\lambda\) without changing its direction (or reversing it if \(\lambda < 0\)). This geometric interpretation makes eigenvalues and eigenvectors fundamental to understanding matrix behavior, with applications ranging from stability analysis of dynamical systems to dimensionality reduction in data science.

Definition: Eigenvectors & Eigenvalues

An eigenvector of an \(n \times n\) matrix \(A\) is a "nonzero" vector \(x\) such that \[ Ax = \lambda x \tag{1} \] for some scalar \(\lambda\), which is called an eigenvalue of \(A\) if there is a nontrivial solution \(x\) to equation (1). Such an \(x\) is referred to as an eigenvector corresponding to \(\lambda\).

Equation (1) can be written as: \[ (A - \lambda I)x = 0. \tag{2} \] The scalar \(\lambda\) is an eigenvalue of \(A\) if and only if (2) has a nontrivial solution. The set of all solutions of (2) is the null space \(Nul(A - \lambda I) \subseteq \mathbb{R}^n\) and is called the eigenspace of \(A\) corresponding to the eigenvalue \(\lambda\).

Theorem 1:

The eigenvalues of a triangular matrix are its main diagonal entries.

Proof:

Suppose \(A \in \mathbb{R}^{3 \times 3} \) is a lower triangular matrix, and \(\lambda\) is an eigenvalue of \(A\). Then, \[A - \lambda I = \begin{bmatrix} a_{11} - \lambda & 0 & 0 \\ a_{21} & a_{22} - \lambda & 0 \\ a_{31} & a_{32} & a_{33} - \lambda \\ \end{bmatrix}. \] Since \(\lambda\) is an eigenvalue of \(A\), \((A-\lambda I)x =0\) has a nontrivial solution.

In other words, the equation has a free variable. This occurs if and only if at least one of the main diagonal entries is zero so that \(\lambda\) is equal to one of the main diagonal entries of \(A\). The same reasoning applies to upper triangular matrices and higher-dimensional cases.

Example:

Consider \[ A = \begin{bmatrix} 0 & 1 & 8 \\ 0 & 2 & 7 \\ 0 & 0 & 3 \\ \end{bmatrix}. \] \(A\) has eigenvalues; \(0, 2, \text{and } 3\). Since \(A\) has a zero eigenvalue, equation (1) becomes the homogeneous equation \(Ax =0\), which must have a nontrivial solution. This happens if and only if \(A\) is a singular matrix (NOT invertible).

We can verify this by computing its determinant: \[ det A = 0(6-0)-(0-0)+8(0-0)=0 \] Since \(detA=0\), \(A\) is indeed not invertible.

Theorem 2:

If \(v_1, \cdots, v_n\) are eigenvectors corresponding to distinct eigenvalues \(\lambda_1, \cdots, \lambda_n\) of a square matrix \(A\), then the set \(\{v_1, \cdots, v_n\}\) is linearly independent.

Proof:

Assume the set of eigenvectors \(\{v_1, \cdots, v_n\}\) is linearly "dependent." By definition, each eigenvector \(v_i\) is nonzero. If the set is dependent, at least one eigenvector \(v_{i+1}\) can be written as a linear combination of the preceding eigenvectors. Let \(i\) be the least index so that \(v_{i+1}\) is a linear combination of the preceding eigenvectors. Then there exist scalars \(c_1, \cdots, c_i\) such that \[ v_{i+1} = c_1v_1 + \cdots + c_iv_i, \quad c_1, \cdots, c_i \in \mathbb{R}. \tag{3} \]

Multiplying both sides of (3) by a square \(A\), we get: \[ c_1Av_1 + \cdots + c_iAv_i = Av_{i+1} \] By definition of eigenvalues & eigenvectors, (3) becomes \[ c_1 \lambda_1 v_1 + \cdots + c_i \lambda_i v_i = \lambda_{i+1} v_{i+1}. \tag{4} \] Subtracting \(\lambda_{i+1}\) times (3) from (4), we get: \[ c_1(\lambda_1 - \lambda_{i+1}) v_1 + \cdots + c_i(\lambda_i - \lambda_{i+1} )v_i = 0. \tag{5} \]

All coefficients in (5) must be zero since \(\{ v_1, \cdots, v_i\}\) is linearly independent. However, \((\lambda_1 - \lambda_{i+1}), \cdots, (\lambda_i - \lambda_{i+1} )\) are nonzero because the eigenvalues are distinct. Hence, \(c_1 = c_2 = \cdots = c_i = 0\). This is a contradiction to equation (3) Therefore, the set \(\{v_1, \cdots, v_n\}\) must be a linearly independent set.

Characteristic Equations

While Theorem 2 tells us that eigenvectors corresponding to distinct eigenvalues are linearly independent, we still need a systematic method to find eigenvalues in the first place. The characteristic equation provides this method by reformulating the eigenvalue problem as a polynomial equation.

A scalar \(\lambda\) is an eigenvalue of an \(n \times n\) matrix \(A\) if and only if \(\lambda\) satisfies the characteristic equation \[ \det (A - \lambda I) = 0. \] This follows from equation (2): \((A - \lambda I)x = 0\) has a nontrivial solution if and only if \(A - \lambda I\) is singular, which occurs precisely when its determinant is zero.

Example:

Consider the matrix \[ A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \\ \end{bmatrix} \] Then, \[ det(A- \lambda I) = (1-\lambda)(-\lambda^2 -\lambda -24) -0 +5(5 +3\lambda) = 0 \] Expanding this, we obtain the characteristic polynomial: \[ -\lambda^3 +2\lambda^2 + 38\lambda +1 = 0 \] Solving this equation for \(\lambda\), we get eigenvalues for \(A\): \(\lambda \approx -5.230, -0.026, 7.256\).

Note: Like this example, in practice, we typically approximate eigenvalues using numerical methods.

Definition: Algebraic Multiplicity

The algebraic multiplicity of an eigenvalue is its multiplicity as a root of the characteristic equation.

For example, if the characteristic polynomial of a matrix is \((\lambda -1)^2 (\lambda -2) = 0\), then the eigenvalue 1 has multiplicity 2.

To understand when matrices share eigenvalue properties, we need the concept of similarity. Similar matrices represent the same linear transformation expressed in different coordinate systems, which explains why they must have identical eigenvalues.

Definition: Similarity

Suppose \(A\) and \(B\) are \(n \times n\) matrices. Then, \(A\) is said to be similar to \(B\) if there exists an invertible matrix \(P\) such that \[ P^{-1}AP =B \quad \text{, or equivalently, } A=PBP^{-1}. \]

Theorem 3:

If \(n \times n\) matrices \(A\) and \(B\) are similar, then they have the same characteristic polynomial and thus the same eigenvalues with the same multiplicities.
Note: the converse of this theorem is not true. Having the same eigenvalues does not imply the matrices are similar.

Proof:

If \(B = P^{-1}AP\), then \[ \begin{align*} B - \lambda I &= P^{-1}AP - \lambda P^{-1}P \\\\ &= P^{-1}(A - \lambda I)P \end{align*} \] By the multiplicative property of determinants, we have \[ \begin{align*} \det (B-\lambda I) &= \det (P^{-1}) \det (A-\lambda I) \det (P)\\\\ &= \det (A-\lambda I). \end{align*} \] Note: \(\det (P^{-1}P) = \det (I) = 1\).

Diagonalization

Similarity becomes especially powerful when we can find a matrix \(P\) that transforms \(A\) into a diagonal matrix \(D\). Diagonal matrices are computationally simple. Its powers, exponentials, and other functions become trivial to compute. The diagonalization \(A = PDP^{-1}\) allows us to transfer these computational advantages back to \(A\), since \(A^k = PD^kP^{-1}\) and computing \(D^k\) only requires raising each diagonal entry to the \(k\)-th power.

Definition: diagonalizable Matrix

A square matrix \(A\) is said to be diagonalizable if for some invertible matrix \(P\), \(A\) is similar to a diagonal matrix \(D\): \[ A = PDP^{-1}. \]

Theorem 4: Diagonalization

An \(n \times n\) matrix \(A\) is diagonalizable iff \(A\) has \(n\) linearly independent eigenvectors. Thus, the columns of \(P\) are linearly independent eigenvectors of \(A\) and the diagonal entries of \(D\) are eigenvalues of \(A\) corresponding to the eigenvectors in \(P\). Note: \(P\) and \(D\) are not unique because the order of the diagonal entries in \(D\) can be changed.

Proof:

Let \(P\) be a square matrix with columns \(v_1, \cdots, v_n\) and \(D\) be a diagonal matrix with diagonal entries \(\lambda_1, \cdots, \lambda_n \). Then, \[ AP = \begin{bmatrix}Av_1 & \cdots & Av_n \\\end{bmatrix} \] \[ PD = \begin{bmatrix}\lambda_1 v_1 & \cdots & \lambda_n v_n \\\end{bmatrix} \] Assume \(A\) is diagonalizable, so \(A = PDP^{-1}\). By multiplying both sides of this equation on the right by \(P\), we get: \[ AP = PD \]. Thus, for each column of \(AP\), we have \[ Av_1 = \lambda_1 v_1, \cdots, Av_n = \lambda_n v_n. \tag{1} \] Since \(P\) is invertible, its columns are linearly independent and must be nonzero. From equation (1), \(\lambda_1, \cdots, \lambda_n \) are eigenvalues, and \(v_1, \cdots, v_n\) are corresponding eigenvectors.

Finally, consider any \(n\) linearly independent eigenvectors \(v_1, \cdots, v_n\). We can construct \(P\) and \(D\) from these eigenvectors and their corresponding eigenvalues \(\lambda_1, \dots, \lambda_n\). If the eigenvectors are linearly independent, then \(P\) is invertible, and we obtain \(A = P D P^{-1}\).

Example:

Given \[ A = \begin{bmatrix} 4 & 1 & 1\\ 1 & 4 & 1 \\ 1 & 1 & 4 \\ \end{bmatrix}, \] computing the characteristic equation: \[ \det(A- \lambda I) = (4-\lambda)((4-\lambda)^2 - 1) -((4-\lambda )-1)+(1-(4 -\lambda)) = 0. \] Simplifying this, we get: \[ -\lambda^3 +12\lambda^2 -45\lambda -54 = 0, \] which is factored as \[ (\lambda - 3)^2(\lambda -6) = 0. \] Thus, eigenvalues are \(\lambda_1 = 3\), \(\lambda_2 = 3\), and \(\lambda_3 = 6\).

Next, we need to find three linearly independent eigenvectors corresponding to each eigenvalue. For \(\lambda_1 = 3\) and \(\lambda_2 = 3\): \[ A-3I = \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix} \xrightarrow{\text{rref}} \begin{bmatrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\\end{bmatrix}. \]

We choose the eigenvectors \(v_1 = \begin{bmatrix} -1 \\ 0 \\ 1 \\ \end{bmatrix}\) for \(\lambda_1 = 3\) and \(v_2 = \begin{bmatrix} -1 \\ 1 \\ 0 \\ \end{bmatrix}\).

For \(\lambda_3 = 6\): \[ A-6I = \begin{bmatrix} -2 & 1 & 1\\ 1 & -2 & 1 \\ 1 & 1 & -2 \\ \end{bmatrix} \xrightarrow{\text{rref}} \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 0 \\\end{bmatrix} \] We choose the eigenvector \(v_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ \end{bmatrix}\).

Therefore, \[ \begin{align*} &D = \begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \\ \end{bmatrix}, \\\\ &P = \begin{bmatrix} -1 & -1 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ \end{bmatrix}, \\\\ &P^{-1} = \begin{bmatrix} \frac{-1}{3} & \frac{-1}{3} & \frac{2}{3} \\ \frac{-1}{3} & \frac{2}{3} & \frac{-1}{3}\\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \end{bmatrix}. \end{align*} \]

This example demonstrates an important point: an \(n \times n\) matrix can be diagonalizable even without \(n\) distinct eigenvalues. By Theorem 4, diagonalizability requires \(n\) linearly independent eigenvectors, NOT \(n\) "distinct" eigenvalues. When an eigenvalue \(\lambda\) has algebraic multiplicity greater than 1, diagonalizability depends on whether its eigenspace \(\text{Nul}(A - \lambda I)\) has dimension equal to this multiplicity. Here, the eigenvalue 3 has multiplicity 2, and we found 2 linearly independent eigenvectors in its eigenspace, satisfying the requirement for diagonalizability.

Note: If a matrix has \(n\) distinct eigenvalues, then by Theorem 2 it automatically has \(n\) linearly independent eigenvectors and must be diagonalizable. The converse is false, as this example shows.

Complex Eigenvalues and Eigenvectors

The characteristic polynomial of a real matrix may have complex roots. For instance, rotation matrices in \(\mathbb{R}^2\) have no real eigenvalues (except for rotations by \(0\) or \(\pi\)) because rotations preserve all directions rather than stretching along any particular direction. However, when we allow complex eigenvalues and eigenvectors, these matrices become diagonalizable over \(\mathbb{C}\).

A complex scalar \(\lambda\) satisfies the characteristic equation \(\det(A -\lambda I) = 0\) if and only if there is a nonzero vector \(x \in \mathbb{C}^n\) such that \(Ax = \lambda x\). In this case, \(\lambda\) is called a complex eigenvalue and \(x\) is its corresponding complex eigenvector.

Rotation matrices demonstrate the importance of complex eigenvalues in understanding geometric transformations. While a rotation in \(\mathbb{R}^2\) (excluding rotations by multiples of \(\pi\)) has no real eigenvectors, it is always diagonalizable over \(\mathbb{C}\). The complex eigenvalues encode both the rotation angle and any scaling, revealing the structure of the transformation. (For orthogonal and symmetric matrices, see Orthogonality and Symmetry.)

Example:

Consider the 2D rotation matrix \(R = \begin{bmatrix} a & -b \\ b & a \end{bmatrix}\), where \(a\) and \(b\) are real and not both nonzero. Then the "complex" eigenvalues of \(R\) can be found by solving the characteristic equation: \[ \begin{align*} \det (R - \lambda I)=0 &\Longrightarrow \lambda^2 -2a\lambda +(a^2 + b^2) = 0 \\\\ &\Longrightarrow (\lambda -(a+bi))(\lambda -(a-bi)) = 0 \end{align*} \] Thus, we get complex eigenvalues \(\lambda = a \pm bi\), and we can find complex eigenvectors: \[ Rv_1 = \begin{bmatrix} a & -b \\ b & a \\ \end{bmatrix} \begin{bmatrix} 1 \\ -i \\ \end{bmatrix} = \begin{bmatrix} a + bi \\ b - ai\\ \end{bmatrix} = (a+bi)\begin{bmatrix} 1 \\ -i \\ \end{bmatrix}. \] Hence, \(v_1 = \begin{bmatrix} 1 \\ -i \\ \end{bmatrix}\) is an eigenvector corresponding to the eigenvalue \(\lambda = a + bi\).

Moreover, the complex conjugate of \(\lambda\), denoted \(\bar{\lambda} = a - bi\), is an eigenvalue with its corresponding eigenvector \(v_2 = \begin{bmatrix} 1 \\ i \\ \end{bmatrix}\). Therefore, we can diagonalize \(R\) in \(\mathbb{C}^2 \,\): \[ \begin{align*} R &= PDP^{-1} \\\\ &= \begin{bmatrix} 1 & 1 \\ -i & i\end{bmatrix} \begin{bmatrix} a+bi & 0 \\ 0 & a-bi \end{bmatrix} \frac{1}{2}\begin{bmatrix} 1 & i \\ 1 & -i\end{bmatrix}. \end{align*} \]

Now, we can map this representation back to \(\mathbb{R}^2\) as the polar decomposition of \(R\) into its rotation angle \(\varphi\) and scaling factor \(r\).

Let \(r = \sqrt{a^2 + b^2}\) be the magnitude of \(\lambda\), so \(|\lambda | = r \) and \(\varphi\) be the angle(the argument of \(\|\lambda \| = r\)) such that \[ \cos \varphi = \frac{a}{r}, \quad \sin \varphi = \frac{b}{r}. \] Then \(R\) can be written in polar form as: \[ \begin{align*} R &= r \begin{bmatrix} \frac{a}{r} & \frac{-b}{r} \\ \frac{b}{r} & \frac{a}{r} \\ \end{bmatrix} \\\\ &= \begin{bmatrix} r & 0 \\ 0 & r \\ \end{bmatrix} \begin{bmatrix} \cos \varphi & -\sin \varphi \\ \sin \varphi & \cos \varphi \\ \end{bmatrix}. \end{align*} \]

Note:
In general, given a nonzero complex number \(z\) corresponding to a point \((a, b)\) in the complex plane, then \[ a = |z| \cos \varphi , \quad b = |z| \sin \varphi \] and so \[ z = a + bi = |z| (\cos \varphi + i\sin \varphi ) = |z|e^{i\varphi} \] where \(\varphi = \arg z \, \) and \(|z| = \sqrt{a^2 + b^2}\) because \(z\cdot \bar{z} = a^2 + b^2\).

Loading...

Eigenvectors and Eigenvalues

Characteristic Equations

Diagonalization

Complex Eigenvalues and Eigenvectors