Symmetry

Symmetric Matrices Quadratic Forms Singular Value Decomposition(SVD) Pseudo-inverse via SVD SVD with Fundamental Subspaces Interactive SVD Visualization

Symmetric Matrices

Symmetric matrices arise naturally throughout applied mathematics from least-squares problems where \(X^\top X\) is always symmetric, to quadratic forms in optimization, to covariance matrices in statistics. They have special properties that make them particularly tractable: real eigenvalues, orthogonal eigenvectors, and guaranteed diagonalizability.

Definition: Symmetric Matrix

A symmetric matrix is a square matrix \(A\) such that \[ A^\top = A. \]

For example, in the context of least-squares solutions, \(\hat{\boldsymbol{\beta}}\) satisfies the normal equations \(X^\top X\boldsymbol{\beta} = X^\top \mathbf{Y}\). In this expression, \(X^\top X\) is always symmetric. To confirm, let \(X\) be an \(m \times n\) matrix. Then \[ (X^\top X)^\top = X^\top (X^\top)^\top = X^\top X. \]

Definition: Orthogonal Diagonalization

An \(n \times n\) matrix \(A\) is said to be orthogonally diagonalizable if there exist an orthogonal matrix \(P\) (with \(P^{-1} = P^\top \)) and a diagonal matrix \(D\) such that: \[ A = PDP^\top = PDP^{-1}. \]

For this diagonalization, \(A\) must have \(n\) linearly independent and orthonormal eigenvectors.

Theorem: The Spectral Theorem for Symmetric Matrices

An \(n \times n\) matrix \(A\) is orthogonally diagonalizable if and only if \(A\) is a symmetric matrix.

Proof (forward direction):

Suppose \(A\) is orthogonally diagonalizable, so \(A = PDP^\top\) for some orthogonal \(P\) and diagonal \(D\). Then \[ \begin{align*} A^\top = (PDP^\top)^\top &= (P^\top)^\top D^\top P^\top \\ &= PDP^\top \\ &= A, \end{align*} \] using \((P^\top)^\top = P\) and \(D^\top = D\) (diagonal matrices are symmetric). Hence \(A\) is symmetric.

The converse — every symmetric matrix is orthogonally diagonalizable — is the substantive direction of the Spectral Theorem. Its proof (by induction on dimension, using the existence of a real eigenvalue for symmetric matrices) is omitted here for space.

Example:

On Eigenvalues & Eigenvectors, we considered the example: \[ \begin{align*} A &= \begin{bmatrix} 4 & 1 & 1\\ 1 & 4 & 1 \\ 1 & 1 & 4 \\ \end{bmatrix} \\ &= PDP^{-1} \\ &= \begin{bmatrix} -1 & -1 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \\ \end{bmatrix} \begin{bmatrix} \frac{-1}{3} & \frac{-1}{3} & \frac{2}{3} \\ \frac{-1}{3} & \frac{2}{3} & \frac{-1}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{bmatrix}. \end{align*} \]

Since, \(A\) is symmetric, it must be orthogonally diagonalizable. To ensure \(P\) is orthonormal, we need orthonormal eigenvectors. This can be achieved using the Gram-Schmidt algorithm on the columns of \(P\). Since only \(\mathbf{v}_1 \cdot \mathbf{v}_2 \neq 0\) needs adjustment, \[ \begin{align*} \mathbf{v}'_2 &= \mathbf{v}_2 - \frac{\mathbf{v}_2 \cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\mathbf{v}_1 \\\\ &= \begin{bmatrix} -1\\ 1\\ 0\\ \end{bmatrix} - \frac{1}{2}\begin{bmatrix} -1\\ 0\\ 1\\ \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{-1}{2}\\ 1 \\ \frac{-1}{2} \end{bmatrix}. \end{align*} \] After normalization, the orthonormal eigenvectors are: \[ \mathbf{u}_1 = \begin{bmatrix} \frac{-1}{\sqrt{2}} \\ 0\\ \frac{1}{\sqrt{2}} \\ \end{bmatrix} ,\, \mathbf{u}_2 = \begin{bmatrix} \frac{-1}{\sqrt{6}}\\ \frac{2}{\sqrt{6}}\\ \frac{-1}{\sqrt{6}}\\ \end{bmatrix}, \, \text{and } \mathbf{u}_3 = \begin{bmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \end{bmatrix} \] Therefore, \[ \begin{align*} A &= PDP^\top \\\\ &= \begin{bmatrix} \frac{-1}{\sqrt{2}} & \frac{-1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{bmatrix} \begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \\ \end{bmatrix} \begin{bmatrix} \frac{-1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{6}} & \frac{2}{\sqrt{6}} & \frac{-1}{\sqrt{6}} \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{bmatrix} \\\\ &= PDP^{-1} \end{align*} \]

Quadratic Forms

The connection between symmetric matrices and their eigenvalues becomes especially clear through quadratic forms. These scalar-valued functions \(Q(\mathbf{x}) = \mathbf{x}^\top A\mathbf{x}\) appear throughout optimization, physics, and machine learning. Understanding how to eliminate cross-product terms reveals the geometric meaning of eigenvalues.

Definition: Quadratic Form

A quadratic form on \(\mathbb{R}^n\) is a function \[ Q(\mathbf{x}) = \mathbf{x}^\top A\mathbf{x} \in \mathbb{R} \] where \(A\) is an \(n \times n\) symmetric matrix.

Example:

Let's use our symmetric matrix again: \[ A = \begin{bmatrix} 4 & 1 & 1\\ 1 & 4 & 1 \\ 1 & 1 & 4 \\ \end{bmatrix} \] Then: \[ \begin{align*} Q(\mathbf{x}) &= \mathbf{x}^\top A\mathbf{x} \\ &= 4x_1^2 +4 x_2^2 + 4x_3^2 + 2x_1x_2 + 2x_2x_3 + 2x_3x_1. \end{align*} \]

We can transform this quadratic form into one with no cross-product terms.

Let \(\mathbf{x} = P\mathbf{y}\) be a change of variable where \(P\) is an invertible matrix and \(\mathbf{y}\) is a new variable in \(\mathbb{R}^n\). Since \(A\) is symmetric, by the Spectral Theorem, \(A\) is orthogonally diagonalizable. Thus \[ \begin{align*} \mathbf{x}^\top A\mathbf{x} &= (P\mathbf{y})^\top A(P\mathbf{y}) \\\\ &= \mathbf{y}^\top P^\top AP\mathbf{y} \\\\ &= \mathbf{y}^\top (P^\top AP)\mathbf{y} \\\\ &= \mathbf{y}^\top(P^\top PDP^\top P)\mathbf{y} \\\\ &= \mathbf{y}^\top D\mathbf{y} \end{align*} \] where \(\mathbf{y} = P^{-1}\mathbf{x} = P^\top \mathbf{x}\).

Since \(P^\top A P = D\), where \(D\) is a diagonal matrix, we get: \[ \begin{align*} \mathbf{x}^\top A \mathbf{x} &= \mathbf{y}^\top D \mathbf{y} \\\\ &= \lambda_1 y_1^2 + \lambda_2 y_2^2 + \cdots + \lambda_n y_n^2 \end{align*} \] where \(\lambda_1, \lambda_2, \ldots, \lambda_n\) are the eigenvalues of \(A\).

Example:

For our example, \(Q(\mathbf{x})\) becomes: \[ Q(\mathbf{x}) = 3y_1^2 + 3y_2^2 + 6 y_3^2. \]

Thus, the eigenvalues of \(A\) are directly connected to its quadratic form. First, we define the classification of quadratic forms.

Definition: Classification of Quadratic Forms

A quadratic form \(Q(\mathbf{x})\) is classified as:

  1. Positive definite:
    \(\qquad \qquad A \succ 0\) if \(\, \forall \mathbf{x} \neq \mathbf{0}, \, Q(\mathbf{x}) > 0\).
  2. Positive semi-definite:
    \(\qquad \qquad A \succeq 0\) if \(\, \forall \mathbf{x}, \, Q(\mathbf{x}) \geq 0\).
  3. Negative definite:
    \(\qquad \qquad A \prec 0\) if \(\, \forall \mathbf{x} \neq \mathbf{0}, \, Q(\mathbf{x}) < 0\).
  4. Negative semi-definite:
    \(\qquad \qquad A \preceq 0\) if \(\, \forall \mathbf{x}, \, Q(\mathbf{x}) \leq 0\).
  5. Indefinite:
    \(\qquad \qquad\) otherwise.
Theorem: Definiteness via Eigenvalue Signs

Let \(A\) be an \(n \times n\) symmetric matrix with eigenvalues \(\lambda_1, \ldots, \lambda_n\). Then the quadratic form \(\mathbf{x}^\top A\mathbf{x}\) is

  1. positive definite if and only if all \(\lambda_i > 0\);
  2. positive semi-definite if and only if all \(\lambda_i \geq 0\);
  3. negative definite if and only if all \(\lambda_i < 0\);
  4. negative semi-definite if and only if all \(\lambda_i \leq 0\);
  5. indefinite if and only if \(A\) has both a positive and a negative eigenvalue.
Proof:

Recall from the derivation above that the orthogonal change of variable \(\mathbf{x} = P\mathbf{y}\) (where \(\mathbf{y} = P^\top \mathbf{x}\)) transforms \[ Q(\mathbf{x}) = \mathbf{x}^\top A\mathbf{x} = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2. \] Since \(P\) is invertible, the correspondence \(\mathbf{x} \leftrightarrow \mathbf{y}\) is a bijection on \(\mathbb{R}^n\) that carries nonzero vectors to nonzero vectors.

Positive definite case. If all \(\lambda_i > 0\), then for any \(\mathbf{x} \neq \mathbf{0}\) the corresponding \(\mathbf{y} \neq \mathbf{0}\) has some \(y_k \neq 0\), and \[ Q(\mathbf{x}) = \sum_{i=1}^n \lambda_i y_i^2 \geq \lambda_k y_k^2 > 0 \] (the inequality uses \(\lambda_i > 0\) for every \(i\) to drop nonnegative terms). Conversely, if some \(\lambda_k \leq 0\), set \(\mathbf{x} = P\mathbf{e}_k\) (the \(k\)-th column of \(P\), which is nonzero since \(P\) is invertible). Then \(\mathbf{y} = P^\top P \mathbf{e}_k = \mathbf{e}_k\), giving \(Q(\mathbf{x}) = \lambda_k \leq 0\) — so \(A\) is not positive definite.

Semi-definite and negative cases. The same argument with the obvious sign changes establishes the characterizations of positive semi-definite (\(\lambda_i \geq 0\) for all \(i\); the strict inequality \(\lambda_k y_k^2 > 0\) becomes \(\geq 0\)), negative definite, and negative semi-definite.

Indefinite case. If \(A\) has both a positive eigenvalue \(\lambda_j\) and a negative eigenvalue \(\lambda_k\), then \(\mathbf{x}_+ := P\mathbf{e}_j\) gives \(Q(\mathbf{x}_+) = \lambda_j > 0\) while \(\mathbf{x}_- := P\mathbf{e}_k\) gives \(Q(\mathbf{x}_-) = \lambda_k < 0\), so \(Q\) is indefinite. Conversely, if \(A\) is indefinite, then it fails to be positive semi-definite (so some \(\lambda_j < 0\)) and fails to be negative semi-definite (so some \(\lambda_k > 0\)).

The columns of \(P\) provide the coordinate system in which the quadratic form becomes diagonal. These columns are called the principal axes of the quadratic form \(\mathbf{x}^\top A\mathbf{x}\), and they align with the eigenvectors of \(A\).

Note:
To check whether a matrix \(A\) is positive definite, it is effective to verify the existence of a Cholesky factorization \(A = R^\top R\) where \(R\) is an upper triangular matrix with positive diagonal entries. This factorization is a modified version of the LU factorization.

Insight:

Positive definiteness of \(A\) means the quadratic form \(\mathbf{x}^\top A\mathbf{x}\) defines a "bowl" shape (all eigenvalues positive). This guarantees a unique minimum for optimization problems, making positive definite matrices central to convex optimization and machine learning.

Singular Value Decomposition(SVD)

While the spectral theorem applies only to symmetric matrices, the Singular Value Decomposition extends orthogonal diagonalization to any \(m \times n\) matrix. SVD is arguably the most important matrix factorization in applied mathematics, with applications ranging from data compression to solving ill-conditioned linear systems.

Let \(A\) be an \(m \times n\) matrix. The symmetric matrix \(A^\top A\) is orthogonally diagonalizable (by the Spectral Theorem). Let \(\{\mathbf{v}_1, \cdots, \mathbf{v}_n\}\) be an orthonormal basis for \(\mathbb{R}^n\) consisting of eigenvectors of \(A^\top A\) with corresponding eigenvalues \(\lambda_1, \cdots, \lambda_n\).

For \(1 \leq i \leq n\), the following holds: \[ \begin{align*} \| A\mathbf{v}_i \|^2 &= (A\mathbf{v}_i)^\top A\mathbf{v}_i = \mathbf{v}_i^\top (A^\top A\mathbf{v}_i) \\\\ &= \mathbf{v}_i^\top (\lambda_i\mathbf{v}_i) \\\\ &= \lambda_i. \end{align*} \] Since \(\| A\mathbf{v}_i \|^2 \geq 0\), we conclude that \(\lambda_i \geq 0 \).

Definition: Singular Values

Let \(A\) be an \(m \times n\) matrix, and let \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n\) be the eigenvalues of \(A^\top A\) arranged in descending order. The singular values of \(A\) are defined as: \[ \sigma_i = \sqrt{\lambda_i} = \|A\mathbf{v}_i\| \geq 0 \quad (1 \leq i \leq n) \] where \(\mathbf{v}_i\) are the corresponding orthonormal eigenvectors of \(A^\top A\).

If \(A\) has \(r\) nonzero singular values, then \(\{A\mathbf{v}_1, \cdots, A\mathbf{v}_r\}\) forms an orthogonal basis for \(\operatorname{Col} A\) and \[\operatorname{rank} A = \dim \operatorname{Col} A = r. \]

Justification. For \(i \neq j\), \((A\mathbf{v}_i) \cdot (A\mathbf{v}_j) = \mathbf{v}_i^\top A^\top A \mathbf{v}_j = \lambda_j (\mathbf{v}_i \cdot \mathbf{v}_j) = 0\), so \(\{A\mathbf{v}_1, \ldots, A\mathbf{v}_n\}\) is an orthogonal set; the zero vectors (\(i > r\), where \(\lambda_i = 0\)) are discarded, leaving \(\{A\mathbf{v}_1, \ldots, A\mathbf{v}_r\}\) orthogonal with \(\|A\mathbf{v}_i\| = \sigma_i > 0\). Since \(\{\mathbf{v}_1, \ldots, \mathbf{v}_n\}\) spans \(\mathbb{R}^n\), the images \(\{A\mathbf{v}_1, \ldots, A\mathbf{v}_n\}\) span \(\operatorname{Col} A\); the \(n - r\) zero images add nothing, so \(\{A\mathbf{v}_1, \ldots, A\mathbf{v}_r\}\) spans \(\operatorname{Col} A\) and is therefore a basis.

Theorem: Singular Value Decomposition

Let \(A\) be an \(m \times n\) matrix with rank \(r\). Then there exist an \(m \times n\) matrix \(\Sigma = \begin{bmatrix} D & 0 \\ 0 & 0 \\ \end{bmatrix}\) where \(D\) is a \(r \times r\) diagonal matrix with the first \(r\) nonzero singular values of \(A\), \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\) and there exist an \(m \times m\) orthogonal matrix \(U\), and an \(n \times n\) orthogonal matrix \(V\) such that \[ A = U\Sigma V^\top. \]

Proof:

Normalize orthogonal basis for \(\operatorname{Col} A\) \(\{A\mathbf{v}_1, \cdots, A\mathbf{v}_r\}\), then we obtain an orthonormal basis \(\{\mathbf{u}_1, \cdots, \mathbf{u}_r\}\), where \[ \mathbf{u}_i = \frac{1}{\| A\mathbf{v}_i \|}A\mathbf{v}_i = \frac{1}{\sigma_i}A\mathbf{v}_i \quad 1 \leq i \leq r \] or, \[ A\mathbf{v}_i = \sigma_i \mathbf{u}_i. \tag{1} \]

We extend \(\{\mathbf{u}_1, \cdots, \mathbf{u}_r\}\) to an orthonormal basis for \(\mathbb{R}^m\), \(\{\mathbf{u}_1, \cdots, \mathbf{u}_r, \mathbf{u}_{r+1}, \cdots, \mathbf{u}_m\}\). Let \(U = \begin{bmatrix}\mathbf{u}_1 & \cdots & \mathbf{u}_m \end{bmatrix}\) and \(V = \begin{bmatrix}\mathbf{v}_1 & \cdots & \mathbf{v}_n \end{bmatrix}\). Clearly, by construction, \(U\) and \(V\) are orthogonal matrices. Also, by Equation (1), \[ AV = \begin{bmatrix} \sigma_1\mathbf{u}_1 & \cdots & \sigma_r\mathbf{u}_r & \mathbf{0} & \cdots & \mathbf{0} \end{bmatrix}. \]

Let \(D\) be a diagonal matrix with diagonal entries \(\sigma_1, \cdots, \sigma_r\) and let \(\Sigma =\begin{bmatrix} D & 0 \\ 0 & 0 \\ \end{bmatrix}\). Then \[ \begin{align*} U\Sigma &= \begin{bmatrix}\sigma_1\mathbf{u}_1 & \cdots & \sigma_r\mathbf{u}_r & \mathbf{0} & \cdots & \mathbf{0}\end{bmatrix} \\ &= AV. \end{align*} \] Finally, since \(V\) is an orthogonal matrix, \[ (U\Sigma ) V^\top = (AV)V^\top = AI = A. \]

Algorithm: Computing SVD
  1. Compute eigendecomposition of \(A^\top A\):
    Compute the eigenvalues and eigenvectors of \(A^\top A\).
    Note: In practice, avoid directly computing \(A^\top A\) to prevent numerical errors.
  2. Construct \(V\) and \(\Sigma\):
    Arrange eigenvalues in descending order.
    Use the corresponding unit eigenvectors as the columns of \(V\).
  3. Construct \(U\):
    For nonzero singular values, compute \(\mathbf{u}_k = \frac{1}{\sigma_k}A\mathbf{v}_k\).
    If there are not enough nonzero singular values, extend to an orthonormal basis.

Beyond its use in matrix factorization, the SVD reveals important numerical properties of a matrix through its singular values. The condition number measures how sensitive the output \(A\mathbf{x}\) is to small changes (like floating-point errors) in the input \(\mathbf{x}\).

Definition: Condition Number

For a matrix \(A \in \mathbb{R}^{m \times n}\) with rank \(r\), the condition number \(\kappa(A)\) is defined as the ratio of the largest to the smallest nonzero singular value: \[ \kappa(A) = \frac{\sigma_1}{\sigma_r} \] where \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\) are the nonzero singular values.

If \(A\) is a square, invertible matrix, this simplifies to \(\kappa(A) = \sigma_1 / \sigma_n\).

Often, a matrix with \(\kappa(A) \gg 1\) is called ill-conditioned, meaning it is numerically unstable for solving linear systems because it is "almost singular." From a CS perspective, this is a critical numerical red flag. Solving \(A\mathbf{x}=\mathbf{b}\) for an ill-conditioned \(A\) can lead to highly inaccurate results, as small rounding errors in \(\mathbf{b}\) or \(A\) get magnified into large errors in the solution \(\mathbf{x}\).

Pseudo-inverse via SVD

The SVD provides the most numerically stable method for computing the Moore-Penrose pseudo-inverse. Unlike the normal equations approach \((A^\top A)^{-1}A^\top\), the SVD-based computation avoids squaring the condition number and naturally handles rank-deficient matrices.

The Moore-Penrose Pseudo-inverse (\(A^\dagger\)) is most stably defined and computed using the Singular Value Decomposition (SVD).

Given the SVD of \(A = U\Sigma V^\top\), where \(U\) is \(m \times m\), \(\Sigma\) is \(m \times n\), and \(V\) is \(n \times n\), the pseudo-inverse is: \[ A^\dagger = V\Sigma^\dagger U^\top \] Here, \(\Sigma^\dagger\) is an \(n \times m\) matrix formed by:

  1. Transposing \(\Sigma\) to get \(\Sigma^\top\) (which is \(n \times m\)).
  2. Inverting every nonzero singular value \(\sigma_i\) on the diagonal to get \(1/\sigma_i\).

If \(\Sigma\) has diagonal entries \(\sigma_1, \dots, \sigma_r > 0\), then \(\Sigma^\dagger\) will have diagonal entries \(1/\sigma_1, \dots, 1/\sigma_r\). All other entries are zero. This definition has a beautiful geometric interpretation. Recall that \(\hat{\mathbf{x}} = A^\dagger \mathbf{b}\) is the minimum-norm least-squares solution.


The SVD beautifully handles this. It decomposes \(A\) into its fundamental subspaces. The pseudo-inverse essentially "inverts" the mapping from \(\operatorname{Row}(A)\) to \(\operatorname{Col}(A)\) and ignores the \(\operatorname{Nul}(A)\).

You can also see that \[ A^\dagger A = V\Sigma^\dagger U^\top U\Sigma V^\top = V\Sigma^\dagger\Sigma V^\top. \] The matrix \(\Sigma^\dagger\Sigma\) is an \(n \times n\) diagonal matrix with 1s for the first \(r\) entries and 0s after. This makes \(A^\dagger A = V \begin{bmatrix} I_r & 0 \\ 0 & 0 \end{bmatrix} V^\top\), which is the orthogonal projection matrix onto the Row Space of \(A\). Similarly, \(A A^\dagger\) is the orthogonal projection matrix onto the Column Space of \(A\).

Practical Note: Truncation and Regularization

In real-world applications, we rarely have singular values that are exactly zero. Instead, we have a spectrum of values, some of which are very small. \[ \sigma_1 \geq \sigma_2 \geq \dots \gg \dots \geq \sigma_n \approx 0. \] Inverting these tiny singular values (\(1/\sigma_i\)) would introduce massive instability, as we'd be amplifying components related to noise.

The practical computation of the pseudo-inverse involves setting a tolerance \(\epsilon\). We treat any \(\sigma_i < \epsilon\) as zero. \[ (\sigma_i^\dagger) = \begin{cases} 1/\sigma_i & \text{if } \sigma_i \geq \epsilon \\ 0 & \text{if } \sigma_i < \epsilon \end{cases} \] This truncated SVD is the basis for Principal Component Regression (PCR) and is a powerful form of regularization. It explicitly discards the "directions" (singular vectors) that contribute least to the transformation and are most likely to be noise, building a more robust, lower-rank model.

Interactive SVD Visualization

This interactive demo visualizes the Singular Value Decomposition process for \(2 \times 2\) matrices. SVD expresses any matrix A as a product \(A = U \Sigma V^\top \), which can be understood geometrically as a composition of three transformations:

  1. \(U\): Rotation/reflection using left singular vectors (orthogonal)
  2. \(\Sigma\): Scaling along the axes (singular values)
  3. \(V^\top\): Rotation/reflection using right singular vectors (orthogonal)

Use the visualization to see how each component of SVD contributes to the overall transformation:

Try different matrices to observe how SVD works with various transformations:

SVD with Fundamental Subspaces

The SVD has a beautiful geometric interpretation: it provides orthonormal bases for all four fundamental subspaces of a matrix simultaneously. This reveals the deep connection between the singular value decomposition and the structure of linear transformations.

Theorem: SVD and Fundamental Subspaces

Let \(A\) be an \(m \times n\) matrix with SVD \(A = U \Sigma V^\top\), where \(A\) has rank \(r\). Let \(\{\mathbf{u}_1, \ldots, \mathbf{u}_m\}\) be the columns of \(U\) (left singular vectors) and \(\{\mathbf{v}_1, \ldots, \mathbf{v}_n\}\) be the columns of \(V\) (right singular vectors). Then:

  1. \(\{\mathbf{u}_1, \ldots, \mathbf{u}_r\}\) is an orthonormal basis for \(\operatorname{Col} A\)
  2. \(\{\mathbf{u}_{r+1}, \ldots, \mathbf{u}_m\}\) is an orthonormal basis for \(\operatorname{Nul} A^\top\)
  3. \(\{\mathbf{v}_{r+1}, \ldots, \mathbf{v}_n\}\) is an orthonormal basis for \(\operatorname{Nul} A\)
  4. \(\{\mathbf{v}_1, \ldots, \mathbf{v}_r\}\) is an orthonormal basis for \(\operatorname{Row} A\)
Proof:

Column Space:
From the construction of the SVD, \(\{A\mathbf{v}_1, \ldots, A\mathbf{v}_r\} = \{\sigma_1 \mathbf{u}_1, \ldots, \sigma_r \mathbf{u}_r\}\) is an orthogonal basis for \(\operatorname{Col} A\). Since \(\sigma_i > 0\) for \(i \leq r\), \(\{\mathbf{u}_1, \ldots, \mathbf{u}_r\}\) is an orthonormal basis for \(\operatorname{Col} A\).

Left Null Space:
Since \(U\) is orthogonal, the columns \(\{\mathbf{u}_1, \ldots, \mathbf{u}_m\}\) form an orthonormal basis for \(\mathbb{R}^m\). By the previous step, \(\{\mathbf{u}_1, \ldots, \mathbf{u}_r\}\) is a basis for \(\operatorname{Col} A\); hence \(\{\mathbf{u}_{r+1}, \ldots, \mathbf{u}_m\}\) is orthogonal to every element of \(\operatorname{Col} A\), i.e., \(\{\mathbf{u}_{r+1}, \ldots, \mathbf{u}_m\} \subseteq (\operatorname{Col} A)^\perp\). By the orthogonality property \((\operatorname{Col} A)^\perp = \operatorname{Nul} A^\top\), this set lies in \(\operatorname{Nul} A^\top\). Since it is orthonormal with \(m - r\) elements, and rank-nullity applied to \(A^\top\) (which has rank \(r\), the same as \(A\)) gives \(\dim \operatorname{Nul} A^\top = m - r\), this set is a basis for \(\operatorname{Nul} A^\top\).

Null Space:
For \(i > r\), \(\sigma_i = 0\) implies \(\|A\mathbf{v}_i\| = \sigma_i = 0\), so \(A\mathbf{v}_i = \mathbf{0}\) and \(\mathbf{v}_i \in \operatorname{Nul} A\). The set \(\{\mathbf{v}_{r+1}, \ldots, \mathbf{v}_n\}\) is orthonormal (columns of the orthogonal matrix \(V\)) and has \(n - r\) elements. Combined with \(\dim(\operatorname{Nul} A) = n - r\) — obtained by rank-nullity since \(\operatorname{rank} A = r\) — this set is a basis for the null space \(\operatorname{Nul} A\).

Row Space:
By part 3, \(\{\mathbf{v}_{r+1}, \ldots, \mathbf{v}_n\}\) is a basis for \(\operatorname{Nul} A\). Since \(V\) is orthogonal, the remaining columns \(\{\mathbf{v}_1, \ldots, \mathbf{v}_r\}\) are orthogonal to every \(\mathbf{v}_i\) with \(i > r\), hence orthogonal to \(\operatorname{Nul} A\); that is, they lie in \((\operatorname{Nul} A)^\perp = \operatorname{Col} A^\top = \operatorname{Row} A\) (by the orthogonality property). Conversely, any \(\mathbf{x} \in \operatorname{Row} A = (\operatorname{Nul} A)^\perp\) expanded in the \(V\) basis as \(\mathbf{x} = \sum_{i=1}^n c_i \mathbf{v}_i\) satisfies \(c_i = \mathbf{x} \cdot \mathbf{v}_i = 0\) for \(i > r\) (since \(\mathbf{x} \perp \mathbf{v}_i \in \operatorname{Nul} A\)), so \(\mathbf{x} \in \operatorname{Span}\{\mathbf{v}_1, \ldots, \mathbf{v}_r\}\). The orthonormal set \(\{\mathbf{v}_1, \ldots, \mathbf{v}_r\}\) is therefore a basis for \(\operatorname{Row} A\).

Geometric Insight:

The SVD reveals that any linear transformation can be understood as acting on orthogonal subspaces: the row space maps via scaling by singular values to the column space, while the null space maps to zero. The transformation is entirely characterized by how it acts on these complementary subspaces.

Connection to the Invertible Matrix Theorem

When \(A\) is an \(n \times n\) invertible matrix, the SVD characterization of fundamental subspaces provides additional equivalent conditions for invertibility.

Additional Invertible Matrix Conditions

Let \(A\) be an \(n \times n\) matrix. Then \(A\) is invertible if and only if any of the following equivalent conditions hold:

  1. \((\operatorname{Col} A)^\perp = \{\mathbf{0}\}\)
  2. \((\operatorname{Nul} A)^\perp = \mathbb{R}^n\)
  3. \(\operatorname{Row} A = \mathbb{R}^n\)
  4. \(A\) has \(n\) nonzero singular values

These conditions extend the Invertible Matrix Theorem from earlier sections.

Proof:

Forward direction. If \(A\) is invertible, then \(\operatorname{rank}(A) = n\) (by the IMT), which means \(r = n\) in the SVD. Therefore:

  • \(\operatorname{Nul} A = \{\mathbf{0}\}\) (no vectors \(\mathbf{v}_{r+1}, \ldots, \mathbf{v}_n\) remain), so \((\operatorname{Nul} A)^\perp = \mathbb{R}^n\) — condition (2).
  • \(\operatorname{Row} A = \operatorname{Span}\{\mathbf{v}_1, \ldots, \mathbf{v}_n\} = \mathbb{R}^n\) — condition (3).
  • \(\operatorname{Col} A = \operatorname{Span}\{\mathbf{u}_1, \ldots, \mathbf{u}_n\} = \mathbb{R}^n\), so \((\operatorname{Col} A)^\perp = \{\mathbf{0}\}\) — condition (1).
  • All \(n\) singular values \(\sigma_1, \ldots, \sigma_n\) are nonzero — condition (4).

Reverse direction. Each condition forces \(r = n\), hence \(\operatorname{rank}(A) = n\), which by the IMT means \(A\) is invertible:

  • (1) \((\operatorname{Col} A)^\perp = \{\mathbf{0}\}\) with \(\operatorname{Col} A \subseteq \mathbb{R}^n\) and the orthogonal decomposition \(\mathbb{R}^n = \operatorname{Col} A \oplus (\operatorname{Col} A)^\perp\) force \(\operatorname{Col} A = \mathbb{R}^n\), so \(\operatorname{rank} A = n\).
  • (2) \((\operatorname{Nul} A)^\perp = \mathbb{R}^n\) forces \(\operatorname{Nul} A = \{\mathbf{0}\}\), and by rank-nullity \(\operatorname{rank} A = n - \dim \operatorname{Nul} A = n\).
  • (3) \(\operatorname{Row} A = \mathbb{R}^n\) gives \(\dim \operatorname{Row} A = n = r\) by the fundamental subspaces theorem above, hence \(\operatorname{rank} A = n\).
  • (4) \(A\) having \(n\) nonzero singular values means \(r = n\) directly.