Introduction
The Woodbury Matrix Identity is a powerful tool in matrix algebra that allows us to compute the
inverse of a matrix after it has been modified by a low-rank update. In many real-world applications,
we don't just solve a system once; we update it as new information arrives.
The "Efficiency" Perspective:
Directly inverting an \(n \times n\) matrix from scratch typically costs \(O(n^3)\) operations.
However, the Woodbury identity is designed for incremental updates.
If we already possess the inverse \(A^{-1}\) and perform a "low-rank" update (where \(k \ll n\)),
the cost to find the new inverse is reduced to approximately \(O(n^2 k)\) operations.
For a small, fixed \(k\) (such as a rank-1 update), this effectively shifts the computational complexity from cubic to
quadratic. This efficiency is the mathematical backbone of online learning, Kalman filters,
and recursive least squares (RLS).
Woodbury Matrix Identity
The Woodbury Matrix Identity is a fundamental result in matrix algebra that allows us to compute the
inverse of a matrix after a rank-k update. In the fields of Machine Learning, Optimization, and
Signal Processing, we often encounter situations where a large matrix \(A\) is modified by adding a product of
smaller matrices \(UBV\).
Theorem 1: Woodbury matrix identity
\[
(A + UBV)^{-1} = A^{-1} - A^{-1} U (B^{-1} + VA^{-1}U)^{-1} VA^{-1}
\]
where \(A \in \mathbb{R}^{n \times n}\), \(U \in \mathbb{R}^{n \times k}\),
\(V \in \mathbb{R}^{k \times n}\), and \(B \in \mathbb{R}^{k \times k}\).
Proof:
\[
\begin{align*}
&(A + UBV)\{A^{-1} - A^{-1} U (B^{-1} + VA^{-1}U)^{-1} VA^{-1}\} \\\\
&= I + UBVA^{-1} -U(B^{-1} + VA^{-1}U)^{-1}VA^{-1} -UBVA^{-1}U(B^{-1} + VA^{-1}U)^{-1}VA^{-1}\\\\
&= I + UBVA^{-1} -U \{(B^{-1} + VA^{-1}U)^{-1} + BVA^{-1}U(B^{-1} + VA^{-1}U)^{-1}\}VA^{-1} \\\\
&= I + UBVA^{-1} -U \{(I + BVA^{-1}U)(B^{-1} + VA^{-1}U)^{-1}\}VA^{-1} \\\\
&= I + UBVA^{-1} -U \{(BB^{-1} + BVA^{-1}U)(B^{-1} + VA^{-1}U)^{-1}\}VA^{-1} \\\\
&= I + UBVA^{-1} -UB \{(B^{-1} + VA^{-1}U)(B^{-1} + VA^{-1}U)^{-1}\}VA^{-1} \\\\
&= I + UBVA^{-1} -UBVA^{-1} \\\\
&= I
\end{align*}
\]
When \(k = 1\), the matrices \(U\) and \(V\) reduce to vectors \(u\) and \(v\), and we obtain the Sherman-Morrison formula.
This special case is widely used for rank-1 updates.
Theorem 2: Sherman-Morrison formula
Suppose \(A \in \mathbb{R}^{n \times n}\) is invertible, and \(u, v \in \mathbb{R}^n\).
The matrix \((A + uv^T)\) is invertible if and only if \(1 + v^T A^{-1} u \neq 0\). In this case:
\[
(A + uv^T)^{-1} = A^{-1} - \frac{A^{-1} uv^T A^{-1}}{1 + v^T A^{-1} u}.
\]
Similarly, for a rank-1 reduction:
\[
(A - uv^T)^{-1} = A^{-1} + \frac{A^{-1} uv^T A^{-1}}{1 - v^T A^{-1} u}.
\]
Proof:
To derive the Sherman-Morrison formula, we treat the rank-1 update as a specific case of the Woodbury
identity with the following parameters:
- \(k = 1\) (The update space is 1-dimensional)
- \(U = u \in \mathbb{R}^{n \times 1}\) (Column vector)
- \(V = v^T \in \mathbb{R}^{1 \times n}\) (Row vector)
- \(B = 1 \in \mathbb{R}^{1 \times 1}\) (Scalar)
With these substitutions:
\[
\begin{align*}
UBV &= u \cdot 1 \cdot v^T = uv^T \\\\
B^{-1} &= 1^{-1} = 1 \\\\
VA^{-1}U &= v^T A^{-1} u \quad \text{(scalar)}
\end{align*}
\]
Substituting into the Woodbury identity:
\[
\begin{align*}
(A + UBV)^{-1} &= A^{-1} - A^{-1} U (B^{-1} + VA^{-1}U)^{-1} VA^{-1} \\\\
(A + uv^T)^{-1} &= A^{-1} - A^{-1} u (1 + v^T A^{-1} u)^{-1} v^T A^{-1} \\\\
&= A^{-1} - A^{-1} u \cdot \frac{1}{1 + v^T A^{-1} u} \cdot v^T A^{-1} \\\\
&= A^{-1} - \frac{A^{-1} u v^T A^{-1}}{1 + v^T A^{-1} u}
\end{align*}
\]
This confirms that the rank-1 update requires only matrix-vector products and outer products,
maintaining a computational complexity of \(O(n^2)\).
Matrix Determinant Lemma
The Matrix Determinant Lemma provides a formula for computing the determinant of a matrix after a low-rank update. Just like the Woodbury identity for inverses, this lemma is essential for maintaining computational efficiency, particularly when evaluating likelihoods in statistical models.
Theorem 3: Matrix Determinant Lemma
Suppose \(A \in \mathbb{R}^{n \times n}\) and \(B \in \mathbb{R}^{k \times k}\) are invertible matrices, and \(U \in \mathbb{R}^{n \times k}, V \in \mathbb{R}^{k \times n}\).
The determinant of the updated matrix is given by:
\[
\det(A + UBV) = \det(B^{-1} + VA^{-1}U) \det(B) \det(A)
\]
In the special case of a rank-1 update (\(k=1\)) where \(u, v \in \mathbb{R}^n\):
\[
\det(A + uv^\top) = (1 + v^\top A^{-1} u) \det(A)
\]
Computational & Numerical Insight:
Efficiency: For \(k=1\), the term \((1 + v^\top A^{-1} u)\) is a scalar. If \(A^{-1}\) and \(\det(A)\) are known, the update requires only \(O(n^2)\) operations (matrix-vector multiplication), avoiding the \(O(n^3)\) cost of a full determinant calculation.
Numerical Stability: In practical machine learning applications (e.g., Gaussian Processes), we often work with the log-determinant to prevent numerical overflow/underflow:
\[ \ln |\det(A + uv^\top)| = \ln |\det(A)| + \ln |1 + v^\top A^{-1} u| \]
Case Study: BFGS and Rank-2 Updates
The BFGS algorithm (the most popular quasi-Newton method) provides a perfect example of the Woodbury
identity in action. It updates the inverse Hessian approximation \(C_{t} \approx B_{t}^{-1}\) by adding a
symmetric rank-2 matrix.
Mathematical Rigor: Why Sherman-Morrison?
The BFGS update formula is defined as:
\[
B_{t+1} = B_t + \frac{y_t y_t^\top}{y_t^\top s_t} - \frac{B_t s_t s_t^\top B_t}{s_t^\top B_t s_t}
\]
By applying the Sherman-Morrison formula recursively (first to the rank-1 addition, then to the rank-1 subtraction),
we derive the update for the inverse \(C_{t+1}\).
- Preservation of Structure:
This process ensures that if \(C_t\) is symmetric and positive definite, \(C_{t+1}\) remains so (provided \(y_t^\top s_t > 0\)).
- Efficiency:
It transforms an \(O(n^3)\) matrix inversion into a sequence of \(O(n^2)\) outer products and matrix-vector multiplications.
View Full Derivation: From Rank-1 to BFGS Rank-2 → Section II Part 8