Derivative of the square matrix functions
So far, we have discussed gradients (scalar-by-vector) and
Jacobians (vector-by-vector). Now,
we elevate our scope to matrix-valued functions with respect to matrix inputs.
The fundamental rule of matrix calculus is that order matters. Unlike scalar calculus, where
\(xdx = dxx\), matrix multiplication is non-commutative. When we take the differential of a matrix function \(f(X)\),
we must preserve the relative positions of the differentials \(dX\).
\[
df = \frac{\partial f}{\partial X}dX
\]
Example 1: Power of a Matrix \(f(X) = X^3\)
consider \(f(X) = X^3\) where \(X \in \mathbb{R}^{n \times n}\). We want to find the most general
symbolic expression for the derivative of this function using differential notation.
By the product rule,
\[
\begin{align*}
df &= (dX) X X + X(dX)X +X X (dX) \\\\
&= (dX)X^2 + X(dX)X + X^2(dX) \tag{1}
\end{align*}
\]
Let's verify this expression using the definition of total differential \(df\):
\[
\begin{align*}
df &= f(X + dX) - f(X) \\\\
&= (X+dX)^3 - X^3 \\\\
&= (X+dX)(X+dX)(X+dX) - X^3 \\\\
&= X^3 + X^2(dX) + X(dX)X + X(dX)^2 + (dX)X^2 \\\\
&\quad + (dX)X(dX) + (dX)^2X + (dX)^3 - X^3
\end{align*}
\]
Since the higher order terms are negligible as \(dX \to 0\), we obtain the expression (1).
Example 2: Invertible Matrix \(X\)
Consider \(f(X) = X^{-1}\), where \(X\) is an invertible matrix.
Since \(X^{-1}X = I\),
\[
d(X^{-1}X) = d(I) = 0 \, \in \mathbb{R}^{m \times n}.
\]
Then by the product rule,
\[
\begin{align*}
&df = d(X^{-1})X + X^{-1}(dX) = 0 \\\\
&\Longrightarrow d(X^{-1})X = - X^{-1}(dX).
\end{align*}
\]
Thus,
\[
df = d(X^{-1}) = - X^{-1}(dX)X^{-1}.
\]
Derivative of the LU decomposition matrix
LU decomposition is a pillar of numerical linear algebra,
factoring a square matrix \(X\) into a lower triangular matrix \(L\) (with unit diagonal) and an upper triangular matrix \(U\).
Understanding the sensitivity of this decomposition to changes in \(X\) is vital for stability analysis in numerical solvers.
Starting from the definition \(X = LU\), we apply the product rule to find the relationship between the differentials:
\[
dX = d(LU) = (dL)U + L(dU)
\]
Here, \(dL\) and \(dU\) must preserve the structural constraints: \(dL\) is strictly lower triangular (diagonal is zero because \(L_{ii}=1\)),
and \(dU\) is upper triangular.
A \(2 \times 2\) Analytical Example
Consider
\[
L = \begin{bmatrix} 1 & 0 \\ L_{21} & 1 \end{bmatrix}
\]
and
\[
U = \begin{bmatrix} U_{11} & U_{12} \\ 0 & U_{22} \end{bmatrix}.
\]
Their differentials are constrained as:
\[
dL = \begin{bmatrix} 0 & 0 \\ d(L_{21}) & 0 \end{bmatrix}
\]
and
\[
dU = \begin{bmatrix} d(U_{11}) & d(U_{12}) \\ 0 & d(U_{22}) \end{bmatrix}.
\]
The total differential \(dX\) is then computed as:
\[
\begin{align*}
d(X) &= \begin{bmatrix} 0 & 0 \\ d(L_{21}) & 0 \end{bmatrix}\begin{bmatrix} U_{11} & U_{12} \\ 0 & U_{22} \end{bmatrix}
+ \begin{bmatrix} 1 & 0 \\ L_{21} & 1 \end{bmatrix}\begin{bmatrix} d( U_{11}) & d(U_{12}) \\ 0 & d(U_{22}) \end{bmatrix}\\\\
&= \begin{bmatrix} d(U_{11}) & d(U_{12}) \\ d(L_{21}) & d(U_{22}) \end{bmatrix}
\end{align*}
\]
Application: Sensitivity in Structural Engineering
This matrix-level differential allows us to calculate how small perturbations in the input matrix \(X\)
(representing, for example, the stiffness of a bridge or the weights of a financial model) propagate through
to the triangular factors. This is the mathematical basis for algorithmic differentiation
applied to matrix factorizations, enabling the optimization of systems defined by linear equations.