Example: \(df = X(dx)+(dX)X\) for \(f(X) = X^2\)
In matrix calculus, complex symbolic derivations are prone to human error. A professional developer's first step is to validate these rules using finite-difference approximations.
In this example, we numerically compute the derivative of the matrix equation \(f(X) = X^2\). Defining differentiation rules, such as the general chain rule and vector-Jacobian product can significantly enhance computational performance.
To validate our differentiation rules, comparing them with the finite-difference approximation is a practical approach. Two common methods for finite-difference approximation are: \[ f(x+dx) - f(x) \text{ (Forward Difference)} \] \[ f(x) - f(x-dx) \text{ (Backward Difference)} \] When evaluating the accuracy of your results, it is essential to use relative error as a metric. A small simple difference between two quantities does not always imply that the error is negligible. Instead, relative error is calculated as: \[ Relative Error = \frac{\| True - Approximation \|}{\|True \|} \] Theoritically, as \(dx \to 0\), the approximation of \(df\) becomes more accurate. However, this is not always true in practice due to the limitations of floating-point arithmetic. When \(dx\) becomes too small, the computed difference \(f(x+dx) - f(x) \) may be rounded off to zero by the computer, leading to a loss of accuracy and an increase in the error.
In the example, we set \(dX = \text{np.random.randn}(2, 2)\) * 1e-8. This is small enough to approximate the derivative but not so small that numerical rounding errors dominate the computation.
Finally, to ensure the reliability of your code, avoid choosing arbitrarily "nice" numbers. Instead, generate random non-integer values to test the robustness of your implementation and minimize bias in your results.
import numpy as np
# Matrix X
X = np.random.randn(2, 2)
# Differential matrix dX
dX = np.random.randn(2, 2) * 1e-8
# Define f(X) = X^2
f_X = X @ X
# Forward Difference Approximation: f(X + dX) - f(X)
f_approx = (X + dX) @ (X + dX) - f_X
# Backward Difference Approximation: f(X) - f(X - dX)
b_approx = f_X - (X - dX) @ (X - dX)
# Exact: X dX + dX X
exact = X @ dX + dX @ X
# Relative errors (by Frobenius norm)
f_relative_error = np.linalg.norm(f_approx - exact, 'fro') / np.linalg.norm(exact, 'fro')
b_relative_error = np.linalg.norm(b_approx - exact, 'fro') / np.linalg.norm(exact, 'fro')
# Print the results
print(f"Input Matrix X:\n {X} \n")
print(f"Differential dX:\n {dX}\n")
print(f"Foward Difference (f(X + dX) - f(X)):\n {f_approx}\n")
print(f"Backward Difference (f(X) - f(X - dX)):\n {b_approx}\n")
print(f"Exact (X dX + dX X):\n {exact}\n")
print(f"\nRelative Error(Forward): {f_relative_error:.2e}")
print(f"\nRelative Error(Backward): {b_relative_error:.2e}")
Why Numerical Validation Matters
Even if you are using Auto-diff, understanding how to manually verify a gradient via finite differences is essential when implementing custom layers or complex physical constraints where the analytical form is non-trivial.