Matrix Algebra

Diagonal Matrix Transpose Inverse of a Matrix Elementary Matrices Partitions LU Factorization

Diagonal Matrix

Definition: Diagonal Matrix

A diagonal matrix is an \(n \times n\) square matrix whose entries outside the main diagonal are all zero.

Consider the following matrix multiplications \(AB\) and \(BA\): \[ \begin{align*} AB &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix} \\\\ &= \begin{bmatrix} 2 & 6 & 12 \\ 8 & 15 & 24 \\ 14 & 24 & 36 \end{bmatrix} \end{align*} \] \[ \begin{align*} BA &= \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\ &= \begin{bmatrix} 2 & 4 & 6 \\ 12 & 15 & 18 \\ 28 & 32 & 36 \end{bmatrix} \end{align*} \]

Multiplying on the right by the diagonal matrix \(B\) scales each column of \(A\) by the corresponding diagonal entry of \(B\); multiplying on the left by \(B\) scales each row. This simple "scaling" action makes diagonal matrices the easiest matrices to analyze, and it is precisely what eigenvalue decomposition aims to achieve in disguise: change basis so that the matrix becomes diagonal.

Definition: Identity Matrix

The identity matrix \(I_n\) is the diagonal matrix with \(1\)'s on the main diagonal: \[ I_n = \begin{bmatrix} 1 & & \\ & \ddots & \\ & & 1 \end{bmatrix}. \]

Although matrix multiplication is generally non-commutative (\(AB \neq BA\)), the identity always commutes: \(AI = IA = A\). For instance, \[ \begin{align*} AI &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}. \end{align*} \]

Diagonal matrices encode the simplest possible scaling action. We now turn to a different fundamental operation — one that reorganizes a matrix's rows and columns rather than scales them.

Transpose

Definition: Transpose

The transpose of an \(m \times n\) matrix \(A\), denoted \(A^top\), is the \(n \times m\) matrix obtained by interchanging the rows and columns of \(A\). Equivalently, \((A^top)_{ij} = A_{ji}\).

For example, \[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, \qquad A^top = \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{bmatrix}. \] The diagonal entries are unchanged. The transpose satisfies several useful properties:

  1. \((A^top)^top = A\) — applying the swap twice returns the original.
  2. \((A + B)^top = A^top + B^top\) — transpose distributes over sums entrywise.
  3. \((AB)^top = B^top A^top\) — note the order reversal.

Properties (1) and (2) follow immediately from the entrywise definition. The third — order reversal in the transpose of a product — is the substantive identity, and we record it as a theorem.

Theorem: Transpose of a Product

For matrices \(A\) and \(B\) of compatible sizes, \((AB)^top = B^top A^top\).

Proof:

Let \(n\) be the number of columns of \(A\) (equivalently, the number of rows of \(B\)). Then \[ \bigl((AB)^top\bigr)_{ij} = (AB)_{ji} = \sum_{k=1}^n a_{jk} b_{ki}, \] and on the other hand \[ (B^top A^top)_{ij} = \sum_{k=1}^n (B^top)_{ik} (A^top)_{kj} = \sum_{k=1}^n b_{ki} a_{jk} = \sum_{k=1}^n a_{jk} b_{ki}. \] The two expressions agree entry by entry, so \((AB)^top = B^top A^top\). The identity extends inductively to products of more than two matrices: \((A_1 A_2 \cdots A_k)^top = A_k^top \cdots A_2^top A_1^top\).

Inverse of a Matrix

Definition: Invertible Matrix

An \(n \times n\) matrix \(A\) is said to be invertible if there exists an \(n \times n\) matrix \(B\) such that \[ AB = BA = I. \] The matrix \(B\) is called the inverse of \(A\) and is denoted \(A^{-1}\). A matrix that is not invertible is called singular.

Two immediate consequences follow from the definition. First, \(A\) is itself the inverse of \(A^{-1}\), so \((A^{-1})^{-1} = A\). Second, the inverse of a product reverses the order: \[ \begin{align*} (AB)(B^{-1}A^{-1}) &= A(BB^{-1})A^{-1} \\\\ &= AIA^{-1} \\\\ &= AA^{-1} \\\\ &= I, \end{align*} \] and analogously \((B^{-1}A^{-1})(AB) = B^{-1}(A^{-1}A)B = B^{-1}IB = B^{-1}B = I\). Both products equal \(I\), so \((AB)^{-1} = B^{-1}A^{-1}\). Combining this order reversal with the analogous identity for the transpose gives the next result.

Theorem: Transpose of an Inverse

If \(A\) is invertible, then so is \(A^top\), and \((A^top)^{-1} = (A^{-1})^top\).

Proof:

By the transpose-of-product theorem above, \[ (A^{-1})^top A^top = (A A^{-1})^top = I^top = I, \] and similarly \[ A^top (A^{-1})^top = (A^{-1} A)^top = I^top = I. \] Hence \(A^top\) is invertible with inverse \((A^{-1})^top\).

For \(2 \times 2\) matrices there is a closed-form inverse expressed in terms of the entries.

Theorem: Inverse of a 2×2 Matrix

Let \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\). Then \(A\) is invertible if and only if \(ad - bc \neq 0\), in which case \[ A^{-1} = \frac{1}{ad - bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}. \] The quantity \(ad - bc\) is called the determinant of \(A\), studied systematically in the next section.

Proof:

(Sufficiency) If \(ad - bc \neq 0\), direct computation gives \[ \frac{1}{ad - bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} = \frac{1}{ad - bc} \begin{bmatrix} ad - bc & 0 \\ 0 & ad - bc \end{bmatrix} = I, \] and similarly for the product in the reverse order.

(Necessity) Suppose \(ad - bc = 0\). We show \(A\) is not invertible by exhibiting a nonzero vector \(\mathbf{v}\) with \(A\mathbf{v} = \mathbf{0}\) — for then no inverse can exist, since an inverse would force \(\mathbf{v} = A^{-1}\mathbf{0} = \mathbf{0}\), a contradiction. Direct computation shows \[ A \begin{bmatrix} d \\ -c \end{bmatrix} = \begin{bmatrix} ad - bc \\ cd - cd \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \qquad A \begin{bmatrix} -b \\ a \end{bmatrix} = \begin{bmatrix} -ab + ab \\ -bc + ad \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}. \] If \(A\) has any nonzero entry, then at least one of the two vectors \(\begin{bmatrix} d \\ -c \end{bmatrix}\) and \(\begin{bmatrix} -b \\ a \end{bmatrix}\) is nonzero — so we have produced the required \(\mathbf{v}\). The remaining case is \(A = 0\), the zero matrix, which is trivially not invertible (any candidate inverse \(B\) satisfies \(AB = 0 \neq I\)).

Example:

For \(A = \begin{bmatrix} 1 & 9 \\ 8 & 2 \end{bmatrix}\), we have \(ad - bc = 2 - 72 = -70\), so \[ A^{-1} = \frac{1}{-70}\begin{bmatrix} 2 & -9 \\ -8 & 1 \end{bmatrix} = \begin{bmatrix} -\tfrac{1}{35} & \tfrac{9}{70} \\[2pt] \tfrac{4}{35} & -\tfrac{1}{70} \end{bmatrix}. \] One can check directly that \(A A^{-1} = I\).

The inverse gives a clean way to solve the matrix equation \(A\mathbf{x} = \mathbf{b}\): multiply both sides on the left by \(A^{-1}\), \[ A\mathbf{x} = \mathbf{b} \;\Longrightarrow\; A^{-1} A \mathbf{x} = A^{-1} \mathbf{b} \;\Longrightarrow\; \mathbf{x} = A^{-1} \mathbf{b}. \]

CS / Numerical Insight: Don't Compute the Inverse

Although the formula \(\mathbf{x} = A^{-1}\mathbf{b}\) is conceptually clean, in practice it is rarely how one solves a linear system. Computing \(A^{-1}\) explicitly is more expensive and numerically less stable than directly solving \(A\mathbf{x} = \mathbf{b}\) by row reduction (or by the LU factorization introduced later in this page). The inverse formula is invaluable for theoretical reasoning, but a working rule in numerical linear algebra is: solve, don't invert.

Finally, we connect matrix invertibility to the linear transformation viewpoint of the previous page.

Definition: Invertible Linear Transformation

A linear transformation \(T : \mathbb{R}^n \to \mathbb{R}^n\) is called invertible if there exists a map \(S : \mathbb{R}^n \to \mathbb{R}^n\) such that \[ S(T(\mathbf{x})) = \mathbf{x} \quad \text{and} \quad T(S(\mathbf{x})) = \mathbf{x} \quad \text{for all } \mathbf{x} \in \mathbb{R}^n. \]

Since linear transformations on \(\mathbb{R}^n\) correspond bijectively to \(n \times n\) matrices via \(T(\mathbf{x}) = A\mathbf{x}\), the transformation \(T\) is invertible if and only if its matrix \(A\) is invertible, in which case the inverse transformation is \(S(\mathbf{x}) = A^{-1}\mathbf{x}\) — which is itself linear. (One can show directly: any \(S\) satisfying \(S(T(\mathbf{x})) = \mathbf{x}\) for all \(\mathbf{x}\), with \(T\) linear and bijective, must itself be linear.) This bridges the algebraic notion of inverse with its geometric meaning: an invertible transformation is one that can be undone.

Elementary Matrices

Definition: Elementary Matrix

An elementary matrix is the result of applying a single elementary row operation to an identity matrix.

There are three types of elementary matrices, one for each elementary row operation: interchange (swap two rows), scale (multiply a row by a nonzero scalar), and replacement (add a multiple of one row to another). The defining property is that left-multiplication by an elementary matrix performs the corresponding row operation on any matrix it multiplies.

Example: Three Types of Elementary Matrices

Take \(A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}\) and the elementary matrices \[ \begin{align*} &E_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}, \quad E_2 = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \\\\ &E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -3 & 1 \end{bmatrix}. \end{align*} \] Multiplying on the left: \[ \begin{align*} &E_1 A = \begin{bmatrix} 1 & 2 & 3 \\ 7 & 8 & 9 \\ 4 & 5 & 6 \end{bmatrix} \quad &\text{(interchange Row 2 and Row 3)} \\\\ &E_2 A = \begin{bmatrix} 2 & 4 & 6 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \quad &\text{(scale Row 1 by 2)} \\\\ &E_3 A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ -5 & -7 & -9 \end{bmatrix} \quad &\text{(Row 3} \to \text{Row 3} - 3\,\text{Row 2)}. \end{align*} \]

Example: Composing Operations

A sequence of row operations is encoded as a single matrix by multiplying the elementary matrices together (in the same order as the operations are applied — read right-to-left because matrix products act on the right): \[ \begin{align*} E_3 E_2 E_1 A &= \begin{bmatrix} 2 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & -3 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\ &= \begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\ -17 & -19 & -21 \end{bmatrix} = B. \end{align*} \] Each elementary row operation is reversible (interchange undoes itself; scaling reverses by reciprocal; replacement reverses by negation), so each elementary matrix is invertible. Multiplying both sides by \((E_3 E_2 E_1)^{-1} = E_1^{-1} E_2^{-1} E_3^{-1}\) recovers the original \(A\): \[ A = E_1^{-1} E_2^{-1} E_3^{-1} B = \begin{bmatrix} 0.5 & 0 & 0 \\ 0 & 3 & 1 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\ -17 & -19 & -21 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}. \] (The matrix \(A\) used here is in fact singular — its rows satisfy \(\text{Row 3} = 2\,\text{Row 2} - \text{Row 1}\) — so this example illustrates only the operations of elementary matrices, not their use for inversion. We turn to that next.)

The reversibility of row operations gives a powerful technique for computing inverses. If a square matrix \(A\) is invertible, its reduced row echelon form must be \(I_n\) — every column has a pivot. Reading the row reduction as a product of elementary matrices then gives an explicit formula for \(A^{-1}\).

Theorem: Inverse via Row Reduction

An \(n \times n\) matrix \(A\) is invertible if and only if its reduced row echelon form is \(I_n\). When this is the case, the same sequence of elementary row operations that reduces \(A\) to \(I_n\) transforms \(I_n\) into \(A^{-1}\). In practice, performing row reduction on the augmented matrix \([A \mid I]\) yields \([I \mid A^{-1}]\).

Proof:

(\(\Leftarrow\)) Suppose the RREF of \(A\) is \(I_n\). Then there is a sequence of elementary row operations transforming \(A\) into \(I_n\); each operation is performed by left-multiplication by an elementary matrix, so \[ (E_k \cdots E_2 E_1)\, A = I_n. \tag{1} \] The matrix \(M := E_k \cdots E_2 E_1\) is a product of invertible matrices and therefore invertible, with explicit inverse \(M^{-1} = E_1^{-1} E_2^{-1} \cdots E_k^{-1}\). Equation (1) reads \(MA = I_n\); multiplying on the left by \(M^{-1}\) gives \(A = M^{-1}\), which exhibits \(A\) as invertible. Moreover \(A^{-1} = M = E_k \cdots E_1\), and since \(M = M \cdot I_n\) we have \[ A^{-1} = (E_k \cdots E_2 E_1)\, I_n. \tag{2} \] Equations (1) and (2) say: the same sequence of elementary matrices, applied to \(A\), produces \(I_n\); applied to \(I_n\), produces \(A^{-1}\). Performing the operations side by side on the augmented matrix \([A \mid I]\) therefore yields \([I \mid A^{-1}]\) in a single sweep.

(\(\Rightarrow\)) Suppose \(A\) is invertible. Reduce \(A\) to its RREF, call it \(R\); as in the previous direction, this gives \(NA = R\) for some invertible \(N\) (a product of elementary matrices). Hence \(R = NA\) is invertible, being the product of two invertible matrices.

We claim that an invertible \(n \times n\) matrix in RREF must equal \(I_n\). Indeed, if \(R\) had any zero row — say its \(i\)-th row — then for every \(\mathbf{x}\) the \(i\)-th coordinate of \(R\mathbf{x}\) would be zero, so the equation \(R\mathbf{x} = \mathbf{e}_i\) would be unsolvable, contradicting invertibility. Hence every row of \(R\) contains a leading \(1\) — giving \(n\) leading \(1\)'s in \(n\) distinct columns. Combined with the RREF staircase condition (each leading \(1\) lies strictly to the right of the previous one), the \(n\) leading \(1\)'s must occupy positions \((1,1), (2,2), \ldots, (n,n)\). The remaining RREF condition (each pivot column has zeros above and below its leading \(1\)) then forces all off-diagonal entries to vanish: \(R = I_n\).

Partitions

Definition: Block Matrix

A block matrix (or partitioned matrix) is a matrix that has been divided into submatrices by horizontal and vertical lines. The resulting submatrices are called blocks.

Partitioning is more than notational convenience: it lets us treat substructure as a single object, which is essential when blocks have special form (zero, identity, sparse) or when a problem naturally splits into known and unknown parts (as in the Schur complement, treated later in the curriculum). The key operational fact is that, with compatible block sizes, matrix multiplication "passes through" the partition — we treat blocks like scalars.

Theorem: Block Multiplication

Suppose \(A\) and \(B\) are partitioned so that the column partitions of \(A\) match the row partitions of \(B\): that is, for each \(k\), the number of columns in the block \(A_{ik}\) equals the number of rows in \(B_{kj}\). Then the product \(AB\) is the block matrix \[ AB = \left[\begin{array}{ccc} A_{11} & \cdots & A_{1p} \\ \vdots & \ddots & \vdots \\ A_{m1} & \cdots & A_{mp} \end{array}\right] \left[\begin{array}{ccc} B_{11} & \cdots & B_{1n} \\ \vdots & \ddots & \vdots \\ B_{p1} & \cdots & B_{pn} \end{array}\right] = \left[\begin{array}{ccc} C_{11} & \cdots & C_{1n} \\ \vdots & \ddots & \vdots \\ C_{m1} & \cdots & C_{mn} \end{array}\right], \] where \(C_{ij} = \sum_{k=1}^p A_{ik} B_{kj}\). The compatibility condition ensures each block product \(A_{ik} B_{kj}\) is well-defined, and the corresponding sums share matching dimensions.

Proof (sketch):

The result follows by tracking which entries of \(A\) and \(B\) contribute to a given entry of \(AB\). Pick any scalar entry \((AB)_{rs}\): by the entrywise definition of matrix multiplication, \[ (AB)_{rs} = \sum_{\ell} a_{r\ell} b_{\ell s}. \] Now suppose the global indices \(r, \ell, s\) lie in row-block \(i\), column-block-of-\(A\) (= row-block-of-\(B\)) \(k\), and column-block \(j\) respectively. The compatibility condition guarantees this assignment is well-defined: every middle index \(\ell\) belongs to exactly one matching pair \((A\)-column-block-\(k\), \(B\)-row-block-\(k\)). Splitting the sum over \(\ell\) according to which block it belongs to, \[ (AB)_{rs} = \sum_{k} \, \sum_{\ell \in \text{block } k} a_{r\ell} b_{\ell s} = \sum_k (A_{ik} B_{kj})_{r' s'}, \] where \(r', s'\) are the local indices of \(r, s\) within their respective blocks. The right-hand side is exactly the \((r', s')\)-entry of the block sum \(C_{ij} = \sum_k A_{ik} B_{kj}\). Hence \(AB\), reassembled from its blocks, has \((i, j)\)-block equal to \(C_{ij}\), as claimed.

Example:

Consider \[ A = \left[\begin{array}{ccc|cc} 1 & 2 & -1 & 3 & 4 \\ 5 & 3 & 0 & -2 & 1 \\ \hline -1 & 3 & 1 & 7 & 1 \end{array}\right] = \left[\begin{array}{c|c} A_{11} & A_{12} \\ \hline A_{21} & A_{22} \end{array}\right], \qquad B = \left[\begin{array}{cc} 3 & 2 \\ 3 & -5 \\ -1 & 6 \\ \hline 4 & 7\\ 1 & 1 \end{array}\right] = \left[\begin{array}{c} B_1 \\ \hline B_2 \end{array}\right], \] where \(A_{11}\) is \(2 \times 3\), \(A_{12}\) is \(2 \times 2\), \(A_{21}\) is \(1 \times 3\), \(A_{22}\) is \(1 \times 2\), \(B_1\) is \(3 \times 2\), and \(B_2\) is \(2 \times 2\). The column partition of \(A\) (3 + 2) matches the row partition of \(B\) (3 + 2), so block multiplication applies: \[ AB = \left[\begin{array}{c} A_{11}B_1 + A_{12}B_2 \\ \hline A_{21}B_1 + A_{22}B_2 \end{array}\right]. \] Computing each block: \[ \begin{align*} A_{11}B_1 &= \begin{bmatrix} 1 & 2 & -1 \\ 5 & 3 & 0 \end{bmatrix} \begin{bmatrix} 3 & 2 \\ 3 & -5 \\ -1 & 6 \end{bmatrix} = \begin{bmatrix} 10 & -14 \\ 24 & -5 \end{bmatrix}, \\ A_{12}B_2 &= \begin{bmatrix} 3 & 4 \\ -2 & 1 \end{bmatrix} \begin{bmatrix} 4 & 7 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 16 & 25 \\ -7 & -13 \end{bmatrix}, \\ A_{21}B_1 &= \begin{bmatrix} -1 & 3 & 1 \end{bmatrix} \begin{bmatrix} 3 & 2 \\ 3 & -5 \\ -1 & 6 \end{bmatrix} = \begin{bmatrix} 5 & -11 \end{bmatrix}, \\ A_{22}B_2 &= \begin{bmatrix} 7 & 1 \end{bmatrix} \begin{bmatrix} 4 & 7 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 29 & 50 \end{bmatrix}. \end{align*} \] Assembling, \[ AB = \left[\begin{array}{cc} 26 & 11 \\ 17 & -18\\ \hline 34 & 39 \end{array}\right]. \]

Block multiplication is foundational in numerical linear algebra: by partitioning a large matrix into blocks that fit in memory, algorithms can process submatrices independently and exploit parallelism. Block structure also reveals analytical properties — block diagonal and block triangular matrices simplify the computation of determinants, inverses, and eigenvalues. Block factorizations lead naturally to a broader topic: matrix factorization, where we express a single matrix as a product of structured factors. The most fundamental such factorization, treated next, is the LU decomposition.

LU Factorization

A matrix factorization expresses a matrix as a product of two or more matrices with special structure. Because matrix multiplication corresponds to the composition of linear transformations, factorization decomposes a complicated transformation into simpler steps — and is fundamental for analyzing the original matrix (or observed data) and for accelerating numerical computation. Many factorizations function as a preprocessing step: pay an upfront cost to factor \(A\), then reuse the factors many times.

Definition: LU Factorization

An LU factorization of an \(m \times n\) matrix \(A\) is an expression of the form \[ A = LU, \] where \(L\) is an \(m \times m\) lower triangular matrix with \(1\)'s on the main diagonal, and \(U\) is an \(m \times n\) matrix in row echelon form.

Existence. Such a factorization exists whenever \(A\) can be reduced to row echelon form without row interchanges; in that case \(U\) is the resulting echelon form and \(L\) records the multipliers used in the elimination (made concrete in the example below). When row interchanges are required, one obtains the more general PLU factorization \(PA = LU\), where \(P\) is a permutation matrix; numerical implementations almost always use the PLU form for stability.

Example:

Let \[ A = \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2 \\ -3 & 1 & 1 \end{bmatrix}. \] Reducing \(A\) to row echelon form (no row interchanges needed) gives \[ U = \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix}. \] We track the elementary matrices that perform each step of the reduction: \[ \begin{align*} &E_1 = \begin{bmatrix} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \quad E_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 3 & 0 & 1 \end{bmatrix}, \\\\ &E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & \tfrac{7}{3} & 1 \end{bmatrix}. \end{align*} \]

Since \(E_3 E_2 E_1 A = U\), we have \(A = (E_3 E_2 E_1)^{-1} U = E_1^{-1} E_2^{-1} E_3^{-1}\, U\). Each \(E_i^{-1}\) is obtained by negating the off-diagonal entry: e.g., \(E_1^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\). Multiplying these three out gives \[ L = E_1^{-1} E_2^{-1} E_3^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & -\tfrac{7}{3} & 1 \end{bmatrix}. \] Verifying: \[ LU = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & -\tfrac{7}{3} & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix} = \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2\\ -3 & 1 & 1 \end{bmatrix} = A. \checkmark \]

Once we have \(A = LU\), the matrix equation \(A\mathbf{x} = \mathbf{b}\) becomes \(LU\mathbf{x} = \mathbf{b}\), which we solve in two triangular steps. Setting \(\mathbf{y} = U\mathbf{x}\):

  1. Forward substitution: solve \(L\mathbf{y} = \mathbf{b}\) for \(\mathbf{y}\).
  2. Back substitution: solve \(U\mathbf{x} = \mathbf{y}\) for \(\mathbf{x}\).

CS / Numerical Insight: Factor Once, Solve Many Times

Computing the LU factorization of a dense \(n \times n\) matrix takes \(O(n^3)\) operations — the same cost as a single Gaussian elimination. But once \(L\) and \(U\) are known, each subsequent solve costs only \(O(n^2)\) (two triangular substitutions). When the same \(A\) appears with many right-hand sides — common in time-stepped simulations, control loops, and iterative methods — this represents a dramatic saving.

LU also avoids the numerical pitfalls of forming \(A^{-1}\) explicitly: computing \(A^{-1}\) and then multiplying \(A^{-1} \mathbf{b}\) involves more arithmetic operations and propagates more rounding error than solving directly via factorization. This is the practical reason behind the rule "solve, don't invert" stated earlier.