Matrix Algebra

Diagonal Matrix Transpose Inverse of a Matrix Elementary Matrices Partitions LU Factorization

Diagonal Matrix

Definition: Diagonal Matrix

A diagonal matrix is \(n \times n\) square matrix whose entries outside the main diagonal are all zero.

Consider the following matrix multiplications \(AB\) anf \(BA\): \[ \begin{align*} AB &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix} \\\\ &= \begin{bmatrix} 2 & 6 & 12 \\ 8 & 15 & 24 \\ 14 & 24 & 36 \end{bmatrix} \end{align*} \] \[ \begin{align*} BA &= \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\ &= \begin{bmatrix} 2 & 4 & 6 \\ 12 & 15 & 18 \\ 28 & 32 & 36 \end{bmatrix} \end{align*} \]

As you can see, multiplying on the right by the diagonal matrix \(B\) results in each "column" of \(A\) being scaled by the corresponding diagonal entry of \(B\). Conversely, multiplying on the left by \(B\) scales each "row" of \(A\).

Definition: identity Matrix

A diagonal matrix with 1's on the main diagonal is called an identity matrix denoted by \(I_n\).

Like the above example, \(AB \neq BA\) in general, but clearly \(AI = IA\) and the resulting matix is just the original matrix \(A\). For instance, \[ \begin{align*} AI &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}. \end{align*} \]

Transpose

Definition: Transposed Matrix

The transpose of an \(m \times n\) matrix \(A\) is the \(n \times m\) matrix interchanging its rows and corresponding columns, and is denoted by \(A^T\).

\[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \qquad A^T = \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{bmatrix} \] Note that the transpose operation does not change the main diagonal entries. There are many properties related to the transpose operation:

  1. \((A^T)^T = A\)
  2. \((A+B)^T = A^T + B^T\)
  3. \((AB)^T = B^TA^T\)

For the third property, we can state the following:

Theorem 1:

The transpose of the product of matrices is the product of their transposes in reverse order.

Proof: Let \(n\) be the number of columns of \(A\) (which equals to the number of rows of \(B\)). Then, \[ ((AB)^T)_{ij} = (AB)_{ji} =\sum_{k=1}^n a_{jk}b_{ki}. \] Also, \[ (B^TA^T)_{ij} = \sum_{k=1}^n b_{ki}a_{jk} = \sum_{k=1}^n a_{jk}b_{ki}. \] This property extends to the product of more than two matrices.

Inverse of a Matrix

Definition: Invertible Matrix

An \(n \times n\) matrix \(A\) is said to be invertible if there exists an \(n \times n\) matrix \(B\) such that \[ AB = BA = I. \] The matrix \(B\) is the inverse of \(A\) denoted by \(A^{-1}\). Also, a matrix that is NOT invertible is called a singular matrix.

By definition, we can say that \((A^{-1})^{-1} = A \, \) and also \((AB)^{-1} = B^{-1}A^{-1}\) because \[ \begin{align*} (AB)(B^{-1}A^{-1}) &= A(BB^{-1})A^{-1} \\\\ &= AIA^{-1} \\\\ &= AA^{-1} \\\\ &= I. \end{align*} \] Using Theorem 1, we obtain the following result:

Theorem 2:

If \(A\) is invertible, then so is \(A^T\) and \((A^T)^{-1} = (A^{-1})^T \).

Proof:

Suppose a matrix \(A\) is an invertible matrix. By Theorem 1, \[ (A^{-1})^TA^T = (AA^{-1})^T = I^T = I \] and similarly, \[ A^T(A^{-1})^T = (A^{-1}A)^T = I^T = I. \] Thus, \(A^T\) is invertible and its inverse is \((A^{-1})^T\).

There is a simple formula for the inverse of a \(2 \times 2\) matrix. Let \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\). If the determinant of \(A\), \(\det (A) = ad - bc\) is nonzero, then \(A\) is invertible and \[ A^{-1} = \frac{1}{\det (A)} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \] So, \(\det (A) = 0\) implies the matrix \(A\) is not invertible.

Example:

\[ \begin{align*} &\begin{bmatrix} 1 & 9 \\ 8 & 2 \end{bmatrix} \frac{1}{(2 - 72)} \begin{bmatrix} 2 & -9 \\ -8 & 1 \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 9 \\ 8 & 2 \end{bmatrix} \begin{bmatrix} -\frac{1}{35} & \frac{9}{70} \\ \frac{4}{35} & -\frac{1}{70} \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{align*} \]

Now we can solve the matrix equation \(Ax = b\) for the vector \(x\) by using the inverse: \[ \begin{align*} Ax=b &\Rightarrow A^{-1}Ax= A^{-1}b \\\\ &\Rightarrow Ix = A^{-1}b \\\\ &\Rightarrow x = A^{-1}b \end{align*} \] Note: In practice, solving by row reduction is often faster than finding \(A^{-1}\) and can be more accurate.

Finally, let's consider an invertible linear transformation \(T:\mathbb{R}^n \to \mathbb{R}^n\). \(T\) is invertible if \[ \begin{align*} &\exists S: \mathbb{R}^n \to \mathbb{R}^n \text{ such that } \\\\ &\forall x \in \mathbb{R}^n, S(T(x))=x \text{ and } T(S(x))=x. \end{align*} \]

Elementary Matrices

Definition: Elementary Matrix

An elementary matrix represents a single elementary row operation applied to an identity matrix.

Example:

Consider a matrix \[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}. \] and elementary matrices \[ \begin{align*} &E_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1\\ 0 & 1 & 0 \end{bmatrix}, \quad E_2 = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \\\\ &\text{ and } E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -3 & 1 \end{bmatrix}. \end{align*} \]

Then \[ \begin{align*} &E_1A = \begin{bmatrix} 1 & 2 & 3 \\ 7 & 8 & 9 \\ 4 & 5 & 6 \end{bmatrix} \\\\ &E_2A = \begin{bmatrix} 2 & 4 & 6 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\\ &E_3A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ -5 & -7 & -9 \end{bmatrix}. \end{align*} \]

\(E_1\) interchanges Row 2 and Row 3, \(E_2\) scales Row 1 by 2, and \(E_3\) adds (Row2 \(\times-3\)) to Row3. Moreover, we can store a "sequence" of the row operations into a single matrix. For example: \[ \begin{align*} E_3E_2E_1A &= \begin{bmatrix} 2 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & -3 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\ &= \begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\-17 & -19 & -21 \end{bmatrix}\\\\ &= B \end{align*} \]

Since elementary row operations are "reversible", there always exists a corresponding inverse matrix. Thus, using the inverse of elementary matrices, we can recover the original matrix \(A\) from \(B\). \[ \begin{align*} &(E_1^{-1}E_2^{-1}E_3^{-1})E_3E_2E_1A \\\\ &= (E_1^{-1}E_2^{-1}E_3^{-1})B. \end{align*} \]

Thus, \[ \begin{align*} A &= \begin{bmatrix} 0.5 & 0 & 0 \\ 0 & 3 & 1 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\ -17 & -19 & -21 \end{bmatrix}\\\\ &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \end{align*} \]

Even though \(A\) in the above example is not invertible, the process of elementary row operations gives insight into how to compute the inverse of a matrix \(A^{-1}\) if it exists.

If an \(n \times n\) matrix \(A\) is invertible, the reduced row echelon form of the matrix must be \(I_n\). Each step of the row reduction can be represented by an elementary matrix. \[ (E_k \cdots E_1)A = I_n \tag{1} \] From equation (1), we can multiply both sides by \((E_k \cdots E_1)^{-1}\) on the left: \[ \begin{align*} A &= (E_k \cdots E_1)^{-1}I_n \\\\ &= (E_k \cdots E_1)^{-1} \end{align*} \] Taking the inverse of both sides: \[ \begin{align*} A^{-1} &= ((E_k \cdots E_1)^{-1})^{-1} \\\\ &= E_k \cdots E_1 \\\\ &= (E_k \cdots E_1 )I_n \tag{2} \end{align*} \]

Equations (1) and (2) show that the same sequence of elementary matrices transforms \(A\) into \(I_n\) and \(I_n\) into \(A^{-1}\). Therefore, row reduction on the augmented matrix \(\begin{bmatrix} A & I \end{bmatrix}\) gives us \(\begin{bmatrix} I & A^{-1} \end{bmatrix}\) if \(A\) is invertible.

Partitions

Definition: Partitioned Matrix

A partitioned matrix (or block matrix) is a matrix that has been divided into submatrices by horizontal and vertical lines. The resulting submatrices are called blocks.

For example, we can partition a \(3 \times 5\) matrix \(A\) as follows: \[ A = \left[\begin{array}{ccc|cc} 1 & 2 & -1 & 3 & 4 \\ 5 & 3 & 0 & -2 & 1 \\ \hline -1 & 3 & 1 & 7 & 1 \\ \end{array}\right] = \left[\begin{array}{c|c} A_{11}& A_{12} \\ \hline A_{21} & A_{22} \\ \end{array}\right] \] where \(A_{11}\) is \(2 \times 3\), \(A_{12}\) is \(2 \times 2\), \(A_{21}\) is \(1 \times 3\), and \(A_{22}\) is \(1 \times 2\).

Theorem 4: Block Multiplication

If matrices \(A\) and \(B\) are partitioned so that the column partitions of \(A\) match the row partitions of \(B\), then the product \(AB\) can be computed by treating the blocks as scalars: \[ AB = \left[\begin{array}{ccc} A_{11}& \cdots & A_{1p} \\ \vdots & \ddots & \vdots \\ A_{m1} & \cdots & A_{mp} \\ \end{array}\right] \left[\begin{array}{ccc} B_{11}& \cdots & B_{1n} \\ \vdots & \ddots & \vdots \\ B_{p1} & \cdots & B_{pn} \\ \end{array}\right] = \left[\begin{array}{ccc} C_{11}& \cdots & C_{1n} \\ \vdots & \ddots & \vdots \\ C_{m1} & \cdots & C_{mn} \\ \end{array}\right] \] where \(C_{ij} = \sum_{k=1}^p A_{ik}B_{kj}\), provided that each product \(A_{ik}B_{kj}\) is defined.

The condition "column partitions of \(A\) match row partitions of \(B\)" means that for each \(k\), the number of columns in \(A_{ik}\) must equal the number of rows in \(B_{kj}\). This ensures that each block product \(A_{ik}B_{kj}\) is well-defined, and the resulting blocks can be summed because they have compatible dimensions.

Example:

Consider the following matrices \(A\) and \(B\): \[ A = \left[\begin{array}{ccc|cc} 1 & 2 & -1 & 3 & 4 \\ 5 & 3 & 0 & -2 & 1 \\ \hline -1 & 3 & 1 & 7 & 1 \\ \end{array}\right] = \left[\begin{array}{c|c} A_{11}& A_{12} \\ \hline A_{21} & A_{22} \\ \end{array}\right] \] \[ B = \left[\begin{array}{cc} 3 & 2 \\ 3 & -5 \\ -1 & 6 \\ \hline 4 & 7\\ 1 & 1\\ \end{array}\right] = \left[\begin{array}{c} B_1 \\ \hline B_2 \\ \end{array}\right] \] where \(A_{11}\) is \(2 \times 3\), \(A_{12}\) is \(2 \times 2\), \(A_{21}\) is \(1 \times 3\), \(A_{22}\) is \(1 \times 2\), \(B_1\) is \(3 \times 2\), and \(B_2\) is \(2 \times 2\).

The column partition of \(A\) divides it into 3 columns and 2 columns, while the row partition of \(B\) divides it into 3 rows and 2 rows. Since these match, we can perform block multiplication: \[ AB = \left[\begin{array}{c} A_{11}B_1 + A_{12}B_2 \\ \hline A_{21}B_1 + A_{22}B_2 \\ \end{array}\right] \] Computing each block: \[ \begin{align*} A_{11}B_1 &= \begin{bmatrix} 1 & 2 & -1 \\ 5 & 3 & 0 \end{bmatrix} \begin{bmatrix} 3 & 2 \\ 3 & -5 \\ -1 & 6 \end{bmatrix} = \begin{bmatrix} 10 & -14 \\ 24 & -5 \end{bmatrix} \\\\ A_{12}B_2 &= \begin{bmatrix} 3 & 4 \\ -2 & 1 \end{bmatrix} \begin{bmatrix} 4 & 7 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 16 & 25 \\ -7 & -13 \end{bmatrix} \\\\ A_{21}B_1 &= \begin{bmatrix} -1 & 3 & 1 \end{bmatrix} \begin{bmatrix} 3 & 2 \\ 3 & -5 \\ -1 & 6 \end{bmatrix} = \begin{bmatrix} 5 & -11 \end{bmatrix} \\\\ A_{22}B_2 &= \begin{bmatrix} 7 & 1 \end{bmatrix} \begin{bmatrix} 4 & 7 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 29 & 50 \end{bmatrix} \end{align*} \] Therefore, \[ AB = \left[\begin{array}{cc} 10 + 16 & -14 + 25 \\ 24 - 7 & -5 - 13 \\ \hline 5 + 29 & -11 + 50 \\ \end{array}\right] = \left[\begin{array}{cc} 26 & 11 \\ 17 & -18\\ \hline 34 & 39 \\ \end{array}\right] \]

Block multiplication is particularly useful in numerical linear algebra when matrices are very large. By partitioning a matrix into smaller blocks that fit into memory, algorithms can process submatrices individually, reducing memory requirements and enabling parallel computation. Additionally, block structures often reveal mathematical properties—for example, block diagonal matrices and block triangular matrices simplify the computation of determinants and inverses.

LU Factorization

A matrix factorization expresses the matrix as a product of two or more matrices. Since matrix multiplications corresponds to the linear transformations, matrix factorization is fundamental for analyzing the properties of the original matrix (or the observed data). In applied mathematics, matrix factorization is often used as a crucial preprocessing step to enable more efficient computations.

LU Factorization

An LU factorization of an \(m \times n\) matrix \(A\) is an expression of the form: \[ A = LU \] where \(L\) is an \(m \times m\) lower triangular matrix with 1's on its main the diagonal, and \(U\) is an \(m \times n\) upper triangular matrix (an echelon form of \(A\)).

Example:

For example, let \[ A = \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2\\ -3 & 1 & 1 \end{bmatrix}. \] The echelon form of \(A\) is \[ U = \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix} \] We can track the row operations used to reduce \(A\) into \(U\) as elementary matrices: \[ \begin{align*} &E_1 = \begin{bmatrix} 1 & 0 & 0 \\ -2 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix}, \qquad E_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 3 & 0 & 1 \end{bmatrix}, \\\\ \qquad &\text{and } \quad E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & \frac{7}{3} & 1 \end{bmatrix} \end{align*} \]

Since \[ E_3E_2E_1A = U \Longrightarrow A = (E_1^{-1}E_2^{-1}E_3^{-1})U, \] it follows that: \[ L = E_1^{-1}E_2^{-1}E_3^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & \frac{-7}{3} & 1 \end{bmatrix} \] Thus, \[ \begin{align*} A = LU &= \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & \frac{-7}{3} & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2\\ -3 & 1 & 1 \end{bmatrix}. \end{align*} \]

With the \(LU\) factorization, solving \(Ax = b\) becomes more efficient: \[ Ax = b \Longrightarrow LUx = b \] Let \(Ux = y\) and solve the pair of equations sequentially:

  1. Solve \(Ly = b\) for \(y\)
  2. Solve \(Ux = y\) for \(x\)

If \(A\) is an \(n \times n\) dense matrix with large \(n\), using LU factorization is faster than using \(A^{-1}\) to solve \(Ax =b\). This is especially useful when solving the equation multiple times with different \(b\) vectors. Furthermore, LU factorization reduces the risk of numerical errors which can arise from computing both both \(A^{-1}\) and \(A^{-1}b\).