System of Linear Equations
The study of linear algebra begins with the systematic solution of linear equations.
Whether analyzing complex networks, or optimizing financial portfolios, the underlying structure
often reduces to solving systems of the form \(A\mathbf{x} = \mathbf{b}\).
This section establishes the foundational language and algorithms used to navigate these systems.
Definition: Linear Equation
A linear equation in the variables \(x_1, \dots, x_n\) is one that can be expressed in the form
\[
a_1x_1 + a_2x_2 + \cdots + a_nx_n = b,
\]
where the coefficients \(a_1, a_2, \dots, a_n\) and the constant \(b\) are real or complex numbers.
A collection of linear equations involving the same variables is called a system of linear equations.
The fundamental goal is to determine the solution set of the system. In practice, \(n\) would be more than thousands or more.
A system is called consistent if it has either exactly one solution, or
infinitely many solutions. Also, if a system of linear equations has no solution,
it is said to be inconsistent.
We want to store the essential information of a system such as the values of each coefficient, the number of equations, and
the number of variables into a rectangular array called a matrix.
Example:
Consider a linear system:
\[
\begin{align}
x_1\phantom{+a_20x_2}-3x_3 + \phantom{0}x_4 &= 10 \\\\
4x_1 -5x_2 + 6x_3 -2x_4 &= -8 \\\\
\phantom{a_3x_1+} 2x_2 + 2x_3 +5x_4 &=6
\end{align}
\]
This system can be represented as the coefficient matrix:
\[
\begin{bmatrix} 1 & 0 & -3 & 1\\ 4 & -5 & 6 & -2\\ 0 & 2 & 2 & 5 \end{bmatrix}
\]
This is a \(3 \times 4\) (read "3 by 4") matrix. Note: The number of rows always comes first.
Alternatively, we can represent it as the augmented matrix:
\[
\begin{bmatrix} 1 & 0 & -3 & 1 & 10\\ 4 & -5 & 6 & -2 & -8\\ 0 & 2 & 2 & 5 & 6 \end{bmatrix}
\]
This is a \(3 \times 5\) matrix which includes the constants from the right-hand side of the equations.
Here, the number of rows is 3, and the number of columns is 5.
How can we solve the linear system? The key idea is to replace the system with an equivalent system
that is easier to solve. What does "equivalent" mean here?
Definition: Row Equivalence
Two matrices are considered row equivalent
if there is a sequence of elementary row operations that transforms one matrix into the other.
The three elementary row operations are as follows.
- Replacing one row by the sum of itself and a multiple of another row.
- Swapping two rows.
- Multiplying all entries in a row by a nonzero scalar.
First, we want to transform our augmented matrix into row echelon form, which satisfies
following conditions:
- All nonzero rows are above any rows of all zeros.
- The first nonzero entry in each row appears further to the right than the first nonzero
entry in the row directly above it.
- All entries in a column below a leading entry are zero.
Example:
We apply elementary row operations to the matrix:
\[
\begin{align*}
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10 \\
4 & -5 & 6 & -2 & -8 \\
0 & 2 & 2 & 5 & 6
\end{bmatrix} \\\\
&(\text{Row 2} \to \text{Row 2} - 4\,\text{Row 1})\\
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10 \\
0 & -5 & 18 & -6 & -48 \\
0 & 2 & 2 & 5 & 6
\end{bmatrix} \\\\
&(\text{Row 3} \to \text{Row 3} + \tfrac{2}{5}\,\text{Row 2})\\
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10 \\
0 & -5 & 18 & -6 & -48 \\
0 & 0 & \tfrac{46}{5} & \tfrac{13}{5} & -\tfrac{66}{5}
\end{bmatrix}
\end{align*}
\]
The matrix is now in row echelon form. Next, we transform it into reduced row echelon form (RREF),
which adds two additional conditions.
Definition: Reduced Row Echelon Form
A matrix is in reduced row echelon form (RREF) if it is in row echelon form and additionally satisfies:
- The leading entry in each nonzero row is \(1\).
- Each leading \(1\) is the only nonzero entry in its column.
Unlike row echelon form, the RREF of a matrix is unique.
Example:
Continuing from the previous example,
\[
\begin{align*}
&(\text{Row 2} \to \text{Row 2}/(-5)) \\
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10\\
0 & 1 & -\tfrac{18}{5} & \tfrac{6}{5} & \tfrac{48}{5}\\
0 & 0 & \tfrac{46}{5} & \tfrac{13}{5} & -\tfrac{66}{5}
\end{bmatrix} \\\\
&(\text{Row 3} \to \text{Row 3} \cdot \tfrac{5}{46}) \\
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10\\
0 & 1 & -\tfrac{18}{5} & \tfrac{6}{5} & \tfrac{48}{5}\\
0 & 0 & 1 & \tfrac{13}{46} & -\tfrac{33}{23}
\end{bmatrix} \\\\
&(\text{Row 1} \to \text{Row 1} + 3\,\text{Row 3}) \\
&\begin{bmatrix}
1 & 0 & 0 & \tfrac{85}{46} & \tfrac{131}{23}\\
0 & 1 & -\tfrac{18}{5} & \tfrac{6}{5} & \tfrac{48}{5}\\
0 & 0 & 1 & \tfrac{13}{46} & -\tfrac{33}{23}
\end{bmatrix} \\\\
&(\text{Row 2} \to \text{Row 2} + \tfrac{18}{5}\,\text{Row 3}) \\
&\begin{bmatrix}
1 & 0 & 0 & \tfrac{85}{46} & \tfrac{131}{23}\\
0 & 1 & 0 & \tfrac{51}{23} & \tfrac{102}{23}\\
0 & 0 & 1 & \tfrac{13}{46} & -\tfrac{33}{23}
\end{bmatrix}
\end{align*}
\]
From the RREF, we read off the solution of the linear system:
\[
\begin{align*}
x_1 &= -\tfrac{85}{46}x_4 + \tfrac{131}{23}\\\\
x_2 &= -\tfrac{51}{23}x_4 + \tfrac{102}{23}\\\\
x_3 &= -\tfrac{13}{46}x_4 - \tfrac{33}{23}\\\\
x_4 & \text{ is a free variable.}
\end{align*}
\]
Any solution is determined by a choice of the free variable \(x_4\).
The Matrix Equation
To get a new perspective on the system of linear equations, we describe it using the language of vectors.
Each column of the coefficient matrix is itself a vector in \(\mathbb{R}^m\), and the right-hand side \(\mathbf{b}\) is another vector in the same space.
Solving \(A\mathbf{x} = \mathbf{b}\) then becomes the question: can we combine the columns of \(A\) using suitable weights to produce \(\mathbf{b}\)?
Definition: Linear Combination
Given vectors \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_p \in \mathbb{R}^n\) and scalars \(c_1, c_2, \ldots, c_p\), the vector
\[
\mathbf{y} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_p \mathbf{v}_p
\]
is called a linear combination of \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_p\) with weights \(c_1, c_2, \ldots, c_p\).
Definition: Span
The span of a set of vectors \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_p \in \mathbb{R}^n\) is the set of all linear combinations
of those vectors:
\[
\operatorname{Span}\{\mathbf{v}_1, \ldots, \mathbf{v}_p\} = \{\, c_1 \mathbf{v}_1 + \cdots + c_p \mathbf{v}_p \mid c_1, \ldots, c_p \in \mathbb{R} \,\}.
\]
Geometrically, the span is the smallest "flat" region through the origin containing all of \(\mathbf{v}_1, \ldots, \mathbf{v}_p\) — a line, a plane,
or a higher-dimensional analog.
Example:
Let's reconsider the augmented matrix
\[
\begin{bmatrix} 1 & 0 & -3 & 1 & 10 \\
4 & -5 & 6 & -2 & -8 \\
0 & 2 & 2 & 5 & 6
\end{bmatrix}.
\]
Define its column vectors:
\[
\begin{align*}
&\mathbf{a}_1 = \begin{bmatrix} 1 \\ 4 \\ 0 \end{bmatrix}, \quad \mathbf{a}_2 = \begin{bmatrix} 0 \\ -5 \\ 2 \end{bmatrix}, \quad \mathbf{a}_3 = \begin{bmatrix} -3 \\ 6 \\ 2 \end{bmatrix}, \\
&\mathbf{a}_4 = \begin{bmatrix} 1 \\ -2 \\ 5 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 10 \\ -8 \\ 6 \end{bmatrix} \in \mathbb{R}^3.
\end{align*}
\]
Then \(\mathbf{b}\) is a linear combination of \(\mathbf{a}_1, \ldots, \mathbf{a}_4\) because there exist weights \(x_1, \ldots, x_4\) such that
\[
x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + x_3 \mathbf{a}_3 + x_4 \mathbf{a}_4 = \mathbf{b}.
\]
This is precisely the solution set we found earlier using the augmented matrix. Equivalently, \(\mathbf{b}\) lies in the span of \(\{\mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4\}\):
\[
\mathbf{b} \in \operatorname{Span}\{\mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4\}.
\]
We reinterpret the linear combination as the product of a matrix \(A \in \mathbb{R}^{3 \times 4}\) and a vector \(\mathbf{x} \in \mathbb{R}^4\):
\[
\begin{align*}
A\mathbf{x} &= \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 & \mathbf{a}_4 \end{bmatrix}
\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} \\
&= x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + x_3 \mathbf{a}_3 + x_4 \mathbf{a}_4 \\
&= \mathbf{b}.
\end{align*}
\]
Hence, the matrix equation \(A\mathbf{x} = \mathbf{b}\) has a solution if and only if \(\mathbf{b}\) is a linear combination of the columns of
\(A\) — equivalently, if and only if \(\mathbf{b} \in \operatorname{Span}\{\mathbf{a}_1, \ldots, \mathbf{a}_4\}\).
Linear Independence
We have seen that the matrix equation \(A\mathbf{x} = \mathbf{b}\) is solvable when \(\mathbf{b}\) lies in the span of the columns of \(A\).
A natural follow-up question is: given a set of vectors, do they all contribute new directions to that span, or are some of them redundant?
If a vector in the set can already be written as a combination of the others, removing it leaves the span unchanged — it is "dependent" on the rest.
The following definition formalizes this idea.
Definition: Linear Independence
A set of vectors \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_p \in \mathbb{R}^n\)
(or equivalently, the subset \(\{\mathbf{v}_1, \ldots, \mathbf{v}_p\} \subseteq \mathbb{R}^n\)) is called linearly independent if the equation
\[
x_1 \mathbf{v}_1 + x_2 \mathbf{v}_2 + \cdots + x_p \mathbf{v}_p = \mathbf{0}
\]
has only the trivial solution \(x_1 = x_2 = \cdots = x_p = 0\). Otherwise, the set is called linearly dependent.
Consider a matrix \(A\) with columns \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_p\).
The matrix equation \(A\mathbf{x} = \mathbf{0}\) is called the homogeneous equation, and it always has at least one solution:
the trivial solution \(\mathbf{x} = \mathbf{0}\). By the definition above, the columns of \(A\) are linearly independent if and only if
\(A\mathbf{x} = \mathbf{0}\) has only the trivial solution.
Conversely, the homogeneous equation \(A\mathbf{x} = \mathbf{0}\) has a nontrivial solution if and only if
the columns of \(A\) are linearly dependent.
When are the columns of \(A\) linearly dependent? Recall from the row-reduction examples earlier: after reducing \([A \mid \mathbf{0}]\) to RREF,
each column either contains a pivot (its variable is a basic variable) or it does not (its variable is a free variable).
A free variable can be assigned any value, with the basic variables then determined accordingly — so the existence of any free variable produces a non-trivial solution.
Hence the homogeneous equation \(A\mathbf{x} = \mathbf{0}\) has a nontrivial solution if and only if its RREF has at least one non-pivot column,
which is exactly the condition that the columns of \(A\) are linearly dependent.
Warning:
Linear dependence does not mean that every vector in the set is a combination of the others. Consider
\[
\mathbf{u} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, \qquad
\mathbf{v} = \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix}, \qquad
\mathbf{w} = \begin{bmatrix} 7 \\ -9 \\ 7 \end{bmatrix}.
\]
The set \(\{\mathbf{u}, \mathbf{v}, \mathbf{w}\}\) is linearly dependent because \(\mathbf{v} = 2\mathbf{u}\),
yet \(\mathbf{w}\) is not a linear combination of \(\mathbf{u}\) and
\(\mathbf{v}\): \(\mathbf{w} \notin \operatorname{Span}\{\mathbf{u}, \mathbf{v}\}\). Linear dependence only requires that at least one vector be redundant
— not all of them.
This subtlety motivates the more refined notion of a basis, where every vector contributes a genuinely new direction.
We meet this concept in the next section on Vector Spaces.
Nonhomogeneous System vs Homogeneous System
What is the relationship between the homogeneous equation \(A\mathbf{x} = \mathbf{0}\) and the nonhomogeneous equation \(A\mathbf{x} = \mathbf{b}\)
for a nonzero \(\mathbf{b}\)? There is a clean structural connection between their solution sets, which we now uncover.
Let's revisit the solution of our nonhomogeneous system:
\[
\begin{align*}
x_1 &= -\tfrac{85}{46}x_4 + \tfrac{131}{23}\\\\
x_2 &= -\tfrac{51}{23}x_4 + \tfrac{102}{23}\\\\
x_3 &= -\tfrac{13}{46}x_4 - \tfrac{33}{23}\\\\
x_4 &\text{ is a free variable.}
\end{align*}
\]
Viewed as a vector, the general solution of \(A\mathbf{x} = \mathbf{b}\) can be written in parametric vector form:
\[
\begin{align*}
\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}
&= \begin{bmatrix} \tfrac{131}{23}\\ \tfrac{102}{23}\\ -\tfrac{33}{23}\\ 0 \end{bmatrix}
+ x_4 \begin{bmatrix} -\tfrac{85}{46}\\ -\tfrac{51}{23} \\ -\tfrac{13}{46}\\ 1 \end{bmatrix}\\
&= \mathbf{p} + x_4 \mathbf{v},
\end{align*}
\]
where \(\mathbf{p}\) is a particular solution to \(A\mathbf{x} = \mathbf{b}\)
(obtained by setting \(x_4 = 0\)), and \(x_4 \mathbf{v}\) parametrizes the solutions of the corresponding homogeneous system \(A\mathbf{x} = \mathbf{0}\).
Example:
Let's solve the homogeneous system \(A\mathbf{x} = \mathbf{0}\) using the same matrix \(A\):
\[
\begin{align*}
&\begin{bmatrix} 1 & 0 & -3 & 1 & 0\\ 4 & -5 & 6 & -2 & 0\\ 0 & 2 & 2 & 5 & 0 \end{bmatrix} \\\\
&\xrightarrow{\operatorname{rref}}
\begin{bmatrix} 1 & 0 & 0 & \tfrac{85}{46} & 0\\ 0 & 1 & 0 & \tfrac{51}{23} & 0\\ 0 & 0 & 1 & \tfrac{13}{46} & 0 \end{bmatrix}.
\end{align*}
\]
The solution is
\[
\begin{align*}
x_1 &= -\tfrac{85}{46}x_4, \quad x_2 = -\tfrac{51}{23}x_4, \\\\
x_3 &= -\tfrac{13}{46}x_4, \quad x_4 \text{ is a free variable.}
\end{align*}
\]
In parametric vector form:
\[
\mathbf{x} = x_4 \begin{bmatrix} -\tfrac{85}{46}\\ -\tfrac{51}{23} \\ -\tfrac{13}{46}\\ 1 \end{bmatrix} = x_4 \mathbf{v}.
\]
This is the general solution of \(A\mathbf{x} = \mathbf{0}\), corresponding to taking the particular solution \(\mathbf{p} = \mathbf{0}\).
Notice that the general solution of \(A\mathbf{x} = \mathbf{b}\) is exactly the homogeneous solution shifted by the particular solution \(\mathbf{p}\).
Geometrically, the solution sets of \(A\mathbf{x} = \mathbf{0}\) and \(A\mathbf{x} = \mathbf{b}\) are parallel:
the homogeneous solution set passes through the origin, and the nonhomogeneous one is its translate by any particular solution.
This observation crystallizes into the following result.
Theorem: Solution Set Structure
Suppose the system \(A\mathbf{x} = \mathbf{b}\) is consistent, and let \(\mathbf{p}\) be a particular solution.
Then the solution set of \(A\mathbf{x} = \mathbf{b}\) is exactly the set of all vectors of the form
\[
\mathbf{w} = \mathbf{p} + \mathbf{v}_h,
\]
where \(\mathbf{v}_h\) ranges over all solutions of the homogeneous system \(A\mathbf{x} = \mathbf{0}\).
Proof:
(\(\supseteq\)) Let \(\mathbf{p}\) be a particular solution and \(\mathbf{v}_h\) any homogeneous solution.
Set \(\mathbf{w} = \mathbf{p} + \mathbf{v}_h\). Then
\[
A\mathbf{w} = A(\mathbf{p} + \mathbf{v}_h) = A\mathbf{p} + A\mathbf{v}_h = \mathbf{b} + \mathbf{0} = \mathbf{b},
\]
so \(\mathbf{w}\) is a solution of \(A\mathbf{x} = \mathbf{b}\).
(\(\subseteq\)) Conversely, let \(\mathbf{w}\) be any solution of \(A\mathbf{x} = \mathbf{b}\), and
define \(\mathbf{v}_h = \mathbf{w} - \mathbf{p}\). Then
\[
A\mathbf{v}_h = A(\mathbf{w} - \mathbf{p}) = A\mathbf{w} - A\mathbf{p} = \mathbf{b} - \mathbf{b} = \mathbf{0},
\]
so \(\mathbf{v}_h\) is a homogeneous solution. Hence \(\mathbf{w} = \mathbf{p} + \mathbf{v}_h\) has the claimed form.
This theorem captures a structural pattern that recurs throughout mathematics: the solution set of a linear nonhomogeneous problem
is always an affine translate of the corresponding homogeneous solution space.
The same principle governs solutions of linear differential equations,
where the general solution is a particular solution plus an arbitrary element of the kernel of the differential operator.