Linear Transformation
In the previous section, we viewed the matrix equation \(A\mathbf{x} = \mathbf{b}\) as a way to find a vector. Now, we shift our perspective to view the matrix \(A\) as an object that acts on vectors. This dynamic view where a matrix transforms one space into another is the core of "Linear Transformation," a concept that underpins computer graphics, signal processing, and the layers of deep neural networks.
A transformation (or mapping) \[ T: \mathbb{R}^n \to \mathbb{R}^m \] is linear if for all \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^n\) and all scalars \(c\), the following two properties hold: \[ \begin{align*} &T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}), \\ &T(c\mathbf{u}) = c\, T(\mathbf{u}). \end{align*} \]
Consider a matrix \(A \in \mathbb{R}^{m \times 3}\) and a vector \(\mathbf{x} \in \mathbb{R}^3\). The matrix-vector product defines the transformation \(T: \mathbf{x} \mapsto A\mathbf{x}\); we verify it is linear. Let \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^3\) and let \(c\) be a scalar.
Vector addition: \[ \begin{align*} A(\mathbf{u}+\mathbf{v}) &= \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 \end{bmatrix} \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ u_3 + v_3 \end{bmatrix} \\\\ &= (u_1 + v_1)\mathbf{a}_1 + (u_2 + v_2)\mathbf{a}_2 + (u_3 + v_3)\mathbf{a}_3 \\\\ &= (u_1 \mathbf{a}_1 + u_2 \mathbf{a}_2 + u_3 \mathbf{a}_3) + (v_1 \mathbf{a}_1 + v_2 \mathbf{a}_2 + v_3 \mathbf{a}_3) \\\\ &= A\mathbf{u} + A\mathbf{v}. \end{align*} \]
Scalar multiplication: \[ \begin{align*} A(c\mathbf{u}) &= \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 \end{bmatrix} \begin{bmatrix} c u_1\\ c u_2 \\ c u_3 \end{bmatrix} \\\\ &= c (u_1 \mathbf{a}_1) + c (u_2 \mathbf{a}_2) + c (u_3 \mathbf{a}_3) \\\\ &= c (u_1 \mathbf{a}_1 + u_2 \mathbf{a}_2 + u_3 \mathbf{a}_3) \\\\ &= c (A\mathbf{u}). \end{align*} \]
More generally, vector addition and scalar multiplication are preserved under any linear transformation. Two consequences follow immediately from the definition. First, setting \(c = 0\) in the second property gives \[ T(\mathbf{0}) = T(0 \cdot \mathbf{u}) = 0 \cdot T(\mathbf{u}) = \mathbf{0}. \] Second, combining both properties yields linearity over arbitrary linear combinations: for any scalars \(a, b\) and vectors \(\mathbf{u}, \mathbf{v}\) in the domain, \[ T(a\mathbf{u} + b\mathbf{v}) = a\, T(\mathbf{u}) + b\, T(\mathbf{v}). \]
You may have encountered surjectivity and injectivity in the context of single-variable functions \(f: \mathbb{R} \to \mathbb{R}\). The same definitions apply to linear transformations between higher-dimensional spaces — and they reveal deep structural information about the underlying matrix.
A mapping \(T: \mathbb{R}^n \to \mathbb{R}^m\) is said to be onto (or surjective) if for every \(\mathbf{b} \in \mathbb{R}^m\), there exists \(\mathbf{x} \in \mathbb{R}^n\) such that \(T(\mathbf{x}) = \mathbf{b}\).
Equivalently, the range of \(T\) (the set of all outputs) is equal to the codomain \(\mathbb{R}^m\).
Intuitively, surjectivity means \(T\) reaches every point of the target space \(\mathbb{R}^m\). For a linear transformation \(T(\mathbf{x}) = A\mathbf{x}\), the image of \(T\) is exactly the set of all linear combinations of the columns of \(A\) — that is, the span of the columns. Surjectivity therefore reduces to a question about the columns.
Let \(T: \mathbb{R}^n \to \mathbb{R}^m\) be the linear transformation \(T(\mathbf{x}) = A\mathbf{x}\). Then \(T\) maps \(\mathbb{R}^n\) onto \(\mathbb{R}^m\) if and only if the columns of \(A\) span \(\mathbb{R}^m\). Equivalently, \(A\mathbf{x} = \mathbf{b}\) has at least one solution for every \(\mathbf{b} \in \mathbb{R}^m\).
Writing \(A = \begin{bmatrix} \mathbf{a}_1 & \cdots & \mathbf{a}_n \end{bmatrix}\), we have \[ T(\mathbf{x}) = A\mathbf{x} = x_1 \mathbf{a}_1 + \cdots + x_n \mathbf{a}_n, \] so the range of \(T\) is exactly \(\operatorname{Span}\{\mathbf{a}_1, \ldots, \mathbf{a}_n\}\). Hence \(T\) is onto \(\mathbb{R}^m\) if and only if this span equals \(\mathbb{R}^m\).
A mapping \(T: \mathbb{R}^n \to \mathbb{R}^m\) is said to be one-to-one (or injective) if for all \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^n\), \[ T(\mathbf{u}) = T(\mathbf{v}) \implies \mathbf{u} = \mathbf{v}. \] Equivalently, for each \(\mathbf{b} \in \mathbb{R}^m\), the equation \(T(\mathbf{x}) = \mathbf{b}\) has either a unique solution or no solution.
Intuitively, injectivity means \(T\) loses no information: distinct inputs produce distinct outputs. For linear transformations, this property is rigidly tied to the linear independence of the columns of \(A\) — and equivalently, to the homogeneous equation \(A\mathbf{x} = \mathbf{0}\) admitting only the trivial solution.
Let \(T: \mathbb{R}^n \to \mathbb{R}^m\) be the linear transformation \(T(\mathbf{x}) = A\mathbf{x}\). The following are equivalent:
- \(T\) is one-to-one.
- The homogeneous equation \(A\mathbf{x} = \mathbf{0}\) has only the trivial solution \(\mathbf{x} = \mathbf{0}\).
- The columns of \(A\) are linearly independent.
(2) \(\Leftrightarrow\) (3) is the definition of linear independence applied to the columns of \(A\) (treated on the previous page).
(1) \(\Rightarrow\) (2): Since \(T\) is linear, \(T(\mathbf{0}) = \mathbf{0}\). If \(T\) is one-to-one and \(T(\mathbf{x}) = \mathbf{0}\), then \(T(\mathbf{x}) = T(\mathbf{0})\) forces \(\mathbf{x} = \mathbf{0}\). So \(\mathbf{0}\) is the only solution of \(A\mathbf{x} = \mathbf{0}\).
(2) \(\Rightarrow\) (1): Suppose \(T(\mathbf{u}) = T(\mathbf{v})\) for some \(\mathbf{u}, \mathbf{v}\). By linearity, \(T(\mathbf{u} - \mathbf{v}) = T(\mathbf{u}) - T(\mathbf{v}) = \mathbf{0}\). Assumption (2) then forces \(\mathbf{u} - \mathbf{v} = \mathbf{0}\), i.e. \(\mathbf{u} = \mathbf{v}\). Hence \(T\) is one-to-one.