Linear Transformation

In the previous section, we viewed the matrix equation \(A\mathbf{x} = \mathbf{b}\) as a way to find a vector. Now, we shift our perspective to view the matrix \(A\) as an object that acts on vectors. This dynamic view where a matrix transforms one space into another is the core of "Linear Transformation," a concept that underpins computer graphics, signal processing, and the layers of deep neural networks.

Definition: Linear Transformation

A transformation (or mapping) \[ T: \mathbb{R}^n \to \mathbb{R}^m \] is linear if for all \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^n\) and all scalars \(c\), the following two properties hold: \[ \begin{align*} &T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}), \\ &T(c\mathbf{u}) = c\, T(\mathbf{u}). \end{align*} \]

Example:

Consider a matrix \(A \in \mathbb{R}^{m \times 3}\) and a vector \(\mathbf{x} \in \mathbb{R}^3\). The matrix-vector product defines the transformation \(T: \mathbf{x} \mapsto A\mathbf{x}\); we verify it is linear. Let \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^3\) and let \(c\) be a scalar.

Vector addition: \[ \begin{align*} A(\mathbf{u}+\mathbf{v}) &= \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 \end{bmatrix} \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ u_3 + v_3 \end{bmatrix} \\\\ &= (u_1 + v_1)\mathbf{a}_1 + (u_2 + v_2)\mathbf{a}_2 + (u_3 + v_3)\mathbf{a}_3 \\\\ &= (u_1 \mathbf{a}_1 + u_2 \mathbf{a}_2 + u_3 \mathbf{a}_3) + (v_1 \mathbf{a}_1 + v_2 \mathbf{a}_2 + v_3 \mathbf{a}_3) \\\\ &= A\mathbf{u} + A\mathbf{v}. \end{align*} \]

Scalar multiplication: \[ \begin{align*} A(c\mathbf{u}) &= \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 \end{bmatrix} \begin{bmatrix} c u_1\\ c u_2 \\ c u_3 \end{bmatrix} \\\\ &= c (u_1 \mathbf{a}_1) + c (u_2 \mathbf{a}_2) + c (u_3 \mathbf{a}_3) \\\\ &= c (u_1 \mathbf{a}_1 + u_2 \mathbf{a}_2 + u_3 \mathbf{a}_3) \\\\ &= c (A\mathbf{u}). \end{align*} \]

More generally, vector addition and scalar multiplication are preserved under any linear transformation. Two consequences follow immediately from the definition. First, setting \(c = 0\) in the second property gives \[ T(\mathbf{0}) = T(0 \cdot \mathbf{u}) = 0 \cdot T(\mathbf{u}) = \mathbf{0}. \] Second, combining both properties yields linearity over arbitrary linear combinations: for any scalars \(a, b\) and vectors \(\mathbf{u}, \mathbf{v}\) in the domain, \[ T(a\mathbf{u} + b\mathbf{v}) = a\, T(\mathbf{u}) + b\, T(\mathbf{v}). \]

You may have encountered surjectivity and injectivity in the context of single-variable functions \(f: \mathbb{R} \to \mathbb{R}\). The same definitions apply to linear transformations between higher-dimensional spaces — and they reveal deep structural information about the underlying matrix.

Definition: Surjective (Onto) Mapping

A mapping \(T: \mathbb{R}^n \to \mathbb{R}^m\) is said to be onto (or surjective) if for every \(\mathbf{b} \in \mathbb{R}^m\), there exists \(\mathbf{x} \in \mathbb{R}^n\) such that \(T(\mathbf{x}) = \mathbf{b}\).

Equivalently, the range of \(T\) (the set of all outputs) is equal to the codomain \(\mathbb{R}^m\).

Intuitively, surjectivity means \(T\) reaches every point of the target space \(\mathbb{R}^m\). For a linear transformation \(T(\mathbf{x}) = A\mathbf{x}\), the image of \(T\) is exactly the set of all linear combinations of the columns of \(A\) — that is, the span of the columns. Surjectivity therefore reduces to a question about the columns.

Theorem: Onto Characterization

Let \(T: \mathbb{R}^n \to \mathbb{R}^m\) be the linear transformation \(T(\mathbf{x}) = A\mathbf{x}\). Then \(T\) maps \(\mathbb{R}^n\) onto \(\mathbb{R}^m\) if and only if the columns of \(A\) span \(\mathbb{R}^m\). Equivalently, \(A\mathbf{x} = \mathbf{b}\) has at least one solution for every \(\mathbf{b} \in \mathbb{R}^m\).

Proof:

Writing \(A = \begin{bmatrix} \mathbf{a}_1 & \cdots & \mathbf{a}_n \end{bmatrix}\), we have \[ T(\mathbf{x}) = A\mathbf{x} = x_1 \mathbf{a}_1 + \cdots + x_n \mathbf{a}_n, \] so the range of \(T\) is exactly \(\operatorname{Span}\{\mathbf{a}_1, \ldots, \mathbf{a}_n\}\). Hence \(T\) is onto \(\mathbb{R}^m\) if and only if this span equals \(\mathbb{R}^m\).

Definition: Injective (One-to-One) Mapping

A mapping \(T: \mathbb{R}^n \to \mathbb{R}^m\) is said to be one-to-one (or injective) if for all \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^n\), \[ T(\mathbf{u}) = T(\mathbf{v}) \implies \mathbf{u} = \mathbf{v}. \] Equivalently, for each \(\mathbf{b} \in \mathbb{R}^m\), the equation \(T(\mathbf{x}) = \mathbf{b}\) has either a unique solution or no solution.

Intuitively, injectivity means \(T\) loses no information: distinct inputs produce distinct outputs. For linear transformations, this property is rigidly tied to the linear independence of the columns of \(A\) — and equivalently, to the homogeneous equation \(A\mathbf{x} = \mathbf{0}\) admitting only the trivial solution.

Theorem: One-to-One Characterization

Let \(T: \mathbb{R}^n \to \mathbb{R}^m\) be the linear transformation \(T(\mathbf{x}) = A\mathbf{x}\). The following are equivalent:

\(T\) is one-to-one.
The homogeneous equation \(A\mathbf{x} = \mathbf{0}\) has only the trivial solution \(\mathbf{x} = \mathbf{0}\).
The columns of \(A\) are linearly independent.

Proof:

(2) \(\Leftrightarrow\) (3) is the definition of linear independence applied to the columns of \(A\) (treated on the previous page).

(1) \(\Rightarrow\) (2): Since \(T\) is linear, \(T(\mathbf{0}) = \mathbf{0}\). If \(T\) is one-to-one and \(T(\mathbf{x}) = \mathbf{0}\), then \(T(\mathbf{x}) = T(\mathbf{0})\) forces \(\mathbf{x} = \mathbf{0}\). So \(\mathbf{0}\) is the only solution of \(A\mathbf{x} = \mathbf{0}\).

(2) \(\Rightarrow\) (1): Suppose \(T(\mathbf{u}) = T(\mathbf{v})\) for some \(\mathbf{u}, \mathbf{v}\). By linearity, \(T(\mathbf{u} - \mathbf{v}) = T(\mathbf{u}) - T(\mathbf{v}) = \mathbf{0}\). Assumption (2) then forces \(\mathbf{u} - \mathbf{v} = \mathbf{0}\), i.e. \(\mathbf{u} = \mathbf{v}\). Hence \(T\) is one-to-one.

Linearity in Mathematics

Linearity is not just about vectors and matrices — it is one of the most pervasive structural ideas in mathematics. You have likely encountered the same pattern in several other contexts; let's collect a few familiar examples.

A real-valued function \(f(x) = m x\) (with no constant term) is a linear transformation \(f: \mathbb{R} \to \mathbb{R}\): \[ \begin{align*} f(ax + by) &= m(ax + by) \\\\ &= a(mx) + b(my) \\\\ &= a\, f(x) + b\, f(y). \end{align*} \] Note: a general line \(g(x) = mx + c\) with nonzero intercept is not a linear transformation in this sense — such maps are called affine. The graph of every linear transformation must pass through the origin, since \(T(\mathbf{0}) = \mathbf{0}\).

If you have started calculus, you may know that differentiation distributes over sums and constants: \[ \frac{d}{dx}\bigl(a\, X(x) + b\, Y(x)\bigr) = a\, \frac{d}{dx} X(x) + b\, \frac{d}{dx} Y(x). \] For example, \[ \frac{d}{dx}(5x^3 + 4x^2) = 15x^2 + 8x = 5\, \frac{d}{dx}(x^3) + 4\, \frac{d}{dx}(x^2), \] so differentiation is a linear operator on functions. (Integration is too.)

From statistics: the expected value is linear. For any random variables \(X, Y\) and constants \(a, b\), \[ \mathbb{E}[a X + b Y] = a\, \mathbb{E}[X] + b\, \mathbb{E}[Y]. \]

Linearity appears everywhere in mathematics — and this ubiquity is precisely why "linear" algebra is so powerful: any time we recognize a linear structure, the entire toolkit developed here transfers immediately.

Matrix Multiplication

Definition: Matrix Multiplication

Suppose \(A\) is an \(m \times n\) matrix and \(B\) is an \(n \times p\) matrix. The product \(AB\) is the \(m \times p\) matrix \[ AB = \begin{bmatrix} A\mathbf{b}_1 & A\mathbf{b}_2 & \cdots & A\mathbf{b}_p \end{bmatrix}, \] where \(\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_p\) are the columns of \(B\).

Each column of \(AB\) is therefore a linear combination of the columns of \(A\) with weights from the corresponding column of \(B\). The entrywise formula \[ (AB)_{ij} = \sum_{k=1}^n a_{ik} b_{kj} \] follows by reading off the \(i\)-th coordinate.

Example:

\[ \begin{align*} AB &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} -1 & -2 \\ 0 & -3 \\ -4 & 0 \end{bmatrix} \\\\ &= \begin{bmatrix} (-1+0-12) & (-2-6+0) \\ (-4+0-24) & (-8-15+0) \\ (-7+0-36) & (-14-24+0) \end{bmatrix} \\\\ &= \begin{bmatrix} -13 & -8 \\ -28 & -23 \\ -43 & -38 \end{bmatrix}. \end{align*} \]

Note that \(BA\) is not defined here, because the number of columns of \(B\) (namely \(2\)) does not match the number of rows of \(A\) (namely \(3\)). Even when both \(AB\) and \(BA\) are defined, in general \(AB \neq BA\) — matrix multiplication is not commutative. Two further pitfalls of intuition borrowed from scalars: \(AB = AC\) does not guarantee \(B = C\) (no cancellation in general), and \(AB = 0\) does not imply that \(A = 0\) or \(B = 0\).

You may wonder why matrix multiplication is not simply entrywise — after all, addition and scalar multiplication are. The reason is structural: matrix multiplication is composition of linear transformations.

Concretely, multiplying a vector \(\mathbf{x}\) by \(B\) yields \(B\mathbf{x}\); subsequently multiplying by \(A\) yields \(A(B\mathbf{x})\). A direct computation from the column-wise definition of \(AB\) confirms that \[ (AB)\mathbf{x} = A(B\mathbf{x}). \] Writing \(\mathbf{x} = x_1 \mathbf{e}_1 + \cdots + x_p \mathbf{e}_p\), both sides expand to \(x_1 (A\mathbf{b}_1) + \cdots + x_p (A\mathbf{b}_p)\). Thus the column-wise product \(AB\) is precisely the matrix of the composition \(\mathbf{x} \mapsto A(B\mathbf{x})\). This identity has a powerful computational consequence: instead of forming the matrix product \(AB\) first, we may compute the cheaper matrix-vector product \(B\mathbf{x}\) and then multiply by \(A\).

CS / Numerical Insight: Order of Operations Matters

Suppose \(A, B, C, D, E\) are dense \(n \times n\) matrices and \(\mathbf{x} \in \mathbb{R}^n\). To compute \(ABCDE\,\mathbf{x}\), evaluating left-to-right would require four matrix-matrix products at \(O(n^3)\) each, costing \(O(n^4)\) operations and \(O(n^2)\) intermediate storage.

Evaluating right-to-left uses associativity: compute \(E\mathbf{x}\), then \(D(E\mathbf{x})\), and so on. Each step is a matrix-vector product at \(O(n^2)\), for a total of \(O(n^2)\) operations and \(O(n)\) storage — orders of magnitude faster. This trick is foundational in numerical linear algebra and the implementation of deep neural networks, where chained matrix-vector products dominate runtime.

Interactive Linear Transformation Visualizer

Linear transformations are often best understood through visual intuition. In two dimensions, every linear transformation can be represented by a matrix multiplication that reshapes, rotates, reflects, or scales the plane.

The following interactive visualizer allows you to explore how different 2x2 matrices affect a 2D grid and various shapes. Try modifying the entries of the matrix and observe how the transformation changes the geometry.

For example, the above 90° clockwise rotation for the square shape is given by \[ \begin{align*} &\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 5 & 5 & 1 \\ 5 & 5 & 1 & 1 \end{bmatrix} \\\\ &= \begin{bmatrix} 5 & 5 & 1 & 1 \\ -1 & -5 & -5 & -1 \end{bmatrix}. \end{align*} \]

Linear Transformation

Loading...