Universal Constructions: Products, Equalizers, Pullbacks

The Universal Property of a Product

Many constructions in mathematics take a collection of objects and assemble a new object out of them. The cartesian product of two sets, the direct sum of two vector spaces, the kernel of a group homomorphism, the intersection of two subsets, the greatest common divisor of two integers — these look like unrelated operations from unrelated fields. The work of this stage is to show that they are instances of a single categorical pattern. We begin with the most familiar case, the product, and extract from it the form that the others will share.

The guiding example is the cartesian product of sets. For sets \(X\) and \(Y\), an element of \(X \times Y\) is an element of \(X\) together with an element of \(Y\). Phrased in terms of maps out of a one-element set — recalling that an element of a set is the same as a map into it from the terminal set \(1\) — this says that a map \(1 \to X \times Y\) amounts to a map \(1 \to X\) together with a map \(1 \to Y\). The same correspondence holds with \(1\) replaced by an arbitrary set \(A\): a map \(A \to X \times Y\) is exactly a pair of maps \(A \to X\) and \(A \to Y\), the bijection being given by composition with the two projection maps

\[ X \xleftarrow{\;p_1\;} X \times Y \xrightarrow{\;p_2\;} Y, \qquad (x, y) \mapsto x, \quad (x, y) \mapsto y . \]

The content of the construction is therefore not the formula for \(X \times Y\) but this bijection: maps into the product are pairs of maps into the factors. This is a property expressible purely in the language of objects and maps, with no reference to elements, and it is the property we take as the definition.

Definition: Product

Let \(\mathscr{A}\) be a category, \(I\) a set, and \((X_i)_{i \in I}\) a family of objects of \(\mathscr{A}\). A product of \((X_i)_{i \in I}\) consists of an object \(P\) together with a family of maps \[ \big(P \xrightarrow{\;p_i\;} X_i\big)_{i \in I}, \] called the projections, with the following universal property: for every object \(A\) and every family of maps \(\big(A \xrightarrow{\;f_i\;} X_i\big)_{i \in I}\), there exists a unique map \(\bar{f} : A \to P\) such that \(p_i \circ \bar{f} = f_i\) for all \(i \in I\).

When the product exists, the object \(P\) is written \(\prod_{i \in I} X_i\), and the unique induced map \(\bar{f}\) is written \((f_i)_{i \in I}\); the maps \(f_i\) are its components. Strictly the product is the object together with its projections, but it is customary to refer to \(P\) alone as the product and to leave the projections understood.

The case that recovers the opening example is \(I = \{1, 2\}\). Here the data are two objects \(X\) and \(Y\), the product carries two projections \(p_1 : P \to X\) and \(p_2 : P \to Y\), and the universal property says that for any object \(A\) equipped with maps \(f_1 : A \to X\) and \(f_2 : A \to Y\) there is a unique map \(\bar{f} : A \to P\) reproducing them through the projections: \[ p_1 \circ \bar{f} = f_1, \qquad p_2 \circ \bar{f} = f_2 . \] This binary product is written \(P = X \times Y\), and the induced map is written \(\bar{f} = (f_1, f_2)\). It is the case to keep in mind; the general definition differs from it only in allowing an arbitrary indexing family in place of the pair.

A definition by mapping-in

The definition does not say what the elements of \(P\) are; it says what the maps into \(P\) are. An object is determined not by an internal description but by how every other object maps to it, and the product is the object whose incoming maps are exactly families of maps to the factors. This is the same style of definition met earlier when an object was characterized by the presheaf it represents: the product represents the presheaf sending an object \(A\) to the set \(\prod_{i \in I} \mathscr{A}(A, X_i)\) of families of maps out of \(A\), contravariantly in \(A\). Reading the definition this way lets the uniqueness of products follow from the general fact that objects with the same representable are isomorphic, rather than being checked case by case, as the next remarks record.

Two remarks attach to the definition and apply equally to every construction that follows. First, products need not exist: in the category with two objects and no maps between them other than identities, the two objects have no product, since there is no object admitting a map to each from every potential source in the required way. Second, when a product does exist it is unique up to isomorphism. This can be proved directly, in the manner of the uniqueness of initial objects, and it is also an instance of the general principle that an object representing a given functor is determined up to canonical isomorphism. Either route justifies the definite article.

Products across categories

The definition acquires its weight from the range of constructions it captures. We collect the principal examples.

Example: products in Set

Any two sets \(X\) and \(Y\) have a product, namely the cartesian product \(X \times Y\) with its usual projections. To see that it satisfies the universal property, take any set \(A\) with functions \(f_1 : A \to X\) and \(f_2 : A \to Y\), and define \(\bar{f} : A \to X \times Y\) by \(\bar{f}(a) = (f_1(a), f_2(a))\). Then \(p_i \circ \bar{f} = f_i\) for \(i = 1, 2\), so \(\bar{f}\) makes the triangles commute. It is the only such map: if \(\hat{f} : A \to X \times Y\) also satisfies \(p_i \circ \hat{f} = f_i\) and we write \(\hat{f}(a) = (x, y)\), then \(x = p_1(\hat{f}(a)) = f_1(a)\) and likewise \(y = f_2(a)\), so \(\hat{f}(a) = (f_1(a), f_2(a)) = \bar{f}(a)\) for every \(a\), giving \(\hat{f} = \bar{f}\). The categorical product in \(\mathbf{Set}\) is thus the cartesian product whose elementwise universal property motivated the definition.

Example: products in Top

Any two topological spaces \(X\) and \(Y\) have a product: the set \(X \times Y\) carrying the product topology, with the standard projections. The product topology is designed precisely so that a function \(A \to X \times Y\), \(t \mapsto (x(t), y(t))\), is continuous if and only if both coordinate functions \(t \mapsto x(t)\) and \(t \mapsto y(t)\) are continuous. That biconditional is the universal property: a continuous map into the product is the same data as a pair of continuous maps into the factors. Equivalently, the product topology is the coarsest topology on \(X \times Y\) making both projections continuous, which is the same requirement read from the side of the projections rather than the side of the induced map.

Example: products in Vect

For vector spaces \(X\) and \(Y\) over a field \(k\), the product is the direct sum \(X \oplus Y\), whose elements are pairs \((x, y)\) with \(x \in X\) and \(y \in Y\), with the two linear projection maps. A linear map \(A \to X \oplus Y\) is the same data as a pair of linear maps \(A \to X\) and \(A \to Y\), which is the universal property; the verification runs as in \(\mathbf{Set}\), with the added observation that the induced map \((f_1, f_2)\) is linear because \(f_1\) and \(f_2\) are. The coincidence of product and direct sum is special to additive settings; in \(\mathbf{Set}\) the product and the disjoint union are different constructions.

A further family of examples comes not from spaces but from orders. A partially ordered set \((A, \leq)\) becomes a category with one map \(x \to y\) precisely when \(x \leq y\). In such a category there is at most one map between any two objects, so the commutativity demanded by a universal property is automatic, and the universal property reduces to an inequality. What a product becomes in this setting is worth a name.

Definition: Lower Bound and Meet

Let \((A, \leq)\) be a poset and \(x, y \in A\). A lower bound for \(x\) and \(y\) is an element \(a \in A\) with \(a \leq x\) and \(a \leq y\). A greatest lower bound, or meet, of \(x\) and \(y\) is a lower bound \(z\) with the property that every lower bound \(a\) for \(x\) and \(y\) satisfies \(a \leq z\). When it exists the meet is unique, and is written \(x \wedge y\).

When the poset \((A, \leq)\) is regarded as a category, the meet \(x \wedge y\) is exactly the product of \(x\) and \(y\): the conditions \(z \leq x\) and \(z \leq y\) are the projections, and the requirement that any lower bound \(a\) factor through \(z\) is the universal property, here amounting to \(a \leq z\). Three standard orders make the meaning concrete.

Example: meets as products in three orders

(a) In \((\mathbb{R}, \leq)\) the meet of \(x\) and \(y\) is the minimum \(\min\{x, y\}\): it satisfies \(\min\{x, y\} \leq x\) and \(\min\{x, y\} \leq y\), and any \(a\) with \(a \leq x\) and \(a \leq y\) satisfies \(a \leq \min\{x, y\}\).

(b) Fix a set \(S\). In the poset \((\mathscr{P}(S), \subseteq)\) of subsets ordered by inclusion, the meet of \(X\) and \(Y\) is the intersection \(X \cap Y\): it satisfies \(X \cap Y \subseteq X\) and \(X \cap Y \subseteq Y\), and any \(A\) with \(A \subseteq X\) and \(A \subseteq Y\) satisfies \(A \subseteq X \cap Y\). This example is the origin of the notation \(\wedge\).

(c) In \((\mathbb{N}, \mid)\), the positive integers ordered by divisibility, the meet of \(x\) and \(y\) is the greatest common divisor \(\gcd(x, y)\): it divides both \(x\) and \(y\), and any \(a\) dividing both \(x\) and \(y\) divides \(\gcd(x, y)\). Thus in the three orders \[ x \wedge y = \min\{x, y\}, \qquad X \wedge Y = X \cap Y, \qquad x \wedge y = \gcd(x, y), \] and each is a product in the corresponding category.

The examples discussed so far are binary, but the definition was stated for an arbitrary indexing family, and nothing forces the family to have two members. For a family \((X_i)_{i \in I}\) the product \(\prod_{i \in I} X_i\) carries one projection \(p_i\) for each index, and the universal property induces from any family of maps \(\big(A \xrightarrow{f_i} X_i\big)_{i \in I}\) a unique map with those components. In an ordered set this is the greatest lower bound of the whole family, written \(\bigwedge_{i \in I} x_i\); in \((\mathbb{R}, \leq)\) it is \(\inf\{x_i : i \in I\}\), which exists exactly when the infimum does. Two limiting cases of the indexing family are instructive.

Example: the empty product is a terminal object

Take \(I = \varnothing\). A family \((X_i)_{i \in \varnothing}\) is empty, and so is every family of maps \(\big(A \xrightarrow{f_i} X_i\big)_{i \in \varnothing}\); the condition \(p_i \circ \bar{f} = f_i\) for all \(i \in \varnothing\) holds vacuously. The universal property therefore reduces to the bare requirement that for each object \(A\) there exist a unique map \(\bar{f} : A \to P\). An object \(P\) with this property — exactly one map into it from every object — is a terminal object. A product of the empty family is thus precisely a terminal object. This is one reason for writing \(1\) for a terminal object: in categories such as \(\mathbf{Set}\), \(\mathbf{Top}\), \(\mathbf{Ring}\), and \(\mathbf{Grp}\) the terminal object has a single element, and a product of no factors is, in the arithmetic of objects, the empty product — the number \(1\).

Example: powers

Take a single object \(X\) and a set \(I\), and form the constant family \((X)_{i \in I}\) in which every member is \(X\). Its product, if it exists, is written \(X^I\) and called a power of \(X\). A map \(A \to X^I\) is then a family of maps \(A \to X\) indexed by \(I\). For \(X\) a set this recovers the set of functions from \(I\) to \(X\): an element of \(X^I\) is a function \(I \to X\), and the power notation agrees with the exponential already met for sets.

The product, in one definition, has absorbed the cartesian product of sets, the product topology, the direct sum of vector spaces, the minimum, the intersection, the greatest common divisor, the terminal object, and the power. Each is the same universal property read in a different category. We now seek a second construction, of a different shape, to set beside it.

Equalizers

A product assembles an object from several objects with no maps presupposed among them. The second construction starts instead from a pair of parallel maps and selects the part of the source on which they agree. Where a product is built from objects, an equalizer is built from an equation. The construction needs one preliminary configuration.

Definition: Fork

In a category, a fork on a parallel pair \(s, t : X \to Y\) consists of an object \(A\) and a map \(f : A \to X\) satisfying \[ s \circ f = t \circ f . \]

A fork is a map into \(X\) that the two parallel maps cannot tell apart: post-composing with \(s\) or with \(t\) gives the same result. Among all forks on a given pair there is, in good cases, a universal one, and it is the equalizer.

Definition: Equalizer

Let \(\mathscr{A}\) be a category and \(s, t : X \to Y\) a parallel pair of maps. An equalizer of \(s\) and \(t\) is an object \(E\) together with a map \(i : E \to X\) such that \(s \circ i = t \circ i\) — that is, \((E, i)\) is itself a fork — and with the following universal property: for every fork \((A, f)\) on \(s\) and \(t\), there exists a unique map \(\bar{f} : A \to E\) such that \[ i \circ \bar{f} = f . \]

The equalizer is the most efficient fork: every other fork factors through it in exactly one way. As with products, an equalizer need not exist; when it does, it is determined up to a unique isomorphism commuting with the inclusion, so one speaks of the equalizer. The general category considered here abstracts a construction first met for sets, where the universal fork is the literal solution set of an equation.

Example: equalizers in Set

For functions \(s, t : X \to Y\) between sets, take \[ E = \{x \in X \mid s(x) = t(x)\} \] with the inclusion \(i : E \hookrightarrow X\). Then \(s \circ i = t \circ i\), since the two maps agree on every element of \(E\) by the definition of \(E\), so \((E, i)\) is a fork. It is the universal one: if \((A, f)\) is any fork, then \(s(f(a)) = t(f(a))\) for every \(a \in A\), so \(f(a) \in E\), and \(f\) factors through the inclusion by the corestriction \(\bar{f} : A \to E\), \(a \mapsto f(a)\). This \(\bar{f}\) is the unique map with \(i \circ \bar{f} = f\), because \(i\) is injective and so determines \(\bar{f}\) on each element. The equalizer in \(\mathbf{Set}\) is therefore the solution set of the single equation \(s(x) = t(x)\), and the universal property is the statement that any other map whose image solves the equation passes through that solution set.

A single equalizer captures one equation. Combined with products, equalizers capture systems of simultaneous equations. Given a set \(\Lambda\) and a family of parallel pairs \(\big(s_\lambda, t_\lambda : X \to Y_\lambda\big)_{\lambda \in \Lambda}\) of maps in \(\mathbf{Set}\), the simultaneous solution set \[ \{x \in X \mid s_\lambda(x) = t_\lambda(x) \text{ for all } \lambda \in \Lambda\} \] is the equalizer of the two induced maps into the product, \[ (s_\lambda)_{\lambda \in \Lambda}, \quad (t_\lambda)_{\lambda \in \Lambda} : X \longrightarrow \prod_{\lambda \in \Lambda} Y_\lambda . \] For \(x \in X\) the equation \((s_\lambda)_\lambda(x) = (t_\lambda)_\lambda(x)\) of maps into the product means \((s_\lambda(x))_\lambda = (t_\lambda(x))_\lambda\) as families, and that holds exactly when \(s_\lambda(x) = t_\lambda(x)\) for every \(\lambda\). One equalizer of maps into a product thus encodes a whole system, the product gathering the separate equations into one.

Example: equalizers in Top

For continuous maps \(s, t : X \to Y\) between topological spaces, form the equalizer \(E = \{x \in X \mid s(x) = t(x)\}\) in sets, with inclusion \(i : E \to X\). Give \(E\) the subspace topology inherited from \(X\); then \(i\) is continuous. For any fork \((A, f)\) in \(\mathbf{Top}\), the induced map \(\bar{f} : A \to E\) of the underlying sets is continuous, because the subspace topology is the coarsest topology on \(E\) making \(i\) continuous, and \(i \circ \bar{f} = f\) is continuous. So \((E, i)\) with the subspace topology is the equalizer in \(\mathbf{Top}\). The situation parallels the product topology: in both, the universal property in spaces follows from the universal property in sets together with the minimality built into the chosen topology.

Example: kernels are equalizers

Let \(\theta : G \to H\) be a homomorphism of groups. Alongside \(\theta\) consider the trivial homomorphism \(\varepsilon : G \to H\) sending every element to the identity \(e\) of \(H\). The equalizer of \(\theta\) and \(\varepsilon\) is the set of elements on which they agree, namely \(\{g \in G \mid \theta(g) = e\}\), which is the kernel of \(\theta\), together with its inclusion \(\ker\theta \hookrightarrow G\). That this is an equalizer in \(\mathbf{Grp}\), and not merely in \(\mathbf{Set}\), is the statement that the induced map \(\bar{f}\) into the kernel is a homomorphism whenever the fork \(f\) is; this holds because \(\bar{f}\) agrees with \(f\) as a function and \(f\) is a homomorphism. Kernels are therefore a special case of equalizers: the kernel of \(\theta\) is the equalizer of \(\theta\) against the trivial map.

Example: equalizers in Vect

For linear maps \(s, t : V \to W\) between vector spaces, the difference \(t - s : V \to W\) is again linear, and the two maps agree exactly where their difference vanishes. The equalizer of \(s\) and \(t\) is therefore the subspace \(\ker(t - s) = \{v \in V \mid s(v) = t(v)\}\), with its inclusion \(\ker(t - s) \hookrightarrow V\). As in the group case, the kernel of a single linear map is recovered by taking \(s\) to be the zero map, so that \(\ker(t - s) = \ker t\). The pattern is uniform across additive categories: the equalizer of a parallel pair is the kernel of their difference.

From equations to objects

The equalizer turns the act of imposing an equation into the construction of an object. In sets it is a solution set; in spaces a solution set carrying the inherited topology; in groups and vector spaces a kernel. The translation runs in one direction throughout: a condition of the form "the two maps agree" becomes a universal map into the source, and the object so produced remembers nothing of the equation except the part of the source that satisfies it. Read this way, the language of universal maps applies wherever a construction is specified by the equations it must satisfy rather than by an explicit formula — the equalizer is the categorical form of "the subobject cut out by an equation."

Pullbacks

The third construction takes a pair of maps with a common target and forms the object of compatible pairs over it. A product pairs elements freely; a pullback pairs only those elements of the two sources that are sent to the same point of the target. The starting data are two maps into a shared object, \[ X \xrightarrow{\;s\;} Z \xleftarrow{\;t\;} Y , \] a configuration called a cospan, and the construction selects from \(X\) and \(Y\) the part that agrees over \(Z\).

Definition: Pullback

Let \(\mathscr{A}\) be a category and \(s : X \to Z\), \(t : Y \to Z\) maps with common target. A pullback of this cospan is an object \(P\) together with maps \(p_1 : P \to X\) and \(p_2 : P \to Y\) such that the square \[ \begin{array}{ccc} P & \xrightarrow{\;p_2\;} & Y \\[4pt] {\scriptstyle p_1}\big\downarrow & & \big\downarrow{\scriptstyle t} \\[4pt] X & \xrightarrow{\;s\;} & Z \end{array} \] commutes, with the following universal property: for every object \(A\) and maps \(f_1 : A \to X\), \(f_2 : A \to Y\) such that \(s \circ f_1 = t \circ f_2\), there exists a unique map \(\bar{f} : A \to P\) with \[ p_1 \circ \bar{f} = f_1, \qquad p_2 \circ \bar{f} = f_2 . \]

The commuting square exhibited by \((P, p_1, p_2)\) is called a pullback square, and the object \(P\) a fibred product. A commuting outer square \(s \circ f_1 = t \circ f_2\) is precisely a fork-like compatibility condition on the pair \((f_1, f_2)\); the universal property asserts that any such compatible pair factors uniquely through the pullback. Since the commutativity of the inner square is given, requiring \(\bar{f}\) to "make the diagram commute" means only the two equations \(p_1 \circ \bar{f} = f_1\) and \(p_2 \circ \bar{f} = f_2\). As with the earlier constructions, a pullback need not exist, and when it does it is determined up to a unique compatible isomorphism.

The name fibred product is explained by a limiting case. When \(Z\) is a terminal object \(1\), the maps \(s\) and \(t\) are the unique maps to \(1\) and impose no condition, so the compatibility \(s \circ f_1 = t \circ f_2\) holds automatically. The pullback then reduces to an object \(P\) with maps to \(X\) and \(Y\) through which every pair of maps factors uniquely — that is, to the product \(X \times Y\). A pullback is thus a product taken not over a point but over a base object \(Z\), pairing elements fibre by fibre.

Example: pullbacks in Set

For functions \(s : X \to Z\) and \(t : Y \to Z\), the pullback is the set of compatible pairs \[ P = \{(x, y) \in X \times Y \mid s(x) = t(y)\} \] with projections \(p_1(x, y) = x\) and \(p_2(x, y) = y\). The square commutes by construction. For any set \(A\) with maps \(f_1 : A \to X\), \(f_2 : A \to Y\) satisfying \(s(f_1(a)) = t(f_2(a))\) for all \(a\), the pair \((f_1(a), f_2(a))\) lies in \(P\), so \(\bar{f}(a) = (f_1(a), f_2(a))\) defines a map \(A \to P\) with \(p_i \circ \bar{f} = f_i\), and it is the only such map since its two components are forced. This \(P\) is the fibred product of \(X\) and \(Y\) over \(Z\): over each point \(z \in Z\) it places the product of the fibres \(s^{-1}(z)\) and \(t^{-1}(z)\).

Two constructions already familiar from set theory are pullbacks in disguise.

Example: inverse images are pullbacks

Given a function \(f : X \to Y\) and a subset \(Y' \subseteq Y\), the inverse image \[ f^{-1}Y' = \{x \in X \mid f(x) \in Y'\} \subseteq X \] fits into a square with the inclusions \(j : Y' \hookrightarrow Y\) and \(i : f^{-1}Y' \hookrightarrow X\) and the corestriction \(f' : f^{-1}Y' \to Y'\), \(x \mapsto f(x)\): \[ \begin{array}{ccc} f^{-1}Y' & \xrightarrow{\;f'\;} & Y' \\[4pt] {\scriptstyle i}\big\downarrow & & \big\downarrow{\scriptstyle j} \\[4pt] X & \xrightarrow{\;f\;} & Y \end{array} \] The starting data are the lower-right corner — the objects \(X, Y, Y'\) with the maps \(f\) and \(j\) — and the remaining corner \(f^{-1}Y'\) is what the construction produces. To see that this is a pullback square, take any commuting square with apex \(A\), \[ \begin{array}{ccc} A & \xrightarrow{\;h\;} & Y' \\[4pt] {\scriptstyle g}\big\downarrow & & \big\downarrow{\scriptstyle j} \\[4pt] X & \xrightarrow{\;f\;} & Y \end{array} \] so that \(f \circ g = j \circ h\). For each \(a \in A\) we have \(f(g(a)) = j(h(a)) = h(a) \in Y'\), hence \(g(a) \in f^{-1}Y'\); define \(k : A \to f^{-1}Y'\) by \(k(a) = g(a)\). Then \(i \circ k = g\), and \(f'(k(a)) = f(g(a)) = h(a)\) gives \(f' \circ k = h\), so \(k\) makes the diagram commute. It is unique: any \(k\) with \(i \circ k = g\) satisfies \(k(a) = g(a)\) for all \(a\), since \(i\) is the inclusion. Thus \(f^{-1}Y'\) with \(i\) and \(f'\) is the pullback of \(f\) along \(j\). This is the source of the name: one says \(Y'\) is "pulled back" along \(f\) to the subset \(f^{-1}Y'\) of \(X\).

Example: intersections are pullbacks

Let \(X\) and \(Y\) be subsets of a set \(Z\). Their intersection sits in a square of inclusions \[ \begin{array}{ccc} X \cap Y & \hookrightarrow & Y \\[4pt] \big\downarrow & & \big\downarrow \\[4pt] X & \hookrightarrow & Z \end{array} \] and this is a pullback square. It is the special case of the inverse-image example in which \(f : X \hookrightarrow Z\) is the inclusion and the subset is \(Y \subseteq Z\): the inverse image \(f^{-1}Y = \{x \in X \mid x \in Y\}\) is exactly \(X \cap Y\). The intersection is thus the fibred product of \(X\) and \(Y\) over their common ambient set, recovering the earlier description of \(X \cap Y\) as a meet from the wider vantage of pullbacks over \(Z\) rather than products in the subset order.

Pairing over a base

The shift from a point to an arbitrary base object is what turns a Cartesian product into a fibred product, and it is the categorical content shared by inverse images and intersections. Wherever two structures are to be combined into the pairs that agree over shared data — records matched on a common key, two sections of a bundle agreeing over the base, two local pieces agreeing on their overlap — the construction is a pullback, and its defining property is that a compatible pair of maps determines a single map into the combined object.

Monics

Pullbacks supply the setting for a notion that will recur throughout the theory. For functions between sets, injectivity — that distinct inputs have distinct outputs — is a basic and useful property. In an arbitrary category there are no elements to compare, so injectivity in its literal form does not make sense; but there is a property phrased entirely in terms of maps that plays the same role, and it is characterized by a pullback square.

Definition: Monic

Let \(\mathscr{A}\) be a category. A map \(f : X \to Y\) is monic (or a monomorphism) if for every object \(A\) and every pair of maps \(x, x' : A \to X\), \[ f \circ x = f \circ x' \;\implies\; x = x' . \]

The condition is that \(f\) can be cancelled on the left: if two maps into \(X\) become equal after composing with \(f\), they were already equal. Viewing a map \(A \to X\) as a generalized element of \(X\) — a probe of \(X\) by the object \(A\) — the definition reads as the statement that \(f\) carries distinct generalized elements of \(X\) to distinct generalized elements of \(Y\). Being monic is thus the generalized-element form of injectivity, with the arbitrary probe \(A\) in place of a one-element set. The two examples below confirm that in the familiar categories this recovers injectivity exactly.

Example: monics in Set

In \(\mathbf{Set}\) a map is monic if and only if it is injective. If \(f : X \to Y\) is injective and \(f \circ x = f \circ x'\) for maps \(x, x' : A \to X\), then \(f(x(a)) = f(x'(a))\) for every \(a \in A\), so \(x(a) = x'(a)\) by injectivity, giving \(x = x'\); thus \(f\) is monic. Conversely, suppose \(f\) is monic and take any \(u, v \in X\) with \(f(u) = f(v)\). Let \(A = 1\) be a one-element set and let \(x, x' : 1 \to X\) be the maps picking out \(u\) and \(v\). Then \(f \circ x = f \circ x'\), since both pick out the common value \(f(u) = f(v)\), so \(x = x'\) by the monic property, which says \(u = v\). Hence \(f\) is injective.

Example: monics in Grp and Vect

The same equivalence holds in categories of algebraic structures. A group homomorphism, or a linear map of vector spaces, is monic exactly when it is injective. That an injective homomorphism is monic follows as in \(\mathbf{Set}\), since the underlying map is injective and the cancelled maps \(x, x'\) are homomorphisms. For the converse one again tests against a single element, but a one-element group or vector space carries no information; instead one uses the free structure on one generator — the group \(\mathbb{Z}\), or the one-dimensional space \(k\) — whose homomorphisms out of it correspond to single elements of the target. A map \(A \to X\) out of the free structure on one generator selects an element of \(X\), and running the argument of the previous example with this \(A\) in place of \(1\) shows that a monic homomorphism is injective.

The reason monics belong beside pullbacks is the following characterization, which expresses left-cancellability as a single universal square and so makes monics available wherever pullbacks are understood.

Lemma: Monics as Pullbacks

A map \(f : X \to Y\) is monic if and only if the square \[ \begin{array}{ccc} X & \xrightarrow{\;1\;} & X \\[4pt] {\scriptstyle 1}\big\downarrow & & \big\downarrow{\scriptstyle f} \\[4pt] X & \xrightarrow{\;f\;} & Y \end{array} \] with both maps from the top-left copy of \(X\) the identity, is a pullback.

Proof

The square commutes, since \(f \circ 1 = f = f \circ 1\). The question is whether it has the universal property of a pullback of the cospan \(X \xrightarrow{\;f\;} Y \xleftarrow{\;f\;} X\). A compatible pair on that cospan is an object \(A\) with maps \(a, b : A \to X\) satisfying \(f \circ a = f \circ b\). Both projections of the square are the identity on \(X\), so the universal property demands, for each such pair, a unique \(\bar{f} : A \to X\) with \(1 \circ \bar{f} = a\) and \(1 \circ \bar{f} = b\) — that is, a unique \(\bar{f}\) with \(\bar{f} = a\) and \(\bar{f} = b\).

Suppose first that \(f\) is monic. Given a compatible pair \(a, b : A \to X\) with \(f \circ a = f \circ b\), the monic property gives \(a = b\). Setting \(\bar{f} = a\) then satisfies \(\bar{f} = a\) and \(\bar{f} = b\), and any map with both properties equals \(a\), so \(\bar{f}\) is unique. The square is therefore a pullback.

Conversely, suppose the square is a pullback, and let \(a, b : A \to X\) satisfy \(f \circ a = f \circ b\). Then \((a, b)\) is a compatible pair on the cospan, so by the universal property there is a unique \(\bar{f} : A \to X\) with \(\bar{f} = a\) and \(\bar{f} = b\). The existence of such a \(\bar{f}\) forces \(a = \bar{f} = b\). Hence \(f \circ a = f \circ b\) implies \(a = b\), which is the monic property.

The value of the characterization is that it transfers facts about pullbacks to facts about monics. A construction or functor known to respect pullbacks will respect this particular square, and so carry monics to monics; read in this direction, results established for limits yield results for monics with no further work. This is the first appearance of monics, and the link to limits is the reason they are introduced here rather than in isolation; their further properties, and the dual notion obtained by reversing every arrow, are taken up once limits have been developed in full.

Three Constructions, One Shape

Products, equalizers, and pullbacks have been treated as three constructions. Set side by side, they are visibly variations on a single theme. Each starts with a configuration of objects and maps; each produces a new object equipped with maps to the objects of the configuration; and in each the new object is characterized by a universal property, that every other candidate factors through it in exactly one way. The constructions differ only in the shape of the starting configuration. Isolating that shape is the step that unifies them.

Consider the starting data of each construction, stripped of the universal object it generates. For a binary product it is a pair of objects with no maps between them, \[ X \qquad Y . \] For an equalizer it is a parallel pair of maps, \[ X \rightrightarrows Y , \] carrying the labels \(s\) and \(t\). For a pullback it is a cospan, \[ X \xrightarrow{\;s\;} Z \xleftarrow{\;t\;} Y . \] Each of these is a small diagram of objects and maps, and the construction in each case extracts a universal object lying over that diagram. The difference between the three lies entirely in which diagram one begins with.

A diagram of a given shape, drawn inside a category \(\mathscr{A}\), is exactly a functor into \(\mathscr{A}\) from a category that encodes the shape. The simplest shapes make this concrete. An object of \(\mathscr{A}\) amounts to a functor \(\mathbf{1} \to \mathscr{A}\) from the one-object category, which labels the single object with the name of an object of \(\mathscr{A}\). A map in \(\mathscr{A}\) amounts to a functor \(\mathbf{2} \to \mathscr{A}\), where \(\mathbf{2}\) is the category with two objects \(0\) and \(1\), one non-identity map \(0 \to 1\), and nothing else: such a functor selects a map of \(\mathscr{A}\) together with its source and target. The same idea names the three shapes above. Let

\[ \mathbf{T} = \{\bullet \quad \bullet\}, \qquad \mathbf{E} = \big\{\bullet \rightrightarrows \bullet\big\}, \qquad \mathbf{P} = \big\{\bullet \rightarrow \bullet \leftarrow \bullet\big\} \]

be, respectively, the category with two objects and no non-identity maps; the category with two objects and two parallel non-identity maps; and the category with three objects and two non-identity maps sharing a target. A functor out of each into \(\mathscr{A}\) then reproduces exactly the starting data examined above — a pair of objects, a parallel pair of maps, a cospan — so the three configurations are revealed as one kind of object, a functor into \(\mathscr{A}\), differing only in their domain.

The shape is an index category

The category encoding the shape is called an index category, and a functor from it into \(\mathscr{A}\) is a diagram of that shape in \(\mathscr{A}\). Products, equalizers, and pullbacks correspond to the index categories \(\mathbf{T}\), \(\mathbf{E}\), and \(\mathbf{P}\); changing the index category changes the construction while leaving the form of the universal property intact. What remains constant across all three is the act of forming, from a diagram, a single universal object that maps to the diagram and through which every compatible family of maps factors uniquely. That constant act is the notion of a limit, and the three constructions of this stage are its first instances, one index category each.

The universal object produced in each case has been described informally as lying "over" the diagram, mapping to its objects compatibly with its maps. Making that description precise requires naming the structure of maps from a fixed object to a whole diagram — a structure that specializes to the projections of a product, the single map of an equalizer, and the compatible pair of a pullback. With the common shape now identified, the next development isolates this structure under the name of a cone and defines the limit of an arbitrary diagram as the universal cone over it, recovering products, equalizers, and pullbacks as the limits indexed by \(\mathbf{T}\), \(\mathbf{E}\), and \(\mathbf{P}\), and opening the same construction to every other index category.