The Universal Property of a Product
Many constructions in mathematics take a collection of objects and assemble a new object out of
them. The cartesian product of two sets, the direct sum of two vector spaces, the kernel of a group
homomorphism, the intersection of two subsets, the greatest common divisor of two integers — these
look like unrelated operations from unrelated fields. The work of this stage is to show that they are
instances of a single categorical pattern. We begin with the most familiar case, the product, and
extract from it the form that the others will share.
The guiding example is the cartesian product of sets. For sets \(X\) and \(Y\), an element of
\(X \times Y\) is an element of \(X\) together with an element of \(Y\). Phrased in terms of maps out
of a one-element set — recalling that an element of a set is the same as a map into it from the
terminal set \(1\) — this says that a map \(1 \to X \times Y\) amounts to a map \(1 \to X\) together
with a map \(1 \to Y\). The same correspondence holds with \(1\) replaced by an arbitrary set \(A\):
a map \(A \to X \times Y\) is exactly a pair of maps \(A \to X\) and \(A \to Y\), the bijection being
given by composition with the two projection maps
\[
X \xleftarrow{\;p_1\;} X \times Y \xrightarrow{\;p_2\;} Y, \qquad
(x, y) \mapsto x, \quad (x, y) \mapsto y .
\]
The content of the construction is therefore not the formula for \(X \times Y\) but this bijection:
maps into the product are pairs of maps into the factors. This is a property expressible purely in
the language of objects and maps, with no reference to elements, and it is the property we take as the
definition.
Definition: Product
Let \(\mathscr{A}\) be a category, \(I\) a set, and \((X_i)_{i \in I}\) a family of objects of
\(\mathscr{A}\). A product of \((X_i)_{i \in I}\) consists of an object \(P\)
together with a family of maps
\[
\big(P \xrightarrow{\;p_i\;} X_i\big)_{i \in I},
\]
called the projections, with the following universal property: for every object
\(A\) and every family of maps \(\big(A \xrightarrow{\;f_i\;} X_i\big)_{i \in I}\), there exists a
unique map \(\bar{f} : A \to P\) such that \(p_i \circ \bar{f} = f_i\) for all \(i \in I\).
When the product exists, the object \(P\) is written \(\prod_{i \in I} X_i\), and the unique induced
map \(\bar{f}\) is written \((f_i)_{i \in I}\); the maps \(f_i\) are its components.
Strictly the product is the object together with its projections, but it is customary to refer to
\(P\) alone as the product and to leave the projections understood.
The case that recovers the opening example is \(I = \{1, 2\}\). Here the data are two objects \(X\)
and \(Y\), the product carries two projections \(p_1 : P \to X\) and \(p_2 : P \to Y\), and the
universal property says that for any object \(A\) equipped with maps \(f_1 : A \to X\) and
\(f_2 : A \to Y\) there is a unique map \(\bar{f} : A \to P\) reproducing them through the
projections:
\[
p_1 \circ \bar{f} = f_1, \qquad p_2 \circ \bar{f} = f_2 .
\]
This binary product is written \(P = X \times Y\), and the induced map is written
\(\bar{f} = (f_1, f_2)\). It is the case to keep in mind; the general definition differs from it only
in allowing an arbitrary indexing family in place of the pair.
A definition by mapping-in
The definition does not say what the elements of \(P\) are; it says what the maps into \(P\) are.
An object is determined not by an internal description but by how every other object maps to it,
and the product is the object whose incoming maps are exactly families of maps to the factors.
This is the same style of definition met earlier when an object was characterized by the
presheaf it represents:
the product represents the presheaf sending an object \(A\) to the set
\(\prod_{i \in I} \mathscr{A}(A, X_i)\) of families of maps out of \(A\), contravariantly in \(A\).
Reading the definition this way lets the uniqueness of products follow from the general fact that
objects with the same representable are isomorphic,
rather than being checked case by case, as the next remarks record.
Two remarks attach to the definition and apply equally to every construction that follows. First,
products need not exist: in the category with two objects and no maps between them other than
identities, the two objects have no product, since there is no object admitting a map to each from
every potential source in the required way. Second, when a product does exist it is unique up to
isomorphism.
This can be proved directly, in the manner of the uniqueness of initial objects, and it
is also an instance of the general principle that an object representing a given functor is determined
up to canonical isomorphism. Either route justifies the definite article.
Products across categories
The definition acquires its weight from the range of constructions it captures. We collect the
principal examples.
Example: products in Set
Any two sets \(X\) and \(Y\) have a product, namely the
cartesian product
\(X \times Y\) with its usual projections. To see that it satisfies the universal property, take
any set \(A\) with functions \(f_1 : A \to X\) and \(f_2 : A \to Y\), and define
\(\bar{f} : A \to X \times Y\) by \(\bar{f}(a) = (f_1(a), f_2(a))\). Then
\(p_i \circ \bar{f} = f_i\) for \(i = 1, 2\), so \(\bar{f}\) makes the triangles commute. It is
the only such map: if \(\hat{f} : A \to X \times Y\) also satisfies \(p_i \circ \hat{f} = f_i\)
and we write \(\hat{f}(a) = (x, y)\), then \(x = p_1(\hat{f}(a)) = f_1(a)\) and likewise
\(y = f_2(a)\), so \(\hat{f}(a) = (f_1(a), f_2(a)) = \bar{f}(a)\) for every \(a\), giving
\(\hat{f} = \bar{f}\). The categorical product in \(\mathbf{Set}\) is thus the cartesian product
whose elementwise universal property motivated the definition.
Example: products in Top
Any two topological spaces \(X\) and \(Y\) have a product: the set \(X \times Y\) carrying the
product topology,
with the standard projections. The product topology is designed precisely so that a function
\(A \to X \times Y\), \(t \mapsto (x(t), y(t))\), is continuous if and only if both coordinate
functions \(t \mapsto x(t)\) and \(t \mapsto y(t)\) are continuous. That biconditional is the
universal property: a continuous map into the product is the same data as a pair of continuous
maps into the factors. Equivalently, the product topology is the coarsest topology on
\(X \times Y\) making both projections continuous, which is the same requirement read from the
side of the projections rather than the side of the induced map.
Example: products in Vect
For vector spaces \(X\) and \(Y\) over a field \(k\), the product is the direct sum
\(X \oplus Y\), whose elements are pairs \((x, y)\) with \(x \in X\) and \(y \in Y\), with the
two linear projection maps. A linear map \(A \to X \oplus Y\) is the same data as a pair of
linear maps \(A \to X\) and \(A \to Y\), which is the universal property; the verification runs
as in \(\mathbf{Set}\), with the added observation that the induced map \((f_1, f_2)\) is linear
because \(f_1\) and \(f_2\) are. The coincidence of product and direct sum is special to additive
settings; in \(\mathbf{Set}\) the product and the
disjoint union
are different constructions.
A further family of examples comes not from spaces but from orders. A partially ordered set
\((A, \leq)\) becomes a category with one map \(x \to y\) precisely when \(x \leq y\). In such a
category there is at most one map between any two objects, so the commutativity demanded by a
universal property is automatic, and the universal property reduces to an inequality. What a product
becomes in this setting is worth a name.
Definition: Lower Bound and Meet
Let \((A, \leq)\) be a poset and \(x, y \in A\). A lower bound for \(x\) and
\(y\) is an element \(a \in A\) with \(a \leq x\) and \(a \leq y\). A greatest lower
bound, or meet, of \(x\) and \(y\) is a lower bound \(z\) with the
property that every lower bound \(a\) for \(x\) and \(y\) satisfies \(a \leq z\). When it exists
the meet is unique, and is written \(x \wedge y\).
When the poset \((A, \leq)\) is regarded as a category, the meet \(x \wedge y\) is exactly the
product of \(x\) and \(y\): the conditions \(z \leq x\) and \(z \leq y\) are the projections, and the
requirement that any lower bound \(a\) factor through \(z\) is the universal property, here amounting
to \(a \leq z\). Three standard orders make the meaning concrete.
Example: meets as products in three orders
(a) In \((\mathbb{R}, \leq)\) the meet of \(x\) and \(y\) is the minimum
\(\min\{x, y\}\): it satisfies \(\min\{x, y\} \leq x\) and \(\min\{x, y\} \leq y\), and any
\(a\) with \(a \leq x\) and \(a \leq y\) satisfies \(a \leq \min\{x, y\}\).
(b) Fix a set \(S\). In the poset \((\mathscr{P}(S), \subseteq)\) of subsets
ordered by inclusion, the meet of \(X\) and \(Y\) is the intersection \(X \cap Y\): it satisfies
\(X \cap Y \subseteq X\) and \(X \cap Y \subseteq Y\), and any \(A\) with \(A \subseteq X\) and
\(A \subseteq Y\) satisfies \(A \subseteq X \cap Y\). This example is the origin of the notation
\(\wedge\).
(c) In \((\mathbb{N}, \mid)\), the positive integers ordered by divisibility, the
meet of \(x\) and \(y\) is the greatest common divisor \(\gcd(x, y)\): it divides both \(x\) and
\(y\), and any \(a\) dividing both \(x\) and \(y\) divides \(\gcd(x, y)\). Thus in the three orders
\[
x \wedge y = \min\{x, y\}, \qquad X \wedge Y = X \cap Y, \qquad x \wedge y = \gcd(x, y),
\]
and each is a product in the corresponding category.
The examples discussed so far are binary, but the definition was stated for an arbitrary indexing
family, and nothing forces the family to have two members. For a family \((X_i)_{i \in I}\) the
product \(\prod_{i \in I} X_i\) carries one projection \(p_i\) for each index, and the universal
property induces from any family of maps \(\big(A \xrightarrow{f_i} X_i\big)_{i \in I}\) a unique map
with those components. In an ordered set this is the greatest lower bound of the whole family, written
\(\bigwedge_{i \in I} x_i\); in \((\mathbb{R}, \leq)\) it is \(\inf\{x_i : i \in I\}\), which exists
exactly when the infimum does. Two limiting cases of the indexing family are instructive.
Example: the empty product is a terminal object
Take \(I = \varnothing\). A family \((X_i)_{i \in \varnothing}\) is empty, and so is every family
of maps \(\big(A \xrightarrow{f_i} X_i\big)_{i \in \varnothing}\); the condition
\(p_i \circ \bar{f} = f_i\) for all \(i \in \varnothing\) holds vacuously. The universal property
therefore reduces to the bare requirement that for each object \(A\) there exist a unique map
\(\bar{f} : A \to P\). An object \(P\) with this property — exactly one map into it from every
object — is a
terminal object.
A product of the empty family is thus precisely a terminal object. This is one reason for writing
\(1\) for a terminal object: in categories such as \(\mathbf{Set}\), \(\mathbf{Top}\),
\(\mathbf{Ring}\), and \(\mathbf{Grp}\) the terminal object has a single element, and a product of
no factors is, in the arithmetic of objects, the empty product — the number \(1\).
Example: powers
Take a single object \(X\) and a set \(I\), and form the constant family \((X)_{i \in I}\) in
which every member is \(X\). Its product, if it exists, is written \(X^I\) and called a
power of \(X\). A map \(A \to X^I\) is then a family of maps \(A \to X\) indexed
by \(I\). For \(X\) a set this recovers the
set of functions
from \(I\) to \(X\): an element of \(X^I\) is a function \(I \to X\), and the power notation
agrees with the exponential already met for sets.
The product, in one definition, has absorbed the cartesian product of sets, the product topology, the
direct sum of vector spaces, the minimum, the intersection, the greatest common divisor, the terminal
object, and the power. Each is the same universal property read in a different category. We now seek a
second construction, of a different shape, to set beside it.
Equalizers
A product assembles an object from several objects with no maps presupposed among them. The second
construction starts instead from a pair of parallel maps and selects the part of the source on which
they agree. Where a product is built from objects, an equalizer is built from an equation. The
construction needs one preliminary configuration.
Definition: Fork
In a category, a fork on a parallel pair \(s, t : X \to Y\) consists of an
object \(A\) and a map \(f : A \to X\) satisfying
\[
s \circ f = t \circ f .
\]
A fork is a map into \(X\) that the two parallel maps cannot tell apart: post-composing with \(s\)
or with \(t\) gives the same result. Among all forks on a given pair there is, in good cases, a
universal one, and it is the equalizer.
Definition: Equalizer
Let \(\mathscr{A}\) be a category and \(s, t : X \to Y\) a parallel pair of maps. An
equalizer of \(s\) and \(t\) is an object \(E\) together with a map
\(i : E \to X\) such that \(s \circ i = t \circ i\) — that is, \((E, i)\) is itself a fork — and
with the following universal property: for every fork \((A, f)\) on \(s\) and \(t\), there exists
a unique map \(\bar{f} : A \to E\) such that
\[
i \circ \bar{f} = f .
\]
The equalizer is the most efficient fork: every other fork factors through it in exactly one way.
As with products, an equalizer need not exist; when it does, it is determined up to a unique
isomorphism commuting with the inclusion, so one speaks of the equalizer. The general
category considered here abstracts a construction first met for sets, where the universal fork is the
literal solution set of an equation.
Example: equalizers in Set
For functions \(s, t : X \to Y\) between sets, take
\[
E = \{x \in X \mid s(x) = t(x)\}
\]
with the
inclusion
\(i : E \hookrightarrow X\). Then \(s \circ i = t \circ i\), since the two maps agree on every
element of \(E\) by the definition of \(E\), so \((E, i)\) is a fork. It is the universal one: if
\((A, f)\) is any fork, then \(s(f(a)) = t(f(a))\) for every \(a \in A\), so \(f(a) \in E\), and
\(f\) factors through the inclusion by the corestriction \(\bar{f} : A \to E\), \(a \mapsto f(a)\).
This \(\bar{f}\) is the unique map with \(i \circ \bar{f} = f\), because \(i\) is injective and so
determines \(\bar{f}\) on each element. The equalizer in \(\mathbf{Set}\) is therefore the
solution set of the single equation \(s(x) = t(x)\), and the universal property is the statement
that any other map whose image solves the equation passes through that solution set.
A single equalizer captures one equation. Combined with products, equalizers capture systems of
simultaneous equations. Given a set \(\Lambda\) and a family of parallel pairs
\(\big(s_\lambda, t_\lambda : X \to Y_\lambda\big)_{\lambda \in \Lambda}\) of maps in \(\mathbf{Set}\),
the simultaneous solution set
\[
\{x \in X \mid s_\lambda(x) = t_\lambda(x) \text{ for all } \lambda \in \Lambda\}
\]
is the equalizer of the two induced maps into the product,
\[
(s_\lambda)_{\lambda \in \Lambda}, \quad (t_\lambda)_{\lambda \in \Lambda} : X \longrightarrow
\prod_{\lambda \in \Lambda} Y_\lambda .
\]
For \(x \in X\) the equation \((s_\lambda)_\lambda(x) = (t_\lambda)_\lambda(x)\) of maps into the
product means \((s_\lambda(x))_\lambda = (t_\lambda(x))_\lambda\) as families, and that holds exactly
when \(s_\lambda(x) = t_\lambda(x)\) for every \(\lambda\). One equalizer of maps into a product thus
encodes a whole system, the product gathering the separate equations into one.
Example: equalizers in Top
For continuous maps \(s, t : X \to Y\) between topological spaces, form the equalizer
\(E = \{x \in X \mid s(x) = t(x)\}\) in sets, with inclusion \(i : E \to X\). Give \(E\) the
subspace topology
inherited from \(X\); then \(i\) is continuous. For any fork \((A, f)\) in \(\mathbf{Top}\), the
induced map \(\bar{f} : A \to E\) of the underlying sets is continuous, because the subspace
topology is the coarsest topology on \(E\) making \(i\) continuous, and \(i \circ \bar{f} = f\)
is continuous. So \((E, i)\) with the subspace topology is the equalizer in \(\mathbf{Top}\). The
situation parallels the product topology: in both, the universal property in spaces follows from
the universal property in sets together with the minimality built into the chosen topology.
Example: kernels are equalizers
Let \(\theta : G \to H\) be a
homomorphism of groups.
Alongside \(\theta\) consider the trivial homomorphism \(\varepsilon : G \to H\) sending every
element to the identity \(e\) of \(H\). The equalizer of \(\theta\) and \(\varepsilon\) is the set
of elements on which they agree, namely \(\{g \in G \mid \theta(g) = e\}\), which is the
kernel
of \(\theta\), together with its inclusion \(\ker\theta \hookrightarrow G\). That this is an
equalizer in \(\mathbf{Grp}\), and not merely in \(\mathbf{Set}\), is the statement that the
induced map \(\bar{f}\) into the kernel is a homomorphism whenever the fork \(f\) is; this holds
because \(\bar{f}\) agrees with \(f\) as a function and \(f\) is a homomorphism. Kernels are
therefore a special case of equalizers: the kernel of \(\theta\) is the equalizer of \(\theta\)
against the trivial map.
Example: equalizers in Vect
For linear maps \(s, t : V \to W\) between vector spaces, the difference \(t - s : V \to W\) is
again linear, and the two maps agree exactly where their difference vanishes. The equalizer of
\(s\) and \(t\) is therefore the
subspace
\(\ker(t - s) = \{v \in V \mid s(v) = t(v)\}\), with its inclusion \(\ker(t - s) \hookrightarrow V\).
As in the group case, the kernel of a single linear map is recovered by taking \(s\) to be the
zero map, so that \(\ker(t - s) = \ker t\). The pattern is uniform across additive categories:
the equalizer of a parallel pair is the kernel of their difference.
From equations to objects
The equalizer turns the act of imposing an equation into the construction of an object. In sets
it is a solution set; in spaces a solution set carrying the inherited topology; in groups and
vector spaces a kernel. The translation runs in one direction throughout: a condition of the form
"the two maps agree" becomes a universal map into the source, and the object so produced
remembers nothing of the equation except the part of the source that satisfies it. Read this way,
the language of universal maps applies wherever a construction is specified by the equations it
must satisfy rather than by an explicit formula — the equalizer is the categorical form of
"the subobject cut out by an equation."
Pullbacks
The third construction takes a pair of maps with a common target and forms the object of compatible
pairs over it. A product pairs elements freely; a pullback pairs only those elements of the two
sources that are sent to the same point of the target. The starting data are two maps into a shared
object,
\[
X \xrightarrow{\;s\;} Z \xleftarrow{\;t\;} Y ,
\]
a configuration called a cospan, and the construction selects from \(X\) and \(Y\)
the part that agrees over \(Z\).
Definition: Pullback
Let \(\mathscr{A}\) be a category and \(s : X \to Z\), \(t : Y \to Z\) maps with common target.
A pullback of this cospan is an object \(P\) together with maps \(p_1 : P \to X\)
and \(p_2 : P \to Y\) such that the square
\[
\begin{array}{ccc}
P & \xrightarrow{\;p_2\;} & Y \\[4pt]
{\scriptstyle p_1}\big\downarrow & & \big\downarrow{\scriptstyle t} \\[4pt]
X & \xrightarrow{\;s\;} & Z
\end{array}
\]
commutes, with the following universal property: for every object \(A\) and maps
\(f_1 : A \to X\), \(f_2 : A \to Y\) such that \(s \circ f_1 = t \circ f_2\), there exists a
unique map \(\bar{f} : A \to P\) with
\[
p_1 \circ \bar{f} = f_1, \qquad p_2 \circ \bar{f} = f_2 .
\]
The commuting square exhibited by \((P, p_1, p_2)\) is called a pullback square, and
the object \(P\) a fibred product. A commuting outer square \(s \circ f_1 = t \circ f_2\)
is precisely a fork-like compatibility condition on the pair \((f_1, f_2)\); the universal property
asserts that any such compatible pair factors uniquely through the pullback. Since the commutativity
of the inner square is given, requiring \(\bar{f}\) to "make the diagram commute" means only the two
equations \(p_1 \circ \bar{f} = f_1\) and \(p_2 \circ \bar{f} = f_2\). As with the earlier
constructions, a pullback need not exist, and when it does it is determined up to a unique compatible
isomorphism.
The name fibred product is explained by a limiting case. When \(Z\) is a terminal object \(1\), the
maps \(s\) and \(t\) are the unique maps to \(1\) and impose no condition, so the compatibility
\(s \circ f_1 = t \circ f_2\) holds automatically. The pullback then reduces to an object \(P\) with
maps to \(X\) and \(Y\) through which every pair of maps factors uniquely — that is, to the product
\(X \times Y\). A pullback is thus a product taken not over a point but over a base object \(Z\),
pairing elements fibre by fibre.
Example: pullbacks in Set
For functions \(s : X \to Z\) and \(t : Y \to Z\), the pullback is the set of compatible pairs
\[
P = \{(x, y) \in X \times Y \mid s(x) = t(y)\}
\]
with projections \(p_1(x, y) = x\) and \(p_2(x, y) = y\). The square commutes by construction. For
any set \(A\) with maps \(f_1 : A \to X\), \(f_2 : A \to Y\) satisfying \(s(f_1(a)) = t(f_2(a))\)
for all \(a\), the pair \((f_1(a), f_2(a))\) lies in \(P\), so \(\bar{f}(a) = (f_1(a), f_2(a))\)
defines a map \(A \to P\) with \(p_i \circ \bar{f} = f_i\), and it is the only such map since its
two components are forced. This \(P\) is the fibred product of \(X\) and \(Y\) over \(Z\): over
each point \(z \in Z\) it places the product of the fibres \(s^{-1}(z)\) and \(t^{-1}(z)\).
Two constructions already familiar from set theory are pullbacks in disguise.
Example: inverse images are pullbacks
Given a function \(f : X \to Y\) and a subset \(Y' \subseteq Y\), the inverse image
\[
f^{-1}Y' = \{x \in X \mid f(x) \in Y'\} \subseteq X
\]
fits into a square with the inclusions \(j : Y' \hookrightarrow Y\) and
\(i : f^{-1}Y' \hookrightarrow X\) and the corestriction \(f' : f^{-1}Y' \to Y'\),
\(x \mapsto f(x)\):
\[
\begin{array}{ccc}
f^{-1}Y' & \xrightarrow{\;f'\;} & Y' \\[4pt]
{\scriptstyle i}\big\downarrow & & \big\downarrow{\scriptstyle j} \\[4pt]
X & \xrightarrow{\;f\;} & Y
\end{array}
\]
The starting data are the lower-right corner — the objects \(X, Y, Y'\) with the maps \(f\) and
\(j\) — and the remaining corner \(f^{-1}Y'\) is what the construction produces. To see that this
is a pullback square, take any commuting square with apex \(A\),
\[
\begin{array}{ccc}
A & \xrightarrow{\;h\;} & Y' \\[4pt]
{\scriptstyle g}\big\downarrow & & \big\downarrow{\scriptstyle j} \\[4pt]
X & \xrightarrow{\;f\;} & Y
\end{array}
\]
so that \(f \circ g = j \circ h\). For each \(a \in A\) we have \(f(g(a)) = j(h(a)) = h(a) \in Y'\),
hence \(g(a) \in f^{-1}Y'\); define \(k : A \to f^{-1}Y'\) by \(k(a) = g(a)\). Then
\(i \circ k = g\), and \(f'(k(a)) = f(g(a)) = h(a)\) gives \(f' \circ k = h\), so \(k\) makes the
diagram commute. It is unique: any \(k\) with \(i \circ k = g\) satisfies \(k(a) = g(a)\) for all
\(a\), since \(i\) is the inclusion. Thus \(f^{-1}Y'\) with \(i\) and \(f'\) is the pullback of
\(f\) along \(j\). This is the source of the name: one says \(Y'\) is "pulled back" along \(f\) to
the subset \(f^{-1}Y'\) of \(X\).
Example: intersections are pullbacks
Let \(X\) and \(Y\) be subsets of a set \(Z\). Their intersection sits in a square of inclusions
\[
\begin{array}{ccc}
X \cap Y & \hookrightarrow & Y \\[4pt]
\big\downarrow & & \big\downarrow \\[4pt]
X & \hookrightarrow & Z
\end{array}
\]
and this is a pullback square. It is the special case of the inverse-image example in which
\(f : X \hookrightarrow Z\) is the inclusion and the subset is \(Y \subseteq Z\): the inverse
image \(f^{-1}Y = \{x \in X \mid x \in Y\}\) is exactly \(X \cap Y\). The intersection is thus the
fibred product of \(X\) and \(Y\) over their common ambient set, recovering the earlier
description of \(X \cap Y\) as a meet from the wider vantage of pullbacks over \(Z\) rather than
products in the subset order.
Pairing over a base
The shift from a point to an arbitrary base object is what turns a Cartesian product into a fibred
product, and it is the categorical content shared by inverse images and intersections. Wherever
two structures are to be combined into the pairs that agree over shared data — records matched on
a common key, two sections of a bundle agreeing over the base, two local pieces agreeing on their
overlap — the construction is a pullback, and its defining property is that a compatible pair of
maps determines a single map into the combined object.
Monics
Pullbacks supply the setting for a notion that will recur throughout the theory. For functions
between sets, injectivity — that distinct inputs have distinct outputs — is a basic and useful
property. In an arbitrary category there are no elements to compare, so injectivity in its literal
form does not make sense; but there is a property phrased entirely in terms of maps that plays the
same role, and it is characterized by a pullback square.
Definition: Monic
Let \(\mathscr{A}\) be a category. A map \(f : X \to Y\) is monic (or a
monomorphism) if for every object \(A\) and every pair of maps
\(x, x' : A \to X\),
\[
f \circ x = f \circ x' \;\implies\; x = x' .
\]
The condition is that \(f\) can be cancelled on the left: if two maps into \(X\) become equal after
composing with \(f\), they were already equal. Viewing a map \(A \to X\) as a generalized element of
\(X\) — a probe of \(X\) by the object \(A\) — the definition reads as the statement that \(f\)
carries distinct generalized elements of \(X\) to distinct generalized elements of \(Y\). Being
monic is thus the generalized-element form of injectivity, with the arbitrary probe \(A\) in place of
a one-element set. The two examples below confirm that in the familiar categories this recovers
injectivity exactly.
Example: monics in Set
In \(\mathbf{Set}\) a map is monic if and only if it is injective. If \(f : X \to Y\) is
injective and \(f \circ x = f \circ x'\) for maps \(x, x' : A \to X\), then
\(f(x(a)) = f(x'(a))\) for every \(a \in A\), so \(x(a) = x'(a)\) by injectivity, giving
\(x = x'\); thus \(f\) is monic. Conversely, suppose \(f\) is monic and take any
\(u, v \in X\) with \(f(u) = f(v)\). Let \(A = 1\) be a one-element set and let
\(x, x' : 1 \to X\) be the maps picking out \(u\) and \(v\). Then \(f \circ x = f \circ x'\),
since both pick out the common value \(f(u) = f(v)\), so \(x = x'\) by the monic property, which
says \(u = v\). Hence \(f\) is injective.
Example: monics in Grp and Vect
The same equivalence holds in categories of algebraic structures. A
group homomorphism,
or a linear map of vector spaces, is monic exactly when it is injective. That an injective
homomorphism is monic follows as in \(\mathbf{Set}\), since the underlying map is injective and
the cancelled maps \(x, x'\) are homomorphisms. For the converse one again tests against a
single element, but a one-element group or vector space carries no information; instead one uses
the free structure on one generator — the group \(\mathbb{Z}\), or the one-dimensional space
\(k\) — whose homomorphisms out of it correspond to single elements of the target. A map
\(A \to X\) out of the free structure on one generator selects an element of \(X\), and running
the argument of the previous example with this \(A\) in place of \(1\) shows that a monic
homomorphism is injective.
The reason monics belong beside pullbacks is the following characterization, which expresses
left-cancellability as a single universal square and so makes monics available wherever pullbacks
are understood.
Lemma: Monics as Pullbacks
A map \(f : X \to Y\) is monic if and only if the square
\[
\begin{array}{ccc}
X & \xrightarrow{\;1\;} & X \\[4pt]
{\scriptstyle 1}\big\downarrow & & \big\downarrow{\scriptstyle f} \\[4pt]
X & \xrightarrow{\;f\;} & Y
\end{array}
\]
with both maps from the top-left copy of \(X\) the identity, is a
pullback.
Proof
The square commutes, since \(f \circ 1 = f = f \circ 1\). The question is whether it has the
universal property of a pullback of the cospan \(X \xrightarrow{\;f\;} Y \xleftarrow{\;f\;} X\).
A compatible pair on that cospan is an object \(A\) with maps \(a, b : A \to X\) satisfying
\(f \circ a = f \circ b\). Both projections of the square are the identity on \(X\), so the
universal property demands, for each such pair, a unique \(\bar{f} : A \to X\) with
\(1 \circ \bar{f} = a\) and \(1 \circ \bar{f} = b\) — that is, a unique \(\bar{f}\) with
\(\bar{f} = a\) and \(\bar{f} = b\).
Suppose first that \(f\) is monic. Given a compatible pair \(a, b : A \to X\) with
\(f \circ a = f \circ b\), the monic property gives \(a = b\). Setting \(\bar{f} = a\) then
satisfies \(\bar{f} = a\) and \(\bar{f} = b\), and any map with both properties equals \(a\),
so \(\bar{f}\) is unique. The square is therefore a pullback.
Conversely, suppose the square is a pullback, and let \(a, b : A \to X\) satisfy
\(f \circ a = f \circ b\). Then \((a, b)\) is a compatible pair on the cospan, so by the
universal property there is a unique \(\bar{f} : A \to X\) with \(\bar{f} = a\) and
\(\bar{f} = b\). The existence of such a \(\bar{f}\) forces \(a = \bar{f} = b\). Hence
\(f \circ a = f \circ b\) implies \(a = b\), which is the monic property.
The value of the characterization is that it transfers facts about pullbacks to facts about monics.
A construction or functor known to respect pullbacks will respect this particular square, and so
carry monics to monics; read in this direction, results established for limits yield results for
monics with no further work. This is the first appearance of monics, and the link to limits is the
reason they are introduced here rather than in isolation; their further properties, and the dual
notion obtained by reversing every arrow, are taken up once limits have been developed in full.
Three Constructions, One Shape
Products, equalizers, and pullbacks have been treated as three constructions. Set side by side, they
are visibly variations on a single theme. Each starts with a configuration of objects and maps; each
produces a new object equipped with maps to the objects of the configuration; and in each the new
object is characterized by a universal property, that every other candidate factors through it in
exactly one way. The constructions differ only in the shape of the starting configuration. Isolating
that shape is the step that unifies them.
Consider the starting data of each construction, stripped of the universal object it generates. For a
binary product it is a pair of objects with no maps between them,
\[
X \qquad Y .
\]
For an equalizer it is a parallel pair of maps,
\[
X \rightrightarrows Y ,
\]
carrying the labels \(s\) and \(t\). For a pullback it is a cospan,
\[
X \xrightarrow{\;s\;} Z \xleftarrow{\;t\;} Y .
\]
Each of these is a small diagram of objects and maps, and the construction in each case extracts a
universal object lying over that diagram. The difference between the three lies entirely in which
diagram one begins with.
A diagram of a given shape, drawn inside a category \(\mathscr{A}\), is exactly a
functor
into \(\mathscr{A}\) from a category that encodes the shape. The simplest shapes make this concrete. An
object of \(\mathscr{A}\) amounts to a functor \(\mathbf{1} \to \mathscr{A}\) from the one-object
category, which labels the single object with the name of an object of \(\mathscr{A}\). A map in
\(\mathscr{A}\) amounts to a functor \(\mathbf{2} \to \mathscr{A}\), where \(\mathbf{2}\) is the
category with two objects \(0\) and \(1\), one non-identity map \(0 \to 1\), and nothing else: such a
functor selects a map of \(\mathscr{A}\) together with its source and target. The same idea names the
three shapes above. Let
\[
\mathbf{T} = \{\bullet \quad \bullet\}, \qquad
\mathbf{E} = \big\{\bullet \rightrightarrows \bullet\big\}, \qquad
\mathbf{P} = \big\{\bullet \rightarrow \bullet \leftarrow \bullet\big\}
\]
be, respectively, the category with two objects and no non-identity maps; the category with two
objects and two parallel non-identity maps; and the category with three objects and two non-identity
maps sharing a target. A functor out of each into \(\mathscr{A}\) then reproduces exactly the starting
data examined above — a pair of objects, a parallel pair of maps, a cospan — so the three
configurations are revealed as one kind of object, a functor into \(\mathscr{A}\), differing only in
their domain.
The shape is an index category
The category encoding the shape is called an index category, and a functor from
it into \(\mathscr{A}\) is a diagram of that shape in \(\mathscr{A}\). Products,
equalizers, and pullbacks correspond to the index categories \(\mathbf{T}\), \(\mathbf{E}\), and
\(\mathbf{P}\); changing the index category changes the construction while leaving the form of the
universal property intact. What remains constant across all three is the act of forming, from a
diagram, a single universal object that maps to the diagram and through which every compatible
family of maps factors uniquely. That constant act is the notion of a limit, and the three
constructions of this stage are its first instances, one index category each.
The universal object produced in each case has been described informally as lying "over" the diagram,
mapping to its objects compatibly with its maps. Making that description precise requires naming the
structure of maps from a fixed object to a whole diagram — a structure that specializes to the
projections of a product, the single map of an equalizer, and the compatible pair of a pullback. With
the common shape now identified, the next development isolates this structure under the name of a
cone and defines the limit of an arbitrary diagram as the universal cone over it, recovering products,
equalizers, and pullbacks as the limits indexed by \(\mathbf{T}\), \(\mathbf{E}\), and \(\mathbf{P}\),
and opening the same construction to every other index category.