The Category of Sets

Constructions in Set Equalizers and Quotients Small, Large, and Locally Small Foundations as an Interface

Constructions in Set

Almost every category met so far has been a category of sets-with-structure: groups, vector spaces, topological spaces, each a set carrying extra data, with structure-preserving maps between them. Beneath all of them sits the plainest category of all, the one whose objects are sets with no structure whatsoever and whose morphisms are arbitrary functions. We write it \(\mathbf{Set}\). A set, in this view, is a bag of points carrying no order, no algebra, and no notion of nearness; a function is simply an assignment of one point of the target to each point of the source. Composition of functions is associative and every set has an identity function, so the data form a category in the now-familiar sense.

Most of the universal constructions that the abstract theory treats in full generality can first be met in \(\mathbf{Set}\) in concrete form, where they reduce to transparent operations on bags of points — the surest way to recognize them later in their abstract dress.

Empty and one-element sets

There is a set \(\varnothing\) with no elements, and a set \(1\) with exactly one element. Each has a defining mapping property. Suppose we are asked to specify a function from a set \(A\) to a set \(B\): the task is to choose, for each element of \(A\), an element of \(B\). The larger \(A\) is, the more choices the task demands; the smaller \(A\) is, the fewer. When \(A\) is empty the task is vacuous — there are no elements to send anywhere — and there is exactly one way to perform a task with no steps. Hence for every set \(B\) there is exactly one function \(\varnothing \to B\). Dually, a function \(A \to 1\) must send every element of \(A\) to the sole element of \(1\), so there is exactly one such function for each \(A\). These are precisely the conditions met earlier in the abstract: \(\varnothing\) is an initial object of \(\mathbf{Set}\), and \(1\) is a terminal object.

The terminal set repays a second look. A function \(1 \to B\) is no more and no less than a choice of one element of \(B\): to give the function is to say where the single point of \(1\) lands. Functions \(1 \to B\) therefore correspond exactly to elements of \(B\). The notion of element, seemingly more primitive than the notion of function, is thus recovered as a special case of it — a point of \(B\) is a morphism out of the terminal object. This is the first sign of a recurring theme: structure that looks set-theoretically basic can be re-described arrow-theoretically.

Products and coproducts

Any two sets \(A\) and \(B\) have a product, the set of ordered pairs \[ A \times B = \{ (a, b) \mid a \in A,\ b \in B \}. \] More generally, for any family \((A_i)_{i \in I}\) indexed by a set \(I\) there is a product set \(\prod_{i \in I} A_i\) whose elements are the families \((a_i)_{i \in I}\) with \(a_i \in A_i\) for each \(i\), two such families being equal exactly when they agree coordinatewise.

Definition: Product of Sets

The product of sets \(A\) and \(B\) is the set \(A \times B\) of ordered pairs, equipped with the two projections \(\pi_A : A \times B \to A\) and \(\pi_B : A \times B \to B\) sending \((a,b)\) to \(a\) and to \(b\) respectively. It has the property that for any set \(X\) and any pair of functions \(f : X \to A\), \(g : X \to B\) there is a unique function \(\langle f, g \rangle : X \to A \times B\) with \(\pi_A \circ \langle f,g\rangle = f\) and \(\pi_B \circ \langle f,g\rangle = g\), namely \(x \mapsto (f(x), g(x))\).

Dually, any two sets have a coproduct, formed by placing all the points of both into one bag while keeping a record of which set each came from: \[ A + B = \{ (a, 0) \mid a \in A \} \cup \{ (b, 1) \mid b \in B \}. \] The tags \(0\) and \(1\) keep the two summands disjoint even when \(A\) and \(B\) overlap, which is why the coproduct is also called the disjoint union and written \(A \amalg B\). If \(A\) and \(B\) are finite with \(m\) and \(n\) elements, then \(A + B\) has \(m + n\) elements regardless of any overlap between the original sets — a feature that distinguishes it sharply from the ordinary union \(A \cup B\), which only makes sense once \(A\) and \(B\) sit inside a common set and which may identify elements the coproduct keeps apart.

Definition: Coproduct of Sets

The coproduct (or disjoint union) of sets \(A\) and \(B\) is the set \(A + B\), equipped with the two inclusions \(i : A \to A + B\) and \(j : B \to A + B\) recording each element together with its origin. It has the dual property: for any set \(X\) and any pair of functions \(f : A \to X\), \(g : B \to X\) there is a unique function \([f, g] : A + B \to X\) with \([f,g] \circ i = f\) and \([f,g] \circ j = g\) — the function that applies \(f\) to the points tagged from \(A\) and \(g\) to those tagged from \(B\).

Both constructions are instances of the adjoint pattern already isolated in the abstract. Recall that the diagonal functor \(\Delta : \mathbf{Set} \to \mathbf{Set} \times \mathbf{Set}\) sends a set \(A\) to the pair \((A, A)\), using the product category as its target. The product and the coproduct are exactly the right and left adjoints of \(\Delta\): the universal property of \(A \times B\) displayed above is precisely the bijection \[ (\mathbf{Set} \times \mathbf{Set})\bigl(\Delta X, (A,B)\bigr) \;=\; \mathbf{Set}(X, A) \times \mathbf{Set}(X, B) \;\cong\; \mathbf{Set}\bigl(X, A \times B\bigr), \] natural in \(X\): the first equality is the definition of morphisms in the product category — a map \(\Delta X \to (A,B)\) is a pair of functions \(X \to A\) and \(X \to B\) — and the second is the universal property pairing them into a single map \(X \to A \times B\). This exhibits \({\times}\) as right adjoint to \(\Delta\); the coproduct property is the mirror bijection exhibiting \({+}\) as left adjoint. The bare set-level operations of pairing and disjoint union are thus not ad hoc: each is forced, up to unique isomorphism, by an adjunction to the most elementary functor there is.

Function sets and the exponential

For any two sets \(A\) and \(B\) we may form the set \(A^B\) of all functions from \(B\) to \(A\). This is a product in disguise: \(A^B\) is the product \(\prod_{b \in B} A\) of the constant family that returns \(A\) at every index, since an element of that product is a choice of one element of \(A\) for each \(b \in B\) — that is, a function \(B \to A\). The notation is chosen to echo arithmetic, and the analogy is exact for finite sets: if \(A\) has \(m\) elements and \(B\) has \(n\), then \(A^B\) has \(m^n\) elements, while \(A \times B\) has \(mn\) and \(A + B\) has \(m + n\). The set-level isomorphisms \[ A^{B + C} \;\cong\; A^B \times A^C, \qquad (A^B)^C \;\cong\; A^{B \times C} \] hold for all sets, finite or not, and are the categorical counterparts of the laws of exponents.

The second of these displays the structure already encountered as currying: a function of two arguments \(C \times B \to A\) is the same thing as a function \(C \to A^B\) of one argument returning a function. In \(\mathbf{Set}\) this is the bijection \[ \mathbf{Set}(C \times B,\, A) \;\cong\; \mathbf{Set}(C,\, A^B), \] natural in \(C\), which says that \(-{} \times B\) is left adjoint to \((-)^B\). The function set \(A^B\) is the concrete realization of the object whose existence the abstract currying adjunction only posited.

Definition: Exponential Object in Set

For sets \(A\) and \(B\), the exponential \(A^B\) is the set of all functions \(B \to A\), equipped with the evaluation function \(\mathrm{ev} : A^B \times B \to A\), \((h, b) \mapsto h(b)\). It has the universal property that for every set \(C\) and function \(g : C \times B \to A\) there is a unique function \(\hat{g} : C \to A^B\) with \(\mathrm{ev} \circ (\hat{g} \times 1_B) = g\); explicitly, \(\hat{g}(c)\) is the function \(b \mapsto g(c, b)\). A category with finite products in which every such exponential exists is called cartesian closed, and \(\mathbf{Set}\) is the prototype.

Power sets and characteristic functions

Let \(2\) denote the two-element set \(1 + 1\), whose elements we name \(\mathtt{true}\) and \(\mathtt{false}\). Given a set \(A\) and a subset \(S \subseteq A\), there is a function \(\chi_S : A \to 2\), the characteristic function of \(S\), defined by \[ \chi_S(a) = \begin{cases} \mathtt{true} & \text{if } a \in S, \\ \mathtt{false} & \text{if } a \notin S. \end{cases} \] Conversely, a function \(f : A \to 2\) determines the subset \(f^{-1}\{\mathtt{true}\} = \{a \in A \mid f(a) = \mathtt{true}\}\). These two passages are mutually inverse: each subset yields a unique characteristic function and each function to \(2\) yields a unique subset. Subsets of \(A\) therefore correspond exactly to functions \(A \to 2\).

The functions from \(A\) to \(2\) form, by the previous construction, the set \(2^A\). Read through the correspondence just established, \(2^A\) is the set of all subsets of \(A\). We call it the power set of \(A\) and write it \(\mathcal{P}(A)\). The element \(2\) thus plays a double role — a two-element set, and the receptacle that classifies subsets — and the identification \(\mathcal{P}(A) \cong 2^A\) is the bridge between them. This small fact carries surprising weight: it is the seed of the general phenomenon by which a single object can classify the subobjects of every other, and it will return when the size of \(\mathcal{P}(A)\) is measured against the size of \(A\) later on this page.

Definition: Power Set

The power set of a set \(A\) is the set \(\mathcal{P}(A)\) of all subsets of \(A\). Via characteristic functions it is naturally identified with the function set \(2^A\), where \(2 = \{\mathtt{true}, \mathtt{false}\}\): a subset \(S \subseteq A\) corresponds to its characteristic function \(\chi_S : A \to 2\), and this correspondence is a bijection \(\mathcal{P}(A) \cong 2^A\).

Equalizers and Quotients

The constructions of the previous section build new sets out of old by pairing, tagging, or forming functions. Two further constructions go the other way: they carve out a part of a set, or collapse a set down, according to a condition. Both are governed by mapping properties of exactly the kind already seen, and both will reappear in the abstract theory — the first among the limits, the second among the colimits — so it is worth meeting them concretely while the meaning is still plain.

Equalizers

It would be convenient to be able to specify a subset of a set \(A\) by writing down a property its elements are to satisfy, \(S = \{ a \in A \mid \text{some property of } a \text{ holds} \}\). Giving a general account of what a property is turns out to be delicate, but there is one kind of property that is entirely transparent: the equality of two functions. Given a parallel pair of functions \(f, g : A \rightrightarrows B\) with the same source and target, the elements of \(A\) on which they agree form a subset \[ E = \{ a \in A \mid f(a) = g(a) \}. \] This subset is the equalizer of \(f\) and \(g\): the part of \(A\) on which the two functions are equalized. It comes with an inclusion \(e : E \hookrightarrow A\) satisfying \(f \circ e = g \circ e\), and it is the universal such object.

Definition: Equalizer in Set

The equalizer of a parallel pair \(f, g : A \to B\) is the set \(E = \{ a \in A \mid f(a) = g(a) \}\) together with the inclusion \(e : E \to A\). It has the universal property that \(f \circ e = g \circ e\), and that for any set \(X\) and function \(h : X \to A\) with \(f \circ h = g \circ h\) there is a unique function \(\tilde{h} : X \to E\) with \(e \circ \tilde{h} = h\). Concretely, the condition \(f \circ h = g \circ h\) says that \(h\) lands inside \(E\), and \(\tilde{h}\) is just \(h\) with its codomain restricted.

The equalizer turns a defining equation into a subobject: the kernel of a linear map, for instance, is the equalizer of that map and the zero map. The uniqueness clause is what makes the construction canonical rather than merely possible.

Quotients

Where the equalizer selects a part, the quotient collapses a whole. Quotient constructions are pervasive — quotient groups and rings in algebra, identification spaces in topology — but the most basic context is that of sets. Let \(A\) be a set and \(\sim\) an equivalence relation on \(A\). There is a set \(A/{\sim}\), the quotient of \(A\) by \(\sim\), whose elements are the equivalence classes. The passage from group theory is a special case: given a group \(G\) and a normal subgroup \(N\), the relation \(g \sim h \iff gh^{-1} \in N\) is an equivalence relation, and \(G/{\sim}\) is exactly the quotient group \(G/N\).

The quotient comes with a canonical map \[ p : A \longrightarrow A/{\sim}, \] sending each element to its equivalence class. It is surjective, and it has the defining feature \(p(a) = p(a') \iff a \sim a'\): the map identifies precisely the elements that \(\sim\) declares equivalent, and no others. This map is universal among all functions out of \(A\) that respect the relation.

Definition: Quotient Set and Its Universal Property

Let \(\sim\) be an equivalence relation on a set \(A\). The quotient set \(A/{\sim}\) is the set of equivalence classes, with canonical surjection \(p : A \to A/{\sim}\), \(a \mapsto [a]\). It has the universal property that for any function \(f : A \to B\) satisfying \[ a \sim a' \;\Longrightarrow\; f(a) = f(a') \qquad (\forall\, a, a' \in A), \] there is a unique function \(\bar{f} : A/{\sim} \to B\) with \(\bar{f} \circ p = f\), namely \([a] \mapsto f(a)\). The hypothesis is exactly what makes this assignment well defined: if \([a] = [a']\) then \(a \sim a'\), so \(f(a) = f(a')\), and the value does not depend on the chosen representative.

Thus, for any set \(B\), the functions \(A/{\sim} \to B\) correspond one-to-one with the functions \(A \to B\) that are constant on equivalence classes. This single fact is the engine behind the isomorphism theorems of algebra: a homomorphism that kills a normal subgroup factors uniquely through the quotient, because killing the subgroup is precisely the relation-respecting condition, and the induced map on the quotient is the content of the first isomorphism theorem.

Equalizer and quotient are dual in spirit. The equalizer is built from an inclusion into \(A\) and tests functions into \(A\); the quotient is built from a surjection out of \(A\) and tests functions out of \(A\). Indeed the quotient is a coequalizer: viewing the equivalence relation as a pair of projections \(R \rightrightarrows A\) from the set \(R \subseteq A \times A\) of related pairs, the canonical map \(p\) coequalizes them universally, the exact dual of the way the inclusion \(e\) equalizes \(f\) and \(g\). The one is a limit, assembled from a diagram by a universal cone into it; the other is a colimit, assembled by a universal cone out of it. The general definitions that subsume both — and that recover products, coproducts, and the empty and one-element sets as further special cases — belong to the abstract theory of limits and colimits, which takes these very examples in \(\mathbf{Set}\) as its point of departure.

Small, Large, and Locally Small

The definition of a category speaks of a collection of objects and, for each pair of objects, a collection of morphisms. So far the word collection has been left deliberately vague. The reason is that in many of the categories already met, these collections are too big to be sets. The collection of all groups, of all vector spaces, of all topological spaces — even the collection of all sets — cannot itself be a set, on pain of contradiction. To say this precisely, and to see why it matters, requires a controlled vocabulary for size.

We use the word class informally for any collection of mathematical objects. Every set is a class, but some classes are too big to be sets. A class is called small if it is a set, and large otherwise. The distinction is not a technicality invented for its own sake; it is forced on us by a theorem.

There is no largest set

The fact that drives everything in this section is that sets can always be enlarged. The power-set construction of the previous section does exactly that, and it does so strictly.

Theorem: Power Sets Are Strictly Larger

For every set \(X\) there is an injection \(X \to \mathcal{P}(X)\) but no bijection. Writing \(|X| \leq |Y|\) to mean that an injection \(X \to Y\) exists, this says \[ |X| \;<\; |\mathcal{P}(X)|, \] where \(<\) denotes that an injection exists but no bijection does.

Proof.

The injection is immediate: the map \(x \mapsto \{x\}\) sends \(X\) into \(\mathcal{P}(X)\), and it is injective because \(\{x\} = \{x'\}\) forces \(x = x'\). So \(|X| \leq |\mathcal{P}(X)|\). That there is no bijection is Cantor's theorem, established earlier by the diagonal argument: were \(f : X \to \mathcal{P}(X)\) a bijection, the set \(\{ x \in X \mid x \notin f(x) \}\) would be a subset hit by no \(x\), contradicting surjectivity. Having an injection but no bijection is exactly \(|X| < |\mathcal{P}(X)|\).

A short induction on this inequality shows there can be no largest set: from any set \(X\), the power set \(\mathcal{P}(X)\) is strictly larger, and the power set of that is larger still. More is true, and it is the form we need. Given any family of sets at all, there is a set outside it.

Proposition: The Isomorphism Classes of Sets Form a Large Class

Let \(I\) be a set and \((A_i)_{i \in I}\) a family of sets. Then there exists a set isomorphic to none of the \(A_i\).

Proof sketch.

Form the coproduct \(\sum_{i \in I} A_i\) of the whole family and take its power set, \(A = \mathcal{P}\bigl(\sum_{i \in I} A_i\bigr)\). For each index \(j\) the summand \(A_j\) injects into the coproduct, so \(|A_j| \leq \bigl|\sum_i A_i\bigr|\); and the coproduct injects into its power set strictly, so \(\bigl|\sum_i A_i\bigr| < |A|\) by the theorem above. Chaining the two, \(|A_j| < |A|\) for every \(j\), and in particular \(A_j \not\cong A\). Thus \(A\) is isomorphic to none of the family. The crux is the single strict step contributed by the power set; everything else is the bookkeeping of injections.

The consequence is the slogan to remember: any individual set is small, but the class of all sets is large. If the sets formed a set, that set would index a family containing every set, and the proposition just proved would manufacture a set outside it — a contradiction. The collection of all sets, even the collection of isomorphism classes of sets, is genuinely too big to be a set.

Locally small categories and hom-sets

With small and large in hand we can grade categories by size. A category \(\mathscr{A}\) is small if its class of all morphisms is small — a set — and large otherwise. (If the morphisms form a set then so do the objects, since each object is named by its identity morphism.) But the most useful size condition is weaker and almost always met: it asks only that the morphisms between any fixed pair of objects form a set.

Definition: Small and Large Categories

A category \(\mathscr{A}\) is small if the class of all its morphisms is a set, and large otherwise. A category \(\mathscr{A}\) is locally small if for every pair of objects \(A, B\) the class \(\mathscr{A}(A, B)\) of morphisms from \(A\) to \(B\) is a set. When \(\mathscr{A}\) is locally small, the set \(\mathscr{A}(A, B)\) is called the hom-set from \(A\) to \(B\).

Small implies locally small, but not conversely: \(\mathbf{Set}\) is locally small, because for any two sets \(A\) and \(B\) the functions from \(A\) to \(B\) form a set — the function set \(A^B\) constructed earlier — yet \(\mathbf{Set}\) is large, since its objects already form a large class. The same pattern holds for \(\mathbf{Grp}\), \(\mathbf{Vect}_k\), \(\mathbf{Ring}\), \(\mathbf{Top}\): a morphism between two structured sets is in particular a function between the underlying sets, so the morphisms between any fixed pair form a set, and each of these categories is locally small while being large overall. Local smallness is the hypothesis under which the hom-set notation \(\mathscr{A}(A, B)\) — used freely up to now — is literally a set, and it is exactly what is needed to regard \(\mathscr{A}(A, -)\) as a functor into \(\mathbf{Set}\), the construction on which the representable functors to come will rest.

Essentially small categories

There is a coarser notion that ignores the difference between a category and any equivalent one. A category is essentially small if it is equivalent to some small category. The category of finite sets is the standard example: it is large as it stands, since there are proper-class-many finite sets, but it is equivalent to the small category with one object per natural number, and so is essentially small. The question for \(\mathbf{Set}\) is whether the same rescue is possible — and it is not.

Definition: Essentially Small Category

A category is essentially small if it is equivalent to a small category. Equivalently, it is locally small and its class of isomorphism classes of objects is small.

Were \(\mathbf{Set}\) essentially small, its isomorphism classes of objects would form a set. But two sets are isomorphic in \(\mathbf{Set}\) exactly when they are in bijection, so the isomorphism classes are the cardinalities, and the proposition above shows these form a large class. Hence \(\mathbf{Set}\) is not essentially small: no small category is equivalent to it. The obstruction is precisely the strict growth of the power set — the same fact that, transported along an encoding of strings, separates the countably many Turing machines from the uncountably many languages. One diagonal argument, two faces.

The category of small categories

Smallness is what makes it safe to gather categories into a category. The collection of all categories and functors is itself a sprawling thing; restricting to the small ones yields a well-behaved object.

Definition: The Category Cat

\(\mathbf{Cat}\) is the category whose objects are the small categories and whose morphisms are the functors between them, with composition and identities inherited from the composition and identity functors.

The restriction to small categories is what keeps \(\mathbf{Cat}\) honest: a functor \(\mathscr{A} \to \mathscr{B}\) between small categories is itself a piece of set-sized data, so the morphisms of \(\mathbf{Cat}\) between any two objects form a set, and \(\mathbf{Cat}\) is locally small. This is the home in which the monoids of the adjunction page acquire their final description. A monoid is by definition a set equipped with an associative multiplication and a unit, and the one-object category it determines — one object, the monoid elements as endomorphisms, multiplication as composition — has a set of morphisms and is therefore small. The one-object categories form a full subcategory of \(\mathbf{Cat}\), and assigning to each monoid its one-object category is an equivalence between the category of monoids and that subcategory. A monoid and its one-object category are, categorically, the same thing seen twice; this is the doorway through which group actions and representations will later be recognized as functors out of a one-object category.

Foundations as an Interface

Step back from the individual constructions and look at what was actually used. Nowhere in this page was a set defined. We never said what a set is, nor what a function is. Instead we listed things one can do with them: form products and coproducts, take function sets and power sets, cut out equalizers, collapse quotients, recognize an empty set and a one-element set by their mapping properties. Every construction was specified by how it behaves under functions — by a universal property — and never by an internal account of what its elements really are.

This is a deliberate stance, and it is worth making explicit because it inverts the usual order of explanation. One might expect to define a set, then derive its properties. Definitions in mathematics normally depend on earlier definitions: a vector space is an abelian group with scalar multiplication, an abelian group is a group with a commutativity law, a group is a set with operations — and a set is a... The chain has to stop somewhere, for otherwise the words would rest on nothing. Set and function are where it stops. They are taken as primitive, not defined in terms of anything prior.

But if they are undefined, how can we reason about them rigorously? The resolution is to specify them not by what they are but by what they do. We agree on a list of properties — there is an initial object, there is a terminal object, products exist, function sets exist, power sets are strictly larger, and so on — and we treat those properties as the rules of the game. Disputes about the nature of sets are then settled not by appeal to intuition, which differs from person to person, but by consulting the list. Whether \(\varnothing\) is initial is not a question about the secret inner constitution of the empty set; it is a question about whether the agreed rules make it so.

The category as a specification

A working programmer will recognize the move. To use a data structure one does not need its source code; one needs its interface — the operations it supports and the laws those operations obey. A stack is anything with push, pop, and the law that popping returns the last pushed value; two implementations that honor the interface are interchangeable, however different their internals. The categorical treatment of \(\mathbf{Set}\) is an interface in exactly this sense. The objects and morphisms are opaque; what is public is the catalog of constructions and their universal properties. Any two structures satisfying the catalog are equivalent for all mathematical purposes, just as any two correct stack implementations are. To found mathematics on \(\mathbf{Set}\) is to program against an interface rather than against a representation.

The analogy is not loose. The discipline of specifying a structure by its mapping properties is the same discipline that lets a type system reason about programs it cannot see inside. A function type \(A^B\) is the type of functions from \(B\) to \(A\); the currying isomorphism \((A^B)^C \cong A^{B \times C}\) is, read on the other side of the correspondence between proofs and programs, the rule that a function of two arguments may be supplied one at a time; the classifier \(2\) that distinguishes subsets is the type of booleans. The constructions of this page are, simultaneously, the constructions of a programming language's type former — product types, sum types, function types, the type of propositions — and the coincidence is not an accident but a theorem of the deepest kind, the structural identity between a cartesian closed category and a typed calculus.

The size theory plays its part in this picture too. Local smallness is the guarantee that the morphisms between two objects can themselves be packaged as an object of \(\mathbf{Set}\) — that the interface can talk about its own collections of maps without overflowing. It is the condition that will let each object \(A\) of a locally small category be probed by the functor \(\mathscr{A}(A, -)\) that records how everything maps out of it, turning objects into the data of their relationships. That \(\mathbf{Set}\) is itself large, never to be tamed into a small category, is not a defect but the source of its expressive power: an interface rich enough to classify all sets cannot be one of the things it classifies. The constructions assembled here are the vocabulary; the probing functor is the grammar; and the reading of an object through the maps into and out of it is the sentence that the next stage of the theory will learn to speak.