Trees & Enumeration

Rooted Trees & Binary Trees BSTs & the Comparison Model Counting Trees Labeled Trees: Cayley's Formula

Rooted Trees and Binary Trees

In Introduction to Graph Theory, we defined a tree as a connected acyclic graph and established that a tree on \(n\) vertices has exactly \(n - 1\) edges, with a unique path between any pair of vertices. We now impose additional structure by designating a special vertex as the root, transforming the tree from an undirected combinatorial object into a hierarchical, recursive data structure — one of the most pervasive abstractions in computer science.

Rooted Trees

Definition: Rooted Tree

A rooted tree is a pair \((T, r)\), where \(T = (V, E)\) is a tree and \(r \in V\) is a distinguished vertex called the root. For any vertex \(v \in V\), the unique path from \(r\) to \(v\) in \(T\) induces a natural hierarchy:

  • The depth of \(v\) is the length (number of edges) of the unique \(r\)-\(v\) path.
  • If \(v \neq r\) and \(u\) is the vertex immediately preceding \(v\) on the \(r\)-\(v\) path, then \(u\) is the parent of \(v\) and \(v\) is a child of \(u\).
  • A vertex with no children is a leaf; a non-leaf vertex is an internal vertex.

For any vertex \(v \in V\), the subtree rooted at \(v\), denoted \(T_v\), is the rooted tree consisting of \(v\) together with all its descendants — the vertices \(w\) whose unique \(r\)-\(w\) path passes through \(v\) — with \(v\) as the new root. In particular, the subtrees rooted at the children of any vertex \(v\) partition \(V(T_v) \setminus \{v\}\), making rooted trees naturally recursive objects.

Every non-root vertex has exactly one parent, and the root has no parent. The height of a rooted tree \((T, r)\) is the maximum depth over all vertices: \[ h(T) = \max_{v \in V} \operatorname{depth}(v). \] A rooted tree of height \(h\) has at least \(h + 1\) vertices (a path from root to deepest leaf) and the root sits at depth \(0\).

Binary Trees

Definition: Binary Tree

A binary tree is a rooted tree in which every vertex has at most two children, with each child distinguished as either the left child or the right child (so that a vertex with exactly one child still records whether that child sits in the left slot or the right slot). A binary tree is full (or proper) if every internal vertex has exactly two children.

The distinction between left and right matters: a single left child produces a different binary tree from a single right child, even though the underlying unrooted graph is the same. This ordered structure is what makes binary trees the natural setting for divide-and-conquer algorithms, where a problem splits into two subproblems at every step.

Proposition: Leaves of a Full Binary Tree

A full binary tree with \(k\) internal vertices has exactly \(k + 1\) leaves.

Proof:

Let \(T\) be a full binary tree with \(k\) internal vertices and \(\ell\) leaves. The total number of vertices is \(n = k + \ell\). Since \(T\) is a tree, it has \(n - 1\) edges. Every edge connects a vertex to its parent, and every non-root vertex has exactly one parent, so the number of edges equals \(n - 1\) (one edge per non-root vertex). Alternatively, orient each edge from parent to child (well-defined by the rooted structure) and count edges by their sources: each internal vertex is the source of exactly 2 outgoing edges (to its two children), and each leaf is the source of 0. Thus the total number of edges is \(2k\). Equating: \[ 2k = n - 1 = (k + \ell) - 1, \] which gives \(\ell = k + 1\). The boundary case \(k = 0\) (a single-vertex tree) is consistent: the condition "every internal vertex has exactly two children" is vacuously satisfied, and \(\ell = 1 = k + 1\).

A binary tree is called height-balanced (or AVL-balanced) if for every vertex, the heights of its left and right subtrees differ by at most one. A height-balanced binary tree on \(n\) vertices has height \(\Theta(\log n)\): the lower bound \(h \geq \lceil \log_2(n+1) \rceil - 1\) follows because at depth \(d\) there are at most \(2^d\) vertices (giving \(n \leq 2^{h+1} - 1\)), and the upper bound \(h = O(\log n)\) follows from the balance condition, which forces the number of vertices to grow at least exponentially with height. More precisely, the minimum number of vertices in a height-balanced tree of height \(h\) satisfies the Fibonacci-like recurrence \(N(h) = N(h-1) + N(h-2) + 1\) with \(N(0) = 1\) and \(N(1) = 2\), yielding \(N(h) \geq \varphi^h - 1\) where \(\varphi = (1 + \sqrt{5})/2\); hence \(h \leq \log_\varphi(n + 1) = O(\log n)\).

Spanning Trees

So far we have studied trees as hierarchical structures — rooted, with a clear parent-child directionality. But trees serve a second, equally fundamental role: as the minimal connected skeleton of an arbitrary graph. A communication network, for instance, may contain many redundant links; stripping away edges until no cycles remain — while preserving connectivity — yields a tree that spans the entire network. This shift in perspective, from rooted hierarchy to unrooted backbone, leads to the concept of a spanning tree and ultimately to the enumeration questions in the final section of this page.

Recall from Introduction to Graph Theory that a spanning subgraph of \(G = (V, E)\) is a subgraph that contains all vertices of \(G\).

Definition: Spanning Tree

A spanning tree of a connected graph \(G = (V, E)\) is a spanning subgraph \(T = (V, E')\) that is a tree.

Proposition: Existence of Spanning Trees

Every connected graph has a spanning tree.

Proof:

Let \(G = (V, E)\) be connected. If \(G\) is acyclic, it is already a tree (and a spanning tree of itself). Otherwise, \(G\) contains a cycle \(C\). Remove any edge \(e \in E(C)\); the resulting graph \(G - e\) remains connected: for any pair \(u, v\) of vertices, a \(u\)-\(v\) path in \(G\) that traverses \(e\) can be rerouted through the remaining edges of the cycle \(C\), yielding a \(u\)-\(v\) walk and hence a \(u\)-\(v\) path in \(G - e\). Repeat until no cycles remain. The process terminates because each step removes one edge, and the graph stays connected at every step. The result is a connected acyclic spanning subgraph — a spanning tree.

The number of spanning trees of a graph is a fundamental invariant. We will return to this question at the end of this page with Cayley's formula for complete graphs, and later with Kirchhoff's matrix tree theorem in the study of graph Laplacians.

Recursive Structure: From Parse Trees to Merkle Trees

The recursive definition of binary trees — a tree is either empty or a root with a left subtree and a right subtree — is the prototype for divide-and-conquer algorithms: a problem at a node is solved by combining solutions to subproblems at its children. This pattern manifests across computer science. Parse trees (abstract syntax trees) decompose source code into hierarchical grammatical structures, enabling compilers to analyze and transform programs. Expression trees represent arithmetic expressions with operators at internal nodes and operands at leaves, so that tree traversals (inorder, postorder) directly yield infix and postfix notations. In cryptography and distributed systems, Merkle trees — binary trees where each internal node stores the hash of its children — enable efficient and tamper-proof verification of large datasets: modifying a single leaf forces recomputation along the entire root-to-leaf path, but verification requires only \(O(\log n)\) hash comparisons.

Binary Search Trees and the Comparison Model

We now turn from the structure of binary trees to their use as a data structure for maintaining ordered data. The binary search tree imposes an ordering invariant that enables efficient search, insertion, and deletion — and the analysis of its worst case leads us naturally to the comparison model and a fundamental information-theoretic limit on sorting.

Binary Search Trees

Definition: Binary Search Tree (BST)

A binary search tree is a binary tree in which each vertex \(v\) stores a key \(k(v)\) drawn from a totally ordered set, with all stored keys distinct, satisfying the BST property: for every vertex \(v\),

  • every key in the left subtree of \(v\) is strictly less than \(k(v)\), and
  • every key in the right subtree of \(v\) is strictly greater than \(k(v)\).

The BST property is an invariant maintained across all operations. An inorder traversal of a BST visits the keys in sorted order — the tree is, in effect, a spatial encoding of a sorted sequence. Search proceeds by comparing the target key with the current node and recursing into the appropriate subtree.

Algorithm: BST-Search Input: BST root \(v\), target key \(x\); Output: node with key \(x\), or \(\texttt{null}\); beginif \(v = \texttt{null}\):   return \(\texttt{null}\);  if \(k(v) = x\):   return \(v\);  if \(x < k(v)\):   return BST-Search\((v.L,\, x)\);  else:   return BST-Search\((v.R,\, x)\); end
Proposition: BST Operations

Search, insertion, and deletion in a binary search tree of height \(h\) each run in \(O(h)\) time.

Proof:

Search. BST-Search descends one edge per step from the root, comparing the target with \(k(v)\) at each visited vertex. The path from root to any vertex has length at most \(h\), so search visits at most \(h + 1\) vertices and runs in \(O(h)\).

Insertion. Run BST-Search for the key being inserted (\(O(h)\)). If the tree is empty, the new vertex becomes the root. Otherwise, since the key is absent, the search terminates at a null position — the empty left or right slot of some vertex \(p\). Attach a new vertex carrying the key in that slot. The BST property is preserved: at each vertex \(u\) visited during the descent, the search took the left branch iff the new key is less than \(k(u)\), placing the new vertex on the correct side of every ancestor.

Deletion. Locate the target vertex \(v\) by search (\(O(h)\)). Three cases:

  • (i) \(v\) is a leaf. Remove \(v\) (if \(v\) is the root, the tree becomes empty). The BST property is preserved trivially: no other vertex's relations are touched.
  • (ii) \(v\) has exactly one child \(c\). If \(v\) has a parent \(p\), splice \(v\) out by attaching \(c\) (and its subtree) to \(p\) in the slot \(v\) occupied; if \(v\) is the root, promote \(c\) to be the new root. Every key formerly in \(c\)'s subtree was already a key in \(v\)'s subtree, hence already on the correct side of \(k(p)\) (when \(p\) exists) and of every ancestor of \(p\); the relations within \(c\)'s subtree are unchanged. So the BST property is preserved.
  • (iii) \(v\) has two children. Let \(w\) be the vertex reached by descending from \(v\)'s right child along left children until none remain. We claim \(w\) holds the minimum key in \(v\)'s right subtree. We prove this by induction on the size of any non-empty BST subtree \(S\): the leftmost descendant of \(S\) holds the minimum key in \(S\). If the root \(r\) of \(S\) has no left child, then \(r\) is itself the leftmost descendant; every other key in \(S\) lies in \(r\)'s right subtree and exceeds \(k(r)\) by the BST property. Otherwise, by induction the leftmost descendant of \(r\)'s left subtree holds the minimum there; that key is less than \(k(r)\), and \(k(r)\) in turn is less than every key in \(r\)'s right subtree, so the leftmost descendant remains minimal in all of \(S\). Applying the claim with \(S = v\)'s right subtree gives \(w\) as desired.

Now replace \(k(v)\) by \(k(w)\); since \(w\) has no left child, deleting \(w\) itself falls into case (i) or (ii). It remains to verify that the modified tree satisfies the BST property:

  • Every key in \(v\)'s old left subtree was less than \(k(v)\), and \(k(v) < k(w)\) (since \(w\) was in \(v\)'s right subtree); by transitivity, every such key is less than \(k(w)\).
  • Every remaining key in \(v\)'s right subtree exceeds \(k(w)\) by the minimality of \(k(w)\) (\(w\) is removed; the rest are strictly greater since keys are distinct).
  • For any proper ancestor \(a\) of \(v\) (if \(v\) is the root, there is none): if \(v\) lay in \(a\)'s left subtree, every key in \(v\)'s subtree (including \(k(w)\)) was already less than \(k(a)\); if \(v\) lay in \(a\)'s right subtree, every such key exceeded \(k(a)\). In either case, replacing \(k(v)\) by \(k(w)\) preserves the relation with \(a\).

The descent to \(w\) is \(O(h)\), and the subsequent splice (case (i) or (ii) on \(w\)) is \(O(h)\). All three deletion cases run in \(O(h)\) total.

The crux is the height problem: the height of a BST on \(n\) keys ranges from \(\Theta(\log n)\) (balanced) to \(\Theta(n)\) (degenerate — a single chain). Inserting keys \(1, 2, 3, \ldots, n\) in order produces the degenerate case where every node has only a right child, and search degrades to linear scan. Self-balancing BSTs — AVL trees (which enforce a height-balance condition at every node) and Red-Black trees (which enforce coloring invariants that guarantee \(h \leq 2 \log_2(n+1)\)) — solve this problem by performing rotations during insertion and deletion to maintain \(O(\log n)\) height.

The Decision Tree Model

We now shift perspective: rather than analyzing a specific data structure, we ask what any algorithm can achieve under a natural computational constraint.

Definition: Comparison-Based Algorithm

A comparison-based algorithm is one in which all decisions about the input are made solely on the basis of pairwise comparisons of the form "\(a_i \leq a_j\)?". The algorithm may not inspect the internal representation of keys (e.g., individual bits or digits).

Any comparison-based sorting algorithm on \(n\) elements can be modelled by a decision tree: a binary tree where each internal node represents a comparison "\(a_i \leq a_j\)?", the left child corresponds to "yes" and the right child to "no", and each leaf represents a specific permutation of the input — the algorithm's output on that execution path. Since the algorithm must produce the correct sorted output for every one of the \(n!\) input permutations, distinct input permutations must reach distinct leaves; hence the decision tree has at least \(n!\) leaves.

Theorem: Comparison-Based Sorting Lower Bound

Any comparison-based sorting algorithm on \(n\) elements requires \(\Omega(n \log n)\) comparisons in the worst case.

Proof:

The decision tree is a binary tree with at least \(n!\) leaves. A binary tree of height \(h\) has at most \(2^h\) leaves: by induction on \(h\), the root contributes a single leaf when \(h = 0\); for \(h \geq 1\), the left and right subtrees each have height at most \(h - 1\), hence each at most \(2^{h-1}\) leaves, giving \(2 \cdot 2^{h-1} = 2^h\) in total. Therefore the worst-case number of comparisons satisfies \[ 2^h \geq n! \quad \implies \quad h \geq \log_2(n!). \] Since \(\log_2 x\) is monotone increasing, for each \(k \geq 2\) we have \(\int_{k-1}^{k} \log_2 x \, dx \leq \log_2 k\). Summing over \(k = 2, \ldots, n\): \[ \log_2(n!) = \sum_{k=2}^{n} \log_2 k \;\geq\; \int_{1}^{n} \log_2 x \, dx \;=\; \frac{n \ln n - n + 1}{\ln 2} \;=\; \Omega(n \log n). \] Therefore any comparison-based sorting algorithm requires \(\Omega(n \log n)\) comparisons in the worst case.

This lower bound is tight: merge sort achieves \(O(n \log n)\) comparisons in every case, and is therefore optimal in the comparison model. Quicksort achieves \(O(n \log n)\) in expectation (over random pivot choices) but \(O(n^2)\) in the worst case. Heapsort also achieves \(O(n \log n)\) worst-case while sorting in place, by exploiting a tree-based data structure (the binary heap, defined next) in which each parent stores a key no greater than those of its children.

Definition: Binary Heap

A (binary min-) heap is a complete binary tree — a binary tree in which every level is filled, except possibly the last, which is filled from left to right — in which every vertex stores a key, and each vertex's key is less than or equal to those of its children. This heap property ensures that the minimum key resides at the root.

The completeness condition forces the height of an \(n\)-vertex heap to be \(\lfloor \log_2 n \rfloor\), so insertion and extract-min — implemented by sift-up and sift-down along a single root-to-leaf path — both run in \(O(\log n)\) time.

The heap is the standard implementation of the priority queue abstract data type — the same structure that powers Dijkstra's algorithm in Network Flow & Walks on Graphs, where it enables efficient extraction of the vertex with minimum tentative distance.

Beyond Comparisons

The \(\Omega(n \log n)\) barrier is a statement about the comparison model, not about sorting in general. Algorithms that exploit the internal structure of keys can escape this bound entirely. Counting sort sorts \(n\) integers in the range \(\{0, 1, \ldots, k\}\) in \(O(n + k)\) time by using direct addressing: it allocates an array of size \(k + 1\), counts occurrences, and reads off the sorted output. Radix sort extends this to \(d\)-digit keys by applying counting sort to each digit position, achieving \(O(d(n + k))\). These algorithms do not compare pairs of elements — they bypass the decision tree model entirely.

The Decision Tree as an Information-Theoretic Argument

The \(\Omega(n \log n)\) lower bound is fundamentally an information-theoretic argument. If the \(n!\) input permutations are equally likely, the initial uncertainty is the Shannon entropy \(\mathbb{H} = \log_2(n!)\) bits. Each binary comparison yields at most one bit of information — it partitions the remaining possibilities into two groups — so at least \(\log_2(n!) = \Omega(n \log n)\) comparisons are needed to identify the permutation. This style of reasoning — bounding the depth of any decision procedure by the entropy of the input — recurs throughout complexity theory. The same idea shows that any comparison-based algorithm for finding an element in a sorted array requires \(\Omega(\log n)\) comparisons, which binary search matches exactly.

Counting Trees: Catalan and Schröder Numbers

We have seen that binary trees, BSTs, and decision trees are ubiquitous computational structures. A natural question arises: how many distinct binary trees are there on \(n\) nodes? The answer leads us to one of the most celebrated sequences in combinatorics — the Catalan numbers — and reveals deep connections between tree enumeration, lattice paths, and generating functions. The tools here build directly on the counting principles and binomial coefficients developed in Introduction to Combinatorics.

The Catalan Numbers

Definition: Catalan Numbers

The \(n\)-th Catalan number \(C_n\) is the number of structurally distinct binary trees on \(n\) nodes (with \(C_0 = 1\) for the empty tree). The first several values are \[ C_0 = 1,\; C_1 = 1,\; C_2 = 2,\; C_3 = 5,\; C_4 = 14,\; C_5 = 42. \]

The recursive structure of binary trees yields a recurrence that determines the entire sequence from the base case \(C_0 = 1\).

Theorem: Catalan Recurrence

The Catalan numbers satisfy \[ C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}, \qquad C_0 = 1. \]

Proof (via root-splitting):

Consider a binary tree on \(n + 1\) nodes. Removing the root leaves a left subtree with \(i\) nodes (for some \(0 \leq i \leq n\)) and a right subtree with \(n - i\) nodes. By the Rule of Product, the number of trees with \(i\) nodes in the left subtree and \(n - i\) in the right is \(C_i \cdot C_{n-i}\). Summing over all possible values of \(i\): \[ C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}. \] The base case \(C_0 = 1\) accounts for the unique empty tree.

The Generating Function

Define the ordinary generating function \[ C(u) = \sum_{n=0}^{\infty} C_n \, u^n. \] The Catalan recurrence \(C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}\) is a convolution: by the definition of formal power series multiplication, the right-hand side is the coefficient of \(u^n\) in the product \(C(u) \cdot C(u) = C(u)^2\). Hence \(\sum_{n=0}^{\infty} C_{n+1} u^n = C(u)^2\). The left-hand side equals \(\bigl(C(u) - C_0\bigr)/u = \bigl(C(u) - 1\bigr)/u\). Therefore: \[ \frac{C(u) - 1}{u} = C(u)^2 \quad \implies \quad u \, C(u)^2 - C(u) + 1 = 0. \] This is a quadratic equation in \(C(u)\). By the quadratic formula: \[ C(u) = \frac{1 \pm \sqrt{1 - 4u}}{2u}. \] Since \(C(0) = C_0 = 1\), we need the root that remains finite as \(u \to 0\). The "\(+\)" root gives \((1 + \sqrt{1 - 4u})/(2u) \to \infty\) as \(u \to 0\), so we take the "\(-\)" root:

Theorem: Catalan Generating Function

The ordinary generating function of the Catalan numbers is \[ C(u) = \sum_{n=0}^{\infty} C_n \, u^n = \frac{1 - \sqrt{1 - 4u}}{2u}. \]

The first few coefficients of the right-hand side, expanded as a power series in \(u\), are \[ \frac{1 - \sqrt{1 - 4u}}{2u} = 1 + u + 2u^2 + 5u^3 + 14u^4 + 42u^5 + \cdots, \] recovering the Catalan sequence.

The Closed Form via the Reflection Principle

We now derive the closed-form expression \(C_n = \frac{1}{n+1}\binom{2n}{n}\) using a bijection between binary trees and lattice paths, combined with the reflection principle.

Definition: Dyck Path

A Dyck path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\) consisting of \(n\) up-steps \((1, +1)\) and \(n\) down-steps \((1, -1)\) that never passes below the \(x\)-axis.

Bijection: binary trees \(\leftrightarrow\) Dyck paths. We define a recursive encoding. Given a binary tree \(T\) on \(n\) nodes with root \(r\), left subtree \(L\), and right subtree \(R\), encode \(T\) as \[ \texttt{encode}(T) = \text{"("} \;\cdot\; \texttt{encode}(L) \;\cdot\; \text{")"} \;\cdot\; \texttt{encode}(R), \] with the empty tree mapping to the empty string. Each node contributes exactly one "(" and one ")", so the result is a string of length \(2n\). Interpreting "(" as an up-step and ")" as a down-step yields a Dyck path of semilength \(n\): the path never dips below the \(x\)-axis because at every prefix, the recursive nesting ensures that the number of opening parentheses is at least the number of closing parentheses. The encoding is invertible: scanning the string from left to right, the first "(" corresponds to the root, the balanced substring immediately following it encodes \(L\), the matching ")" closes the root, and the remaining substring encodes \(R\).

Equivalently, the encoding above produces a sequence of \(n\) pairs of balanced parentheses — a string in which every opening parenthesis has a matching closing one and, at every prefix, the number of opening parentheses is at least the number of closing ones. The Dyck path / balanced parenthesization correspondence (up = "(", down = ")") thus identifies binary trees, Dyck paths, and balanced parenthesizations as three views of the same object.

Theorem: Closed Form of Catalan Numbers

\[ C_n = \frac{1}{n+1}\binom{2n}{n}. \]

Proof (Reflection Principle):

By the bijection above, \(C_n\) equals the number of Dyck paths of length \(2n\). We count these by subtracting bad paths from the unrestricted total. The total number of lattice paths from \((0,0)\) to \((2n,0)\) with \(n\) up-steps and \(n\) down-steps (without any restriction) is \(\binom{2n}{n}\) — we simply choose which \(n\) of the \(2n\) steps are up-steps.

We must subtract the bad paths: those that touch or cross below the \(x\)-axis, i.e., paths that at some point reach height \(-1\). Consider such a bad path. Let \(P\) be the first point where the path reaches height \(-1\). Reflect the portion of the path from \(P\) onward in the line \(y = -1\): every up-step becomes a down-step and vice versa. The reflected path goes from \((0,0)\) to \((2n, -2)\), which consists of \((n-1)\) up-steps and \((n+1)\) down-steps.

This reflection is a bijection between bad paths from \((0,0)\) to \((2n,0)\) and all paths from \((0,0)\) to \((2n, -2)\). Indeed, every path ending at \((2n, -2)\) must touch the line \(y = -1\) (since each step changes \(y\) by exactly \(1\) and the endpoint is at \(y = -2\)); reflecting the portion from its first such point onward inverts the operation and recovers a bad path ending at \((2n, 0)\). The number of paths from \((0,0)\) to \((2n, -2)\) is \(\binom{2n}{n-1}\) (choosing which \(n - 1\) of \(2n\) steps are up-steps). Using the identity \(\binom{2n}{n-1} = \frac{n}{n+1}\binom{2n}{n}\) (immediate from the factorial form), \[ C_n = \binom{2n}{n} - \binom{2n}{n-1} = \binom{2n}{n} - \frac{n}{n+1}\binom{2n}{n} = \frac{1}{n+1}\binom{2n}{n}. \]

The reflection principle exemplifies a powerful pattern in enumerative combinatorics: counting hard-to-describe objects (here, Dyck paths) by establishing a bijection with objects of a different, easily counted type, and applying the rule of sum on the resulting partition.

Schröder Numbers and Lattice Path Generalizations

The Catalan numbers count lattice paths with two types of steps: up \((1, +1)\) and down \((1, -1)\). A natural generalization is to allow a third type of step.

Definition: Schröder Path

A Schröder path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\) using up-steps \((1, +1)\), down-steps \((1, -1)\), and flat steps (or diagonal steps) \((2, 0)\), that never passes below the \(x\)-axis. A Schröder path with no flat steps on the \(x\)-axis is called a small Schröder path.

The number of Schröder paths of semilength \(n\) (i.e., from \((0,0)\) to \((2n,0)\)) is the large Schröder number \(S_n\), and the number of small Schröder paths is the little Schröder number (or super Catalan number) \(s_n\). The first several values are: \[ S_n \colon \; 1,\, 2,\, 6,\, 22,\, 90,\, 394,\, \ldots \qquad s_n \colon \; 1,\, 1,\, 3,\, 11,\, 45,\, 197,\, \ldots \] One observes that \(S_n = 2\, s_n\) for \(n \geq 1\). The distinction is that large Schröder paths permit flat steps on the \(x\)-axis while small Schröder paths forbid them; the precise factor of 2 can be established by a decomposition argument at the first return to the \(x\)-axis.

In terms of tree enumeration, the little Schröder numbers \(s_n\) count plane trees (ordered rooted trees where each internal vertex has \(\geq 2\) children) on \(n + 1\) leaves. For instance, \(s_2 = 3\) counts the three plane trees with 3 leaves: the root with 3 children (all leaves), and the two trees where the root has 2 children with one being an internal node bearing 2 leaves. Where the Catalan numbers count binary trees on \(n\) nodes (each internal node having exactly 2 child slots, possibly empty), the little Schröder numbers count the more general case where branching can vary — each internal node may have 2, 3, or more children.

The large Schröder numbers satisfy the recurrence \[ (n+2)\, S_{n+1} = 3(2n+1)\, S_n - (n-1)\, S_{n-1}, \qquad S_0 = 1,\; S_1 = 2, \] as well as a convolution-type recurrence that mirrors the Catalan case: \[ S_{n+1} = S_{n} + \sum_{k=0}^{n} S_k \, S_{n-k}, \qquad S_0 = 1. \] The additional \(S_n\) term (compared to the pure convolution \(C_{n+1} = \sum C_k C_{n-k}\)) reflects the extra flat step available in Schröder paths. Both recurrences can be derived from the generating function \(S(x) = (1 - x - \sqrt{1 - 6x + x^2})/(2x)\); we state them without proof here, as the Schröder numbers play a supporting role in this page rather than a central one.

The Motzkin Numbers

Between Dyck paths and Schröder paths lies a third family. A Motzkin path of length \(n\) is a lattice path from \((0,0)\) to \((n, 0)\) using up-steps \((1,+1)\), down-steps \((1,-1)\), and horizontal steps \((1, 0)\), that never passes below the \(x\)-axis. The number of Motzkin paths of length \(n\) is the Motzkin number \(M_n\): \[ M_n \colon \; 1,\, 1,\, 2,\, 4,\, 9,\, 21,\, 51,\, \ldots \] In terms of trees, \(M_n\) counts the number of ordered trees on \(n+1\) vertices where each internal vertex has either one or two children (unary-binary trees). This positions the three families in a natural hierarchy:

Catalan Numbers: A Universal Sequence in Discrete Mathematics

The Catalan numbers are one of the most frequently occurring sequences in all of combinatorics, appearing in at least 200 known combinatorial interpretations. Beyond binary trees, Dyck paths, and balanced parenthesizations, they count: triangulations of a convex \((n+2)\)-gon, non-crossing partitions of \(\{1, 2, \ldots, n\}\), and monotonic lattice paths that stay below the diagonal — the classical ballot problem. This universality is reflected algebraically in the generating function \(C(u) = (1 - \sqrt{1 - 4u})/(2u)\), which arises naturally whenever a combinatorial structure decomposes into two independent substructures of the same kind — the hallmark of the root-splitting recursion in binary trees.

Labeled Trees: Cayley's Formula

The Catalan and Schröder numbers count unlabeled trees — structures considered up to isomorphism. We now ask a different question: given a fixed vertex set \(\{1, 2, \ldots, n\}\), how many distinct labeled trees can be formed? Equivalently, how many spanning trees does the complete graph \(K_n\) have? The answer, due to Cayley, is one of the most elegant results in combinatorics.

Theorem: Cayley's Formula

The number of labeled trees on the vertex set \([n] = \{1, 2, \ldots, n\}\) is \(n^{n-2}\) for \(n \geq 2\).

For small values: with \(n = 2\) there is \(2^0 = 1\) tree (the single edge \(12\)); with \(n = 3\) there are \(3^1 = 3\) trees (star graphs centered at each vertex); with \(n = 4\) there are \(4^2 = 16\) labeled trees.

We prove Cayley's formula by exhibiting a bijection between labeled trees on \([n]\) and sequences in \([n]^{n-2}\), known as Prüfer sequences.

Prüfer Sequences

Definition: Prüfer Sequence

A Prüfer sequence of length \(n - 2\) is a sequence \((a_1, a_2, \ldots, a_{n-2})\) where each \(a_i \in [n] = \{1, 2, \ldots, n\}\).

Encoding: labeled tree \(\to\) Prüfer sequence. Given a labeled tree \(T\) on \([n]\):

  1. Find the leaf with the smallest label. Call it \(\ell\).
  2. Record the label of the unique neighbor of \(\ell\). This is the first entry \(a_1\) of the Prüfer sequence.
  3. Remove \(\ell\) and its incident edge from \(T\).
  4. Repeat steps 1-3 on the resulting tree with \(n - 1\) vertices, recording \(a_2\), then \(a_3\), and so on.
  5. Stop after \(n - 2\) iterations. At this point, exactly 2 vertices remain (connected by a single edge).
Example:

Consider the labeled tree on \(\{1, 2, 3, 4, 5, 6\}\) with edges \(\{12,\, 13,\, 24,\, 25,\, 36\}\).

Step 1. Leaves are \(\{4, 5, 6\}\). Smallest leaf: \(4\). Neighbor of \(4\) is \(2\). Record \(a_1 = 2\). Remove vertex \(4\).

Step 2. Leaves: \(\{5, 6\}\). Smallest leaf: \(5\). Neighbor of \(5\) is \(2\). Record \(a_2 = 2\). Remove vertex \(5\).

Step 3. Leaves: \(\{2, 6\}\). Smallest leaf: \(2\). Neighbor of \(2\) is \(1\). Record \(a_3 = 1\). Remove vertex \(2\).

Step 4. Leaves: \(\{1, 6\}\). Smallest leaf: \(1\). Neighbor of \(1\) is \(3\). Record \(a_4 = 3\). Remove vertex \(1\).

Remaining vertices: \(\{3, 6\}\). The Prüfer sequence is \((2, 2, 1, 3)\).

Decoding: Prüfer sequence \(\to\) labeled tree. Given a Prüfer sequence \((a_1, a_2, \ldots, a_{n-2})\):

  1. Initialize a degree counter: set \(d(v) = 1 + (\text{number of times } v \text{ appears in the sequence})\) for each \(v \in [n]\).
  2. For \(i = 1, 2, \ldots, n-2\): find the smallest \(v \in [n]\) with \(d(v) = 1\). Add edge \(\{v, a_i\}\) to the tree. Decrement \(d(v)\) and \(d(a_i)\) by 1.
  3. After processing all entries, exactly two vertices have \(d(v) = 1\). Connect them with the final edge.

The encoding and decoding procedures are mutual inverses, establishing a bijection between labeled trees on \([n]\) and sequences in \([n]^{n-2}\). Verifying this rests on a few elementary observations.

A useful consequence: a vertex \(v\) appears in the Prüfer sequence of \(T\) exactly \(\deg_T(v) - 1\) times. Indeed, \(v\) is recorded as the neighbor of a removed leaf precisely when one of its incident edges is deleted; each such deletion reduces \(\deg(v)\) by 1, and the process terminates with \(v\) either still present (when \(\deg_T(v) - 1\) of its edges have been removed) or having become itself a leaf and been removed (recording its own neighbor, not \(v\) itself). In both cases \(v\) is recorded \(\deg_T(v) - 1\) times. In particular, leaves never appear in the sequence, and the two vertices remaining at the end are those whose degree counter has been decremented to 1 after all entries are processed.

The decoding implementation runs in \(O(n \log n)\) using a priority queue (or \(O(n)\) with a more careful implementation that scans the sequence once to compute final degree counts).

The vertex-frequency observation has a powerful consequence: it reduces the counting of trees with a prescribed degree sequence to a problem about arrangements of symbols in a sequence.

Theorem: Degree-Constrained Tree Counting

The number of labeled trees on \([n]\) in which vertex \(v\) has degree \(d_v\) (where \(d_v \geq 1\) and \(\sum_{v=1}^{n} d_v = 2(n-1)\)) is the multinomial coefficient \[ \binom{n-2}{d_1 - 1,\; d_2 - 1,\; \ldots,\; d_n - 1} = \frac{(n-2)!}{(d_1 - 1)!\,(d_2 - 1)! \cdots (d_n - 1)!}. \]

Proof:

Under the Prüfer bijection, vertex \(v\) appears exactly \(d_v - 1\) times in the sequence of length \(n - 2\). The number of sequences with this prescribed multiplicity is precisely the multinomial coefficient above — it counts the ways to arrange \(d_1 - 1\) copies of symbol \(1\), \(d_2 - 1\) copies of symbol \(2\), etc., in a sequence of length \(\sum (d_v - 1) = 2(n-1) - n = n - 2\).

Proof of Cayley's Formula:

The Prüfer bijection establishes \(\{\text{labeled trees on } [n]\} \longleftrightarrow [n]^{n-2}\). The set \([n]^{n-2}\) has \(n^{n-2}\) elements (each of the \(n-2\) positions has \(n\) choices), giving the result immediately.

Cayley's formula counts the spanning trees of the complete graph \(K_n\). Recall from Introduction to Graph Theory that \(K_n\) is the graph on \(n\) vertices where every pair is connected by an edge — it has \(\binom{n}{2}\) edges, and Cayley's formula tells us that \(n^{n-2}\) of its \(\binom{\binom{n}{2}}{n-1}\) possible spanning subgraphs with \(n - 1\) edges are trees.

Algorithm: Prüfer Encoding Input: labeled tree \(T\) on vertex set \([n]\); Output: Prüfer sequence \((a_1, \ldots, a_{n-2})\); beginfor \(i = 1\) to \(n - 2\):   \(\ell \leftarrow\) smallest leaf in \(T\);   \(a_i \leftarrow\) neighbor of \(\ell\) in \(T\);   remove \(\ell\) and its edge from \(T\);  end forreturn \((a_1, a_2, \ldots, a_{n-2})\); end

Kirchhoff's Matrix Tree Theorem and the Graph Laplacian

Cayley's formula counts spanning trees of \(K_n\). For a general graph \(G\), the number of spanning trees is given by Kirchhoff's matrix tree theorem: it equals any cofactor of the graph Laplacian \(L = D - A\), where \(D\) is the degree matrix and \(A\) is the adjacency matrix. Equivalently, if \(\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n\) are the eigenvalues of \(L\) (with \(\lambda_1 = 0\)), then the number of spanning trees is \[ \tau(G) = \frac{1}{n}\prod_{i=2}^{n} \lambda_i. \] For \(K_n\), the Laplacian has eigenvalue \(n\) with multiplicity \(n-1\), giving \(\tau(K_n) = n^{n-1}/n = n^{n-2}\) — an independent algebraic proof of Cayley's formula. The matrix tree theorem connects combinatorial enumeration to spectral theory, a theme developed in the study of graph Laplacians and spectral methods. In probabilistic combinatorics, random spanning trees of a graph are sampled via Wilson's algorithm (which uses random walks) and have deep connections to the determinantal point processes that arise in statistical physics.