Trees & Enumeration

Rooted Trees & Binary Trees BSTs & the Comparison Model Counting Trees Labeled Trees: Cayley's Formula

Rooted Trees and Binary Trees

In Introduction to Graph Theory, we defined a tree as a connected acyclic graph and established that a tree on \(n\) vertices has exactly \(n - 1\) edges, with a unique path between any pair of vertices. We now impose additional structure by designating a special vertex as the root, transforming the tree from an undirected combinatorial object into a hierarchical, recursive data structure — the single most pervasive abstraction in computer science.

Rooted Trees

Definition: Rooted Tree

A rooted tree is a pair \((T, r)\), where \(T = (V, E)\) is a tree and \(r \in V\) is a distinguished vertex called the root. For any vertex \(v \in V\), the unique path from \(r\) to \(v\) in \(T\) induces a natural hierarchy:

  • The depth of \(v\) is the length (number of edges) of the unique \(r\)-\(v\) path.
  • If \(v \neq r\) and \(u\) is the vertex immediately preceding \(v\) on the \(r\)-\(v\) path, then \(u\) is the parent of \(v\) and \(v\) is a child of \(u\).
  • A vertex with no children is a leaf; a non-leaf vertex is an internal vertex.

Every non-root vertex has exactly one parent, and the root has no parent. The height of a rooted tree \((T, r)\) is the maximum depth over all vertices: \[ h(T) = \max_{v \in V} \operatorname{depth}(v). \] A rooted tree of height \(h\) has at least \(h + 1\) vertices (a path from root to deepest leaf) and the root sits at depth \(0\).

Binary Trees

Definition: Binary Tree

A binary tree is a rooted tree in which every vertex has at most two children, designated as the left child and the right child. A binary tree is full (or proper) if every internal vertex has exactly two children.

The distinction between left and right matters: a single left child produces a different binary tree from a single right child, even though the underlying unrooted graph is the same. This ordered structure is what makes binary trees the natural setting for divide-and-conquer algorithms, where a problem splits into two subproblems at every step.

Proposition: Leaves of a Full Binary Tree

A full binary tree with \(k\) internal vertices has exactly \(k + 1\) leaves.

Proof:

Let \(T\) be a full binary tree with \(k\) internal vertices and \(\ell\) leaves. The total number of vertices is \(n = k + \ell\). Since \(T\) is a tree, it has \(n - 1\) edges. Every edge connects a vertex to its parent, and every non-root vertex has exactly one parent, so the number of edges equals \(n - 1\) (one edge per non-root vertex). Alternatively, count edges by their sources: each internal vertex contributes exactly 2 edges (to its two children), and leaves contribute 0. Thus the total number of edges is \(2k\). Equating: \[ 2k = n - 1 = (k + \ell) - 1, \] which gives \(\ell = k + 1\). \(\blacksquare\)

A binary tree is called height-balanced (or AVL-balanced) if for every vertex, the heights of its left and right subtrees differ by at most one. A height-balanced binary tree on \(n\) vertices has height \(\Theta(\log n)\): the lower bound \(h \geq \lceil \log_2(n+1) \rceil - 1\) follows because at depth \(d\) there are at most \(2^d\) vertices (giving \(n \leq 2^{h+1} - 1\)), and the upper bound \(h = O(\log n)\) follows from the balance condition, which forces the number of vertices to grow at least exponentially with height.

Spanning Trees

So far we have studied trees as hierarchical structures — rooted, ordered, with a clear parent-child directionality. But trees serve a second, equally fundamental role: as the minimal connected skeleton of an arbitrary graph. A communication network, for instance, may contain many redundant links; stripping away edges until no cycles remain — while preserving connectivity — yields a tree that spans the entire network. This shift in perspective, from rooted hierarchy to unrooted backbone, leads to the concept of a spanning tree and ultimately to the enumeration questions in the final section of this page.

Recall from Introduction to Graph Theory that a spanning subgraph of \(G = (V, E)\) is a subgraph that contains all vertices of \(G\).

Definition: Spanning Tree

A spanning tree of a connected graph \(G = (V, E)\) is a spanning subgraph \(T = (V, E')\) that is a tree.

Proposition: Existence of Spanning Trees

Every connected graph has a spanning tree.

Proof:

Let \(G = (V, E)\) be connected. If \(G\) is acyclic, it is already a tree. Otherwise, \(G\) contains a cycle \(C\). Remove any edge \(e \in E(C)\); the resulting graph \(G - e\) remains connected (since the two endpoints of \(e\) are still connected via the rest of the cycle). Repeat until no cycles remain. The process terminates because each step removes one edge, and the graph stays connected at every step. The result is a connected acyclic spanning subgraph — a spanning tree. \(\blacksquare\)

The number of spanning trees of a graph is a fundamental invariant. We will return to this question at the end of this page with Cayley's formula for complete graphs, and later with Kirchhoff's matrix tree theorem in the study of graph Laplacians.

Recursive Structure: From Parse Trees to Merkle Trees

The recursive definition of binary trees — a tree is either empty or a root with a left subtree and a right subtree — is the prototype for divide-and-conquer algorithms: a problem at a node is solved by combining solutions to subproblems at its children. This pattern manifests across computer science. Parse trees (abstract syntax trees) decompose source code into hierarchical grammatical structures, enabling compilers to analyze and transform programs. Expression trees represent arithmetic expressions with operators at internal nodes and operands at leaves, so that tree traversals (inorder, postorder) directly yield infix and postfix notations. In cryptography and distributed systems, Merkle trees — binary trees where each internal node stores the hash of its children — enable efficient and tamper-proof verification of large datasets: modifying a single leaf forces recomputation along the entire root-to-leaf path, but verification requires only \(O(\log n)\) hash comparisons.

Binary Search Trees and the Comparison Model

We now turn from the structure of binary trees to their use as a data structure for maintaining ordered data. The binary search tree imposes an ordering invariant that enables efficient search, insertion, and deletion — and the analysis of its worst case leads us naturally to the comparison model and a fundamental information-theoretic limit on sorting.

Binary Search Trees

Definition: Binary Search Tree (BST)

A binary search tree is a binary tree in which each vertex \(v\) stores a key \(k(v)\) from a totally ordered set, satisfying the BST property: for every vertex \(v\),

  • every key in the left subtree of \(v\) is strictly less than \(k(v)\), and
  • every key in the right subtree of \(v\) is strictly greater than \(k(v)\).

The BST property is an invariant maintained across all operations. An inorder traversal of a BST visits the keys in sorted order — the tree is, in effect, a spatial encoding of a sorted sequence. Search proceeds by comparing the target key with the current node and recursing into the appropriate subtree.

Algorithm: BST-Search Input: BST root \(v\), target key \(x\); Output: node with key \(x\), or \(\texttt{null}\); beginif \(v = \texttt{null}\):   return \(\texttt{null}\);  if \(k(v) = x\):   return \(v\);  if \(x < k(v)\):   return BST-Search\((v.L,\, x)\);  else:   return BST-Search\((v.R,\, x)\); end
Proposition: BST Operations

Search, insertion, and deletion in a binary search tree of height \(h\) each run in \(O(h)\) time.

The crux is the height problem: the height of a BST on \(n\) keys ranges from \(\Theta(\log n)\) (balanced) to \(\Theta(n)\) (degenerate — a single chain). Inserting keys \(1, 2, 3, \ldots, n\) in order produces the degenerate case where every node has only a right child, and search degrades to linear scan. Self-balancing BSTs — AVL trees (which enforce a height-balance condition at every node) and Red-Black trees (which enforce coloring invariants that guarantee \(h \leq 2 \log_2(n+1)\)) — solve this problem by performing rotations during insertion and deletion to maintain \(O(\log n)\) height.

The Decision Tree Model

We now shift perspective: rather than analyzing a specific data structure, we ask what any algorithm can achieve under a natural computational constraint.

Definition: Comparison-Based Algorithm

A comparison-based algorithm is one in which all decisions about the input are made solely on the basis of pairwise comparisons of the form "\(a_i \leq a_j\)?". The algorithm may not inspect the internal representation of keys (e.g., individual bits or digits).

Any comparison-based sorting algorithm on \(n\) elements can be modelled by a decision tree: a binary tree where each internal node represents a comparison "\(a_i \leq a_j\)?", the left child corresponds to "yes" and the right child to "no", and each leaf represents a specific permutation of the input — the algorithm's output on that execution path. Since the algorithm must be able to produce every one of the \(n!\) possible permutations as output, the decision tree must have at least \(n!\) leaves.

Theorem: Comparison-Based Sorting Lower Bound

Any comparison-based sorting algorithm on \(n\) elements requires \(\Omega(n \log n)\) comparisons in the worst case.

Proof:

The decision tree is a binary tree with at least \(n!\) leaves. A binary tree of height \(h\) has at most \(2^h\) leaves, so the worst-case number of comparisons satisfies \[ 2^h \geq n! \quad \implies \quad h \geq \log_2(n!). \] By Stirling's approximation, \(n! \geq (n/e)^n\), so \[ \log_2(n!) \geq n \log_2(n/e) = n \log_2 n - n \log_2 e = \Omega(n \log n). \] Therefore any comparison-based sorting algorithm requires \(\Omega(n \log n)\) comparisons in the worst case. \(\blacksquare\)

This lower bound is tight: merge sort achieves \(O(n \log n)\) comparisons in every case, and is therefore optimal in the comparison model. Quicksort achieves \(O(n \log n)\) in expectation (over random pivot choices) but \(O(n^2)\) in the worst case. Heapsort also achieves \(O(n \log n)\) worst-case while sorting in place, by exploiting a tree-based data structure that maintains a partial order on elements.

Definition: Binary Heap

A (binary min-) heap is a complete binary tree in which every vertex stores a key, and each vertex's key is less than or equal to those of its children. This heap property ensures that the minimum key resides at the root. Insertion and extract-min both run in \(O(\log n)\).

The heap is the standard implementation of the priority queue abstract data type — the same structure that powers Dijkstra's algorithm in Network Flow & Walks on Graphs, where it enables efficient extraction of the vertex with minimum tentative distance.

Beyond Comparisons

The \(\Omega(n \log n)\) barrier is a statement about the comparison model, not about sorting in general. Algorithms that exploit the internal structure of keys can escape this bound entirely. Counting sort sorts \(n\) integers in the range \(\{0, 1, \ldots, k\}\) in \(O(n + k)\) time by using direct addressing: it allocates an array of size \(k + 1\), counts occurrences, and reads off the sorted output. Radix sort extends this to \(d\)-digit keys by applying counting sort to each digit position, achieving \(O(d(n + k))\). These algorithms do not compare pairs of elements — they bypass the decision tree model entirely.

The Decision Tree as an Information-Theoretic Argument

The \(\Omega(n \log n)\) lower bound is fundamentally an information-theoretic argument. If the \(n!\) input permutations are equally likely, the initial uncertainty is the Shannon entropy \(\mathbb{H} = \log_2(n!)\) bits. Each binary comparison yields at most one bit of information — it partitions the remaining possibilities into two groups — so at least \(\log_2(n!) = \Omega(n \log n)\) comparisons are needed to identify the permutation. This style of reasoning — bounding the depth of any decision procedure by the entropy of the input — recurs throughout complexity theory. The same idea shows that any comparison-based algorithm for finding an element in a sorted array requires \(\Omega(\log n)\) comparisons, which binary search matches exactly.

Counting Trees: Catalan and Schröder Numbers

We have seen that binary trees, BSTs, and decision trees are ubiquitous computational structures. A natural question arises: how many distinct binary trees are there on \(n\) nodes? The answer leads us to one of the most celebrated sequences in combinatorics — the Catalan numbers — and reveals deep connections between tree enumeration, lattice paths, and generating functions. The tools here build directly on the counting principles and binomial coefficients developed in Introduction to Combinatorics.

The Catalan Numbers

Definition: Catalan Numbers

The \(n\)-th Catalan number is \[ C_n = \frac{1}{n+1}\binom{2n}{n}, \qquad n = 0, 1, 2, \ldots \] The first several values are \(C_0 = 1,\; C_1 = 1,\; C_2 = 2,\; C_3 = 5,\; C_4 = 14,\; C_5 = 42\).

The Catalan numbers satisfy a fundamental recurrence that arises from the recursive structure of binary trees.

Theorem: Catalan Recurrence

The Catalan numbers satisfy \[ C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}, \qquad C_0 = 1. \]

Proof (via root-splitting):

Consider a binary tree on \(n + 1\) nodes. The root has a left subtree with \(i\) nodes (for some \(0 \leq i \leq n\)) and a right subtree with \(n - i\) nodes. If we denote by \(C_n\) the number of structurally distinct binary trees on \(n\) nodes, then the number of trees with \(i\) nodes in the left subtree and \(n - i\) in the right is \(C_i \cdot C_{n-i}\) (by the Rule of Product). Summing over all possible values of \(i\): \[ C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}. \] The base case \(C_0 = 1\) accounts for the unique empty tree. \(\blacksquare\)

Theorem: Binary Tree Enumeration

The number of structurally distinct binary trees on \(n\) nodes is the Catalan number \(C_n\).

The Generating Function

The recurrence \(C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}\) is a convolution: the right-hand side is the coefficient of \(u^n\) in the product of the generating function with itself. Let us exploit this algebraic structure.

Define the ordinary generating function \(C(u) = \sum_{n=0}^{\infty} C_n \, u^n\). The recurrence tells us that \(C_{n+1}\) is the \(n\)-th coefficient of \(C(u)^2\), so \(\sum_{n=0}^{\infty} C_{n+1} u^n = C(u)^2\). The left-hand side equals \(\bigl(C(u) - C_0\bigr)/u = \bigl(C(u) - 1\bigr)/u\). Therefore: \[ \frac{C(u) - 1}{u} = C(u)^2 \quad \implies \quad u \, C(u)^2 - C(u) + 1 = 0. \] This is a quadratic equation in \(C(u)\). By the quadratic formula: \[ C(u) = \frac{1 \pm \sqrt{1 - 4u}}{2u}. \] Since \(C(0) = C_0 = 1\), we need the root that remains finite as \(u \to 0\). The "\(+\)" root gives \((1 + \sqrt{1 - 4u})/(2u) \to \infty\) as \(u \to 0\), so we take the "\(-\)" root:

Theorem: Catalan Generating Function

The ordinary generating function of the Catalan numbers is \[ C(u) = \sum_{n=0}^{\infty} C_n \, u^n = \frac{1 - \sqrt{1 - 4u}}{2u}. \]

This expression can be verified by expanding \(\sqrt{1 - 4u}\) via the generalized binomial series \((1 + z)^{1/2}\) with \(z = -4u\), yielding \[ \sqrt{1 - 4u} = \sum_{k=0}^{\infty} \binom{1/2}{k}(-4u)^k = 1 - 2u - 2u^2 - 4u^3 - 10u^4 - 28u^5 - \cdots \] so that \[ \frac{1 - \sqrt{1 - 4u}}{2u} = 1 + u + 2u^2 + 5u^3 + 14u^4 + 42u^5 + \cdots \] recovering the Catalan sequence.

The Closed Form via the Reflection Principle

We now derive the closed-form expression \(C_n = \frac{1}{n+1}\binom{2n}{n}\) using a bijection between binary trees and lattice paths, combined with the reflection principle.

Definition: Dyck Path

A Dyck path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\) consisting of \(n\) up-steps \((1, +1)\) and \(n\) down-steps \((1, -1)\) that never passes below the \(x\)-axis.

Bijection: binary trees \(\leftrightarrow\) Dyck paths. We define a recursive encoding. Given a binary tree \(T\) on \(n\) nodes with root \(r\), left subtree \(L\), and right subtree \(R\), encode \(T\) as \[ \texttt{encode}(T) = \text{"("} \;\cdot\; \texttt{encode}(L) \;\cdot\; \text{")"} \;\cdot\; \texttt{encode}(R), \] with the empty tree mapping to the empty string. Each node contributes exactly one "(" and one ")", so the result is a string of length \(2n\). Interpreting "(" as an up-step and ")" as a down-step yields a Dyck path of semilength \(n\): the path never dips below the \(x\)-axis because at every prefix, the recursive nesting ensures that the number of opening parentheses is at least the number of closing parentheses. The encoding is invertible: scanning the string from left to right, the first "(" corresponds to the root, the balanced substring immediately following it encodes \(L\), the matching ")" closes the root, and the remaining substring encodes \(R\).

Bijection: Dyck paths \(\leftrightarrow\) balanced parenthesizations. Replace each up-step by "(" and each down-step by ")". A Dyck path of length \(2n\) corresponds to a sequence of \(n\) pairs of balanced parentheses. The condition "never passes below the \(x\)-axis" translates to "at every prefix, the number of opening parentheses is at least the number of closing parentheses" — exactly the definition of a balanced parenthesization.

Theorem: Closed Form of Catalan Numbers

The number of Dyck paths of length \(2n\) is \(C_n = \frac{1}{n+1}\binom{2n}{n}\).

Proof (Reflection Principle):

The total number of lattice paths from \((0,0)\) to \((2n,0)\) with \(n\) up-steps and \(n\) down-steps (without any restriction) is \(\binom{2n}{n}\) — we simply choose which \(n\) of the \(2n\) steps are up-steps.

We must subtract the bad paths: those that touch or cross below the \(x\)-axis, i.e., paths that at some point reach height \(-1\). Consider such a bad path. Let \(P\) be the first point where the path reaches height \(-1\). Reflect the portion of the path after \(P\) in the line \(y = -1\): every up-step becomes a down-step and vice versa. The reflected path goes from \((0,0)\) to \((2n, -2)\), which consists of \((n-1)\) up-steps and \((n+1)\) down-steps.

This reflection is a bijection between bad paths from \((0,0)\) to \((2n,0)\) and all paths from \((0,0)\) to \((2n, -2)\). The number of the latter is \(\binom{2n}{n-1}\) (choosing which \(n-1\) of \(2n\) steps are up-steps). Therefore: \[ C_n = \binom{2n}{n} - \binom{2n}{n-1} = \binom{2n}{n} - \frac{n}{n+1}\binom{2n}{n} = \frac{1}{n+1}\binom{2n}{n}. \quad \blacksquare \]

The reflection principle is a powerful technique: the central idea is to establish a bijection between "bad" objects and objects of a different, easily counted type. We used the same combinatorial thinking — counting via bijections and the subtraction principle — in Introduction to Combinatorics.

Schröder Numbers and Lattice Path Generalizations

The Catalan numbers count lattice paths with two types of steps: up \((1, +1)\) and down \((1, -1)\). A natural generalization is to allow a third type of step.

Definition: Schröder Path

A Schröder path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\) using up-steps \((1, +1)\), down-steps \((1, -1)\), and flat steps (or diagonal steps) \((2, 0)\), that never passes below the \(x\)-axis. A Schröder path with no flat steps on the \(x\)-axis is called a small Schröder path.

The number of Schröder paths of semilength \(n\) (i.e., from \((0,0)\) to \((2n,0)\)) is the large Schröder number \(S_n\), and the number of small Schröder paths is the little Schröder number (or super Catalan number) \(s_n\). The first several values are: \[ S_n \colon \; 1,\, 2,\, 6,\, 22,\, 90,\, 394,\, \ldots \qquad s_n \colon \; 1,\, 1,\, 3,\, 11,\, 45,\, 197,\, \ldots \] One observes that \(S_n = 2\, s_n\) for \(n \geq 1\). The distinction is that large Schröder paths permit flat steps on the \(x\)-axis while small Schröder paths forbid them; the precise factor of 2 can be established by a decomposition argument at the first return to the \(x\)-axis.

In terms of tree enumeration, the little Schröder numbers \(s_n\) count plane trees (ordered rooted trees where each internal vertex has \(\geq 2\) children) on \(n + 1\) leaves. For instance, \(s_2 = 3\) counts the three plane trees with 3 leaves: the root with 3 children (all leaves), and the two trees where the root has 2 children with one being an internal node bearing 2 leaves. Where the Catalan numbers count full binary trees (exactly 2 children per internal node), the little Schröder numbers count the more general case where branching can vary — each internal node may have 2, 3, or more children, but at least 2.

The large Schröder numbers satisfy the recurrence \[ (n+2)\, S_{n+1} = 3(2n+1)\, S_n - (n-1)\, S_{n-1}, \qquad S_0 = 1,\; S_1 = 2. \] There is also a convolution-type recurrence that mirrors the Catalan case: \[ S_{n+1} = S_{n} + \sum_{k=0}^{n} S_k \, S_{n-k}, \qquad S_0 = 1. \] The additional \(S_n\) term (compared to the pure convolution \(C_{n+1} = \sum C_k C_{n-k}\)) reflects the extra flat step available in Schröder paths.

The Motzkin Numbers

Between Dyck paths and Schröder paths lies a third family. A Motzkin path of length \(n\) is a lattice path from \((0,0)\) to \((n, 0)\) using up-steps \((1,+1)\), down-steps \((1,-1)\), and horizontal steps \((1, 0)\), that never passes below the \(x\)-axis. The number of Motzkin paths of length \(n\) is the Motzkin number \(M_n\): \[ M_n \colon \; 1,\, 1,\, 2,\, 4,\, 9,\, 21,\, 51,\, \ldots \] In terms of trees, \(M_n\) counts the number of ordered trees on \(n+1\) vertices where each internal vertex has either one or two children (unary-binary trees). This positions the three families in a natural hierarchy:

Catalan Numbers: A Universal Sequence in Discrete Mathematics

The Catalan numbers are one of the most frequently occurring sequences in all of combinatorics, appearing in at least 200 known combinatorial interpretations. Beyond binary trees, Dyck paths, and balanced parenthesizations, they count: triangulations of a convex \((n+2)\)-gon, non-crossing partitions of \(\{1, 2, \ldots, n\}\), and monotonic lattice paths that stay below the diagonal — the classical ballot problem. This universality is reflected algebraically in the generating function \(C(u) = (1 - \sqrt{1 - 4u})/(2u)\), which arises naturally whenever a combinatorial structure decomposes into two independent substructures of the same kind — the hallmark of the root-splitting recursion in binary trees.

Labeled Trees: Cayley's Formula

The Catalan and Schröder numbers count unlabeled trees — structures considered up to isomorphism. We now ask a different question: given a fixed vertex set \(\{1, 2, \ldots, n\}\), how many distinct labeled trees can be formed? Equivalently, how many spanning trees does the complete graph \(K_n\) have? The answer, due to Cayley, is one of the most elegant results in combinatorics.

Theorem: Cayley's Formula

The number of labeled trees on the vertex set \([n] = \{1, 2, \ldots, n\}\) is \(n^{n-2}\) for \(n \geq 2\).

For small values: with \(n = 2\) there is \(2^0 = 1\) tree (the single edge \(12\)); with \(n = 3\) there are \(3^1 = 3\) trees (star graphs centered at each vertex); with \(n = 4\) there are \(4^2 = 16\) labeled trees.

We prove Cayley's formula by exhibiting a bijection between labeled trees on \([n]\) and sequences in \([n]^{n-2}\), known as Prüfer sequences.

Prüfer Sequences

Definition: Prüfer Sequence

A Prüfer sequence of length \(n - 2\) is a sequence \((a_1, a_2, \ldots, a_{n-2})\) where each \(a_i \in [n] = \{1, 2, \ldots, n\}\).

Encoding: labeled tree \(\to\) Prüfer sequence. Given a labeled tree \(T\) on \([n]\):

  1. Find the leaf with the smallest label. Call it \(\ell\).
  2. Record the label of the unique neighbor of \(\ell\). This is the first entry \(a_1\) of the Prüfer sequence.
  3. Remove \(\ell\) and its incident edge from \(T\).
  4. Repeat steps 1-3 on the resulting tree with \(n - 1\) vertices, recording \(a_2\), then \(a_3\), and so on.
  5. Stop after \(n - 2\) iterations. At this point, exactly 2 vertices remain (connected by a single edge).
Example:

Consider the labeled tree on \(\{1, 2, 3, 4, 5, 6\}\) with edges \(\{12,\, 13,\, 24,\, 25,\, 36\}\).

Step 1. Leaves are \(\{4, 5, 6\}\). Smallest leaf: \(4\). Neighbor of \(4\) is \(2\). Record \(a_1 = 2\). Remove vertex \(4\).

Step 2. Leaves: \(\{5, 6\}\). Smallest leaf: \(5\). Neighbor of \(5\) is \(2\). Record \(a_2 = 2\). Remove vertex \(5\).

Step 3. Leaves: \(\{2, 6\}\). Smallest leaf: \(2\). Neighbor of \(2\) is \(1\). Record \(a_3 = 1\). Remove vertex \(2\).

Step 4. Leaves: \(\{1, 6\}\). Smallest leaf: \(1\). Neighbor of \(1\) is \(3\). Record \(a_4 = 3\). Remove vertex \(1\).

Remaining vertices: \(\{3, 6\}\). The Prüfer sequence is \((2, 2, 1, 3)\).

Decoding: Prüfer sequence \(\to\) labeled tree. Given a Prüfer sequence \((a_1, a_2, \ldots, a_{n-2})\):

  1. Initialize a degree counter: set \(d(v) = 1 + (\text{number of times } v \text{ appears in the sequence})\) for each \(v \in [n]\).
  2. For \(i = 1, 2, \ldots, n-2\): find the smallest \(v \in [n]\) with \(d(v) = 1\). Add edge \(\{v, a_i\}\) to the tree. Decrement \(d(v)\) and \(d(a_i)\) by 1.
  3. After processing all entries, exactly two vertices have \(d(v) = 1\). Connect them with the final edge.

Note that Step 2 requires no inspection of the remaining sequence entries. The degree counter maintains the invariant that at step \(i\), \(d(v) = 1 + (\text{number of occurrences of } v \text{ among } a_i, a_{i+1}, \ldots, a_{n-2})\). Thus \(d(v) = 1\) guarantees that \(v\) does not appear in the unprocessed portion of the sequence, ensuring it is safe to remove \(v\) as a leaf. This invariant also shows that decoding runs in \(O(n \log n)\) time using a priority queue (or \(O(n)\) with a more careful implementation).

Key observation. A vertex \(v\) appears in the Prüfer sequence exactly \(\deg_T(v) - 1\) times, where \(\deg_T(v)\) is the degree of \(v\) in the tree. In particular, leaves (degree 1) never appear in the sequence, and the two vertices remaining at the end are those whose degree has been decremented to 1 after all entries are processed.

This observation has a powerful consequence: it reduces the counting of trees with a prescribed degree sequence to a problem about arrangements of symbols in a sequence.

Theorem: Degree-Constrained Tree Counting

The number of labeled trees on \([n]\) in which vertex \(v\) has degree \(d_v\) (where \(d_v \geq 1\) and \(\sum_{v=1}^{n} d_v = 2(n-1)\)) is the multinomial coefficient \[ \binom{n-2}{d_1 - 1,\; d_2 - 1,\; \ldots,\; d_n - 1} = \frac{(n-2)!}{(d_1 - 1)!\,(d_2 - 1)! \cdots (d_n - 1)!}. \]

Proof:

Under the Prüfer bijection, vertex \(v\) appears exactly \(d_v - 1\) times in the sequence of length \(n - 2\). The number of sequences with this prescribed multiplicity is precisely the multinomial coefficient above — it counts the ways to arrange \(d_1 - 1\) copies of symbol \(1\), \(d_2 - 1\) copies of symbol \(2\), etc., in a sequence of length \(\sum (d_v - 1) = 2(n-1) - n = n - 2\). \(\blacksquare\)

Proof of Cayley's Formula (two approaches):

Direct counting. The Prüfer bijection establishes \(\{\text{labeled trees on } [n]\} \longleftrightarrow [n]^{n-2}\). The set \([n]^{n-2}\) has \(n^{n-2}\) elements (each of the \(n-2\) positions has \(n\) choices), giving the result immediately.

Via the multinomial theorem. Alternatively, we sum the degree-constrained formula over all valid degree sequences. Setting \(e_v = d_v - 1\), the sum becomes \[ \sum_{\substack{e_1, \ldots, e_n \geq 0 \\ e_1 + \cdots + e_n = n-2}} \frac{(n-2)!}{e_1! \cdots e_n!} = (\underbrace{1 + 1 + \cdots + 1}_{n})^{n-2} = n^{n-2} \] by the multinomial theorem. This second approach is more powerful: it not only proves Cayley's formula but reveals how the \(n^{n-2}\) trees are distributed across degree sequences. \(\blacksquare\)

Cayley's formula counts the spanning trees of the complete graph \(K_n\). Recall from Introduction to Graph Theory that \(K_n\) is the graph on \(n\) vertices where every pair is connected by an edge — it has \(\binom{n}{2}\) edges, and Cayley's formula tells us that \(n^{n-2}\) of its \(\binom{\binom{n}{2}}{n-1}\) possible spanning subgraphs with \(n - 1\) edges are trees.

Algorithm: Prüfer Encoding Input: labeled tree \(T\) on vertex set \([n]\); Output: Prüfer sequence \((a_1, \ldots, a_{n-2})\); beginfor \(i = 1\) to \(n - 2\):   \(\ell \leftarrow\) smallest leaf in \(T\);   \(a_i \leftarrow\) neighbor of \(\ell\) in \(T\);   remove \(\ell\) and its edge from \(T\);  end forreturn \((a_1, a_2, \ldots, a_{n-2})\); end

Kirchhoff's Matrix Tree Theorem and the Graph Laplacian

Cayley's formula counts spanning trees of \(K_n\). For a general graph \(G\), the number of spanning trees is given by Kirchhoff's matrix tree theorem: it equals any cofactor of the graph Laplacian \(L = D - A\), where \(D\) is the degree matrix and \(A\) is the adjacency matrix. Equivalently, if \(\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n\) are the eigenvalues of \(L\) (with \(\lambda_1 = 0\)), then the number of spanning trees is \[ \tau(G) = \frac{1}{n}\prod_{i=2}^{n} \lambda_i. \] For \(K_n\), the Laplacian has eigenvalue \(n\) with multiplicity \(n-1\), giving \(\tau(K_n) = n^{n-1}/n = n^{n-2}\) — an independent algebraic proof of Cayley's formula. The matrix tree theorem connects combinatorial enumeration to spectral theory, a theme developed in the study of graph Laplacians and spectral methods. In probabilistic combinatorics, random spanning trees of a graph are sampled via Wilson's algorithm (which uses random walks) and have deep connections to the determinantal point processes that arise in statistical physics.