Rooted Trees and Binary Trees
In Introduction to Graph Theory, we defined a
tree as a connected acyclic graph and established that a tree on \(n\) vertices has exactly
\(n - 1\) edges, with a unique path between any pair of vertices. We now impose additional structure by
designating a special vertex as the root, transforming the tree from an undirected
combinatorial object into a hierarchical, recursive data structure — the single most pervasive
abstraction in computer science.
Rooted Trees
Definition: Rooted Tree
A rooted tree is a pair \((T, r)\), where \(T = (V, E)\) is a tree and \(r \in V\) is a
distinguished vertex called the root. For any vertex \(v \in V\), the unique path from
\(r\) to \(v\) in \(T\) induces a natural hierarchy:
- The depth of \(v\) is the length (number of edges) of the unique \(r\)-\(v\) path.
- If \(v \neq r\) and \(u\) is the vertex immediately preceding \(v\) on the \(r\)-\(v\) path, then
\(u\) is the parent of \(v\) and \(v\) is a child of \(u\).
- A vertex with no children is a leaf; a non-leaf vertex is an internal vertex.
Every non-root vertex has exactly one parent, and the root has no parent. The
height of a rooted tree \((T, r)\) is the maximum depth over all vertices:
\[
h(T) = \max_{v \in V} \operatorname{depth}(v).
\]
A rooted tree of height \(h\) has at least \(h + 1\) vertices (a path from root to deepest leaf)
and the root sits at depth \(0\).
Binary Trees
Definition: Binary Tree
A binary tree is a rooted tree in which every vertex has at most two children,
designated as the left child and the right child. A binary tree is
full (or proper) if every internal vertex has exactly two children.
The distinction between left and right matters: a single left child produces a different binary tree
from a single right child, even though the underlying unrooted graph is the same. This ordered
structure is what makes binary trees the natural setting for divide-and-conquer algorithms, where
a problem splits into two subproblems at every step.
Proposition: Leaves of a Full Binary Tree
A full binary tree with \(k\) internal vertices has exactly \(k + 1\) leaves.
Proof:
Let \(T\) be a full binary tree with \(k\) internal vertices and \(\ell\) leaves. The total number
of vertices is \(n = k + \ell\). Since \(T\) is a tree, it has \(n - 1\) edges. Every edge connects a
vertex to its parent, and every non-root vertex has exactly one parent, so the number of edges equals
\(n - 1\) (one edge per non-root vertex). Alternatively, count edges by their sources: each
internal vertex contributes exactly 2 edges (to its two children), and leaves contribute 0. Thus
the total number of edges is \(2k\). Equating:
\[
2k = n - 1 = (k + \ell) - 1,
\]
which gives \(\ell = k + 1\). \(\blacksquare\)
A binary tree is called height-balanced (or AVL-balanced) if for
every vertex, the heights of its left and right subtrees differ by at most one. A height-balanced
binary tree on \(n\) vertices has height \(\Theta(\log n)\): the lower bound
\(h \geq \lceil \log_2(n+1) \rceil - 1\) follows because at depth \(d\) there are at most
\(2^d\) vertices (giving \(n \leq 2^{h+1} - 1\)), and the upper bound \(h = O(\log n)\)
follows from the balance condition, which forces the number of vertices to grow at least
exponentially with height.
Spanning Trees
So far we have studied trees as hierarchical structures — rooted, ordered, with a
clear parent-child directionality. But trees serve a second, equally fundamental role: as the
minimal connected skeleton of an arbitrary graph. A communication network, for
instance, may contain many redundant links; stripping away edges until no cycles remain — while
preserving connectivity — yields a tree that spans the entire network. This shift in perspective,
from rooted hierarchy to unrooted backbone, leads to the concept of a spanning tree and
ultimately to the enumeration questions in the final section of this page.
Recall from Introduction to Graph Theory that a
spanning subgraph of \(G = (V, E)\) is a subgraph that contains all vertices of \(G\).
Definition: Spanning Tree
A spanning tree of a connected graph \(G = (V, E)\) is a spanning subgraph
\(T = (V, E')\) that is a tree.
Proposition: Existence of Spanning Trees
Every connected graph has a spanning tree.
Proof:
Let \(G = (V, E)\) be connected. If \(G\) is acyclic, it is already a tree. Otherwise,
\(G\) contains a cycle \(C\). Remove any edge \(e \in E(C)\); the resulting graph \(G - e\) remains
connected (since the two endpoints of \(e\) are still connected via the rest of the cycle). Repeat
until no cycles remain. The process terminates because each step removes one edge, and the graph
stays connected at every step. The result is a connected acyclic spanning subgraph — a spanning tree.
\(\blacksquare\)
The number of spanning trees of a graph is a fundamental invariant. We will return to this
question at the end of this page with Cayley's formula for complete graphs, and later with
Kirchhoff's matrix tree theorem in the study of
graph Laplacians.
Recursive Structure: From Parse Trees to Merkle Trees
The recursive definition of binary trees — a tree is either empty or a root with a left subtree
and a right subtree — is the prototype for divide-and-conquer algorithms: a problem at a node is
solved by combining solutions to subproblems at its children. This pattern manifests across computer
science. Parse trees (abstract syntax trees) decompose source code into hierarchical
grammatical structures, enabling compilers to analyze and transform programs.
Expression trees represent arithmetic expressions with operators at internal nodes
and operands at leaves, so that tree traversals (inorder, postorder) directly yield infix and
postfix notations. In cryptography and distributed systems, Merkle trees — binary
trees where each internal node stores the hash of its children — enable efficient and tamper-proof
verification of large datasets: modifying a single leaf forces recomputation along the entire
root-to-leaf path, but verification requires only \(O(\log n)\) hash comparisons.
Binary Search Trees and the Comparison Model
We now turn from the structure of binary trees to their use as a data structure for
maintaining ordered data. The binary search tree imposes an ordering invariant that enables efficient
search, insertion, and deletion — and the analysis of its worst case leads us naturally to the
comparison model and a fundamental information-theoretic limit on sorting.
Binary Search Trees
Definition: Binary Search Tree (BST)
A binary search tree is a binary tree in which each vertex \(v\) stores a
key \(k(v)\) from a totally ordered set, satisfying the BST property: for
every vertex \(v\),
- every key in the left subtree of \(v\) is strictly less than \(k(v)\), and
- every key in the right subtree of \(v\) is strictly greater than \(k(v)\).
The BST property is an invariant maintained across all operations. An inorder traversal
of a BST visits the keys in sorted order — the tree is, in effect, a spatial encoding of a sorted
sequence. Search proceeds by comparing the target key with the current node and recursing into
the appropriate subtree.
Algorithm: BST-Search
Input: BST root \(v\), target key \(x\);
Output: node with key \(x\), or \(\texttt{null}\);
begin
if \(v = \texttt{null}\):
return \(\texttt{null}\);
if \(k(v) = x\):
return \(v\);
if \(x < k(v)\):
return BST-Search\((v.L,\, x)\);
else:
return BST-Search\((v.R,\, x)\);
end
Proposition: BST Operations
Search, insertion, and deletion in a binary search tree of height \(h\) each run in
\(O(h)\) time.
The crux is the height problem: the height of a BST on \(n\) keys
ranges from \(\Theta(\log n)\) (balanced) to \(\Theta(n)\) (degenerate — a single chain).
Inserting keys \(1, 2, 3, \ldots, n\) in order produces the degenerate case where every node has only
a right child, and search degrades to linear scan. Self-balancing BSTs —
AVL trees (which enforce a height-balance condition at every node) and Red-Black trees (which
enforce coloring invariants that guarantee \(h \leq 2 \log_2(n+1)\)) — solve this problem by
performing rotations during insertion and deletion to maintain \(O(\log n)\) height.
The Decision Tree Model
We now shift perspective: rather than analyzing a specific data structure, we ask what
any algorithm can achieve under a natural computational constraint.
Definition: Comparison-Based Algorithm
A comparison-based algorithm is one in which all decisions about the input
are made solely on the basis of pairwise comparisons of the form "\(a_i \leq a_j\)?".
The algorithm may not inspect the internal representation of keys (e.g., individual bits or digits).
Any comparison-based sorting algorithm on \(n\) elements can be modelled by a
decision tree: a binary tree where each internal node represents a comparison
"\(a_i \leq a_j\)?", the left child corresponds to "yes" and the right child to "no", and each
leaf represents a specific permutation of the input — the algorithm's output on that execution path.
Since the algorithm must be able to produce every one of the \(n!\) possible permutations as output,
the decision tree must have at least \(n!\) leaves.
Theorem: Comparison-Based Sorting Lower Bound
Any comparison-based sorting algorithm on \(n\) elements requires
\(\Omega(n \log n)\) comparisons in the worst case.
Proof:
The decision tree is a binary tree with at least \(n!\) leaves. A binary tree of height
\(h\) has at most \(2^h\) leaves, so the worst-case number of comparisons satisfies
\[
2^h \geq n! \quad \implies \quad h \geq \log_2(n!).
\]
By Stirling's approximation, \(n! \geq (n/e)^n\), so
\[
\log_2(n!) \geq n \log_2(n/e) = n \log_2 n - n \log_2 e = \Omega(n \log n).
\]
Therefore any comparison-based sorting algorithm requires \(\Omega(n \log n)\) comparisons
in the worst case. \(\blacksquare\)
This lower bound is tight: merge sort achieves \(O(n \log n)\) comparisons in every
case, and is therefore optimal in the comparison model. Quicksort achieves \(O(n \log n)\) in
expectation (over random pivot choices) but \(O(n^2)\) in the worst case. Heapsort also achieves
\(O(n \log n)\) worst-case while sorting in place, by exploiting a tree-based data
structure that maintains a partial order on elements.
Definition: Binary Heap
A (binary min-) heap is a complete binary tree in which every vertex stores a key,
and each vertex's key is less than or equal to those of its children. This heap property
ensures that the minimum key resides at the root. Insertion and extract-min both run in
\(O(\log n)\).
The heap is the standard implementation of the priority queue abstract data type — the
same structure that powers Dijkstra's algorithm in
Network Flow & Walks on Graphs,
where it enables efficient extraction of the vertex with minimum tentative distance.
Beyond Comparisons
The \(\Omega(n \log n)\) barrier is a statement about the comparison model, not about sorting
in general. Algorithms that exploit the internal structure of keys can escape this bound entirely.
Counting sort sorts \(n\) integers in the range \(\{0, 1, \ldots, k\}\) in
\(O(n + k)\) time by using direct addressing: it allocates an array of size \(k + 1\), counts
occurrences, and reads off the sorted output. Radix sort extends this to
\(d\)-digit keys by applying counting sort to each digit position, achieving \(O(d(n + k))\).
These algorithms do not compare pairs of elements — they bypass the decision tree model entirely.
The Decision Tree as an Information-Theoretic Argument
The \(\Omega(n \log n)\) lower bound is fundamentally an information-theoretic
argument. If the \(n!\) input permutations are equally likely, the initial uncertainty is
the Shannon entropy
\(\mathbb{H} = \log_2(n!)\) bits. Each binary comparison yields at most one bit of
information — it partitions the remaining possibilities into two groups — so at least
\(\log_2(n!) = \Omega(n \log n)\) comparisons are needed to identify the permutation.
This style of reasoning — bounding the depth of any decision procedure by the entropy of
the input — recurs throughout
complexity theory. The same idea shows that
any comparison-based algorithm for finding an element in a sorted array requires
\(\Omega(\log n)\) comparisons, which binary search matches exactly.
Counting Trees: Catalan and Schröder Numbers
We have seen that binary trees, BSTs, and decision trees are ubiquitous computational structures.
A natural question arises: how many distinct binary trees are there on \(n\) nodes?
The answer leads us to one of the most celebrated sequences in combinatorics — the
Catalan numbers — and reveals deep connections between tree enumeration,
lattice paths, and generating functions. The tools here build directly on the counting
principles and binomial coefficients developed in
Introduction to Combinatorics.
The Catalan Numbers
Definition: Catalan Numbers
The \(n\)-th Catalan number is
\[
C_n = \frac{1}{n+1}\binom{2n}{n}, \qquad n = 0, 1, 2, \ldots
\]
The first several values are \(C_0 = 1,\; C_1 = 1,\; C_2 = 2,\; C_3 = 5,\; C_4 = 14,\; C_5 = 42\).
The Catalan numbers satisfy a fundamental recurrence that arises from the recursive structure
of binary trees.
Theorem: Catalan Recurrence
The Catalan numbers satisfy
\[
C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}, \qquad C_0 = 1.
\]
Proof (via root-splitting):
Consider a binary tree on \(n + 1\) nodes. The root has a left subtree with \(i\) nodes
(for some \(0 \leq i \leq n\)) and a right subtree with \(n - i\) nodes. If we denote by
\(C_n\) the number of structurally distinct binary trees on \(n\) nodes, then the number
of trees with \(i\) nodes in the left subtree and \(n - i\) in the right is \(C_i \cdot C_{n-i}\)
(by the
Rule of Product).
Summing over all possible values of \(i\):
\[
C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}.
\]
The base case \(C_0 = 1\) accounts for the unique empty tree. \(\blacksquare\)
Theorem: Binary Tree Enumeration
The number of structurally distinct binary trees on \(n\) nodes is the Catalan number \(C_n\).
The Generating Function
The recurrence \(C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}\) is a convolution: the
right-hand side is the coefficient of \(u^n\) in the product of the generating function with itself.
Let us exploit this algebraic structure.
Define the ordinary generating function
\(C(u) = \sum_{n=0}^{\infty} C_n \, u^n\). The recurrence tells us that
\(C_{n+1}\) is the \(n\)-th coefficient of \(C(u)^2\), so
\(\sum_{n=0}^{\infty} C_{n+1} u^n = C(u)^2\). The left-hand side equals
\(\bigl(C(u) - C_0\bigr)/u = \bigl(C(u) - 1\bigr)/u\). Therefore:
\[
\frac{C(u) - 1}{u} = C(u)^2 \quad \implies \quad u \, C(u)^2 - C(u) + 1 = 0.
\]
This is a quadratic equation in \(C(u)\). By the quadratic formula:
\[
C(u) = \frac{1 \pm \sqrt{1 - 4u}}{2u}.
\]
Since \(C(0) = C_0 = 1\), we need the root that remains finite as \(u \to 0\). The "\(+\)" root
gives \((1 + \sqrt{1 - 4u})/(2u) \to \infty\) as \(u \to 0\), so we take the "\(-\)" root:
Theorem: Catalan Generating Function
The ordinary generating function of the Catalan numbers is
\[
C(u) = \sum_{n=0}^{\infty} C_n \, u^n = \frac{1 - \sqrt{1 - 4u}}{2u}.
\]
This expression can be verified by expanding \(\sqrt{1 - 4u}\) via the generalized binomial
series \((1 + z)^{1/2}\) with \(z = -4u\), yielding
\[
\sqrt{1 - 4u} = \sum_{k=0}^{\infty} \binom{1/2}{k}(-4u)^k = 1 - 2u - 2u^2 - 4u^3 - 10u^4 - 28u^5 - \cdots
\]
so that
\[
\frac{1 - \sqrt{1 - 4u}}{2u} = 1 + u + 2u^2 + 5u^3 + 14u^4 + 42u^5 + \cdots
\]
recovering the Catalan sequence.
The Closed Form via the Reflection Principle
We now derive the closed-form expression \(C_n = \frac{1}{n+1}\binom{2n}{n}\) using a
bijection between binary trees and lattice paths, combined with the reflection principle.
Definition: Dyck Path
A Dyck path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\)
consisting of \(n\) up-steps \((1, +1)\) and \(n\) down-steps \((1, -1)\) that never passes
below the \(x\)-axis.
Bijection: binary trees \(\leftrightarrow\) Dyck paths. We define a recursive
encoding. Given a binary tree \(T\) on \(n\) nodes with root \(r\), left subtree \(L\), and
right subtree \(R\), encode \(T\) as
\[
\texttt{encode}(T) = \text{"("} \;\cdot\; \texttt{encode}(L) \;\cdot\; \text{")"} \;\cdot\; \texttt{encode}(R),
\]
with the empty tree mapping to the empty string. Each node contributes exactly one "(" and one ")",
so the result is a string of length \(2n\). Interpreting "(" as an up-step and ")" as a down-step
yields a Dyck path of semilength \(n\): the path never dips below the \(x\)-axis because at
every prefix, the recursive nesting ensures that the number of opening parentheses is at least
the number of closing parentheses. The encoding is invertible: scanning the string from left to
right, the first "(" corresponds to the root, the balanced substring immediately following it
encodes \(L\), the matching ")" closes the root, and the remaining substring encodes \(R\).
Bijection: Dyck paths \(\leftrightarrow\) balanced parenthesizations.
Replace each up-step by "(" and each down-step by ")". A Dyck path of length \(2n\)
corresponds to a sequence of \(n\) pairs of balanced parentheses. The condition "never passes
below the \(x\)-axis" translates to "at every prefix, the number of opening parentheses is at
least the number of closing parentheses" — exactly the definition of a balanced parenthesization.
Theorem: Closed Form of Catalan Numbers
The number of Dyck paths of length \(2n\) is \(C_n = \frac{1}{n+1}\binom{2n}{n}\).
Proof (Reflection Principle):
The total number of lattice paths from \((0,0)\) to \((2n,0)\) with \(n\) up-steps and \(n\)
down-steps (without any restriction) is \(\binom{2n}{n}\) — we simply choose which \(n\) of the
\(2n\) steps are up-steps.
We must subtract the bad paths: those that touch or cross below the \(x\)-axis,
i.e., paths that at some point reach height \(-1\). Consider such a bad path. Let \(P\) be the
first point where the path reaches height \(-1\). Reflect the portion of the
path after \(P\) in the line \(y = -1\): every up-step becomes a down-step and vice versa. The
reflected path goes from \((0,0)\) to \((2n, -2)\), which consists of \((n-1)\) up-steps and
\((n+1)\) down-steps.
This reflection is a bijection between bad paths from \((0,0)\) to \((2n,0)\) and
all paths from \((0,0)\) to \((2n, -2)\). The number of the latter is
\(\binom{2n}{n-1}\) (choosing which \(n-1\) of \(2n\) steps are up-steps). Therefore:
\[
C_n = \binom{2n}{n} - \binom{2n}{n-1} = \binom{2n}{n} - \frac{n}{n+1}\binom{2n}{n}
= \frac{1}{n+1}\binom{2n}{n}. \quad \blacksquare
\]
The reflection principle is a powerful technique: the central idea is to establish a bijection
between "bad" objects and objects of a different, easily counted type. We used the same
combinatorial thinking — counting via bijections and the subtraction principle — in
Introduction to Combinatorics.
Schröder Numbers and Lattice Path Generalizations
The Catalan numbers count lattice paths with two types of steps: up \((1, +1)\) and
down \((1, -1)\). A natural generalization is to allow a third type of step.
Definition: Schröder Path
A Schröder path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\)
using up-steps \((1, +1)\), down-steps \((1, -1)\), and flat steps (or
diagonal steps) \((2, 0)\), that never passes below the \(x\)-axis. A
Schröder path with no flat steps on the \(x\)-axis is called a small Schröder path.
The number of Schröder paths of semilength \(n\) (i.e., from \((0,0)\) to \((2n,0)\)) is the
large Schröder number \(S_n\), and the number of small Schröder paths is the
little Schröder number (or super Catalan number) \(s_n\).
The first several values are:
\[
S_n \colon \; 1,\, 2,\, 6,\, 22,\, 90,\, 394,\, \ldots \qquad
s_n \colon \; 1,\, 1,\, 3,\, 11,\, 45,\, 197,\, \ldots
\]
One observes that \(S_n = 2\, s_n\) for \(n \geq 1\). The distinction is that large
Schröder paths permit flat steps on the \(x\)-axis while small Schröder paths forbid them;
the precise factor of 2 can be established by a decomposition argument at the first return
to the \(x\)-axis.
In terms of tree enumeration, the little Schröder numbers \(s_n\) count plane trees
(ordered rooted trees where each internal vertex has \(\geq 2\) children) on \(n + 1\) leaves.
For instance, \(s_2 = 3\) counts the three plane trees with 3 leaves: the root with 3
children (all leaves), and the two trees where the root has 2 children with one being an
internal node bearing 2 leaves. Where the Catalan numbers count full binary trees
(exactly 2 children per internal node), the little Schröder numbers count the more general
case where branching can vary — each internal node may have 2, 3, or more children,
but at least 2.
The large Schröder numbers satisfy the recurrence
\[
(n+2)\, S_{n+1} = 3(2n+1)\, S_n - (n-1)\, S_{n-1}, \qquad S_0 = 1,\; S_1 = 2.
\]
There is also a convolution-type recurrence that mirrors the Catalan case:
\[
S_{n+1} = S_{n} + \sum_{k=0}^{n} S_k \, S_{n-k}, \qquad S_0 = 1.
\]
The additional \(S_n\) term (compared to the pure convolution \(C_{n+1} = \sum C_k C_{n-k}\))
reflects the extra flat step available in Schröder paths.
The Motzkin Numbers
Between Dyck paths and Schröder paths lies a third family. A Motzkin path of
length \(n\) is a lattice path from \((0,0)\) to \((n, 0)\) using up-steps \((1,+1)\),
down-steps \((1,-1)\), and horizontal steps \((1, 0)\), that never passes below
the \(x\)-axis. The number of Motzkin paths of length \(n\) is the Motzkin number
\(M_n\):
\[
M_n \colon \; 1,\, 1,\, 2,\, 4,\, 9,\, 21,\, 51,\, \ldots
\]
In terms of trees, \(M_n\) counts the number of ordered trees on \(n+1\) vertices where each
internal vertex has either one or two children (unary-binary trees). This positions the three
families in a natural hierarchy:
- Catalan \(C_n\): full binary trees — each internal node has exactly 2 children
(\(n\) internal nodes, \(n+1\) leaves).
- Motzkin \(M_n\): unary-binary trees — each internal node has 1 or 2 children
(\(n\) edges).
- Little Schröder \(s_n\): plane trees — each internal node has \(\geq 2\) children
(\(n+1\) leaves).
Catalan Numbers: A Universal Sequence in Discrete Mathematics
The Catalan numbers are one of the most frequently occurring sequences in all of
combinatorics, appearing in at least 200 known combinatorial interpretations. Beyond binary
trees, Dyck paths, and balanced parenthesizations, they count: triangulations of a convex
\((n+2)\)-gon, non-crossing partitions of \(\{1, 2, \ldots, n\}\), and monotonic lattice paths
that stay below the diagonal — the classical ballot problem. This
universality is reflected algebraically in the generating function
\(C(u) = (1 - \sqrt{1 - 4u})/(2u)\), which arises naturally whenever a combinatorial
structure decomposes into two independent substructures of the same kind — the hallmark
of the root-splitting recursion in binary trees.
Labeled Trees: Cayley's Formula
The Catalan and Schröder numbers count unlabeled trees — structures considered up to
isomorphism. We now ask a different question: given a fixed vertex set \(\{1, 2, \ldots, n\}\),
how many distinct labeled trees can be formed? Equivalently, how many spanning trees
does the complete graph \(K_n\) have? The answer, due to Cayley, is one of the most elegant
results in combinatorics.
Theorem: Cayley's Formula
The number of labeled trees on the vertex set \([n] = \{1, 2, \ldots, n\}\) is \(n^{n-2}\)
for \(n \geq 2\).
For small values: with \(n = 2\) there is \(2^0 = 1\) tree (the single edge \(12\)); with \(n = 3\)
there are \(3^1 = 3\) trees (star graphs centered at each vertex); with \(n = 4\) there are
\(4^2 = 16\) labeled trees.
We prove Cayley's formula by exhibiting a bijection between labeled trees on \([n]\) and
sequences in \([n]^{n-2}\), known as Prüfer sequences.
Prüfer Sequences
Definition: Prüfer Sequence
A Prüfer sequence of length \(n - 2\) is a sequence
\((a_1, a_2, \ldots, a_{n-2})\) where each \(a_i \in [n] = \{1, 2, \ldots, n\}\).
Encoding: labeled tree \(\to\) Prüfer sequence. Given a labeled tree \(T\) on \([n]\):
- Find the leaf with the smallest label. Call it \(\ell\).
- Record the label of the unique neighbor of \(\ell\).
This is the first entry \(a_1\) of the Prüfer sequence.
- Remove \(\ell\) and its incident edge from \(T\).
- Repeat steps 1-3 on the resulting tree with \(n - 1\) vertices, recording \(a_2\), then
\(a_3\), and so on.
- Stop after \(n - 2\) iterations. At this point, exactly 2 vertices remain (connected by
a single edge).
Example:
Consider the labeled tree on \(\{1, 2, 3, 4, 5, 6\}\) with edges
\(\{12,\, 13,\, 24,\, 25,\, 36\}\).
Step 1. Leaves are \(\{4, 5, 6\}\). Smallest leaf: \(4\).
Neighbor of \(4\) is \(2\). Record \(a_1 = 2\). Remove vertex \(4\).
Step 2. Leaves: \(\{5, 6\}\). Smallest leaf: \(5\).
Neighbor of \(5\) is \(2\). Record \(a_2 = 2\). Remove vertex \(5\).
Step 3. Leaves: \(\{2, 6\}\). Smallest leaf: \(2\).
Neighbor of \(2\) is \(1\). Record \(a_3 = 1\). Remove vertex \(2\).
Step 4. Leaves: \(\{1, 6\}\). Smallest leaf: \(1\).
Neighbor of \(1\) is \(3\). Record \(a_4 = 3\). Remove vertex \(1\).
Remaining vertices: \(\{3, 6\}\). The Prüfer sequence is \((2, 2, 1, 3)\).
Decoding: Prüfer sequence \(\to\) labeled tree. Given a Prüfer sequence
\((a_1, a_2, \ldots, a_{n-2})\):
- Initialize a degree counter: set \(d(v) = 1 + (\text{number of times } v \text{ appears in the sequence})\)
for each \(v \in [n]\).
- For \(i = 1, 2, \ldots, n-2\): find the smallest \(v \in [n]\) with \(d(v) = 1\).
Add edge \(\{v, a_i\}\) to the tree. Decrement \(d(v)\) and \(d(a_i)\) by 1.
- After processing all entries, exactly two vertices have \(d(v) = 1\). Connect them with
the final edge.
Note that Step 2 requires no inspection of the remaining sequence entries. The degree
counter maintains the invariant that at step \(i\),
\(d(v) = 1 + (\text{number of occurrences of } v \text{ among } a_i, a_{i+1}, \ldots, a_{n-2})\).
Thus \(d(v) = 1\) guarantees that \(v\) does not appear in the unprocessed portion of the
sequence, ensuring it is safe to remove \(v\) as a leaf. This invariant also shows that
decoding runs in \(O(n \log n)\) time using a priority queue (or \(O(n)\) with a more
careful implementation).
Key observation. A vertex \(v\) appears in the Prüfer sequence exactly
\(\deg_T(v) - 1\) times, where \(\deg_T(v)\) is the degree of \(v\) in the tree. In particular,
leaves (degree 1) never appear in the sequence, and the two vertices remaining at the end are
those whose degree has been decremented to 1 after all entries are processed.
This observation has a powerful consequence: it reduces the counting of trees with a
prescribed degree sequence to a problem about arrangements of symbols in a sequence.
Theorem: Degree-Constrained Tree Counting
The number of labeled trees on \([n]\) in which vertex \(v\) has degree \(d_v\)
(where \(d_v \geq 1\) and \(\sum_{v=1}^{n} d_v = 2(n-1)\)) is the
multinomial coefficient
\[
\binom{n-2}{d_1 - 1,\; d_2 - 1,\; \ldots,\; d_n - 1}
= \frac{(n-2)!}{(d_1 - 1)!\,(d_2 - 1)! \cdots (d_n - 1)!}.
\]
Proof:
Under the Prüfer bijection, vertex \(v\) appears exactly \(d_v - 1\) times in the
sequence of length \(n - 2\). The number of sequences with this prescribed multiplicity
is precisely the multinomial coefficient above — it counts the ways to arrange
\(d_1 - 1\) copies of symbol \(1\), \(d_2 - 1\) copies of symbol \(2\), etc.,
in a sequence of length \(\sum (d_v - 1) = 2(n-1) - n = n - 2\). \(\blacksquare\)
Proof of Cayley's Formula (two approaches):
Direct counting. The Prüfer bijection establishes
\(\{\text{labeled trees on } [n]\} \longleftrightarrow [n]^{n-2}\).
The set \([n]^{n-2}\) has \(n^{n-2}\) elements (each of the \(n-2\) positions
has \(n\) choices), giving the result immediately.
Via the multinomial theorem. Alternatively, we sum the
degree-constrained formula over all valid degree sequences. Setting
\(e_v = d_v - 1\), the sum becomes
\[
\sum_{\substack{e_1, \ldots, e_n \geq 0 \\ e_1 + \cdots + e_n = n-2}}
\frac{(n-2)!}{e_1! \cdots e_n!}
= (\underbrace{1 + 1 + \cdots + 1}_{n})^{n-2} = n^{n-2}
\]
by the multinomial theorem. This second approach is more powerful: it not only
proves Cayley's formula but reveals how the \(n^{n-2}\) trees are distributed
across degree sequences. \(\blacksquare\)
Cayley's formula counts the spanning trees of the complete graph \(K_n\). Recall from
Introduction to Graph Theory that \(K_n\) is
the graph on \(n\) vertices where every pair is connected by an edge — it has \(\binom{n}{2}\) edges,
and Cayley's formula tells us that \(n^{n-2}\) of its \(\binom{\binom{n}{2}}{n-1}\) possible spanning
subgraphs with \(n - 1\) edges are trees.
Algorithm: Prüfer Encoding
Input: labeled tree \(T\) on vertex set \([n]\);
Output: Prüfer sequence \((a_1, \ldots, a_{n-2})\);
begin
for \(i = 1\) to \(n - 2\):
\(\ell \leftarrow\) smallest leaf in \(T\);
\(a_i \leftarrow\) neighbor of \(\ell\) in \(T\);
remove \(\ell\) and its edge from \(T\);
end for
return \((a_1, a_2, \ldots, a_{n-2})\);
end
Kirchhoff's Matrix Tree Theorem and the Graph Laplacian
Cayley's formula counts spanning trees of \(K_n\). For a general graph \(G\), the number of
spanning trees is given by Kirchhoff's matrix tree theorem: it equals any
cofactor of the
graph Laplacian \(L = D - A\), where \(D\) is the degree matrix and \(A\) is the
adjacency matrix. Equivalently, if \(\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n\) are
the eigenvalues of \(L\) (with \(\lambda_1 = 0\)), then the number of spanning trees is
\[
\tau(G) = \frac{1}{n}\prod_{i=2}^{n} \lambda_i.
\]
For \(K_n\), the Laplacian has eigenvalue \(n\) with multiplicity \(n-1\), giving
\(\tau(K_n) = n^{n-1}/n = n^{n-2}\) — an independent algebraic proof of Cayley's formula.
The matrix tree theorem connects combinatorial enumeration to spectral theory, a theme developed
in the study of
graph Laplacians and spectral methods.
In probabilistic combinatorics, random spanning trees of a graph are sampled via
Wilson's algorithm (which uses random walks) and have deep connections to the determinantal
point processes that arise in statistical physics.