Rooted Trees and Binary Trees
In Introduction to Graph Theory, we defined a
tree as a connected acyclic graph and established that a tree on \(n\) vertices has exactly
\(n - 1\) edges, with a unique path between any pair of vertices. We now impose additional structure by
designating a special vertex as the root, transforming the tree from an undirected
combinatorial object into a hierarchical, recursive data structure — one of the most pervasive
abstractions in computer science.
Rooted Trees
Definition: Rooted Tree
A rooted tree is a pair \((T, r)\), where \(T = (V, E)\) is a tree and \(r \in V\) is a
distinguished vertex called the root. For any vertex \(v \in V\), the unique path from
\(r\) to \(v\) in \(T\) induces a natural hierarchy:
- The depth of \(v\) is the length (number of edges) of the unique \(r\)-\(v\) path.
- If \(v \neq r\) and \(u\) is the vertex immediately preceding \(v\) on the \(r\)-\(v\) path, then
\(u\) is the parent of \(v\) and \(v\) is a child of \(u\).
- A vertex with no children is a leaf; a non-leaf vertex is an internal vertex.
For any vertex \(v \in V\), the subtree rooted at \(v\), denoted \(T_v\), is the
rooted tree consisting of \(v\) together with all its descendants — the vertices
\(w\) whose unique \(r\)-\(w\) path passes through \(v\) — with \(v\) as the new root. In particular,
the subtrees rooted at the children of any vertex \(v\) partition \(V(T_v) \setminus \{v\}\),
making rooted trees naturally recursive objects.
Every non-root vertex has exactly one parent, and the root has no parent. The
height of a rooted tree \((T, r)\) is the maximum depth over all vertices:
\[
h(T) = \max_{v \in V} \operatorname{depth}(v).
\]
A rooted tree of height \(h\) has at least \(h + 1\) vertices (a path from root to deepest leaf)
and the root sits at depth \(0\).
Binary Trees
Definition: Binary Tree
A binary tree is a rooted tree in which every vertex has at most two children, with
each child distinguished as either the left child or the right child
(so that a vertex with exactly one child still records whether that child sits in the left slot or the
right slot). A binary tree is full (or proper) if every internal
vertex has exactly two children.
The distinction between left and right matters: a single left child produces a different binary tree
from a single right child, even though the underlying unrooted graph is the same. This ordered
structure is what makes binary trees the natural setting for divide-and-conquer algorithms, where
a problem splits into two subproblems at every step.
Proposition: Leaves of a Full Binary Tree
A full binary tree with \(k\) internal vertices has exactly \(k + 1\) leaves.
Proof:
Let \(T\) be a full binary tree with \(k\) internal vertices and \(\ell\) leaves. The total number
of vertices is \(n = k + \ell\). Since \(T\) is a tree, it has \(n - 1\) edges. Every edge connects a
vertex to its parent, and every non-root vertex has exactly one parent, so the number of edges equals
\(n - 1\) (one edge per non-root vertex). Alternatively, orient each edge from parent to child
(well-defined by the rooted structure) and count edges by their sources: each internal vertex
is the source of exactly 2 outgoing edges (to its two children), and each leaf is the source of 0.
Thus the total number of edges is \(2k\). Equating:
\[
2k = n - 1 = (k + \ell) - 1,
\]
which gives \(\ell = k + 1\). The boundary case \(k = 0\) (a single-vertex tree) is consistent: the
condition "every internal vertex has exactly two children" is vacuously satisfied, and \(\ell = 1 = k + 1\).
A binary tree is called height-balanced (or AVL-balanced) if for
every vertex, the heights of its left and right subtrees differ by at most one. A height-balanced
binary tree on \(n\) vertices has height \(\Theta(\log n)\): the lower bound
\(h \geq \lceil \log_2(n+1) \rceil - 1\) follows because at depth \(d\) there are at most
\(2^d\) vertices (giving \(n \leq 2^{h+1} - 1\)), and the upper bound \(h = O(\log n)\)
follows from the balance condition, which forces the number of vertices to grow at least
exponentially with height. More precisely, the minimum number of vertices in a height-balanced tree
of height \(h\) satisfies the Fibonacci-like recurrence
\(N(h) = N(h-1) + N(h-2) + 1\) with \(N(0) = 1\) and \(N(1) = 2\), yielding
\(N(h) \geq \varphi^h - 1\) where \(\varphi = (1 + \sqrt{5})/2\); hence
\(h \leq \log_\varphi(n + 1) = O(\log n)\).
Spanning Trees
So far we have studied trees as hierarchical structures — rooted, with a
clear parent-child directionality. But trees serve a second, equally fundamental role: as the
minimal connected skeleton of an arbitrary graph. A communication network, for
instance, may contain many redundant links; stripping away edges until no cycles remain — while
preserving connectivity — yields a tree that spans the entire network. This shift in perspective,
from rooted hierarchy to unrooted backbone, leads to the concept of a spanning tree and
ultimately to the enumeration questions in the final section of this page.
Recall from Introduction to Graph Theory that a
spanning subgraph of \(G = (V, E)\) is a subgraph that contains all vertices of \(G\).
Definition: Spanning Tree
A spanning tree of a connected graph \(G = (V, E)\) is a spanning subgraph
\(T = (V, E')\) that is a tree.
Proposition: Existence of Spanning Trees
Every connected graph has a spanning tree.
Proof:
Let \(G = (V, E)\) be connected. If \(G\) is acyclic, it is already a tree (and a spanning tree of itself).
Otherwise, \(G\) contains a cycle \(C\). Remove any edge \(e \in E(C)\); the resulting graph \(G - e\)
remains connected: for any pair \(u, v\) of vertices, a \(u\)-\(v\) path in \(G\) that traverses \(e\)
can be rerouted through the remaining edges of the cycle \(C\), yielding a \(u\)-\(v\) walk and hence a
\(u\)-\(v\) path in \(G - e\). Repeat until no cycles remain. The process terminates because each step
removes one edge, and the graph stays connected at every step. The result is a connected acyclic spanning
subgraph — a spanning tree.
The number of spanning trees of a graph is a fundamental invariant. We will return to this
question at the end of this page with Cayley's formula for complete graphs, and later with
Kirchhoff's matrix tree theorem in the study of
graph Laplacians.
Recursive Structure: From Parse Trees to Merkle Trees
The recursive definition of binary trees — a tree is either empty or a root with a left subtree
and a right subtree — is the prototype for divide-and-conquer algorithms: a problem at a node is
solved by combining solutions to subproblems at its children. This pattern manifests across computer
science. Parse trees (abstract syntax trees) decompose source code into hierarchical
grammatical structures, enabling compilers to analyze and transform programs.
Expression trees represent arithmetic expressions with operators at internal nodes
and operands at leaves, so that tree traversals (inorder, postorder) directly yield infix and
postfix notations. In cryptography and distributed systems, Merkle trees — binary
trees where each internal node stores the hash of its children — enable efficient and tamper-proof
verification of large datasets: modifying a single leaf forces recomputation along the entire
root-to-leaf path, but verification requires only \(O(\log n)\) hash comparisons.
Binary Search Trees and the Comparison Model
We now turn from the structure of binary trees to their use as a data structure for
maintaining ordered data. The binary search tree imposes an ordering invariant that enables efficient
search, insertion, and deletion — and the analysis of its worst case leads us naturally to the
comparison model and a fundamental information-theoretic limit on sorting.
Binary Search Trees
Definition: Binary Search Tree (BST)
A binary search tree is a binary tree in which each vertex \(v\) stores a
key \(k(v)\) drawn from a totally ordered set, with all stored keys distinct, satisfying the
BST property: for every vertex \(v\),
- every key in the left subtree of \(v\) is strictly less than \(k(v)\), and
- every key in the right subtree of \(v\) is strictly greater than \(k(v)\).
The BST property is an invariant maintained across all operations. An inorder traversal
of a BST visits the keys in sorted order — the tree is, in effect, a spatial encoding of a sorted
sequence. Search proceeds by comparing the target key with the current node and recursing into
the appropriate subtree.
Algorithm: BST-Search
Input: BST root \(v\), target key \(x\);
Output: node with key \(x\), or \(\texttt{null}\);
begin
if \(v = \texttt{null}\):
return \(\texttt{null}\);
if \(k(v) = x\):
return \(v\);
if \(x < k(v)\):
return BST-Search\((v.L,\, x)\);
else:
return BST-Search\((v.R,\, x)\);
end
Proposition: BST Operations
Search, insertion, and deletion in a binary search tree of height \(h\) each run in
\(O(h)\) time.
Proof:
Search. BST-Search descends one edge per step from the root, comparing the
target with \(k(v)\) at each visited vertex. The path from root to any vertex has length at
most \(h\), so search visits at most \(h + 1\) vertices and runs in \(O(h)\).
Insertion. Run BST-Search for the key being inserted (\(O(h)\)). If the tree
is empty, the new vertex becomes the root. Otherwise, since the key is absent, the search
terminates at a null position — the empty left or right slot of some vertex \(p\). Attach a
new vertex carrying the key in that slot. The BST property is preserved: at each vertex \(u\)
visited during the descent, the search took the left branch iff the new key is less than
\(k(u)\), placing the new vertex on the correct side of every ancestor.
Deletion. Locate the target vertex \(v\) by search (\(O(h)\)). Three cases:
- (i) \(v\) is a leaf. Remove \(v\) (if \(v\) is the root, the tree becomes
empty). The BST property is preserved trivially: no other vertex's relations are touched.
- (ii) \(v\) has exactly one child \(c\). If \(v\) has a parent \(p\), splice
\(v\) out by attaching \(c\) (and its subtree) to \(p\) in the slot \(v\) occupied; if \(v\)
is the root, promote \(c\) to be the new root. Every key formerly in \(c\)'s subtree was
already a key in \(v\)'s subtree, hence already on the correct side of \(k(p)\) (when \(p\)
exists) and of every ancestor of \(p\); the relations within \(c\)'s subtree are unchanged.
So the BST property is preserved.
- (iii) \(v\) has two children. Let \(w\) be the vertex reached by descending
from \(v\)'s right child along left children until none remain. We claim \(w\) holds the
minimum key in \(v\)'s right subtree. We prove this by induction on the size of any
non-empty BST subtree \(S\): the leftmost descendant of \(S\) holds the minimum key in
\(S\). If the root \(r\) of \(S\) has no left child, then \(r\) is itself the leftmost
descendant; every other key in \(S\) lies in \(r\)'s right subtree and exceeds \(k(r)\)
by the BST property. Otherwise, by induction the leftmost descendant of \(r\)'s left
subtree holds the minimum there; that key is less than \(k(r)\), and \(k(r)\) in turn
is less than every key in \(r\)'s right subtree, so the leftmost descendant remains
minimal in all of \(S\). Applying the claim with \(S = v\)'s right subtree gives \(w\)
as desired.
Now replace \(k(v)\) by \(k(w)\); since \(w\) has no left child, deleting \(w\) itself falls
into case (i) or (ii). It remains to verify that the modified tree satisfies the BST property:
- Every key in \(v\)'s old left subtree was less than \(k(v)\), and \(k(v) < k(w)\) (since
\(w\) was in \(v\)'s right subtree); by transitivity, every such key is less than \(k(w)\).
- Every remaining key in \(v\)'s right subtree exceeds \(k(w)\) by the minimality of \(k(w)\)
(\(w\) is removed; the rest are strictly greater since keys are distinct).
- For any proper ancestor \(a\) of \(v\) (if \(v\) is the root, there is none): if \(v\) lay
in \(a\)'s left subtree, every key in \(v\)'s subtree (including \(k(w)\)) was already less
than \(k(a)\); if \(v\) lay in \(a\)'s right subtree, every such key exceeded \(k(a)\).
In either case, replacing \(k(v)\) by \(k(w)\) preserves the relation with \(a\).
The descent to \(w\) is \(O(h)\), and the subsequent splice (case (i) or (ii) on \(w\)) is
\(O(h)\). All three deletion cases run in \(O(h)\) total.
The crux is the height problem: the height of a BST on \(n\) keys
ranges from \(\Theta(\log n)\) (balanced) to \(\Theta(n)\) (degenerate — a single chain).
Inserting keys \(1, 2, 3, \ldots, n\) in order produces the degenerate case where every node has only
a right child, and search degrades to linear scan. Self-balancing BSTs —
AVL trees (which enforce a height-balance condition at every node) and Red-Black trees (which
enforce coloring invariants that guarantee \(h \leq 2 \log_2(n+1)\)) — solve this problem by
performing rotations during insertion and deletion to maintain \(O(\log n)\) height.
The Decision Tree Model
We now shift perspective: rather than analyzing a specific data structure, we ask what
any algorithm can achieve under a natural computational constraint.
Definition: Comparison-Based Algorithm
A comparison-based algorithm is one in which all decisions about the input
are made solely on the basis of pairwise comparisons of the form "\(a_i \leq a_j\)?".
The algorithm may not inspect the internal representation of keys (e.g., individual bits or digits).
Any comparison-based sorting algorithm on \(n\) elements can be modelled by a
decision tree: a binary tree where each internal node represents a comparison
"\(a_i \leq a_j\)?", the left child corresponds to "yes" and the right child to "no", and each
leaf represents a specific permutation of the input — the algorithm's output on that execution path.
Since the algorithm must produce the correct sorted output for every one of the \(n!\) input
permutations, distinct input permutations must reach distinct leaves; hence the decision tree
has at least \(n!\) leaves.
Theorem: Comparison-Based Sorting Lower Bound
Any comparison-based sorting algorithm on \(n\) elements requires
\(\Omega(n \log n)\) comparisons in the worst case.
Proof:
The decision tree is a binary tree with at least \(n!\) leaves. A binary tree of height \(h\)
has at most \(2^h\) leaves: by induction on \(h\), the root contributes a single leaf when
\(h = 0\); for \(h \geq 1\), the left and right subtrees each have height at most \(h - 1\),
hence each at most \(2^{h-1}\) leaves, giving \(2 \cdot 2^{h-1} = 2^h\) in total. Therefore
the worst-case number of comparisons satisfies
\[
2^h \geq n! \quad \implies \quad h \geq \log_2(n!).
\]
Since \(\log_2 x\) is monotone increasing, for each \(k \geq 2\) we have
\(\int_{k-1}^{k} \log_2 x \, dx \leq \log_2 k\). Summing over \(k = 2, \ldots, n\):
\[
\log_2(n!) = \sum_{k=2}^{n} \log_2 k \;\geq\; \int_{1}^{n} \log_2 x \, dx
\;=\; \frac{n \ln n - n + 1}{\ln 2} \;=\; \Omega(n \log n).
\]
Therefore any comparison-based sorting algorithm requires \(\Omega(n \log n)\) comparisons
in the worst case.
This lower bound is tight: merge sort achieves \(O(n \log n)\) comparisons in every
case, and is therefore optimal in the comparison model. Quicksort achieves \(O(n \log n)\) in
expectation (over random pivot choices) but \(O(n^2)\) in the worst case. Heapsort also achieves
\(O(n \log n)\) worst-case while sorting in place, by exploiting a tree-based data
structure (the binary heap, defined next) in which each parent stores a key
no greater than those of its children.
Definition: Binary Heap
A (binary min-) heap is a complete binary tree — a binary
tree in which every level is filled, except possibly the last, which is filled from left to
right — in which every vertex stores a key, and each vertex's key is less than or equal to
those of its children. This heap property ensures that the minimum key
resides at the root.
The completeness condition forces the height of an \(n\)-vertex heap to be \(\lfloor \log_2 n \rfloor\),
so insertion and extract-min — implemented by sift-up and sift-down along a single root-to-leaf
path — both run in \(O(\log n)\) time.
The heap is the standard implementation of the priority queue abstract data type — the
same structure that powers Dijkstra's algorithm in
Network Flow & Walks on Graphs,
where it enables efficient extraction of the vertex with minimum tentative distance.
Beyond Comparisons
The \(\Omega(n \log n)\) barrier is a statement about the comparison model, not about sorting
in general. Algorithms that exploit the internal structure of keys can escape this bound entirely.
Counting sort sorts \(n\) integers in the range \(\{0, 1, \ldots, k\}\) in
\(O(n + k)\) time by using direct addressing: it allocates an array of size \(k + 1\), counts
occurrences, and reads off the sorted output. Radix sort extends this to
\(d\)-digit keys by applying counting sort to each digit position, achieving \(O(d(n + k))\).
These algorithms do not compare pairs of elements — they bypass the decision tree model entirely.
The Decision Tree as an Information-Theoretic Argument
The \(\Omega(n \log n)\) lower bound is fundamentally an information-theoretic
argument. If the \(n!\) input permutations are equally likely, the initial uncertainty is
the Shannon entropy
\(\mathbb{H} = \log_2(n!)\) bits. Each binary comparison yields at most one bit of
information — it partitions the remaining possibilities into two groups — so at least
\(\log_2(n!) = \Omega(n \log n)\) comparisons are needed to identify the permutation.
This style of reasoning — bounding the depth of any decision procedure by the entropy of
the input — recurs throughout
complexity theory. The same idea shows that
any comparison-based algorithm for finding an element in a sorted array requires
\(\Omega(\log n)\) comparisons, which binary search matches exactly.
Counting Trees: Catalan and Schröder Numbers
We have seen that binary trees, BSTs, and decision trees are ubiquitous computational structures.
A natural question arises: how many distinct binary trees are there on \(n\) nodes?
The answer leads us to one of the most celebrated sequences in combinatorics — the
Catalan numbers — and reveals deep connections between tree enumeration,
lattice paths, and generating functions. The tools here build directly on the counting
principles and binomial coefficients developed in
Introduction to Combinatorics.
The Catalan Numbers
Definition: Catalan Numbers
The \(n\)-th Catalan number \(C_n\) is the number of structurally distinct
binary trees on \(n\) nodes (with \(C_0 = 1\) for the empty tree). The first several values are
\[
C_0 = 1,\; C_1 = 1,\; C_2 = 2,\; C_3 = 5,\; C_4 = 14,\; C_5 = 42.
\]
The recursive structure of binary trees yields a recurrence that determines the entire sequence
from the base case \(C_0 = 1\).
Theorem: Catalan Recurrence
The Catalan numbers satisfy
\[
C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}, \qquad C_0 = 1.
\]
Proof (via root-splitting):
Consider a binary tree on \(n + 1\) nodes. Removing the root leaves a left subtree with \(i\)
nodes (for some \(0 \leq i \leq n\)) and a right subtree with \(n - i\) nodes. By the
Rule of Product,
the number of trees with \(i\) nodes in the left subtree and \(n - i\) in the right is
\(C_i \cdot C_{n-i}\). Summing over all possible values of \(i\):
\[
C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}.
\]
The base case \(C_0 = 1\) accounts for the unique empty tree.
The Generating Function
Define the ordinary generating function
\[
C(u) = \sum_{n=0}^{\infty} C_n \, u^n.
\]
The Catalan recurrence \(C_{n+1} = \sum_{i=0}^{n} C_i \, C_{n-i}\) is a convolution:
by the definition of formal power series multiplication, the right-hand side is the coefficient
of \(u^n\) in the product \(C(u) \cdot C(u) = C(u)^2\). Hence
\(\sum_{n=0}^{\infty} C_{n+1} u^n = C(u)^2\). The left-hand side equals
\(\bigl(C(u) - C_0\bigr)/u = \bigl(C(u) - 1\bigr)/u\). Therefore:
\[
\frac{C(u) - 1}{u} = C(u)^2 \quad \implies \quad u \, C(u)^2 - C(u) + 1 = 0.
\]
This is a quadratic equation in \(C(u)\). By the quadratic formula:
\[
C(u) = \frac{1 \pm \sqrt{1 - 4u}}{2u}.
\]
Since \(C(0) = C_0 = 1\), we need the root that remains finite as \(u \to 0\). The "\(+\)" root
gives \((1 + \sqrt{1 - 4u})/(2u) \to \infty\) as \(u \to 0\), so we take the "\(-\)" root:
Theorem: Catalan Generating Function
The ordinary generating function of the Catalan numbers is
\[
C(u) = \sum_{n=0}^{\infty} C_n \, u^n = \frac{1 - \sqrt{1 - 4u}}{2u}.
\]
The first few coefficients of the right-hand side, expanded as a power series in \(u\), are
\[
\frac{1 - \sqrt{1 - 4u}}{2u} = 1 + u + 2u^2 + 5u^3 + 14u^4 + 42u^5 + \cdots,
\]
recovering the Catalan sequence.
The Closed Form via the Reflection Principle
We now derive the closed-form expression \(C_n = \frac{1}{n+1}\binom{2n}{n}\) using a
bijection between binary trees and lattice paths, combined with the reflection principle.
Definition: Dyck Path
A Dyck path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\)
consisting of \(n\) up-steps \((1, +1)\) and \(n\) down-steps \((1, -1)\) that never passes
below the \(x\)-axis.
Bijection: binary trees \(\leftrightarrow\) Dyck paths. We define a recursive
encoding. Given a binary tree \(T\) on \(n\) nodes with root \(r\), left subtree \(L\), and
right subtree \(R\), encode \(T\) as
\[
\texttt{encode}(T) = \text{"("} \;\cdot\; \texttt{encode}(L) \;\cdot\; \text{")"} \;\cdot\; \texttt{encode}(R),
\]
with the empty tree mapping to the empty string. Each node contributes exactly one "(" and one ")",
so the result is a string of length \(2n\). Interpreting "(" as an up-step and ")" as a down-step
yields a Dyck path of semilength \(n\): the path never dips below the \(x\)-axis because at
every prefix, the recursive nesting ensures that the number of opening parentheses is at least
the number of closing parentheses. The encoding is invertible: scanning the string from left to
right, the first "(" corresponds to the root, the balanced substring immediately following it
encodes \(L\), the matching ")" closes the root, and the remaining substring encodes \(R\).
Equivalently, the encoding above produces a sequence of \(n\) pairs of balanced parentheses —
a string in which every opening parenthesis has a matching closing one and, at every prefix, the number
of opening parentheses is at least the number of closing ones. The Dyck path / balanced parenthesization
correspondence (up = "(", down = ")") thus identifies binary trees, Dyck paths, and balanced parenthesizations
as three views of the same object.
Proof (Reflection Principle):
By the bijection above, \(C_n\) equals the number of Dyck paths of length \(2n\). We count
these by subtracting bad paths from the unrestricted total. The total number of lattice paths
from \((0,0)\) to \((2n,0)\) with \(n\) up-steps and \(n\) down-steps (without any restriction)
is \(\binom{2n}{n}\) — we simply choose which \(n\) of the \(2n\) steps are up-steps.
We must subtract the bad paths: those that touch or cross below the \(x\)-axis,
i.e., paths that at some point reach height \(-1\). Consider such a bad path. Let \(P\) be the
first point where the path reaches height \(-1\). Reflect the portion of the
path from \(P\) onward in the line \(y = -1\): every up-step becomes a down-step and vice versa.
The reflected path goes from \((0,0)\) to \((2n, -2)\), which consists of \((n-1)\) up-steps and
\((n+1)\) down-steps.
This reflection is a bijection between bad paths from \((0,0)\) to \((2n,0)\) and
all paths from \((0,0)\) to \((2n, -2)\). Indeed, every path ending at \((2n, -2)\)
must touch the line \(y = -1\) (since each step changes \(y\) by exactly \(1\) and the endpoint
is at \(y = -2\)); reflecting the portion from its first such point onward inverts the operation
and recovers a bad path ending at \((2n, 0)\). The number of paths from \((0,0)\) to \((2n, -2)\)
is \(\binom{2n}{n-1}\) (choosing which \(n - 1\) of \(2n\) steps are up-steps). Using the
identity \(\binom{2n}{n-1} = \frac{n}{n+1}\binom{2n}{n}\) (immediate from the factorial form),
\[
C_n = \binom{2n}{n} - \binom{2n}{n-1} = \binom{2n}{n} - \frac{n}{n+1}\binom{2n}{n}
= \frac{1}{n+1}\binom{2n}{n}.
\]
The reflection principle exemplifies a powerful pattern in enumerative combinatorics:
counting hard-to-describe objects (here, Dyck paths) by establishing a bijection with
objects of a different, easily counted type, and applying the rule of sum on the resulting
partition.
Schröder Numbers and Lattice Path Generalizations
The Catalan numbers count lattice paths with two types of steps: up \((1, +1)\) and
down \((1, -1)\). A natural generalization is to allow a third type of step.
Definition: Schröder Path
A Schröder path of length \(2n\) is a lattice path from \((0,0)\) to \((2n, 0)\)
using up-steps \((1, +1)\), down-steps \((1, -1)\), and flat steps (or
diagonal steps) \((2, 0)\), that never passes below the \(x\)-axis. A
Schröder path with no flat steps on the \(x\)-axis is called a small Schröder path.
The number of Schröder paths of semilength \(n\) (i.e., from \((0,0)\) to \((2n,0)\)) is the
large Schröder number \(S_n\), and the number of small Schröder paths is the
little Schröder number (or super Catalan number) \(s_n\).
The first several values are:
\[
S_n \colon \; 1,\, 2,\, 6,\, 22,\, 90,\, 394,\, \ldots \qquad
s_n \colon \; 1,\, 1,\, 3,\, 11,\, 45,\, 197,\, \ldots
\]
One observes that \(S_n = 2\, s_n\) for \(n \geq 1\). The distinction is that large
Schröder paths permit flat steps on the \(x\)-axis while small Schröder paths forbid them;
the precise factor of 2 can be established by a decomposition argument at the first return
to the \(x\)-axis.
In terms of tree enumeration, the little Schröder numbers \(s_n\) count plane trees
(ordered rooted trees where each internal vertex has \(\geq 2\) children) on \(n + 1\) leaves.
For instance, \(s_2 = 3\) counts the three plane trees with 3 leaves: the root with 3
children (all leaves), and the two trees where the root has 2 children with one being an
internal node bearing 2 leaves. Where the Catalan numbers count binary trees on \(n\) nodes
(each internal node having exactly 2 child slots, possibly empty), the little Schröder numbers
count the more general case where branching can vary — each internal node may have 2, 3, or
more children.
The large Schröder numbers satisfy the recurrence
\[
(n+2)\, S_{n+1} = 3(2n+1)\, S_n - (n-1)\, S_{n-1}, \qquad S_0 = 1,\; S_1 = 2,
\]
as well as a convolution-type recurrence that mirrors the Catalan case:
\[
S_{n+1} = S_{n} + \sum_{k=0}^{n} S_k \, S_{n-k}, \qquad S_0 = 1.
\]
The additional \(S_n\) term (compared to the pure convolution \(C_{n+1} = \sum C_k C_{n-k}\))
reflects the extra flat step available in Schröder paths. Both recurrences can be derived from
the generating function \(S(x) = (1 - x - \sqrt{1 - 6x + x^2})/(2x)\); we state them without
proof here, as the Schröder numbers play a supporting role in this page rather than a central one.
The Motzkin Numbers
Between Dyck paths and Schröder paths lies a third family. A Motzkin path of
length \(n\) is a lattice path from \((0,0)\) to \((n, 0)\) using up-steps \((1,+1)\),
down-steps \((1,-1)\), and horizontal steps \((1, 0)\), that never passes below
the \(x\)-axis. The number of Motzkin paths of length \(n\) is the Motzkin number
\(M_n\):
\[
M_n \colon \; 1,\, 1,\, 2,\, 4,\, 9,\, 21,\, 51,\, \ldots
\]
In terms of trees, \(M_n\) counts the number of ordered trees on \(n+1\) vertices where each
internal vertex has either one or two children (unary-binary trees). This positions the three
families in a natural hierarchy:
- Catalan \(C_n\): binary trees on \(n\) nodes — each internal node has at most
2 children, distinguished as left and right.
- Motzkin \(M_n\): unary-binary trees — each internal node has 1 or 2 children
(\(n\) edges).
- Little Schröder \(s_n\): plane trees — each internal node has \(\geq 2\) children
(\(n+1\) leaves).
Catalan Numbers: A Universal Sequence in Discrete Mathematics
The Catalan numbers are one of the most frequently occurring sequences in all of
combinatorics, appearing in at least 200 known combinatorial interpretations. Beyond binary
trees, Dyck paths, and balanced parenthesizations, they count: triangulations of a convex
\((n+2)\)-gon, non-crossing partitions of \(\{1, 2, \ldots, n\}\), and monotonic lattice paths
that stay below the diagonal — the classical ballot problem. This
universality is reflected algebraically in the generating function
\(C(u) = (1 - \sqrt{1 - 4u})/(2u)\), which arises naturally whenever a combinatorial
structure decomposes into two independent substructures of the same kind — the hallmark
of the root-splitting recursion in binary trees.
Labeled Trees: Cayley's Formula
The Catalan and Schröder numbers count unlabeled trees — structures considered up to
isomorphism. We now ask a different question: given a fixed vertex set \(\{1, 2, \ldots, n\}\),
how many distinct labeled trees can be formed? Equivalently, how many spanning trees
does the complete graph \(K_n\) have? The answer, due to Cayley, is one of the most elegant
results in combinatorics.
For small values: with \(n = 2\) there is \(2^0 = 1\) tree (the single edge \(12\)); with \(n = 3\)
there are \(3^1 = 3\) trees (star graphs centered at each vertex); with \(n = 4\) there are
\(4^2 = 16\) labeled trees.
We prove Cayley's formula by exhibiting a bijection between labeled trees on \([n]\) and
sequences in \([n]^{n-2}\), known as Prüfer sequences.
Prüfer Sequences
Definition: Prüfer Sequence
A Prüfer sequence of length \(n - 2\) is a sequence
\((a_1, a_2, \ldots, a_{n-2})\) where each \(a_i \in [n] = \{1, 2, \ldots, n\}\).
Encoding: labeled tree \(\to\) Prüfer sequence. Given a labeled tree \(T\) on \([n]\):
- Find the leaf with the smallest label. Call it \(\ell\).
- Record the label of the unique neighbor of \(\ell\).
This is the first entry \(a_1\) of the Prüfer sequence.
- Remove \(\ell\) and its incident edge from \(T\).
- Repeat steps 1-3 on the resulting tree with \(n - 1\) vertices, recording \(a_2\), then
\(a_3\), and so on.
- Stop after \(n - 2\) iterations. At this point, exactly 2 vertices remain (connected by
a single edge).
Example:
Consider the labeled tree on \(\{1, 2, 3, 4, 5, 6\}\) with edges
\(\{12,\, 13,\, 24,\, 25,\, 36\}\).
Step 1. Leaves are \(\{4, 5, 6\}\). Smallest leaf: \(4\).
Neighbor of \(4\) is \(2\). Record \(a_1 = 2\). Remove vertex \(4\).
Step 2. Leaves: \(\{5, 6\}\). Smallest leaf: \(5\).
Neighbor of \(5\) is \(2\). Record \(a_2 = 2\). Remove vertex \(5\).
Step 3. Leaves: \(\{2, 6\}\). Smallest leaf: \(2\).
Neighbor of \(2\) is \(1\). Record \(a_3 = 1\). Remove vertex \(2\).
Step 4. Leaves: \(\{1, 6\}\). Smallest leaf: \(1\).
Neighbor of \(1\) is \(3\). Record \(a_4 = 3\). Remove vertex \(1\).
Remaining vertices: \(\{3, 6\}\). The Prüfer sequence is \((2, 2, 1, 3)\).
Decoding: Prüfer sequence \(\to\) labeled tree. Given a Prüfer sequence
\((a_1, a_2, \ldots, a_{n-2})\):
- Initialize a degree counter: set \(d(v) = 1 + (\text{number of times } v \text{ appears in the sequence})\)
for each \(v \in [n]\).
- For \(i = 1, 2, \ldots, n-2\): find the smallest \(v \in [n]\) with \(d(v) = 1\).
Add edge \(\{v, a_i\}\) to the tree. Decrement \(d(v)\) and \(d(a_i)\) by 1.
- After processing all entries, exactly two vertices have \(d(v) = 1\). Connect them with
the final edge.
The encoding and decoding procedures are mutual inverses, establishing a bijection between
labeled trees on \([n]\) and sequences in \([n]^{n-2}\). Verifying this rests on a few
elementary observations.
- Encoding is well-defined. At each iteration, the current tree has at least
2 leaves, so the smallest leaf is unambiguously chosen. (Any tree on \(\geq 2\) vertices has
at least 2 leaves: a tree on \(n\) vertices has \(n - 1\) edges, so by counting each edge
from both endpoints, \(\sum_v \deg(v) = 2(n-1)\); if at most one vertex had degree 1, the
sum would be at least \(1 + 2(n-1) = 2n - 1 > 2(n-1)\), a contradiction.) Removing a leaf
preserves the tree property, so the procedure runs for exactly \(n - 2\) steps.
- Decoding is well-defined. The degree counter \(d(v)\) maintains the invariant
that at the start of step \(i\), for each unprocessed vertex \(v\),
\[
d(v) = 1 + \#\{j \geq i : a_j = v\},
\]
verifiable by induction on \(i\). Summing over the \(n - i + 1\) unprocessed vertices gives
\(\sum d(v) = (n - i + 1) + (n - 1 - i) = 2(n - i)\). Since this is strictly less than
\(2(n - i + 1)\), some unprocessed vertex must have \(d(v) = 1\); the smallest such is
unambiguously chosen. The resulting graph collects \(n - 1\) edges on \(n\) vertices and
is acyclic, by the following component invariant: at the start of each step, the
connected component of any unprocessed vertex \(v\) contains only \(v\) among the unprocessed
vertices (its other members are previously deactivated \(v_j\)'s, by induction on the step
index). Hence \(v_i\) and \(a_i\) lie in different components when edge \(\{v_i, a_i\}\) is
added, so every edge of the constructed graph — including the final edge between the two
survivors — is a bridge, and the graph is a forest. With \(n - 1\) edges on \(n\) vertices,
it is in fact a tree.
- The two procedures are inverse. Both are driven by the same recursive
principle — at each step, the smallest leaf is matched with its unique neighbor — so
applying decoding to an encoded sequence reconstructs the original tree, and conversely.
A formal verification proceeds by induction on \(n\).
A useful consequence: a vertex \(v\) appears in the Prüfer sequence of \(T\) exactly
\(\deg_T(v) - 1\) times. Indeed, \(v\) is recorded as the neighbor of a removed leaf precisely
when one of its incident edges is deleted; each such deletion reduces \(\deg(v)\) by 1, and the
process terminates with \(v\) either still present (when \(\deg_T(v) - 1\) of its edges have been
removed) or having become itself a leaf and been removed (recording its own neighbor, not \(v\)
itself). In both cases \(v\) is recorded \(\deg_T(v) - 1\) times. In particular, leaves never
appear in the sequence, and the two vertices remaining at the end are those whose degree counter
has been decremented to 1 after all entries are processed.
The decoding implementation runs in \(O(n \log n)\) using a priority queue (or \(O(n)\) with a
more careful implementation that scans the sequence once to compute final degree counts).
The vertex-frequency observation has a powerful consequence: it reduces the counting of trees
with a prescribed degree sequence to a problem about arrangements of symbols in a sequence.
Theorem: Degree-Constrained Tree Counting
The number of labeled trees on \([n]\) in which vertex \(v\) has degree \(d_v\)
(where \(d_v \geq 1\) and \(\sum_{v=1}^{n} d_v = 2(n-1)\)) is the
multinomial coefficient
\[
\binom{n-2}{d_1 - 1,\; d_2 - 1,\; \ldots,\; d_n - 1}
= \frac{(n-2)!}{(d_1 - 1)!\,(d_2 - 1)! \cdots (d_n - 1)!}.
\]
Proof:
Under the Prüfer bijection, vertex \(v\) appears exactly \(d_v - 1\) times in the
sequence of length \(n - 2\). The number of sequences with this prescribed multiplicity
is precisely the multinomial coefficient above — it counts the ways to arrange
\(d_1 - 1\) copies of symbol \(1\), \(d_2 - 1\) copies of symbol \(2\), etc.,
in a sequence of length \(\sum (d_v - 1) = 2(n-1) - n = n - 2\).
Proof of Cayley's Formula:
The Prüfer bijection establishes
\(\{\text{labeled trees on } [n]\} \longleftrightarrow [n]^{n-2}\).
The set \([n]^{n-2}\) has \(n^{n-2}\) elements (each of the \(n-2\) positions
has \(n\) choices), giving the result immediately.
Cayley's formula counts the spanning trees of the complete graph \(K_n\). Recall from
Introduction to Graph Theory that \(K_n\) is
the graph on \(n\) vertices where every pair is connected by an edge — it has \(\binom{n}{2}\) edges,
and Cayley's formula tells us that \(n^{n-2}\) of its \(\binom{\binom{n}{2}}{n-1}\) possible spanning
subgraphs with \(n - 1\) edges are trees.
Algorithm: Prüfer Encoding
Input: labeled tree \(T\) on vertex set \([n]\);
Output: Prüfer sequence \((a_1, \ldots, a_{n-2})\);
begin
for \(i = 1\) to \(n - 2\):
\(\ell \leftarrow\) smallest leaf in \(T\);
\(a_i \leftarrow\) neighbor of \(\ell\) in \(T\);
remove \(\ell\) and its edge from \(T\);
end for
return \((a_1, a_2, \ldots, a_{n-2})\);
end
Kirchhoff's Matrix Tree Theorem and the Graph Laplacian
Cayley's formula counts spanning trees of \(K_n\). For a general graph \(G\), the number of
spanning trees is given by Kirchhoff's matrix tree theorem: it equals any
cofactor of the
graph Laplacian \(L = D - A\), where \(D\) is the degree matrix and \(A\) is the
adjacency matrix. Equivalently, if \(\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n\) are
the eigenvalues of \(L\) (with \(\lambda_1 = 0\)), then the number of spanning trees is
\[
\tau(G) = \frac{1}{n}\prod_{i=2}^{n} \lambda_i.
\]
For \(K_n\), the Laplacian has eigenvalue \(n\) with multiplicity \(n-1\), giving
\(\tau(K_n) = n^{n-1}/n = n^{n-2}\) — an independent algebraic proof of Cayley's formula.
The matrix tree theorem connects combinatorial enumeration to spectral theory, a theme developed
in the study of
graph Laplacians and spectral methods.
In probabilistic combinatorics, random spanning trees of a graph are sampled via
Wilson's algorithm (which uses random walks) and have deep connections to the determinantal
point processes that arise in statistical physics.