Counting
We begin by revisiting the two most fundamental counting principles, which form the basis for all combinatorial reasoning.
Definition: Fundamental Counting Principle
- Rule of Sum:
If a set can be expressed as the disjoint union of two (or more) subsets, then the number of elements in the set is the sum
of the numbers of elements in the subsets.
- Rule of Product:
Consider a (finite) sequence of decisions. Suppose the number of choices for each individual decision is independent of
decisions made previously in the sequence. Then the number of ways to make the whole sequence of decisions is the product
of these numbers of choices.
There are two complementary ways to view the combination
\[
{}_n C_r = C(n, r) = \binom{n}{r}.
\]
The combinatorial definition is operational: "\(n\) choose \(r\)" is the number of ways to
choose \(r\) elements from a collection of \(n\) elements without regard to order.
The algebraic formula is
\[
C(n, r) = \frac{n!}{r!\,(n-r)!}.
\]
The two definitions agree: this equivalence is established below as an immediate consequence of the
rule of product applied to ordered selections (see the discussion preceding the definition of
permutation). The quantity \(C(n, r)\) is called a binomial coefficient because of
its role in the binomial theorem. It is a special case of the more general
multinomial coefficient, defined below.
Definition: Multinomial Coefficient
For non-negative integers \(r_1, r_2, \ldots, r_k\) with \(r_1 + r_2 + \cdots + r_k = n\),
the multinomial coefficient is
\[
\binom{n}{r_1, r_2, \ldots, r_k} = \frac{n!}{r_1!\, r_2! \cdots r_k!}.
\]
It counts the number of ways to partition an \(n\)-element set into ordered blocks of sizes
\(r_1, r_2, \ldots, r_k\). Equivalently, it admits the iterated-binomial decomposition
\[
\binom{n}{r_1, r_2, \ldots, r_k} = \binom{n}{r_1}\binom{n-r_1}{r_2}\binom{n-r_1-r_2}{r_3} \cdots \binom{n-r_1-r_2 - \cdots - r_{k-1}}{r_k},
\]
which expresses the same count by choosing the blocks one at a time. The binomial coefficient
\(\binom{n}{r}\) is the special case \(k = 2\) with \(r_1 = r\), \(r_2 = n - r\).
Selections with replacement. The number of ways to choose \(r\) items from \(n\) types
with replacement, where order does not matter, is \(C(r + n - 1, r)\); if order does matter,
it is \(n^r\). The first count is a stars and bars identity: each unordered selection
corresponds bijectively to an arrangement of \(r\) stars (the chosen items) and \(n - 1\) bars (separators
between the \(n\) item types), giving \(\binom{r + n - 1}{r}\) arrangements. For example, with
\(n = 4\) types \(\{A, B, C, D\}\) and \(r = 3\) picks: ordered selections give \(4^3 = 64\),
unordered selections give \(C(6, 3) = 20\).
Let's observe a couple of combination-related theorems:
Theorem: Pascal's Relation
If \(1 \leq r \leq n\), then
\[
C(n+1, r) = C(n, r-1) + C(n, r).
\]
Proof:
Consider an \((n+1)\)-element set \(\{x_1, x_2, \ldots, x_n, y\}\). Its \(r\)-element subsets
partition into two disjoint families: those that contain \(y\) and those that do not. The \(r\)-element
subsets that do not contain \(y\) are precisely the \(r\)-element subsets of
\(\{x_1, x_2, \ldots, x_n\}\), of which there are \(\binom{n}{r}\). Each \(r\)-element subset that
does contain \(y\) is determined by choosing the remaining \(r - 1\) elements from
\(\{x_1, x_2, \ldots, x_n\}\), giving \(\binom{n}{r-1}\) such subsets. By the
rule of sum,
\[
\binom{n+1}{r} = \binom{n}{r-1} + \binom{n}{r}.
\]
Theorem: Chu's Theorem
If \(n \geq r \geq 0\), then
\[
\begin{align*}
\sum_{k=0}^n C(k,r) &= C(r, r) + C(r+1, r) + C(r+2, r) + \cdots + C(n, r) \\\\
&= C(n+1, r+1).
\end{align*}
\]
Proof:
We first reduce the summation index. Since \(C(k, r) = 0\) for \(0 \leq k < r\) (there are no
\(r\)-element subsets of a smaller set), we have
\[
\sum_{k=0}^{n} C(k, r) = \sum_{k=r}^{n} C(k, r).
\]
It thus suffices to show \(\sum_{k=r}^{n} C(k, r) = C(n+1, r+1)\). The boundary case \(n = r\)
reduces both sides to \(C(r, r) = C(r+1, r+1) = 1\).
For \(n > r\), we apply Pascal's Relation
iteratively. Starting from the identity
\(C(r, r) = 1 = C(r+1, r+1)\), we have
\[
C(r, r) + C(r+1, r) \;=\; C(r+1, r+1) + C(r+1, r) \;=\; C(r+2, r+1).
\]
Adding \(C(r+2, r)\) to both sides and applying Pascal's Relation again:
\[
C(r, r) + C(r+1, r) + C(r+2, r) \;=\; C(r+2, r+1) + C(r+2, r) \;=\; C(r+3, r+1).
\]
Continuing in this fashion, after adding \(C(k, r)\) for \(k = r, r+1, \ldots, n\) we obtain
\[
\sum_{k=r}^{n} C(k, r) = C(n+1, r+1),
\]
as required.
Chu's Theorem immediately yields familiar summation formulas. The sum of the first \(n\) positive integers,
\[
1 + 2 + \cdots + n = \frac{n(n+1)}{2},
\]
is Chu's Theorem with \(r = 1\):
\[
C(1, 1) + C(2, 1) + \cdots + C(n, 1) = C(n+1, 2).
\]
This perspective also handles higher powers without induction. For the sum of squares
\(1^2 + 2^2 + \cdots + n^2\), we first rewrite \(k^2\) in the binomial-coefficient basis. Since
\(k(k-1) = 2\, C(k, 2)\),
\[
k^2 = k + k(k-1) = C(k, 1) + 2\, C(k, 2),
\]
and summing both sides over \(k = 1, \ldots, n\):
\[
\sum_{k=1}^n k^2 = \sum_{k=1}^n C(k, 1) + 2\sum_{k=1}^n C(k, 2).
\]
Applying Chu's Theorem to each sum,
\[
\begin{align*}
1^2 + 2^2 + \cdots + n^2 &= C(n+1, 2) + 2\, C(n+1, 3) \\\\
&= \frac{n(n+1)}{2} + 2 \cdot \frac{(n+1)n(n-1)}{6}\\\\
&= n(n+1) \cdot \frac{3 + 2(n-1)}{6} \\\\
&= \frac{n(n+1)(2n+1)}{6}.
\end{align*}
\]
The same strategy works in general:
Theorem: Sum of Powers of Positive Integers
For every integer \(m \geq 1\), there exist rational coefficients \(a_{1, m}, a_{2, m}, \ldots, a_{m, m}\) —
independent of \(k\) — such that
\[
k^m = \sum_{r=1}^{m} a_{r, m}\, C(k, r) \qquad \text{for all } k \in \mathbb{Z}_{\geq 0}.
\]
Consequently, summing over \(k = 1, \ldots, n\) and applying
Chu's Theorem yields
\[
\sum_{k=1}^{n} k^m = \sum_{r=1}^{m} a_{r, m}\, C(n+1, r+1),
\]
a polynomial in \(n\) of degree \(m + 1\).
Proof:
Existence of \(a_{r, m}\). Consider the polynomials \(C(k, 0), C(k, 1), \ldots, C(k, m)\)
in the variable \(k\). The polynomial \(C(k, r) = k(k-1)\cdots(k-r+1)/r!\) has degree exactly \(r\), so
these \(m + 1\) polynomials have pairwise distinct degrees. Polynomials of distinct degrees are linearly
independent (in any non-trivial linear relation, the term of highest degree cannot be cancelled), so
\(\{C(k, 0), C(k, 1), \ldots, C(k, m)\}\) is a basis of the space of polynomials in \(k\) of degree at most \(m\).
Since \(k^m\) lies in this space, it admits a unique expansion
\(k^m = \sum_{r=0}^{m} a_{r, m}\, C(k, r)\). Evaluating at \(k = 0\): for \(m \geq 1\) the left side is
\(0\), and on the right side \(C(0, r) = 0\) for \(r \geq 1\) while \(C(0, 0) = 1\), forcing \(a_{0, m} = 0\).
Thus only \(C(k, 1), \ldots, C(k, m)\) carry non-zero coefficients, and the expansion takes the stated form.
Sum formula. Summing the expansion over \(k = 1, \ldots, n\) and exchanging the order of summation,
\[
\begin{align*}
\sum_{k=1}^{n} k^m &= \sum_{k=1}^{n} \sum_{r=1}^{m} a_{r, m}\, C(k, r) \\\\
&= \sum_{r=1}^{m} a_{r, m} \sum_{k=1}^{n} C(k, r) \\\\
&= \sum_{r=1}^{m} a_{r, m}\, C(n+1, r+1),
\end{align*}
\]
where the last equality uses Chu's Theorem;
the sum starting at \(k = 1\) instead of \(k = 0\) is harmless
because \(C(0, r) = 0\) for \(r \geq 1\).
The coefficients \(a_{r, m}\) are in fact \(r! \cdot S(m, r)\), where \(S(m, r)\) is the
Stirling number of the second kind (counting partitions of an \(m\)-element set
into \(r\) non-empty blocks). We do not develop Stirling numbers here; the basis argument above
suffices to compute the \(a_{r, m}\) for any specific \(m\), as we now illustrate.
For \(m = 3\), we seek coefficients \(x_1, x_2, x_3\) such that
\[
\begin{align*}
k^3 &= x_1 C(k, 1) + x_2 C(k, 2) + x_3 C(k, 3) \\\\
&= x_1 k + \frac{x_2\, k(k-1)}{2} + \frac{x_3\, k(k-1)(k-2)}{6}.
\end{align*}
\]
Clearing denominators,
\[
6 k^3 = (6 x_1 - 3 x_2 + 2 x_3)\, k + (3 x_2 - 3 x_3)\, k^2 + x_3\, k^3.
\]
Comparing coefficients of \(k\), \(k^2\), and \(k^3\), we obtain
\(a_{1, 3} = x_1 = 1\), \(a_{2, 3} = x_2 = 6\), and \(a_{3, 3} = x_3 = 6\).
Hence,
\[
k^3 = C(k,1) + 6C(k,2) + 6C(k,3)
\]
and
\[
1^3 + 2^3 + \cdots + n^3 = C(n+1, 2) + 6C(n+1, 3) + 6C(n+1, 4) = \frac{n^2(n+1)^2}{4}.
\]
Rather than solving this system of equations by hand each time, we can organize the computation
using linear algebra.
Let \(\mathcal{P}_n \in \mathbb{R}^{n \times n}\) be the Pascal matrix whose \((i,j)\)-entry is the binomial
coefficient \(C(i,j)\) for \(1 \leq i, j \leq n\) (with the convention \(C(i, j) = 0\) when \(j > i\),
making \(\mathcal{P}_n\) lower triangular).
The matrix product \(\mathcal{P}_n^{-1} M\) extracts the desired coefficients. Indeed, define
the power matrix \(M \in \mathbb{R}^{n \times n}\) by \(M_{ij} = i^j\), and the coefficient
array \(A \in \mathbb{R}^{n \times n}\) by \(A_{rj} = a_{r, j}\) for \(1 \leq r \leq j\) and
\(A_{rj} = 0\) for \(r > j\); thus \(A\) is upper triangular. With these conventions, the
column identity
\[
M_{ij} = i^j = \sum_{r=1}^{j} a_{r, j}\, C(i, r) = \sum_{r=1}^{n} A_{rj}\, (\mathcal{P}_n)_{i, r}
\]
holds for each \(i, j\) (the upper limit extends from \(j\) to \(n\) because \(A_{rj} = 0\) for \(r > j\)),
and reads as the matrix equation \(M = \mathcal{P}_n A\), so \(A = \mathcal{P}_n^{-1} M\).
For example, when \(n = 4\),
\[
\mathcal{P}_4 = \begin{bmatrix}1 & 0 & 0 & 0 \\ 2 & 1 & 0 & 0 \\ 3 & 3 & 1 & 0 \\ 4 & 6 & 4 & 1 \end{bmatrix}
\]
and its inverse is
\[
\mathcal{P}_4^{-1} = \begin{bmatrix}1 & 0 & 0 & 0 \\ -2 & 1 & 0 & 0 \\ 3 & -3 & 1 & 0 \\ -4 & 6 & -4 & 1 \end{bmatrix}.
\]
(You can see the portion of Pascal's triangle.)
Compute
\[
\begin{align*}
&\begin{bmatrix}1 & 0 & 0 & 0 \\ -2 & 1 & 0 & 0 \\ 3 & -3 & 1 & 0 \\ -4 & 6 & -4 & 1 \end{bmatrix}
\begin{bmatrix}1^1 & 1^2 & 1^3 & 1^4 \\ 2^1 & 2^2 & 2^3 & 2^4 \\ 3^1 & 3^2 & 3^3 & 3^4 \\ 4^1 & 4^2 & 4^3 & 4^4 \end{bmatrix} \\\\
&= \begin{bmatrix}1 & 1 & 1 & 1 \\ 0 & 2 & 6 & 14 \\ 0 & 0 & 6 & 36 \\ 0 & 0 & 0 & 24 \end{bmatrix}
\end{align*}
\]
Thus, the 4th column gives us
\[
k^4 = C(k,1) + 14C(k,2) + 36C(k,3) + 24C(k,4).
\]
In combinations, the order in which elements are chosen does not matter. Permutations,
by contrast, count ordered arrangements — and provide the missing piece that ties the
combinatorial and algebraic views of \(C(n, r)\) together.
Definition: Permutation
A permutation of \(r\) elements chosen from an \(n\)-element set is an
ordered arrangement of \(r\) distinct elements drawn from the set. By the
rule of product, the number of such
permutations is
\[
\begin{align*}
P(n, r) &= n(n-1)(n-2)\cdots(n-r+1) \\\\
&= \frac{n!}{(n-r)!} \\\\
&= r!\, C(n, r).
\end{align*}
\]
The third equality is what justifies the algebraic formula \(C(n, r) = n!/[r!\,(n-r)!]\) from the
combinatorial definition: an ordered selection of \(r\) elements is the same as choosing an
unordered \(r\)-subset and then ordering it, so \(P(n, r) = r! \cdot C(n, r)\).
With the fundamentals of counting, combinations, and permutations in place, we have the essential toolkit
for enumerating discrete structures. These techniques reappear throughout the curriculum — in
probability theory, where counting outcomes is the first step in computing probabilities; in the
analysis of algorithms, where combinatorial arguments bound running times and data-structure sizes;
and in the enumeration of trees, paths, and graphs, taken up in later pages on tree counting and
graph enumeration.