Intro to Combinatorics

Introduction Counting

Introduction

Combinatorics is the branch of discrete mathematics concerned with counting, arranging, and analyzing finite structures. Its central techniques — permutations, combinations, generating functions, and the principle of inclusion–exclusion — are the basic vocabulary of discrete probability, the analysis of algorithms, error-correcting codes, and cryptographic constructions. The same counting arguments that underlie a Bayesian belief update or a randomized hashing scheme begin here, with the rules of sum and product.

In this page, we review the fundamental counting principles — the rules of sum and product — and then develop the theory of combinations and permutations. Along the way, we encounter Pascal's Relation and Chu's Theorem, which reveal a connection between combinatorial identities and linear algebra through the Pascal matrix.

Counting

We begin by revisiting the two most fundamental counting principles, which form the basis for all combinatorial reasoning.

Definition: Fundamental Counting Principle
  • Rule of Sum:
    If a set can be expressed as the disjoint union of two (or more) subsets, then the number of elements in the set is the sum of the numbers of elements in the subsets.
  • Rule of Product:
    Consider a (finite) sequence of decisions. Suppose the number of choices for each individual decision is independent of decisions made previously in the sequence. Then the number of ways to make the whole sequence of decisions is the product of these numbers of choices.

There are two complementary ways to view the combination \[ {}_n C_r = C(n, r) = \binom{n}{r}. \] The combinatorial definition is operational: "\(n\) choose \(r\)" is the number of ways to choose \(r\) elements from a collection of \(n\) elements without regard to order.

The algebraic formula is \[ C(n, r) = \frac{n!}{r!\,(n-r)!}. \] The two definitions agree: this equivalence is established below as an immediate consequence of the rule of product applied to ordered selections (see the discussion preceding the definition of permutation). The quantity \(C(n, r)\) is called a binomial coefficient because of its role in the binomial theorem. It is a special case of the more general multinomial coefficient, defined below.

Definition: Multinomial Coefficient

For non-negative integers \(r_1, r_2, \ldots, r_k\) with \(r_1 + r_2 + \cdots + r_k = n\), the multinomial coefficient is \[ \binom{n}{r_1, r_2, \ldots, r_k} = \frac{n!}{r_1!\, r_2! \cdots r_k!}. \] It counts the number of ways to partition an \(n\)-element set into ordered blocks of sizes \(r_1, r_2, \ldots, r_k\). Equivalently, it admits the iterated-binomial decomposition \[ \binom{n}{r_1, r_2, \ldots, r_k} = \binom{n}{r_1}\binom{n-r_1}{r_2}\binom{n-r_1-r_2}{r_3} \cdots \binom{n-r_1-r_2 - \cdots - r_{k-1}}{r_k}, \] which expresses the same count by choosing the blocks one at a time. The binomial coefficient \(\binom{n}{r}\) is the special case \(k = 2\) with \(r_1 = r\), \(r_2 = n - r\).

Selections with replacement. The number of ways to choose \(r\) items from \(n\) types with replacement, where order does not matter, is \(C(r + n - 1, r)\); if order does matter, it is \(n^r\). The first count is a stars and bars identity: each unordered selection corresponds bijectively to an arrangement of \(r\) stars (the chosen items) and \(n - 1\) bars (separators between the \(n\) item types), giving \(\binom{r + n - 1}{r}\) arrangements. For example, with \(n = 4\) types \(\{A, B, C, D\}\) and \(r = 3\) picks: ordered selections give \(4^3 = 64\), unordered selections give \(C(6, 3) = 20\).

Let's observe a couple of combination-related theorems:

Theorem: Pascal's Relation

If \(1 \leq r \leq n\), then \[ C(n+1, r) = C(n, r-1) + C(n, r). \]

Proof:

Consider an \((n+1)\)-element set \(\{x_1, x_2, \ldots, x_n, y\}\). Its \(r\)-element subsets partition into two disjoint families: those that contain \(y\) and those that do not. The \(r\)-element subsets that do not contain \(y\) are precisely the \(r\)-element subsets of \(\{x_1, x_2, \ldots, x_n\}\), of which there are \(\binom{n}{r}\). Each \(r\)-element subset that does contain \(y\) is determined by choosing the remaining \(r - 1\) elements from \(\{x_1, x_2, \ldots, x_n\}\), giving \(\binom{n}{r-1}\) such subsets. By the rule of sum, \[ \binom{n+1}{r} = \binom{n}{r-1} + \binom{n}{r}. \]

Theorem: Chu's Theorem

If \(n \geq r \geq 0\), then \[ \begin{align*} \sum_{k=0}^n C(k,r) &= C(r, r) + C(r+1, r) + C(r+2, r) + \cdots + C(n, r) \\\\ &= C(n+1, r+1). \end{align*} \]

Proof:

We first reduce the summation index. Since \(C(k, r) = 0\) for \(0 \leq k < r\) (there are no \(r\)-element subsets of a smaller set), we have \[ \sum_{k=0}^{n} C(k, r) = \sum_{k=r}^{n} C(k, r). \] It thus suffices to show \(\sum_{k=r}^{n} C(k, r) = C(n+1, r+1)\). The boundary case \(n = r\) reduces both sides to \(C(r, r) = C(r+1, r+1) = 1\).

For \(n > r\), we apply Pascal's Relation iteratively. Starting from the identity \(C(r, r) = 1 = C(r+1, r+1)\), we have \[ C(r, r) + C(r+1, r) \;=\; C(r+1, r+1) + C(r+1, r) \;=\; C(r+2, r+1). \] Adding \(C(r+2, r)\) to both sides and applying Pascal's Relation again: \[ C(r, r) + C(r+1, r) + C(r+2, r) \;=\; C(r+2, r+1) + C(r+2, r) \;=\; C(r+3, r+1). \] Continuing in this fashion, after adding \(C(k, r)\) for \(k = r, r+1, \ldots, n\) we obtain \[ \sum_{k=r}^{n} C(k, r) = C(n+1, r+1), \] as required.

Chu's Theorem immediately yields familiar summation formulas. The sum of the first \(n\) positive integers, \[ 1 + 2 + \cdots + n = \frac{n(n+1)}{2}, \] is Chu's Theorem with \(r = 1\): \[ C(1, 1) + C(2, 1) + \cdots + C(n, 1) = C(n+1, 2). \] This perspective also handles higher powers without induction. For the sum of squares \(1^2 + 2^2 + \cdots + n^2\), we first rewrite \(k^2\) in the binomial-coefficient basis. Since \(k(k-1) = 2\, C(k, 2)\), \[ k^2 = k + k(k-1) = C(k, 1) + 2\, C(k, 2), \] and summing both sides over \(k = 1, \ldots, n\): \[ \sum_{k=1}^n k^2 = \sum_{k=1}^n C(k, 1) + 2\sum_{k=1}^n C(k, 2). \] Applying Chu's Theorem to each sum, \[ \begin{align*} 1^2 + 2^2 + \cdots + n^2 &= C(n+1, 2) + 2\, C(n+1, 3) \\\\ &= \frac{n(n+1)}{2} + 2 \cdot \frac{(n+1)n(n-1)}{6}\\\\ &= n(n+1) \cdot \frac{3 + 2(n-1)}{6} \\\\ &= \frac{n(n+1)(2n+1)}{6}. \end{align*} \] The same strategy works in general:

Theorem: Sum of Powers of Positive Integers

For every integer \(m \geq 1\), there exist rational coefficients \(a_{1, m}, a_{2, m}, \ldots, a_{m, m}\) — independent of \(k\) — such that \[ k^m = \sum_{r=1}^{m} a_{r, m}\, C(k, r) \qquad \text{for all } k \in \mathbb{Z}_{\geq 0}. \] Consequently, summing over \(k = 1, \ldots, n\) and applying Chu's Theorem yields \[ \sum_{k=1}^{n} k^m = \sum_{r=1}^{m} a_{r, m}\, C(n+1, r+1), \] a polynomial in \(n\) of degree \(m + 1\).

Proof:

Existence of \(a_{r, m}\). Consider the polynomials \(C(k, 0), C(k, 1), \ldots, C(k, m)\) in the variable \(k\). The polynomial \(C(k, r) = k(k-1)\cdots(k-r+1)/r!\) has degree exactly \(r\), so these \(m + 1\) polynomials have pairwise distinct degrees. Polynomials of distinct degrees are linearly independent (in any non-trivial linear relation, the term of highest degree cannot be cancelled), so \(\{C(k, 0), C(k, 1), \ldots, C(k, m)\}\) is a basis of the space of polynomials in \(k\) of degree at most \(m\).

Since \(k^m\) lies in this space, it admits a unique expansion \(k^m = \sum_{r=0}^{m} a_{r, m}\, C(k, r)\). Evaluating at \(k = 0\): for \(m \geq 1\) the left side is \(0\), and on the right side \(C(0, r) = 0\) for \(r \geq 1\) while \(C(0, 0) = 1\), forcing \(a_{0, m} = 0\). Thus only \(C(k, 1), \ldots, C(k, m)\) carry non-zero coefficients, and the expansion takes the stated form.

Sum formula. Summing the expansion over \(k = 1, \ldots, n\) and exchanging the order of summation, \[ \begin{align*} \sum_{k=1}^{n} k^m &= \sum_{k=1}^{n} \sum_{r=1}^{m} a_{r, m}\, C(k, r) \\\\ &= \sum_{r=1}^{m} a_{r, m} \sum_{k=1}^{n} C(k, r) \\\\ &= \sum_{r=1}^{m} a_{r, m}\, C(n+1, r+1), \end{align*} \] where the last equality uses Chu's Theorem; the sum starting at \(k = 1\) instead of \(k = 0\) is harmless because \(C(0, r) = 0\) for \(r \geq 1\).

The coefficients \(a_{r, m}\) are in fact \(r! \cdot S(m, r)\), where \(S(m, r)\) is the Stirling number of the second kind (counting partitions of an \(m\)-element set into \(r\) non-empty blocks). We do not develop Stirling numbers here; the basis argument above suffices to compute the \(a_{r, m}\) for any specific \(m\), as we now illustrate.

For \(m = 3\), we seek coefficients \(x_1, x_2, x_3\) such that \[ \begin{align*} k^3 &= x_1 C(k, 1) + x_2 C(k, 2) + x_3 C(k, 3) \\\\ &= x_1 k + \frac{x_2\, k(k-1)}{2} + \frac{x_3\, k(k-1)(k-2)}{6}. \end{align*} \] Clearing denominators, \[ 6 k^3 = (6 x_1 - 3 x_2 + 2 x_3)\, k + (3 x_2 - 3 x_3)\, k^2 + x_3\, k^3. \] Comparing coefficients of \(k\), \(k^2\), and \(k^3\), we obtain \(a_{1, 3} = x_1 = 1\), \(a_{2, 3} = x_2 = 6\), and \(a_{3, 3} = x_3 = 6\).

Hence, \[ k^3 = C(k,1) + 6C(k,2) + 6C(k,3) \] and \[ 1^3 + 2^3 + \cdots + n^3 = C(n+1, 2) + 6C(n+1, 3) + 6C(n+1, 4) = \frac{n^2(n+1)^2}{4}. \] Rather than solving this system of equations by hand each time, we can organize the computation using linear algebra.

Let \(\mathcal{P}_n \in \mathbb{R}^{n \times n}\) be the Pascal matrix whose \((i,j)\)-entry is the binomial coefficient \(C(i,j)\) for \(1 \leq i, j \leq n\) (with the convention \(C(i, j) = 0\) when \(j > i\), making \(\mathcal{P}_n\) lower triangular).

The matrix product \(\mathcal{P}_n^{-1} M\) extracts the desired coefficients. Indeed, define the power matrix \(M \in \mathbb{R}^{n \times n}\) by \(M_{ij} = i^j\), and the coefficient array \(A \in \mathbb{R}^{n \times n}\) by \(A_{rj} = a_{r, j}\) for \(1 \leq r \leq j\) and \(A_{rj} = 0\) for \(r > j\); thus \(A\) is upper triangular. With these conventions, the column identity \[ M_{ij} = i^j = \sum_{r=1}^{j} a_{r, j}\, C(i, r) = \sum_{r=1}^{n} A_{rj}\, (\mathcal{P}_n)_{i, r} \] holds for each \(i, j\) (the upper limit extends from \(j\) to \(n\) because \(A_{rj} = 0\) for \(r > j\)), and reads as the matrix equation \(M = \mathcal{P}_n A\), so \(A = \mathcal{P}_n^{-1} M\).

For example, when \(n = 4\), \[ \mathcal{P}_4 = \begin{bmatrix}1 & 0 & 0 & 0 \\ 2 & 1 & 0 & 0 \\ 3 & 3 & 1 & 0 \\ 4 & 6 & 4 & 1 \end{bmatrix} \] and its inverse is \[ \mathcal{P}_4^{-1} = \begin{bmatrix}1 & 0 & 0 & 0 \\ -2 & 1 & 0 & 0 \\ 3 & -3 & 1 & 0 \\ -4 & 6 & -4 & 1 \end{bmatrix}. \] (You can see the portion of Pascal's triangle.) Compute \[ \begin{align*} &\begin{bmatrix}1 & 0 & 0 & 0 \\ -2 & 1 & 0 & 0 \\ 3 & -3 & 1 & 0 \\ -4 & 6 & -4 & 1 \end{bmatrix} \begin{bmatrix}1^1 & 1^2 & 1^3 & 1^4 \\ 2^1 & 2^2 & 2^3 & 2^4 \\ 3^1 & 3^2 & 3^3 & 3^4 \\ 4^1 & 4^2 & 4^3 & 4^4 \end{bmatrix} \\\\ &= \begin{bmatrix}1 & 1 & 1 & 1 \\ 0 & 2 & 6 & 14 \\ 0 & 0 & 6 & 36 \\ 0 & 0 & 0 & 24 \end{bmatrix} \end{align*} \] Thus, the 4th column gives us \[ k^4 = C(k,1) + 14C(k,2) + 36C(k,3) + 24C(k,4). \]

In combinations, the order in which elements are chosen does not matter. Permutations, by contrast, count ordered arrangements — and provide the missing piece that ties the combinatorial and algebraic views of \(C(n, r)\) together.

Definition: Permutation

A permutation of \(r\) elements chosen from an \(n\)-element set is an ordered arrangement of \(r\) distinct elements drawn from the set. By the rule of product, the number of such permutations is \[ \begin{align*} P(n, r) &= n(n-1)(n-2)\cdots(n-r+1) \\\\ &= \frac{n!}{(n-r)!} \\\\ &= r!\, C(n, r). \end{align*} \] The third equality is what justifies the algebraic formula \(C(n, r) = n!/[r!\,(n-r)!]\) from the combinatorial definition: an ordered selection of \(r\) elements is the same as choosing an unordered \(r\)-subset and then ordering it, so \(P(n, r) = r! \cdot C(n, r)\).

With the fundamentals of counting, combinations, and permutations in place, we have the essential toolkit for enumerating discrete structures. These techniques reappear throughout the curriculum — in probability theory, where counting outcomes is the first step in computing probabilities; in the analysis of algorithms, where combinatorial arguments bound running times and data-structure sizes; and in the enumeration of trees, paths, and graphs, taken up in later pages on tree counting and graph enumeration.