Network Flow & Walks on Graphs

Shortest Paths Maximum Flow Random Walks on Graphs

Shortest Paths

In Introduction to Graph Theory, we defined paths and studied their combinatorial structure. We now equip our graphs with weights and ask a quantitative question: among all paths connecting two vertices, which one minimizes total weight? This is the shortest path problem, and its study reveals a fundamental dichotomy in algorithm design — greedy selection versus dynamic programming — that pervades discrete optimization.

Weighted Graphs and Distance

Definition: Weighted Directed Graph

A weighted directed graph (or weighted digraph) is a triple \(G = (V, E, w)\), where \((V, E)\) is a directed graph and \(w \colon E \to \mathbb{R}\) is a weight function assigning a real number to each edge.

Definition: Walk and Trail

Let \(G = (V, E)\) be a graph (directed or undirected). A walk in \(G\) of length \(k\) is a sequence \((v_0, v_1, \ldots, v_k)\) of vertices such that each consecutive pair \((v_{i-1}, v_i)\) is an edge of \(G\); vertices and edges may repeat. A walk is closed if \(v_0 = v_k\). A trail is a walk in which no edge is repeated (vertices may still repeat).

Together with the notion of path (a walk in which no vertex is repeated), these form a strict hierarchy: every path is a trail, and every trail is a walk. The boundary case \(k = 0\) is the trivial walk consisting of the single vertex \(v_0\) with no edges; it is simultaneously a closed walk, a trail, and a path of length zero.

For a directed path \(P = (v_0, v_1, \ldots, v_k)\), we define its weight as \[ w(P) = \sum_{i=0}^{k-1} w(v_i, v_{i+1}). \] (In the boundary case \(k = 0\), the sum is empty and \(w(P) = 0\).) The shortest-path distance from \(s\) to \(t\) is \[ \delta(s, t) = \begin{cases} \displaystyle\min_{P \,:\, s \leadsto t} w(P) & \text{if a path from } s \text{ to } t \text{ exists,} \\ +\infty & \text{otherwise.} \end{cases} \] When all weights are non-negative, \(\delta\) satisfies the triangle inequality \(\delta(s,t) \leq \delta(s,u) + \delta(u,t)\) and non-negativity \(\delta(s,t) \geq 0\), but on a directed graph it is generally not symmetric: \(\delta(s,t) \neq \delta(t,s)\). Thus \(\delta\) is a quasi-metric (or asymmetric metric) on the vertex set, not a metric in the strict sense — recall that a metric requires symmetry by definition. When negative weights are present, \(\delta(s,t)\) may be \(-\infty\) if a negative-weight cycle is reachable from \(s\) on a path to \(t\).

Dijkstra's Algorithm

When all edge weights are non-negative, the shortest-path problem admits a greedy solution. Dijkstra's algorithm maintains a set \(S\) of vertices whose shortest-path distances from a source \(s\) have been finalized, and repeatedly extracts the vertex with the smallest tentative distance.

Theorem: Correctness of Dijkstra's Algorithm

Let \(G = (V, E, w)\) be a weighted digraph with \(w(e) \geq 0\) for all \(e \in E\). Upon termination, Dijkstra's algorithm computes \(\delta(s, v)\) for every \(v \in V\).

Proof Sketch:

We prove by induction on \(|S|\) that for every vertex \(v \in S\), the recorded distance \(d[v] = \delta(s,v)\).

Base case. Initially \(S = \{s\}\) with \(d[s] = 0 = \delta(s,s)\).

Inductive step. Suppose the invariant holds for \(|S| = k\). The algorithm selects \(u \notin S\) minimizing \(d[u]\). Suppose for contradiction that \(\delta(s,u) < d[u]\). Then there exists a shortest path \(P\) from \(s\) to \(u\) with \(w(P) < d[u]\). Let \((x, y)\) be the first edge on \(P\) with \(x \in S\) and \(y \notin S\). By the inductive hypothesis, \(d[x] = \delta(s,x)\), and the relaxation step performed when \(x\) was added to \(S\) ensured \(d[y] \leq d[x] + w(x,y)\). Since the prefix of \(P\) ending at \(y\) is itself a shortest path (subpath optimality), \(d[x] + w(x,y) = \delta(s,y)\). Non-negativity of weights gives \(\delta(s,y) \leq w(P)\), hence \(d[y] \leq w(P) < d[u]\). But then \(u\) was not the minimum of \(d[\cdot]\) over \(V \setminus S\), a contradiction.

Algorithm: Dijkstra's Algorithm Input: weighted digraph \(G = (V, E, w)\) with \(w(e) \geq 0\), source \(s\); Output: shortest-path distances \(d[v] = \delta(s,v)\) for all \(v \in V\); begin  \(d[v] \leftarrow +\infty\) for all \(v \in V\);   \(d[s] \leftarrow 0\);  \(S \leftarrow \emptyset\);  Insert all vertices into priority queue \(Q\) keyed by \(d[v]\);  while \(Q \neq \emptyset\):   \(u \leftarrow\) Extract-Min\((Q)\);   \(S \leftarrow S \cup \{u\}\);   for each edge \((u,v) \in E\) with \(v \notin S\):     if \(d[u] + w(u,v) < d[v]\):       \(d[v] \leftarrow d[u] + w(u,v)\);   // relaxation       Decrease-Key\((Q, v, d[v])\);   end forend whilereturn \(d\); end

With a binary min-heap, each extract-min costs \(O(\log |V|)\) worst-case and each decrease-key costs \(O(\log |V|)\) worst-case. Since we perform \(|V|\) extractions and at most \(|E|\) relaxations, the total running time is \[ O\bigl((|V| + |E|) \log |V|\bigr). \] With a Fibonacci heap, decrease-key is amortized \(O(1)\), yielding \(O(|V| \log |V| + |E|)\).

The greedy strategy works precisely because non-negative weights guarantee that extending a path can never decrease its total weight — once a vertex is finalized, no future path can improve upon it. This monotonicity breaks when negative edges are present.

Bellman-Ford Algorithm and Dynamic Programming

The Bellman-Ford algorithm handles arbitrary real-valued weights (including negative edges) by adopting a fundamentally different strategy: rather than greedily committing to final distances, it iteratively relaxes all edges, allowing tentative distances to improve over \(|V|-1\) rounds.

Define the subproblems: let \(d_i[v]\) be the minimum weight of any path from \(s\) to \(v\) using at most \(i\) edges. The recurrence is \[ d_i[v] = \min\Bigl(\, d_{i-1}[v], \;\min_{(u,v) \in E} \bigl(d_{i-1}[u] + w(u,v)\bigr)\Bigr), \] with base case \(d_0[s] = 0\) and \(d_0[v] = +\infty\) for \(v \neq s\). Since any shortest path in a graph without negative cycles has at most \(|V| - 1\) edges, we have \(\delta(s,v) = d_{|V|-1}[v]\).

Definition: Dynamic Programming (Bellman's Principle of Optimality)

A problem exhibits optimal substructure if an optimal solution contains within it optimal solutions to subproblems. A problem exhibits overlapping subproblems if the same subproblems are encountered repeatedly during a recursive decomposition.

Dynamic programming (DP) solves optimization problems satisfying both properties by computing and storing solutions to subproblems in a systematic order — either bottom-up (tabulation) or top-down with memoization — ensuring each subproblem is solved exactly once.

The Bellman-Ford recurrence is a canonical instance of DP: the optimal substructure is that a shortest path from \(s\) to \(v\) with at most \(i\) edges is either a shortest path with at most \(i-1\) edges, or the extension of some shortest path to a predecessor \(u\) by the edge \((u,v)\). The overlapping subproblems arise because multiple vertices may share the same predecessors.

Algorithm: Bellman-Ford Input: weighted digraph \(G = (V, E, w)\), source \(s\); Output: distances \(d[v] = \delta(s,v)\), or detection of a negative cycle; begin  \(d[v] \leftarrow +\infty\) for all \(v \in V\);   \(d[s] \leftarrow 0\);  for \(i = 1\) to \(|V| - 1\):   // relaxation passes   for each edge \((u,v) \in E\):     if \(d[u] + w(u,v) < d[v]\):       \(d[v] \leftarrow d[u] + w(u,v)\);   end forend for  // negative cycle check  for each edge \((u,v) \in E\):   if \(d[u] + w(u,v) < d[v]\):     return "negative cycle detected";  end forreturn \(d\); end
Theorem: Bellman-Ford Correctness and Negative Cycle Detection

Let \(G = (V, E, w)\) be a weighted digraph with source \(s\). After \(|V| - 1\) iterations of relaxing all edges, the Bellman-Ford algorithm computes \(\delta(s, v)\) for every vertex \(v\) reachable from \(s\) without traversing a negative-weight cycle. Moreover, a negative-weight cycle reachable from \(s\) exists if and only if some edge \((u,v)\) still satisfies \(d[v] > d[u] + w(u,v)\) after the \((|V|-1)\)-th iteration.

Proof Sketch:

Correctness. We show by induction on \(i\) that after \(i\) outer iterations, \(d[v] \leq d_i[v]\) for all \(v \in V\). The base case \(i = 0\) holds by initialization. In iteration \(i\), every edge \((u,v) \in E\) is relaxed: \[ d[v] \leftarrow \min\bigl(d[v],\; d[u] + w(u,v)\bigr). \] At the moment of relaxation, \(d[u] \leq d_{i-1}[u]\) — by the inductive hypothesis after iteration \(i-1\), and because \(d\) only decreases. Combined with the previous value of \(d[v] \leq d_{i-1}[v]\), the recurrence for \(d_i[v]\) gives \(d[v] \leq d_i[v]\).

Since each finite \(d[v]\) is, by construction, the weight of some \(s\)-to-\(v\) walk, we also have \(\delta(s,v) \leq d[v]\). When no negative-weight cycle is reachable from \(s\), every shortest path has at most \(|V|-1\) edges (a longer walk would repeat a vertex, and excising the resulting cycle — of non-negative weight — yields a no-longer walk). Thus \(\delta(s,v) = d_{|V|-1}[v]\), and after \(|V|-1\) iterations \(\delta(s,v) \leq d[v] \leq d_{|V|-1}[v] = \delta(s,v)\).

Negative cycle detection. (⟸) If no negative cycle is reachable from \(s\), then \(d[v] = \delta(s,v)\) after \(|V|-1\) iterations, which satisfies the triangle inequality \(d[v] \leq d[u] + w(u,v)\) for every edge — so no edge admits relaxation.

(⟹) Conversely, suppose every edge satisfies \(d[v] \leq d[u] + w(u,v)\) after \(|V|-1\) iterations, yet a negative-weight cycle \(C = (v_0, v_1, \ldots, v_k = v_0)\) is reachable from \(s\). Reachability and edge-by-edge relaxation propagate finiteness, so each \(d[v_i]\) is finite. Summing the triangle inequalities around the cycle, \[ \sum_{i=1}^{k} d[v_i] \;\leq\; \sum_{i=1}^{k} d[v_{i-1}] + \sum_{i=1}^{k} w(v_{i-1}, v_i). \] The two \(d\)-sums cancel (since \(v_0 = v_k\) and all values are finite), yielding \(0 \leq w(C)\) — contradicting \(w(C) < 0\). Hence some edge must still admit relaxation.

The total running time is \(O(|V| \cdot |E|)\): we perform \(|V| - 1\) passes over all \(|E|\) edges.

Negative Cycles and Currency Arbitrage

Bellman-Ford's ability to detect negative-weight cycles has a striking application in finance. Consider a network of currencies where each directed edge \((u,v)\) carries the exchange rate \(r(u,v)\). Setting \(w(u,v) = -\log r(u,v)\) transforms multiplicative gains into additive weights: a cycle \(C\) with \(\prod_{e \in C} r(e) > 1\) (a profitable round-trip) becomes a negative-weight cycle with \(\sum_{e \in C} w(e) < 0\). Thus, Bellman-Ford on the log-transformed graph detects arbitrage opportunities — situations where a sequence of currency exchanges yields a net profit. The same principle applies to decentralized exchanges (DEXs) in DeFi, where token swap rates across liquidity pools form exactly such a weighted digraph, and arbitrage bots run variants of Bellman-Ford in real time.

Greedy vs. Dynamic Programming

Dijkstra and Bellman-Ford embody the two great paradigms of discrete optimization. Dijkstra's greedy approach commits irrevocably to each vertex's distance, exploiting the monotonicity of non-negative weights for superior runtime. Bellman-Ford's DP approach defers commitment, exploring all possible relaxations — slower, but correct under weaker assumptions. This trade-off between exploiting structural assumptions for speed and maintaining generality recurs throughout algorithm design and optimization theory.

A natural extension of the greedy paradigm is the A* algorithm, which augments Dijkstra's strategy with a heuristic function \(h(v)\) estimating the remaining distance from \(v\) to the goal. Instead of extracting the vertex minimizing \(d[v]\), A* extracts the vertex minimizing \(d[v] + h(v)\), directing the search toward the target. When \(h\) is admissible (never overestimates true distance) and consistent (satisfies a triangle inequality), A* retains optimality while dramatically reducing the number of vertices explored. This makes A* the algorithm of choice for robotic path planning, autonomous navigation, and game ML — settings where spatial heuristics provide powerful guidance that pure Dijkstra cannot exploit.

Shortest-Path Distances in Graph Learning

The shortest-path distance \(\delta(u,v)\) is a fundamental feature in graph-based machine learning. In Graph Neural Networks (GNNs), Shortest Path Networks encode \(\delta(u,v)\) as positional features to overcome the limited expressiveness of message-passing architectures — whose discriminative power is fundamentally bounded by the 1-Weisfeiler-Lehman (1-WL) test, meaning standard GNNs cannot distinguish non-isomorphic graphs (or nodes) that 1-WL fails to separate. Shortest-path distances inject global structural information that breaks through this barrier. In knowledge graph embeddings, translational models like TransE learn vector representations where \(\mathbf{h} + \mathbf{r} \approx \mathbf{t}\) for a triple \((h, r, t)\), implicitly encoding a notion of relational distance. The graph Laplacian's commute time — which we will develop in the random walks section below — provides a spectral relaxation of shortest-path distance that is continuous and differentiable, making it amenable to gradient-based optimization.

Maximum Flow

We now pass from finding an optimal single path to optimizing the aggregate throughput of an entire network. Where shortest paths ask "what is the cheapest route?", maximum flow asks "how much can we push through simultaneously?" This shift — from a path problem to a network problem — opens the door to deep connections with linear programming duality.

Flow Networks

Definition: Flow Network

A flow network is a tuple \((G, c, s, t)\), where \(G = (V, E)\) is a directed graph, \(c \colon E \to \mathbb{R}_{\geq 0}\) is a capacity function, \(s \in V\) is the source, and \(t \in V\) is the sink, with \(s \neq t\).

Definition: Flow

A flow in a flow network \((G, c, s, t)\) is a function \(f \colon E \to \mathbb{R}_{\geq 0}\) satisfying:

Capacity constraint: For every edge \(e \in E\), \[ f(e) \leq c(e). \]

Conservation (Kirchhoff's law): For every vertex \(v \in V \setminus \{s, t\}\), \[ \sum_{(u,v) \in E} f(u,v) = \sum_{(v,w) \in E} f(v,w). \]

The value of a flow is \[ |f| = \sum_{(s,v) \in E} f(s,v) - \sum_{(v,s) \in E} f(v,s). \]

The maximum flow problem asks: find a flow \(f^*\) of maximum value \(|f^*|\). This is a linear program — the objective \(|f|\) and all constraints are linear in the flow variables — and we shall see that its dual yields the min-cut theorem.

Residual Graphs and Augmenting Paths

Definition: Residual Graph

We assume that \(G\) contains no antiparallel edges (i.e., if \((u,v) \in E\) then \((v,u) \notin E\)). This is without loss of generality: any antiparallel pair can be eliminated by inserting a dummy vertex on one of the two edges.

Given a flow \(f\) in a network \((G, c, s, t)\), the residual graph \(G_f = (V, E_f)\) has edge set \[ E_f = \bigl\{(u,v) : (u,v) \in E,\; f(u,v) < c(u,v)\bigr\} \;\cup\; \bigl\{(v,u) : (u,v) \in E,\; f(u,v) > 0\bigr\}. \] The residual capacity is \[ c_f(u,v) = \begin{cases} c(u,v) - f(u,v) & \text{if } (u,v) \in E, \\ f(v,u) & \text{if } (v,u) \in E. \end{cases} \]

An augmenting path is a path from \(s\) to \(t\) in the residual graph \(G_f\). If such a path \(P\) exists, we can increase the flow by \(\Delta = \min_{e \in P} c_f(e) > 0\) by pushing \(\Delta\) units along \(P\). When the augmenting path traverses a backward edge \((v,u)\) — corresponding to an original edge \((u,v)\) carrying positive flow — pushing flow along it effectively cancels previously assigned flow on \((u,v)\), redirecting that capacity to a more profitable route. This ability to "undo" earlier decisions is precisely what allows the method to escape locally suboptimal flow assignments and converge to a global maximum.

Algorithm: Ford-Fulkerson Method Input: flow network \((G, c, s, t)\); Output: maximum flow \(f\); begin  \(f(e) \leftarrow 0\) for all \(e \in E\);  while \(\exists\) augmenting path \(P\) from \(s\) to \(t\) in \(G_f\):   \(\Delta \leftarrow \min\{c_f(e) : e \in P\}\);   for each edge \((u,v) \in P\):     if \((u,v) \in E\):   \(f(u,v) \leftarrow f(u,v) + \Delta\);   // forward edge     else:   \(f(v,u) \leftarrow f(v,u) - \Delta\);   // backward edge   end forend whilereturn \(f\); end

The method terminates when no augmenting path exists in \(G_f\). The choice of augmenting path determines both the algorithm's name and its complexity. When augmenting paths are found via breadth-first search (choosing a shortest path in terms of number of edges), the method is called the Edmonds-Karp algorithm, with running time \(O(|V| \cdot |E|^2)\).

The Ford-Fulkerson method's termination depends on the capacity values: with integer (or rational) capacities, each augmentation increases \(|f|\) by at least the capacity unit, so the procedure terminates in finitely many steps; with irrational capacities, however, examples exist where the method runs forever without converging to the maximum flow. The Edmonds-Karp variant resolves this entirely — the BFS choice forces termination in \(O(|V| \cdot |E|)\) augmentations regardless of capacity values.

The Max-Flow Min-Cut Theorem

Definition: Cut

An \(s\)-\(t\) cut in a flow network \((G, c, s, t)\) is a partition \((S, T)\) of \(V\) with \(s \in S\) and \(t \in T\). Its capacity is \[ c(S, T) = \sum_{\substack{(u,v) \in E \\ u \in S,\; v \in T}} c(u,v). \]

For any flow \(f\) and any \(s\)-\(t\) cut \((S,T)\), the value \(|f|\) equals the net flow across the cut. To see this, define the net out-flow at each vertex, \[ b_f(v) \;=\; \sum_{(v,w) \in E} f(v,w) \;-\; \sum_{(u,v) \in E} f(u,v). \] Conservation gives \(b_f(v) = 0\) for all \(v \in V \setminus \{s, t\}\), and \(b_f(s) = |f|\) by definition. Since \(t \notin S\), summing \(b_f\) over \(v \in S\) yields \[ \sum_{v \in S} b_f(v) \;=\; b_f(s) \;+\; \sum_{v \in S \setminus \{s\}} b_f(v) \;=\; |f|. \] On the other hand, expanding \(\sum_{v \in S} b_f(v)\) edge-by-edge: every edge \((a,b)\) with both endpoints in \(S\) contributes \(+f(a,b)\) (from \(a\)'s out-sum) and \(-f(a,b)\) (from \(b\)'s in-sum), cancelling; every edge with both endpoints in \(T\) contributes nothing; only edges crossing the cut survive: \[ |f| = \sum_{\substack{(u,v) \in E \\ u \in S,\; v \in T}} f(u,v) \;-\; \sum_{\substack{(v,u) \in E \\ v \in T,\; u \in S}} f(v,u). \] Combining \(f(u,v) \leq c(u,v)\) on the first sum and \(f(v,u) \geq 0\) on the second, \[ |f| \;\leq\; \sum_{\substack{(u,v) \in E \\ u \in S,\; v \in T}} c(u,v) \;=\; c(S,T). \] Every flow value is bounded above by every cut capacity. The remarkable fact is that equality is achieved.

Theorem: Max-Flow Min-Cut (Ford & Fulkerson, 1956)

In any flow network, the following three conditions are equivalent:

(i) \(f\) is a maximum flow.
(ii) The residual graph \(G_f\) contains no augmenting path.
(iii) There exists an \(s\)-\(t\) cut \((S, T)\) with \(|f| = c(S, T)\).

In particular, \(\max_f |f| = \min_{(S,T)} c(S,T)\).

Proof:

(i) ⇒ (ii): If an augmenting path \(P\) existed in \(G_f\), then \(\Delta = \min_{e \in P} c_f(e) > 0\) (every residual edge has strictly positive capacity), and pushing \(\Delta\) units along \(P\) would increase \(|f|\), contradicting maximality.

(ii) ⇒ (iii): Suppose \(G_f\) has no \(s\)-\(t\) path. Define \(S = \{v \in V : v \text{ is reachable from } s \text{ in } G_f\}\) and \(T = V \setminus S\). Then \(s \in S\) and \(t \in T\) (since no augmenting path exists), so \((S,T)\) is an \(s\)-\(t\) cut. For every edge \((u,v) \in E\) with \(u \in S, v \in T\): since \(v\) is not reachable from \(s\) in \(G_f\), the forward residual edge \((u,v)\) is absent, meaning \(f(u,v) = c(u,v)\). For every edge \((v,u) \in E\) with \(v \in T, u \in S\): since \(u \in S\) is reachable but \(v \in T\) is not, the backward residual edge \((u,v)\) must be absent, meaning \(f(v,u) = 0\). Therefore, \[ |f| = \sum_{\substack{u \in S,\, v \in T \\ (u,v) \in E}} c(u,v) - 0 = c(S,T). \]

(iii) ⇒ (i): Since \(|f| \leq c(S',T')\) for every cut \((S',T')\), and \(|f| = c(S,T)\), the flow \(f\) achieves the upper bound and must be maximal.

Duality and the Combinatorial-Continuous Bridge

The max-flow min-cut theorem is an instance of strong duality. The maximum flow problem can be formulated as a linear program: maximize \(|f|\) subject to capacity and conservation constraints. Its dual problem turns out to be the minimum cut problem, and strong duality guarantees that their optima coincide. This is not merely a theoretical curiosity — it reveals that combinatorial optimization on graphs and continuous optimization in \(\mathbb{R}^n\) are often two views of the same mathematical structure. The same duality principle underlies the min-cost flow problem, assignment problems, and the general theory of totally unimodular matrices, where linear programming relaxations automatically yield integer solutions.

Graph Cuts in Computer Vision

In image segmentation, the max-flow min-cut framework provides an exact solution to a natural energy minimization problem. We construct a flow network where each pixel is a vertex, edges between adjacent pixels carry weights encoding similarity (encouraging neighboring pixels to share a label), and edges to the source and sink encode per-pixel costs for foreground/background assignment. The minimum cut then partitions pixels into foreground and background, minimizing the total cost. This formulation — known as graph cuts — was among the first methods to give globally optimal solutions for binary labeling problems in vision, and its extensions to multi-label problems via \(\alpha\)-expansion remain widely used.

Random Walks on Graphs

So far, we have studied deterministic optimization on graphs: finding shortest paths and maximum flows. We now introduce stochastic dynamics by considering a particle that traverses a graph randomly. This probabilistic perspective connects graph theory to Markov chains and spectral graph theory, revealing that combinatorial and algebraic properties of a graph are two faces of the same coin.

Throughout this section we work with undirected graphs, in contrast to Sections 1 and 2 above. This is not merely a presentational choice: the closed-form stationary distribution \(\pi_v = \deg(v)/(2m)\) and the symmetry of the graph Laplacian both rely on the undirected structure, where adjacency is reciprocal.

The Transition Matrix

Definition: Random Walk on a Graph

Let \(G = (V, E)\) be a connected, undirected graph with \(n = |V|\) vertices. A simple random walk on \(G\) is the Markov chain \((X_0, X_1, X_2, \ldots)\) with state space \(V\) and transition probabilities \[ \Pr(X_{t+1} = v \mid X_t = u) = \begin{cases} \dfrac{1}{\deg(u)} & \text{if } (u,v) \in E, \\[6pt] 0 & \text{otherwise.} \end{cases} \]

Let \(A\) be the adjacency matrix and \(D = \operatorname{diag}(\deg(v_1), \ldots, \deg(v_n))\) the degree matrix. The transition matrix of the random walk is \[ P = D^{-1}A. \] The entry \(P_{uv} = A_{uv}/\deg(u)\) is exactly the one-step transition probability from \(u\) to \(v\). Note that \(P\) is a row-stochastic matrix: each row sums to 1. After \(k\) steps, the distribution of the walker's position evolves as \[ \boldsymbol{\pi}^{(k)} = \boldsymbol{\pi}^{(0)} P^k, \] where \(\boldsymbol{\pi}^{(0)}\) is the initial distribution (a row vector).

Stationary Distribution

Theorem: Stationary Distribution of a Simple Random Walk

Let \(G\) be a connected undirected graph with \(m = |E|\) edges. The distribution \[ \pi_v = \frac{\deg(v)}{2m}, \quad v \in V \] is the unique stationary distribution of the simple random walk on \(G\) (i.e., the unique distribution satisfying \(\boldsymbol{\pi} P = \boldsymbol{\pi}\)). If, in addition, \(G\) is non-bipartite, then the walk is aperiodic, and the distribution \(\boldsymbol{\pi}^{(k)}\) converges to \(\boldsymbol{\pi}\) from any initial distribution.

Proof:

Stationarity. First, \(\boldsymbol{\pi}\) is indeed a probability distribution: by the Handshaking Lemma \(\sum_{v \in V} \deg(v) = 2m\), so \(\sum_{v} \pi_v = \sum_{v} \deg(v)/(2m) = 1\). We verify \(\boldsymbol{\pi} P = \boldsymbol{\pi}\). For each \(v \in V\), \[ \begin{align*} (\boldsymbol{\pi} P)_v &= \sum_{u \in V} \pi_u \cdot P_{uv} \\ &= \sum_{u \,:\, (u,v) \in E} \frac{\deg(u)}{2m} \cdot \frac{1}{\deg(u)} \\ &= \sum_{u \,:\, (u,v) \in E} \frac{1}{2m} \\ &= \frac{\deg(v)}{2m} = \pi_v. \end{align*} \]

Uniqueness. Since \(G\) is connected, the transition matrix \(P\) is irreducible: any vertex \(u\) reaches any other vertex \(v\) along a finite path in \(G\), and along this path each transition has positive probability \(1/\deg(\cdot)\), so \((P^k)_{uv} > 0\) for some \(k \geq 0\). The standard theory of irreducible stochastic matrices — developed in our treatment of stochastic matrices and applied to Markov chains — guarantees that an irreducible stochastic matrix has a unique stationary distribution (this is the classical Perron-Frobenius statement applied to nonnegative matrices). Since we have already exhibited \(\boldsymbol{\pi}\) as one stationary distribution, it must be the unique one.

Convergence. If \(G\) is non-bipartite, then by the bipartite/odd-cycle equivalence the graph contains an odd cycle of some length \(2\ell + 1\). Combined with the existence of length-\(2\) closed walks (round trips on any edge — which exist since \(G\) is connected and has at least one edge), this forces \(\gcd\{k : P^k_{vv} > 0\} = 1\) for every vertex \(v\), so the chain is aperiodic. The convergence \(\boldsymbol{\pi}^{(k)} \to \boldsymbol{\pi}\) from any initial distribution is then the convergence theorem for irreducible aperiodic Markov chains, again from the standard Perron-Frobenius theory referenced above.

The result is elegant in its simplicity: in the long run, the random walker visits each vertex in proportion to its degree. High-degree "hub" vertices are visited more frequently — a fact that underpins the intuition behind centrality measures in network science.

The Random Walk Laplacian

The transition matrix is intimately connected to the graph Laplacian. Recall that the (combinatorial) graph Laplacian is \(L = D - A\). The random walk Laplacian is defined as \[ L_{\text{rw}} = D^{-1}L = I - D^{-1}A = I - P. \] This operator satisfies \(L_{\text{rw}} \mathbf{f} = \mathbf{f} - P\mathbf{f}\), so its action on a function \(\mathbf{f} \colon V \to \mathbb{R}\) measures the difference between \(\mathbf{f}\) and its local average under the random walk: \[ (L_{\text{rw}} \mathbf{f})(v) = f(v) - \frac{1}{\deg(v)} \sum_{u \,:\, (u,v) \in E} f(u). \] A function is harmonic at \(v\) (i.e., \(L_{\text{rw}} \mathbf{f}(v) = 0\)) precisely when its value equals the average over neighbors — the discrete analogue of the mean value property that characterizes harmonic functions in continuous analysis.

The eigenvalues of \(L_{\text{rw}}\) are \(0 = \mu_1 \leq \mu_2 \leq \cdots \leq \mu_n\), where \(\mu_i = 1 - \lambda_i\) and \(1 = \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n > -1\) are the eigenvalues of \(P\) (the lower bound \(\lambda_n > -1\) holds because the graph is non-bipartite). On a bipartite graph, by contrast, \(\lambda_n = -1\): the walker oscillates between the two vertex classes with period 2, so the distribution never settles but instead alternates indefinitely — this is the spectral signature of periodicity. The rate of convergence to stationarity is controlled by \[ \bar{\lambda} = \max\bigl(|\lambda_2|,\, |\lambda_n|\bigr), \] which captures contributions from both the second-largest and the most negative eigenvalue. The standard mixing time bound is \[ \|\boldsymbol{\pi}^{(k)} - \boldsymbol{\pi}\|_{\text{TV}} \leq \frac{1}{2}\sqrt{\frac{1}{\pi_{\min}}}\;\bar{\lambda}^{\,k}, \] where \(\pi_{\min} = \min_v \pi_v\). The quantity \(1 - \bar{\lambda}\) is the absolute spectral gap: a large gap means rapid mixing, while a small gap means the walk remains trapped in local clusters for long periods. The spectral facts of this paragraph belong to the standard theory of stochastic matrices and Markov chains; detailed proofs are given in the dedicated treatments of stochastic matrices and Markov chains.

Hitting Time and Commute Time

Definition: Hitting Time and Commute Time

The hitting time \(H(u, v)\) is the expected number of steps for a random walk starting at \(u\) to first reach \(v\): \[ H(u,v) = \mathbb{E}\bigl[\min\{t \geq 1 : X_t = v\} \mid X_0 = u\bigr]. \] The commute time is the expected number of steps for a round trip: \[ C(u,v) = H(u,v) + H(v,u). \]

These quantities admit a spectral characterization. Since \(L_{\text{rw}} = D^{-1}L\) is generally not symmetric, we work with the similar symmetric matrix \(L_{\text{sym}} = D^{-1/2} L\, D^{-1/2} = I - D^{-1/2} A\, D^{-1/2}\), which shares the same eigenvalues \(\mu_1 = 0 < \mu_2 \leq \cdots \leq \mu_n\). Let \(\boldsymbol{\psi}_1, \ldots, \boldsymbol{\psi}_n\) be an orthonormal eigenbasis of \(L_{\text{sym}}\) (in the standard Euclidean inner product), and define \(\boldsymbol{\phi}_k = D^{-1/2}\boldsymbol{\psi}_k\) — these are the right eigenvectors of \(L_{\text{rw}}\), orthonormal with respect to the \(D\)-weighted inner product \(\langle \boldsymbol{\phi}_j, \boldsymbol{\phi}_k \rangle_D = \boldsymbol{\phi}_j^\top D\, \boldsymbol{\phi}_k = \delta_{jk}\). Under this normalization, \[ C(u,v) = 2m \sum_{k=2}^{n} \frac{1}{\mu_k}\bigl(\phi_k(u) - \phi_k(v)\bigr)^2, \] where \(m = |E|\). This is precisely the squared Euclidean distance in a spectral embedding weighted by \(1/\mu_k\) — a direct bridge between the combinatorial notion of "distance by random traversal" and the algebraic notion of "distance in eigenspace." This identity is the commute-time formula (Lovász's 1993 survey on random walks); we state it without proof, as the derivation requires the electrical-network interpretation of random walks beyond the scope of this page.

Proposition: Cover Time Bound

The cover time \(\text{Cov}(G)\) — the expected time for a random walk to visit every vertex — satisfies \[ \text{Cov}(G) \leq 2m(n-1) \] for any connected graph with \(n\) vertices and \(m\) edges.

Proof Sketch:

Fix any spanning tree \(T\) of \(G\) and a depth-first traversal of \(T\) that visits all \(n\) vertices and returns to the start, traversing each tree edge exactly twice — a closed walk on \(T\) of length \(2(n-1)\). The cover time of \(G\) is bounded by the expected time the random walker takes to mimic this traversal in \(G\) itself: at each step of the closed walk, the walker must hit the next prescribed vertex along a tree edge \((u,v)\), contributing an expected \(H(u,v)\) steps. Summing over the \(2(n-1)\) directed tree-edge traversals pairs each undirected tree edge with both directions, contributing \(C(u,v) = H(u,v) + H(v,u)\) per tree edge. A separate bound (Aleliunas–Karp–Lipton–Lovász–Rackoff, 1979) establishes \(C(u,v) \leq 2m\) for any edge \((u,v) \in E\), so summing over the \(n - 1\) tree edges gives \(\text{Cov}(G) \leq (n-1) \cdot 2m = 2m(n-1)\).

Random Walks, PageRank, and Graph Representation Learning

PageRank modifies the simple random walk with a teleportation mechanism: at each step, with probability \(\alpha\) the walker follows a random edge, and with probability \(1 - \alpha\) it jumps to a uniformly random vertex. The transition matrix becomes \(P_{\text{PR}} = \alpha P + (1 - \alpha)\frac{1}{n}\mathbf{1}\mathbf{1}^\top\), where \(\mathbf{1} \in \mathbb{R}^n\) is the all-ones column vector, and its stationary distribution ranks vertices by a blend of local connectivity and global accessibility.

In graph representation learning, random walks are the engine behind embedding algorithms. DeepWalk samples random walk sequences from a graph and feeds them to a Word2Vec-style skip-gram model, learning vertex embeddings that capture multi-hop neighborhood structure. Node2Vec generalizes this with biased random walks controlled by return parameter \(p\) and in-out parameter \(q\), interpolating between breadth-first (local) and depth-first (global) exploration. The connection to spectral methods is deep: it can be shown that the DeepWalk objective implicitly factorizes a matrix closely related to the transition matrix \(P\), linking these seemingly heuristic methods back to the spectral theory of graph Laplacians.