Network Flow & Walks on Graphs

Shortest Paths Maximum Flow Random Walks on Graphs

Shortest Paths

In Introduction to Graph Theory, we defined paths and studied their combinatorial structure. We now equip our graphs with weights and ask a quantitative question: among all paths connecting two vertices, which one minimizes total weight? This is the shortest path problem, and its study reveals a fundamental dichotomy in algorithm design — greedy selection versus dynamic programming — that pervades discrete optimization.

Weighted Graphs and Distance

Definition: Weighted Directed Graph

A weighted directed graph (or weighted digraph) is a triple \(G = (V, E, w)\), where \((V, E)\) is a directed graph and \(w \colon E \to \mathbb{R}\) is a weight function assigning a real number to each edge.

For a directed path \(P = (v_0, v_1, \ldots, v_k)\), we define its weight as \[ w(P) = \sum_{i=0}^{k-1} w(v_i, v_{i+1}). \] The shortest-path distance from \(s\) to \(t\) is \[ \delta(s, t) = \begin{cases} \displaystyle\min_{P \,:\, s \leadsto t} w(P) & \text{if a path from } s \text{ to } t \text{ exists,} \\ +\infty & \text{otherwise.} \end{cases} \] When all weights are non-negative, \(\delta\) satisfies the triangle inequality \(\delta(s,t) \leq \delta(s,u) + \delta(u,t)\) and non-negativity \(\delta(s,t) \geq 0\), but on a directed graph it is generally not symmetric: \(\delta(s,t) \neq \delta(t,s)\). Thus \(\delta\) is a quasi-metric (or asymmetric metric) on the vertex set, not a metric in the strict sense — recall that a metric requires symmetry by definition. When negative weights are present, \(\delta(s,t)\) may be \(-\infty\) if a negative-weight cycle is reachable from \(s\) on a path to \(t\).

Dijkstra's Algorithm

When all edge weights are non-negative, the shortest-path problem admits a greedy solution. Dijkstra's algorithm maintains a set \(S\) of vertices whose shortest-path distances from a source \(s\) have been finalized, and repeatedly extracts the vertex with the smallest tentative distance.

Theorem: Correctness of Dijkstra's Algorithm

Let \(G = (V, E, w)\) be a weighted digraph with \(w(e) \geq 0\) for all \(e \in E\). Upon termination, Dijkstra's algorithm computes \(\delta(s, v)\) for every \(v \in V\).

Proof Sketch:

We prove by induction on \(|S|\) that for every vertex \(v \in S\), the recorded distance \(d[v] = \delta(s,v)\).

Base case. Initially \(S = \{s\}\) with \(d[s] = 0 = \delta(s,s)\).

Inductive step. Suppose the invariant holds for \(|S| = k\). The algorithm selects \(u \notin S\) minimizing \(d[u]\). Suppose for contradiction that \(\delta(s,u) < d[u]\). Then there exists a shortest path \(P\) from \(s\) to \(u\) with \(w(P) < d[u]\). Let \((x, y)\) be the first edge on \(P\) with \(x \in S\) and \(y \notin S\). By the inductive hypothesis, \(d[x] = \delta(s,x)\), and by the relaxation step, \(d[y] \leq d[x] + w(x,y) = \delta(s,y)\). Since all weights are non-negative, the subpath from \(s\) to \(y\) has weight at most \(w(P)\), so \(d[y] \leq w(P) < d[u]\). But then \(u\) was not the minimum of \(d[\cdot]\) over \(V \setminus S\), a contradiction.

Algorithm: Dijkstra's Algorithm Input: weighted digraph \(G = (V, E, w)\) with \(w(e) \geq 0\), source \(s\); Output: shortest-path distances \(d[v] = \delta(s,v)\) for all \(v \in V\); begin  \(d[v] \leftarrow +\infty\) for all \(v \in V\);   \(d[s] \leftarrow 0\);  \(S \leftarrow \emptyset\);  Insert all vertices into priority queue \(Q\) keyed by \(d[v]\);  while \(Q \neq \emptyset\):   \(u \leftarrow\) Extract-Min\((Q)\);   \(S \leftarrow S \cup \{u\}\);   for each edge \((u,v) \in E\) with \(v \notin S\):     if \(d[u] + w(u,v) < d[v]\):       \(d[v] \leftarrow d[u] + w(u,v)\);   // relaxation       Decrease-Key\((Q, v, d[v])\);   end forend whilereturn \(d\); end

With a binary min-heap, each extract-min costs \(O(\log |V|)\) worst-case and each decrease-key costs \(O(\log |V|)\) worst-case. Since we perform \(|V|\) extractions and at most \(|E|\) relaxations, the total running time is \[ O\bigl((|V| + |E|) \log |V|\bigr). \] With a Fibonacci heap, decrease-key is amortized \(O(1)\), yielding \(O(|V| \log |V| + |E|)\).

The greedy strategy works precisely because non-negative weights guarantee that extending a path can never decrease its total weight — once a vertex is finalized, no future path can improve upon it. This monotonicity breaks when negative edges are present.

Bellman-Ford Algorithm and Dynamic Programming

The Bellman-Ford algorithm handles arbitrary real-valued weights (including negative edges) by adopting a fundamentally different strategy: rather than greedily committing to final distances, it iteratively relaxes all edges, allowing tentative distances to improve over \(|V|-1\) rounds.

Define the subproblems: let \(d_i[v]\) be the minimum weight of any path from \(s\) to \(v\) using at most \(i\) edges. The recurrence is \[ d_i[v] = \min\Bigl(\, d_{i-1}[v], \;\min_{(u,v) \in E} \bigl(d_{i-1}[u] + w(u,v)\bigr)\Bigr), \] with base case \(d_0[s] = 0\) and \(d_0[v] = +\infty\) for \(v \neq s\). Since any shortest path in a graph without negative cycles has at most \(|V| - 1\) edges, we have \(\delta(s,v) = d_{|V|-1}[v]\).

Definition: Dynamic Programming (Bellman's Principle of Optimality)

A problem exhibits optimal substructure if an optimal solution contains within it optimal solutions to subproblems. A problem exhibits overlapping subproblems if the same subproblems are encountered repeatedly during a recursive decomposition.

Dynamic programming (DP) solves optimization problems satisfying both properties by computing and storing solutions to subproblems in a systematic order — either bottom-up (tabulation) or top-down with memoization — ensuring each subproblem is solved exactly once.

The Bellman-Ford recurrence is a canonical instance of DP: the optimal substructure is that a shortest path from \(s\) to \(v\) with at most \(i\) edges is either a shortest path with at most \(i-1\) edges, or the extension of some shortest path to a predecessor \(u\) by the edge \((u,v)\). The overlapping subproblems arise because multiple vertices may share the same predecessors.

Algorithm: Bellman-Ford Input: weighted digraph \(G = (V, E, w)\), source \(s\); Output: distances \(d[v] = \delta(s,v)\), or detection of a negative cycle; begin  \(d[v] \leftarrow +\infty\) for all \(v \in V\);   \(d[s] \leftarrow 0\);  for \(i = 1\) to \(|V| - 1\):   // relaxation passes   for each edge \((u,v) \in E\):     if \(d[u] + w(u,v) < d[v]\):       \(d[v] \leftarrow d[u] + w(u,v)\);   end forend for  // negative cycle check  for each edge \((u,v) \in E\):   if \(d[u] + w(u,v) < d[v]\):     return "negative cycle detected";  end forreturn \(d\); end
Theorem: Bellman-Ford Correctness and Negative Cycle Detection

Let \(G = (V, E, w)\) be a weighted digraph with source \(s\). After \(|V| - 1\) iterations of relaxing all edges, the Bellman-Ford algorithm computes \(\delta(s, v)\) for every vertex \(v\) reachable from \(s\) without traversing a negative-weight cycle. Moreover, a negative-weight cycle reachable from \(s\) exists if and only if some edge \((u,v)\) still satisfies \(d[v] > d[u] + w(u,v)\) after the \((|V|-1)\)-th iteration.

Proof:

Correctness. We show that after \(i\) iterations, \(d[v] = d_i[v]\) for all \(v \in V\). The inequality \(d[v] \geq d_i[v]\) holds because the algorithm only assigns \(d[v]\) values that correspond to actual path weights, and \(d_i[v]\) is the minimum over all such paths with at most \(i\) edges. For the reverse, we prove \(d[v] \leq d_i[v]\) by induction on \(i\). The base case \(i = 0\) holds by initialization. In iteration \(i\), every edge \((u,v)\) is relaxed: \[ d[v] \leftarrow \min\bigl(d[v],\; d[u] + w(u,v)\bigr). \] By the inductive hypothesis \(d[u] \leq d_{i-1}[u]\), the relaxation ensures \(d[v] \leq d_{i-1}[u] + w(u,v)\) for every predecessor \(u\). Taking the minimum over all predecessors and the previous value gives \(d[v] \leq d_i[v]\). After \(|V|-1\) iterations, \(d[v] = d_{|V|-1}[v] = \delta(s,v)\).

Negative cycle detection. If no negative cycle is reachable from \(s\), then all shortest paths have at most \(|V|-1\) edges, so \(d[v] = \delta(s,v)\) satisfies the triangle inequality \(d[v] \leq d[u] + w(u,v)\) for every edge. Conversely, suppose after \(|V|-1\) iterations some edge \((u,v)\) still admits relaxation. Consider the cycle \(C = (v_0, v_1, \ldots, v_k = v_0)\) formed by tracing predecessors. Summing \(d[v_i] > d[v_{i-1}] + w(v_{i-1}, v_i)\) around the cycle gives \[ \sum_{i=1}^{k} d[v_i] > \sum_{i=1}^{k} d[v_{i-1}] + \sum_{i=1}^{k} w(v_{i-1}, v_i). \] The left and right \(d\)-sums cancel (since \(v_0 = v_k\)), yielding \(0 > w(C)\).

The total running time is \(O(|V| \cdot |E|)\): we perform \(|V| - 1\) passes over all \(|E|\) edges.

Greedy vs. Dynamic Programming

Dijkstra and Bellman-Ford embody the two great paradigms of discrete optimization. Dijkstra's greedy approach commits irrevocably to each vertex's distance, exploiting the monotonicity of non-negative weights for superior runtime. Bellman-Ford's DP approach defers commitment, exploring all possible relaxations — slower, but correct under weaker assumptions. This trade-off between exploiting structural assumptions for speed and maintaining generality recurs throughout algorithm design and optimization theory.

Shortest-Path Distances in Graph Learning

The shortest-path distance \(\delta(u,v)\) is a fundamental feature in graph-based machine learning. In Graph Neural Networks (GNNs), Shortest Path Networks encode \(\delta(u,v)\) as positional features to overcome the limited expressiveness of message-passing architectures — whose discriminative power is fundamentally bounded by the 1-Weisfeiler-Lehman (1-WL) test, meaning standard GNNs cannot distinguish non-isomorphic graphs (or nodes) that 1-WL fails to separate. Shortest-path distances inject global structural information that breaks through this barrier. In knowledge graph embeddings, translational models like TransE learn vector representations where \(\mathbf{h} + \mathbf{r} \approx \mathbf{t}\) for a triple \((h, r, t)\), implicitly encoding a notion of relational distance. The graph Laplacian's commute time — which we will develop in the random walks section below — provides a spectral relaxation of shortest-path distance that is continuous and differentiable, making it amenable to gradient-based optimization.

Maximum Flow

We now pass from finding an optimal single path to optimizing the aggregate throughput of an entire network. Where shortest paths ask "what is the cheapest route?", maximum flow asks "how much can we push through simultaneously?" This shift — from a path problem to a network problem — opens the door to deep connections with linear programming duality.

Flow Networks

Definition: Flow Network

A flow network is a tuple \((G, c, s, t)\), where \(G = (V, E)\) is a directed graph, \(c \colon E \to \mathbb{R}_{\geq 0}\) is a capacity function, \(s \in V\) is the source, and \(t \in V\) is the sink, with \(s \neq t\).

Definition: Flow

A flow in a flow network \((G, c, s, t)\) is a function \(f \colon E \to \mathbb{R}_{\geq 0}\) satisfying:

Capacity constraint: For every edge \(e \in E\), \[ 0 \leq f(e) \leq c(e). \]

Conservation (Kirchhoff's law): For every vertex \(v \in V \setminus \{s, t\}\), \[ \sum_{(u,v) \in E} f(u,v) = \sum_{(v,w) \in E} f(v,w). \]

The value of a flow is \[ |f| = \sum_{(s,v) \in E} f(s,v) - \sum_{(v,s) \in E} f(v,s). \]

The maximum flow problem asks: find a flow \(f^*\) of maximum value \(|f^*|\). This is a linear program — the objective \(|f|\) and all constraints are linear in the flow variables — and we shall see that its dual yields the min-cut theorem.

Residual Graphs and Augmenting Paths

Definition: Residual Graph

We assume that \(G\) contains no antiparallel edges (i.e., if \((u,v) \in E\) then \((v,u) \notin E\)). This is without loss of generality: any antiparallel pair can be eliminated by inserting a dummy vertex on one of the two edges.

Given a flow \(f\) in a network \((G, c, s, t)\), the residual graph \(G_f = (V, E_f)\) has edge set \[ E_f = \bigl\{(u,v) : (u,v) \in E,\; f(u,v) < c(u,v)\bigr\} \;\cup\; \bigl\{(v,u) : (u,v) \in E,\; f(u,v) > 0\bigr\}. \] The residual capacity is \[ c_f(u,v) = \begin{cases} c(u,v) - f(u,v) & \text{if } (u,v) \in E, \\ f(v,u) & \text{if } (v,u) \in E. \end{cases} \]

An augmenting path is a path from \(s\) to \(t\) in the residual graph \(G_f\). If such a path \(P\) exists, we can increase the flow by \(\Delta = \min_{e \in P} c_f(e) > 0\) by pushing \(\Delta\) units along \(P\). This is the foundation of the Ford-Fulkerson method.

Algorithm: Ford-Fulkerson Method Input: flow network \((G, c, s, t)\); Output: maximum flow \(f\); begin  \(f(e) \leftarrow 0\) for all \(e \in E\);  while \(\exists\) augmenting path \(P\) from \(s\) to \(t\) in \(G_f\):   \(\Delta \leftarrow \min\{c_f(e) : e \in P\}\);   for each edge \((u,v) \in P\):     if \((u,v) \in E\):   \(f(u,v) \leftarrow f(u,v) + \Delta\);   // forward edge     else:   \(f(v,u) \leftarrow f(v,u) - \Delta\);   // backward edge   end forend whilereturn \(f\); end

The method terminates when no augmenting path exists in \(G_f\). The choice of augmenting path determines both the algorithm's name and its complexity. When augmenting paths are found via breadth-first search (choosing a shortest path in terms of number of edges), the method is called the Edmonds-Karp algorithm, with running time \(O(|V| \cdot |E|^2)\).

The Max-Flow Min-Cut Theorem

Definition: Cut

An \(s\)-\(t\) cut in a flow network \((G, c, s, t)\) is a partition \((S, T)\) of \(V\) with \(s \in S\) and \(t \in T\). Its capacity is \[ c(S, T) = \sum_{\substack{(u,v) \in E \\ u \in S,\; v \in T}} c(u,v). \]

For any flow \(f\) and any \(s\)-\(t\) cut \((S,T)\), a telescoping argument using the conservation law yields \[ |f| = \sum_{\substack{(u,v) \in E \\ u \in S,\; v \in T}} f(u,v) \;-\; \sum_{\substack{(v,u) \in E \\ v \in T,\; u \in S}} f(v,u) \;\leq\; c(S,T). \] That is, every flow value is bounded above by every cut capacity. The remarkable fact is that equality is achieved.

Theorem: Max-Flow Min-Cut (Ford & Fulkerson, 1956)

In any flow network, the following three conditions are equivalent:

(i) \(f\) is a maximum flow.
(ii) The residual graph \(G_f\) contains no augmenting path.
(iii) There exists an \(s\)-\(t\) cut \((S, T)\) with \(|f| = c(S, T)\).

In particular, \(\max_f |f| = \min_{(S,T)} c(S,T)\).

Proof:

(i) ⇒ (ii): If an augmenting path existed in \(G_f\), we could increase \(|f|\), contradicting maximality.

(ii) ⇒ (iii): Suppose \(G_f\) has no \(s\)-\(t\) path. Define \(S = \{v \in V : v \text{ is reachable from } s \text{ in } G_f\}\) and \(T = V \setminus S\). Then \(s \in S\) and \(t \in T\) (since no augmenting path exists), so \((S,T)\) is an \(s\)-\(t\) cut. For every edge \((u,v) \in E\) with \(u \in S, v \in T\): since \(v\) is not reachable from \(s\) in \(G_f\), the forward residual edge \((u,v)\) is absent, meaning \(f(u,v) = c(u,v)\). For every edge \((v,u) \in E\) with \(v \in T, u \in S\): since \(u \in S\) is reachable but \(v \in T\) is not, the backward residual edge \((v,u)\) must be absent, meaning \(f(v,u) = 0\). Therefore, \[ |f| = \sum_{\substack{u \in S,\, v \in T \\ (u,v) \in E}} c(u,v) - 0 = c(S,T). \]

(iii) ⇒ (i): Since \(|f| \leq c(S',T')\) for every cut \((S',T')\), and \(|f| = c(S,T)\), the flow \(f\) achieves the upper bound and must be maximal.

Duality and the Combinatorial-Continuous Bridge

The max-flow min-cut theorem is an instance of strong duality. The maximum flow problem can be formulated as a linear program: maximize \(|f|\) subject to capacity and conservation constraints. Its dual problem turns out to be the minimum cut problem, and strong duality guarantees that their optima coincide. This is not merely a theoretical curiosity — it reveals that combinatorial optimization on graphs and continuous optimization in \(\mathbb{R}^n\) are often two views of the same mathematical structure. The same duality principle underlies the min-cost flow problem, assignment problems, and the general theory of totally unimodular matrices, where linear programming relaxations automatically yield integer solutions.

Graph Cuts in Computer Vision

In image segmentation, the max-flow min-cut framework provides an exact solution to a natural energy minimization problem. We construct a flow network where each pixel is a vertex, edges between adjacent pixels carry weights encoding similarity (encouraging neighboring pixels to share a label), and edges to the source and sink encode per-pixel costs for foreground/background assignment. The minimum cut then partitions pixels into foreground and background, minimizing the total cost. This formulation — known as graph cuts — was among the first methods to give globally optimal solutions for binary labeling problems in vision, and its extensions to multi-label problems via \(\alpha\)-expansion remain widely used.

Random Walks on Graphs

So far, we have studied deterministic optimization on graphs: finding shortest paths and maximum flows. We now introduce stochastic dynamics by considering a particle that traverses a graph randomly. This probabilistic perspective connects graph theory to Markov chains and spectral graph theory, revealing that combinatorial and algebraic properties of a graph are two faces of the same coin.

The Transition Matrix

Definition: Random Walk on a Graph

Let \(G = (V, E)\) be a connected, undirected graph with \(n = |V|\) vertices. A simple random walk on \(G\) is the Markov chain \((X_0, X_1, X_2, \ldots)\) with state space \(V\) and transition probabilities \[ \Pr(X_{t+1} = v \mid X_t = u) = \begin{cases} \dfrac{1}{\deg(u)} & \text{if } (u,v) \in E, \\[6pt] 0 & \text{otherwise.} \end{cases} \]

Let \(A\) be the adjacency matrix and \(D = \operatorname{diag}(\deg(v_1), \ldots, \deg(v_n))\) the degree matrix. The transition matrix of the random walk is \[ P = D^{-1}A. \] The entry \(P_{uv} = A_{uv}/\deg(u)\) is exactly the one-step transition probability from \(u\) to \(v\). Note that \(P\) is a row-stochastic matrix: each row sums to 1. After \(k\) steps, the distribution of the walker's position evolves as \[ \boldsymbol{\pi}^{(k)} = \boldsymbol{\pi}^{(0)} P^k, \] where \(\boldsymbol{\pi}^{(0)}\) is the initial distribution (a row vector).

Stationary Distribution

Theorem: Stationary Distribution of a Simple Random Walk

Let \(G\) be a connected undirected graph with \(m = |E|\) edges. The distribution \[ \pi_v = \frac{\deg(v)}{2m}, \quad v \in V \] is the unique stationary distribution of the simple random walk on \(G\) (i.e., the unique distribution satisfying \(\boldsymbol{\pi} P = \boldsymbol{\pi}\)). If, in addition, \(G\) is non-bipartite, then the walk is aperiodic, and the distribution \(\boldsymbol{\pi}^{(k)}\) converges to \(\boldsymbol{\pi}\) from any initial distribution.

Proof:

Stationarity. We verify \(\boldsymbol{\pi} P = \boldsymbol{\pi}\). For each \(v \in V\), \[ \begin{align*} (\boldsymbol{\pi} P)_v &= \sum_{u \in V} \pi_u \cdot P_{uv} \\ &= \sum_{u \,:\, (u,v) \in E} \frac{\deg(u)}{2m} \cdot \frac{1}{\deg(u)} \\ &= \sum_{u \,:\, (u,v) \in E} \frac{1}{2m} \\ &= \frac{\deg(v)}{2m} = \pi_v. \end{align*} \]

Uniqueness. Since \(G\) is connected, the transition matrix \(P\) is irreducible (every state communicates with every other). By the Perron-Frobenius theorem for irreducible stochastic matrices, the stationary distribution is unique.

Convergence. If \(G\) is non-bipartite, the chain is additionally aperiodic (a non-bipartite graph contains an odd cycle, which implies \(\gcd\{k : P^k_{vv} > 0\} = 1\) for every vertex \(v\)). Irreducibility and aperiodicity together give ergodicity, so \(\boldsymbol{\pi}^{(k)} \to \boldsymbol{\pi}\) as \(k \to \infty\) for any initial distribution.

The result is elegant in its simplicity: in the long run, the random walker visits each vertex in proportion to its degree. High-degree "hub" vertices are visited more frequently — a fact that underpins the intuition behind centrality measures in network science.

The Random Walk Laplacian

The transition matrix is intimately connected to the graph Laplacian. Recall that the (combinatorial) graph Laplacian is \(L = D - A\). The random walk Laplacian is defined as \[ L_{\text{rw}} = D^{-1}L = I - D^{-1}A = I - P. \] This operator satisfies \(L_{\text{rw}} \mathbf{f} = \mathbf{f} - P\mathbf{f}\), so its action on a function \(\mathbf{f} \colon V \to \mathbb{R}\) measures the difference between \(\mathbf{f}\) and its local average under the random walk: \[ (L_{\text{rw}} \mathbf{f})(v) = f(v) - \frac{1}{\deg(v)} \sum_{u \,:\, (u,v) \in E} f(u). \] A function is harmonic at \(v\) (i.e., \(L_{\text{rw}} \mathbf{f}(v) = 0\)) precisely when its value equals the average over neighbors — the discrete analogue of the mean value property that characterizes harmonic functions in continuous analysis.

The eigenvalues of \(L_{\text{rw}}\) are \(0 = \mu_1 \leq \mu_2 \leq \cdots \leq \mu_n\), where \(\mu_i = 1 - \lambda_i\) and \(1 = \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n > -1\) are the eigenvalues of \(P\) (the lower bound \(\lambda_n > -1\) holds because the graph is non-bipartite). The rate of convergence to stationarity is controlled by \[ \bar{\lambda} = \max\bigl(|\lambda_2|,\, |\lambda_n|\bigr), \] which captures contributions from both the second-largest and the most negative eigenvalue. The standard mixing time bound is \[ \|\boldsymbol{\pi}^{(k)} - \boldsymbol{\pi}\|_{\text{TV}} \leq \frac{1}{2}\sqrt{\frac{1}{\pi_{\min}}}\;\bar{\lambda}^{\,k}, \] where \(\pi_{\min} = \min_v \pi_v\). The quantity \(1 - \bar{\lambda}\) is the absolute spectral gap: a large gap means rapid mixing, while a small gap means the walk remains trapped in local clusters for long periods.

Hitting Time and Commute Time

Definition: Hitting Time and Commute Time

The hitting time \(H(u, v)\) is the expected number of steps for a random walk starting at \(u\) to first reach \(v\): \[ H(u,v) = \mathbb{E}\bigl[\min\{t \geq 1 : X_t = v\} \mid X_0 = u\bigr]. \] The commute time is the expected number of steps for a round trip: \[ C(u,v) = H(u,v) + H(v,u). \]

These quantities admit a spectral characterization. Since \(L_{\text{rw}} = D^{-1}L\) is generally not symmetric, we work with the similar symmetric matrix \(L_{\text{sym}} = D^{-1/2} L\, D^{-1/2} = I - D^{-1/2} A\, D^{-1/2}\), which shares the same eigenvalues \(\mu_1 = 0 < \mu_2 \leq \cdots \leq \mu_n\). Let \(\boldsymbol{\psi}_1, \ldots, \boldsymbol{\psi}_n\) be an orthonormal eigenbasis of \(L_{\text{sym}}\) (in the standard Euclidean inner product), and define \(\boldsymbol{\phi}_k = D^{-1/2}\boldsymbol{\psi}_k\) — these are the right eigenvectors of \(L_{\text{rw}}\), orthonormal with respect to the \(D\)-weighted inner product \(\langle \boldsymbol{\phi}_j, \boldsymbol{\phi}_k \rangle_D = \boldsymbol{\phi}_j^\top D\, \boldsymbol{\phi}_k = \delta_{jk}\). Under this normalization, \[ C(u,v) = 2m \sum_{k=2}^{n} \frac{1}{\mu_k}\bigl(\phi_k(u) - \phi_k(v)\bigr)^2, \] where \(m = |E|\). This is precisely the squared Euclidean distance in a spectral embedding weighted by \(1/\mu_k\) — a direct bridge between the combinatorial notion of "distance by random traversal" and the algebraic notion of "distance in eigenspace."

Proposition: Cover Time Bound

The cover time \(\text{Cov}(G)\) — the expected time for a random walk to visit every vertex — satisfies \[ \text{Cov}(G) \leq 2m(n-1) \] for any connected graph with \(n\) vertices and \(m\) edges.

Random Walks, PageRank, and Graph Representation Learning

PageRank modifies the simple random walk with a teleportation mechanism: at each step, with probability \(\alpha\) the walker follows a random edge, and with probability \(1 - \alpha\) it jumps to a uniformly random vertex. The transition matrix becomes \(P_{\text{PR}} = \alpha P + (1 - \alpha)\frac{1}{n}\mathbf{1}\mathbf{1}^\top\), and its stationary distribution ranks vertices by a blend of local connectivity and global accessibility.

In graph representation learning, random walks are the engine behind embedding algorithms. DeepWalk samples random walk sequences from a graph and feeds them to a Word2Vec-style skip-gram model, learning vertex embeddings that capture multi-hop neighborhood structure. Node2Vec generalizes this with biased random walks controlled by return parameter \(p\) and in-out parameter \(q\), interpolating between breadth-first (local) and depth-first (global) exploration. The connection to spectral methods is deep: it can be shown that the DeepWalk objective implicitly factorizes a matrix closely related to the transition matrix \(P\), linking these seemingly heuristic methods back to the spectral theory of graph Laplacians.