Spectral Theory of Compact Operators

From Eigenvalues to the Spectrum

In Eigenvalues & Eigenvectors, we learned that every square matrix \(A \in \mathbb{C}^{n \times n}\) has eigenvalues — scalars \(\lambda\) for which \(A\mathbf{v} = \lambda\mathbf{v}\) has a nonzero solution. Equivalently, \(\lambda\) is an eigenvalue if and only if the operator \(A - \lambda I\) is not invertible, i.e., \(\det(A - \lambda I) = 0\). In finite dimensions, every linear operator has at least one eigenvalue (over \(\mathbb{C}\)), and the Spectral Theorem for Symmetric Matrices guarantees that symmetric matrices possess a complete orthonormal eigenbasis.

In infinite dimensions, the situation is far more subtle. The characteristic polynomial \(\det(T - \lambda I) = 0\) is no longer available — determinants are not defined for operators on infinite-dimensional spaces. We must replace this algebraic criterion with a topological one: invertibility of the operator \(T - \lambda I\).

Convention. Throughout this chapter we take \(\mathcal{X}\) and \(\mathcal{H}\) to be complex Banach and Hilbert spaces (i.e., \(\mathbb{F} = \mathbb{C}\)). Spectral theory over \(\mathbb{R}\) is strictly poorer: a rotation on \(\mathbb{R}^2\) has empty spectrum when viewed as a real operator, since its characteristic polynomial has no real roots. The passage to \(\mathbb{C}\) restores the fundamental theorem of algebra and, with it, the guarantee that \(\sigma(T) \neq \emptyset\) for any \(T \in \mathcal{B}(\mathcal{X})\) on a nonzero space.

The Resolvent and Spectrum

Definition: Resolvent Set and Spectrum

Let \(\mathcal{X}\) be a Banach space and \(T \in \mathcal{B}(\mathcal{X})\) a bounded linear operator. The resolvent set of \(T\) is \[ \rho(T) = \bigl\{\lambda \in \mathbb{C} : T - \lambda I \text{ is bijective with inverse in } \mathcal{B}(\mathcal{X})\bigr\}. \] The spectrum of \(T\) is its complement: \[ \sigma(T) = \mathbb{C} \setminus \rho(T). \]

In finite dimensions, the spectrum equals the set of eigenvalues. In infinite dimensions, the operator \(T - \lambda I\) can fail to be invertible in fundamentally different ways, giving rise to three distinct parts of the spectrum.

Definition: Decomposition of the Spectrum

The spectrum \(\sigma(T)\) decomposes into three disjoint parts:

(a) Point Spectrum \(\sigma_p(T)\):
The set of \(\lambda\) for which \(T - \lambda I\) is not injective. These are the true eigenvalues — there exists a nonzero \(x\) with \(Tx = \lambda x\).

(b) Continuous Spectrum \(\sigma_c(T)\):
The set of \(\lambda\) for which \(T - \lambda I\) is injective, has dense range, but is not surjective.

(c) Residual Spectrum \(\sigma_r(T)\):
The set of \(\lambda\) for which \(T - \lambda I\) is injective but does not have dense range.

These three cases are disjoint and, via the inverse mapping theorem, exhaust \(\sigma(T)\): any \(\lambda\) at which \(T - \lambda I\) is injective with dense range and a bounded inverse on its range would extend to a bounded inverse on all of \(\mathcal{X}\), placing \(\lambda\) in \(\rho(T)\). The "inverse" in case (b) is therefore necessarily unbounded. Among the three, the continuous spectrum is the genuinely new phenomenon of infinite dimensions. Let us see it in action.

Example: The Right Shift Operator on \(\ell^2\)

Define the right shift \(S : \ell^2 \to \ell^2\) by \(S(x_1, x_2, x_3, \ldots) = (0, x_1, x_2, x_3, \ldots)\).

This operator has no eigenvalues: if \(S\mathbf{x} = \lambda\mathbf{x}\), then \(0 = \lambda x_1\), \(x_1 = \lambda x_2\), \(x_2 = \lambda x_3\), etc. For \(\lambda = 0\), we get \(\mathbf{x} = 0\). For \(\lambda \neq 0\), we get \(x_n = x_1 / \lambda^{n-1}\), which is not square-summable unless \(x_1 = 0\). So \(\sigma_p(S) = \emptyset\).

Yet the spectrum is nonempty: one can show that \(\sigma(S) = \{\lambda \in \mathbb{C} : |\lambda| \leq 1\}\), the entire closed unit disk. Specifically, the open disk \(\{|\lambda| < 1\}\) forms the residual spectrum, and the boundary \(\{|\lambda| = 1\}\) forms the continuous spectrum — a phenomenon with no finite-dimensional counterpart.

This example reveals why infinite-dimensional spectral theory requires new tools. A general bounded operator on a Hilbert space may have spectrum that is a continuous "smear" rather than a discrete set of points. To recover the clean, discrete eigenvalue structure of finite-dimensional linear algebra, we need to impose additional structure on our operators.

Connection to the Graph Laplacian

In Graph Laplacians & Spectral Methods and Spectral Clustering, we exploited the eigenvalues and eigenvectors of the (finite) graph Laplacian \(\mathbf{L} = \mathbf{D} - \mathbf{W}\) to partition data. The Spectral Theorem for symmetric matrices guaranteed that \(\mathbf{L}\) is orthogonally diagonalizable. This chapter generalizes that machinery to continuous operators on infinite-dimensional spaces — replacing the discrete graph Laplacian with integral operators, and finite eigenvectors with eigenfunctions.

The Adjoint & Self-Adjoint Operators

In finite-dimensional linear algebra, the transpose (or conjugate transpose) of a matrix plays a central role — symmetric matrices (\(A = A^T\)) have real eigenvalues and orthogonal eigenvectors. We now generalize this to Hilbert spaces using the inner product.

The Hilbert Space Adjoint

Theorem: Existence of the Adjoint

Let \(\mathcal{H}\) be a Hilbert space and \(T \in \mathcal{B}(\mathcal{H})\). There exists a unique operator \(T^* \in \mathcal{B}(\mathcal{H})\), called the adjoint of \(T\), satisfying \[ \langle Tx, y \rangle = \langle x, T^* y \rangle \quad \text{for all } x, y \in \mathcal{H}. \] Moreover, \(\|T^*\| = \|T\|\).

Proof:

The existence of \(T^*\) follows from the Riesz Representation Theorem: for each fixed \(y\), the map \(x \mapsto \langle Tx, y \rangle\) is a continuous linear functional on \(\mathcal{H}\). By Riesz, there exists a unique vector — which we call \(T^* y\) — such that \(\langle Tx, y \rangle = \langle x, T^* y \rangle\). The map \(y \mapsto T^* y\) is linear (it is the composition of two conjugate-linear maps: \(y \mapsto \varphi_y\) where \(\varphi_y(x) = \langle Tx, y\rangle\), and the Riesz sharp map \(\varphi \mapsto \varphi^\sharp\), which is conjugate-linear).

For the norm equality: \(\|T^*y\| = \|\varphi_y\|_{\mathcal{H}^*} \leq \|T\|\|y\|\) gives \(\|T^*\| \leq \|T\|\). The reverse inequality follows from \((T^*)^* = T\) (which holds because \(\langle T^*y, x\rangle = \overline{\langle x, T^*y\rangle} = \overline{\langle Tx, y\rangle} = \langle y, Tx\rangle\), so \(T\) satisfies the same defining relation that characterizes the adjoint of \(T^*\); by the uniqueness asserted in the theorem (applied to \(T^*\) in place of \(T\)), we conclude \((T^*)^* = T\)) combined with the same inequality applied to \(T^*\).

Example: Adjoints of Familiar Operators

Matrices:
For \(A \in \mathbb{C}^{n \times n}\) acting on \(\mathbb{C}^n\), the adjoint is the conjugate transpose: \(A^* = \overline{A}^T\). Over \(\mathbb{R}\), this is simply \(A^T\).

Integral operators:
If \(T : L^2([a,b]) \to L^2([a,b])\) is defined by \[ (Tf)(x) = \int_a^b K(x, t)\, f(t)\, dt, \] then the adjoint is \[ (T^* g)(x) = \int_a^b \overline{K(t, x)}\, g(t)\, dt. \] That is, the kernel is "transposed and conjugated": \(K^*(x, t) = \overline{K(t, x)}\).

Self-Adjoint Operators

Definition: Self-Adjoint (Hermitian) Operator

A bounded operator \(T \in \mathcal{B}(\mathcal{H})\) is self-adjoint if \(T = T^*\), i.e., \[ \langle Tx, y \rangle = \langle x, Ty \rangle \quad \text{for all } x, y \in \mathcal{H}. \]

Self-adjoint operators are the infinite-dimensional generalization of real symmetric (or Hermitian) matrices. They inherit the crucial spectral properties:

Theorem: Spectral Properties of Self-Adjoint Operators

Let \(T = T^* \in \mathcal{B}(\mathcal{H})\). Then:

(a) Every eigenvalue of \(T\) is real.

(b) Eigenvectors corresponding to distinct eigenvalues are orthogonal.

(c) The entire spectrum \(\sigma(T) \subseteq \mathbb{R}\).

(d) \(\|T\| = \sup_{\|x\| = 1} |\langle Tx, x \rangle|\).

Proof:

(a)
If \(Tx = \lambda x\) with \(x \neq 0\), then \(\lambda \langle x, x \rangle = \langle Tx, x \rangle = \langle x, Tx \rangle = \overline{\lambda} \langle x, x \rangle\). Since \(\langle x, x \rangle \neq 0\), we get \(\lambda = \overline{\lambda}\), so \(\lambda \in \mathbb{R}\).

(b)
If \(Tx = \lambda x\) and \(Ty = \mu y\) with \(\lambda \neq \mu\), then \[ \lambda \langle x, y \rangle = \langle Tx, y \rangle = \langle x, Ty \rangle = \overline{\mu} \langle x, y \rangle = \mu \langle x, y \rangle, \] where the last step uses \(\mu \in \mathbb{R}\) from part (a). Since \(\lambda \neq \mu\), we conclude \(\langle x, y \rangle = 0\).

(c) (sketch)
For \(\lambda = \alpha + i\beta \in \mathbb{C}\) with \(\beta \neq 0\), self-adjointness gives the identity \[ \|(T - \lambda I)x\|^2 \;=\; \|(T - \alpha I)x\|^2 + \beta^2 \|x\|^2 \;\geq\; \beta^2 \|x\|^2, \] so \(T - \lambda I\) is bounded below, hence injective with closed range. Applying the same estimate to \((T - \overline{\lambda}I)\) shows it is injective, so \(\ker(T - \overline{\lambda}I) = \{0\}\). Combined with the standard Hilbert-space identity \(\overline{\operatorname{Range}(S)} = \ker(S^*)^\perp\) (applied to \(S = T - \lambda I\), so \(S^* = T - \overline{\lambda}I\)) and the closedness of the range established above, we get \[ \operatorname{Range}(T - \lambda I) = \{0\}^\perp = \mathcal{H}. \] Thus \(T - \lambda I\) is bijective; the bound \(\|(T-\lambda I)x\| \geq |\beta|\|x\|\) from the initial inequality forces \(\|(T-\lambda I)^{-1}\| \leq 1/|\beta|\), so \(\lambda \notin \sigma(T)\), giving \(\sigma(T) \subseteq \mathbb{R}\).

(d)
Let \(M = \sup_{\|x\|=1} |\langle Tx, x\rangle|\). The bound \(M \leq \|T\|\) follows from Cauchy-Schwarz. For the reverse, take unit vectors \(h, g \in \mathcal{H}\) and expand \(\langle T(h \pm g), h \pm g\rangle\) using self-adjointness: the cross terms satisfy \[ \begin{align*} \langle Th, g\rangle + \langle Tg, h\rangle &= \langle Th, g\rangle + \overline{\langle Th, g\rangle} \\\\ &= 2\operatorname{Re}\langle Th, g\rangle \end{align*} \] (using self-adjointness \(\langle Tg, h\rangle = \langle g, Th\rangle\) followed by the Hermitian symmetry of the inner product), so \[ \langle T(h \pm g), h \pm g\rangle \;=\; \langle Th, h\rangle \pm 2\operatorname{Re}\langle Th, g\rangle + \langle Tg, g\rangle. \] Subtracting the two equations gives \[ 4\operatorname{Re}\langle Th, g\rangle = \langle T(h+g), h+g\rangle - \langle T(h-g), h-g\rangle. \] Applying \(|\langle Tf, f\rangle| \leq M\|f\|^2\) and the parallelogram law, \[ 4\operatorname{Re}\langle Th, g\rangle \;\leq\; M\bigl(\|h+g\|^2 + \|h-g\|^2\bigr) \;=\; 2M\bigl(\|h\|^2 + \|g\|^2\bigr) \;=\; 4M. \] Replacing \(h\) with \(e^{-i\theta}h\) for \(\theta = \arg\langle Th, g\rangle\) converts the real part into the modulus: \(|\langle Th, g\rangle| \leq M\) for all unit \(h, g\). Taking the supremum over unit \(g\) yields \(\|Th\| \leq M\) (using the standard Hilbert-space identity \(\|y\| = \sup_{\|g\|=1}|\langle y, g\rangle|\)); taking it over unit \(h\) yields \(\|T\| \leq M\).

Property (d) is particularly important — it says the operator norm of a self-adjoint operator is determined by its quadratic form \(\langle Tx, x \rangle\), the infinite-dimensional analogue of the Rayleigh quotient from quadratic forms.

Connection to Quantum Mechanics

In quantum mechanics, observables (position, momentum, energy) are modeled as self-adjoint operators on a Hilbert space. The requirement that eigenvalues be real (property (a)) ensures that measurement outcomes are real numbers. The orthogonality of eigenvectors (property (b)) ensures that distinct measurement outcomes correspond to mutually exclusive states — the mathematical basis of the Bra-Ket formalism.

Compact Operators: "Almost Finite-Dimensional"

The right shift example in From Eigenvalues to the Spectrum showed that a general bounded operator can have a continuous spectrum with no eigenvalues at all. To recover the discrete, diagonalizable structure of finite-dimensional linear algebra, we restrict our attention to a special class of operators that are, in a precise sense, "close to finite-dimensional."

Definition: Compact Operator

A bounded linear operator \(T : \mathcal{X} \to \mathcal{Y}\) between normed spaces is called compact if it maps every bounded set in \(\mathcal{X}\) to a relatively compact (precompact) set in \(\mathcal{Y}\). Since \(\mathcal{Y}\) is a metric space, this is equivalent to: for every bounded sequence \(\{x_n\}\) in \(\mathcal{X}\), the image sequence \(\{Tx_n\}\) has a convergent subsequence in \(\mathcal{Y}\).

Recall from Intro to Functional Analysis that in infinite dimensions, bounded sequences need not have convergent subsequences — the closed unit ball is not compact. A compact operator "repairs" this: it compresses the infinite-dimensional unit ball into a set that is (relatively) compact. In other words, it "squashes away" the infinite-dimensional directions, leaving behind something that behaves like a finite-dimensional object.

Finite-Rank Operators

The simplest compact operators are those with finite-dimensional range.

Definition: Finite-Rank Operator

An operator \(T \in \mathcal{B}(\mathcal{X}, \mathcal{Y})\) is finite-rank if \(\dim(\operatorname{Range}(T)) \lt \infty\). Every finite-rank operator is compact (its image lies in a finite-dimensional subspace, where Heine-Borel applies).

The deep connection between compact and finite-rank operators is captured by the following approximation property:

Theorem: Compact Operators as Limits of Finite-Rank Operators

Let \(\mathcal{H}\) be a Hilbert space. An operator \(T \in \mathcal{B}(\mathcal{H})\) is compact if and only if there exists a sequence of finite-rank operators \(\{T_n\}\) such that \(\|T - T_n\| \to 0\).

Proof sketch (Hilbert space case):

(\(\Leftarrow\)) Suppose \(T_n \to T\) in operator norm with each \(T_n\) finite-rank (hence compact). Given a bounded sequence \(\{x_k\}\) with \(\|x_k\| \leq B\), iteratively extract subsequences: \(\{T_1 x_{k}\}\) has a convergent subsequence indexed by \(k_1(j)\); within that, \(\{T_2 x_{k_1(j)}\}\) has a convergent subsequence indexed by \(k_2(j)\); and so on. The diagonal \(y_j := x_{k_j(j)}\) satisfies: for each fixed \(n\), \(\{T_n y_j\}_{j \geq n}\) converges. For any \(\varepsilon > 0\), choose \(n\) with \(\|T - T_n\| \lt \varepsilon/(3B)\); then for \(j, j'\) large enough that \(\|T_n y_j - T_n y_{j'}\| \lt \varepsilon/3\), \[ \begin{align*} \|T y_j - T y_{j'}\| &\leq\; \|(T - T_n)y_j\| + \|T_n y_j - T_n y_{j'}\| + \|(T_n - T)y_{j'}\| \\\\ &\lt\; \tfrac{\varepsilon}{3} + \tfrac{\varepsilon}{3} + \tfrac{\varepsilon}{3} = \varepsilon. \end{align*} \] Hence \(\{Ty_j\}\) is Cauchy in \(\mathcal{H}\), and converges by completeness. Thus \(T\) is compact.

(\(\Rightarrow\)) Separability of \(\overline{\operatorname{Range}(T)}\) (which follows from compactness, since \(\overline{\operatorname{Range}(T)} = \overline{\bigcup_n T(\overline{B}_n)}\) is the closure of a countable union of compact — hence separable — sets) lets us pick an orthonormal basis \(\{e_n\}\) for it. Let \(P_n\) be orthogonal projection onto \(\operatorname{span}\{e_1, \ldots, e_n\}\), and set \(T_n = P_n T\). Each \(T_n\) is finite-rank. Since \(T(\overline{B}_\mathcal{H})\) is relatively compact, for any \(\varepsilon > 0\) it is covered by finitely many \(\varepsilon/2\)-balls centered at points \(y_1, \ldots, y_k\). Choose \(N\) so that \(\|y_j - P_n y_j\| \lt \varepsilon/2\) for all \(j\) and \(n \geq N\) (possible because \(P_n y_j \to y_j\) for each fixed \(y_j\) in the closed span of \(\{e_n\}\)). For unit \(x \in \overline{B}_\mathcal{H}\), pick \(j\) with \(\|Tx - y_j\| \lt \varepsilon/2\); then \[ \begin{align*} \|(T - T_n) x\| \;=\; \|(I - P_n) Tx\| &\leq\; \|(I - P_n)(Tx - y_j)\| + \|(I - P_n) y_j\| \\\\ &\leq\; \|Tx - y_j\| + \|y_j - P_n y_j\| \\\\ &\lt\; \varepsilon, \end{align*} \] using \(\|I - P_n\| \leq 1\). Taking sup over unit \(x\) gives \(\|T - T_n\| \leq \varepsilon\) for \(n \geq N\).

Remark. This characterization is special to spaces with a Schauder basis (in particular, separable Hilbert spaces). In general Banach spaces the analogous statement — the approximation property — can fail (Enflo, 1973).

This is the functional-analytic analogue of the truncated SVD: a compact operator can be approximated arbitrarily well by operators with finite-dimensional range, just as any matrix can be approximated by low-rank matrices obtained by discarding small singular values.

The Canonical Example: Integral Operators

Definition: Hilbert-Schmidt Integral Operators

Let \(K : [a,b] \times [a,b] \to \mathbb{C}\) be a square-integrable kernel, meaning \[ \int_a^b \int_a^b |K(x,t)|^2 \, dx\, dt \lt \infty. \] The integral operator defined by \[ (Tf)(x) = \int_a^b K(x, t)\, f(t)\, dt \] is a compact operator on \(L^2([a,b])\). Such operators are called Hilbert-Schmidt operators, and they form a two-sided ideal in \(\mathcal{B}(L^2)\).

The Hilbert-Schmidt norm is \[ \|T\|_{\text{HS}} = \left(\int_a^b \int_a^b |K(x,t)|^2 \, dx\, dt\right)^{1/2}, \] and it satisfies \(\|T\| \leq \|T\|_{\text{HS}}\) (the operator norm is bounded by the Hilbert-Schmidt norm).

Proof sketch (compactness and norm bound):

Norm bound. For \(f \in L^2([a,b])\), Cauchy-Schwarz applied pointwise in \(x\) gives \[ |(Tf)(x)|^2 \;=\; \left|\int_a^b K(x,t) f(t)\, dt\right|^2 \;\leq\; \left(\int_a^b |K(x,t)|^2\, dt\right) \|f\|_2^2. \] Integrating in \(x\) yields \(\|Tf\|_2^2 \leq \|T\|_{\text{HS}}^2 \|f\|_2^2\), i.e., \(\|T\| \leq \|T\|_{\text{HS}}\).

Compactness. Fix an orthonormal basis \(\{\varphi_i\}\) of \(L^2([a,b])\); then \(\{\varphi_i(x)\varphi_j(t)\}\) is an orthonormal basis of \(L^2([a,b]^2)\). Expand \(K = \sum_{i,j} c_{ij}\, \varphi_i(x)\varphi_j(t)\) and let \(K_N\) be the truncation to \(i, j \leq N\). The corresponding operator \(T_N\) is finite-rank, and \(\|T - T_N\| \leq \|T - T_N\|_{\text{HS}} \to 0\) by Parseval. By the finite-rank approximation theorem above, \(T\) is compact.

These integral operators are the "continuous matrices" of functional analysis. The kernel function \(K(x,t)\) plays the role of the matrix entries \(a_{ij}\), and the integral replaces the finite sum. When the kernel is symmetric (\(K(x,t) = K(t,x)\)), the integral operator is self-adjoint — the continuous analogue of a symmetric matrix.

Key Properties of Compact Operators

Theorem: Properties of Compact Operators

(a) The set of compact operators \(\mathcal{K}(\mathcal{X}, \mathcal{Y})\) is a closed subspace of \(\mathcal{B}(\mathcal{X}, \mathcal{Y})\).

(b) If \(T\) is compact and \(S\) is bounded, then both \(ST\) and \(TS\) are compact (compact operators form a two-sided ideal in \(\mathcal{B}(\mathcal{H})\)).

(c) If \(T\) is compact, then \(T^*\) is compact (Schauder's theorem).

(d) The identity operator \(I\) on an infinite-dimensional space is never compact (since the closed unit ball is not compact).

Proof:

(a) Linearity (closure under addition and scalar multiplication) is straightforward from the subsequence-extraction characterization. Closedness in operator norm was proved as the (\(\Leftarrow\)) direction of Compact Operators as Limits of Finite-Rank Operators for the Hilbert case; the same diagonal argument (no use of finite-rank approximability) works for general normed spaces, replacing "finite-rank" by "compact" throughout: a sequence \(\{T_n\}\) of compact operators with \(T_n \to T\) in norm and a bounded \(\{x_k\}\) admit, by iterated subsequence extraction, a diagonal \(\{y_j\}\) on which every \(T_n\) converges; the \(3\varepsilon\) argument then gives \(\{Ty_j\}\) Cauchy, hence convergent.

(b) Let \(\{x_n\}\) be a bounded sequence in \(\mathcal{X}\). For \(TS\): \(\{Sx_n\}\) is bounded (\(\|Sx_n\| \leq \|S\|\|x_n\|\)), so by compactness of \(T\), \((TS)x_n = T(Sx_n)\) has a convergent subsequence. For \(ST\): by compactness of \(T\), \(\{Tx_n\}\) has a convergent subsequence \(Tx_{n_k} \to y\); by continuity of \(S\), \(S(Tx_{n_k}) \to Sy\), so \((ST)x_n\) has a convergent subsequence. Hence both \(ST\) and \(TS\) are compact.

(c) The compactness of \(T^*\) given that of \(T\) — Schauder's theorem — is a deeper result requiring the Arzelà-Ascoli theorem applied to the family \(\{x \mapsto \langle Tx, y\rangle\}_{\|y\|\leq 1}\) of equicontinuous functions on a totally bounded set.

(d) If \(I\) were compact, the image of the unit ball under \(I\) — which is the unit ball itself — would be relatively compact. But in infinite dimensions, the closed unit ball is not compact: this is exactly the Riesz characterization of finite-dimensionality.

Property (d) highlights the key distinction: compact operators are the "well-behaved" operators that do not include the identity — they necessarily "lose dimensions" in the sense that their image cannot contain an open set.

Connection to Machine Learning: Kernel Matrices as Finite Approximations

In Kernel PCA, we compute the Gram matrix \(K_{ij} = k(x_i, x_j)\) for \(N\) data points and extract its eigenvectors. This \(N \times N\) matrix is a finite-dimensional discretization of the underlying integral operator \((Tf)(x) = \int k(x, t) f(t)\, d\mu(t)\) on the function space, obtained by replacing the measure \(\mu\) with the empirical measure \(\mu_N = \tfrac{1}{N}\sum_i \delta_{x_i}\). Mercer-type results guarantee that as \(N \to \infty\) and \(\mu_N \to \mu\) (e.g., for i.i.d. samples by the Glivenko-Cantelli theorem), the eigenvalues and eigenfunctions of the finite Gram matrix converge to those of the underlying compact integral operator — justifying why kernel methods extract meaningful spectral structure from finite data.

The Spectral Theorem for Compact Self-Adjoint Operators

We now arrive at the central result of this chapter — the infinite-dimensional generalization of the Spectral Theorem for Symmetric Matrices.

Theorem: Spectral Theorem (Compact Self-Adjoint Operators)

Let \(\mathcal{H}\) be a separable Hilbert space and \(T = T^* \in \mathcal{B}(\mathcal{H})\) a compact, self-adjoint operator. Then:

(a) There exists a finite or countably infinite orthonormal sequence of eigenvectors \(\{e_n\}_{n=1}^N\) (\(N \leq \infty\)) with corresponding real eigenvalues \(\{\lambda_n\}\), such that \[ Tx = \sum_{n=1}^{N} \lambda_n \langle x, e_n \rangle \, e_n \quad \text{for all } x \in \mathcal{H}. \]

(b) If \(N = \infty\), then \(\lambda_n \to 0\) as \(n \to \infty\).

(c) Each nonzero eigenvalue has finite multiplicity (its eigenspace is finite-dimensional).

(d) The only possible accumulation point of \(\{\lambda_n\}\) is \(0\).

(e) The eigenvectors \(\{e_n\}\) together with an orthonormal basis for \(\ker(T)\) form a complete orthonormal basis for \(\mathcal{H}\).

Comparison: Finite vs. Infinite Dimensions

Property	Symmetric Matrix \(A \in \mathbb{R}^{n \times n}\)	Compact Self-Adjoint \(T \in \mathcal{B}(\mathcal{H})\)
Eigenvalues	Finitely many (\(n\)), all real	Countably many, all real, \(\lambda_n \to 0\)
Eigenvectors	Orthonormal basis for \(\mathbb{R}^n\)	Orthonormal sequence \(\{e_n\}\) spanning \(\overline{\operatorname{Range}(T)}\); together with an orthonormal basis of \(\ker(T)\), forms an orthonormal basis of \(\mathcal{H}\)
Decomposition	\(A\mathbf{x} = \sum_{i=1}^{n} \lambda_i \langle \mathbf{x}, \mathbf{v}_i\rangle \mathbf{v}_i\) (equivalently \(A = Q\Lambda Q^T\))	\(Tx = \sum_{n=1}^{N} \lambda_n \langle x, e_n \rangle \, e_n\) (\(N \leq \infty\))
Truncation	Low-rank approximation (\(\\|A - A_k\\|_2 = \|\lambda_{k+1}\|\), Eckart-Young)	Finite-rank approximation (\(\\|T - T_k\\| = \|\lambda_{k+1}\|\))
Zero eigenvalue	Present iff \(A\) is singular (\(\ker(A) \neq \{0\}\))	May or may not be an eigenvalue (\(0 \in \sigma_p(T) \iff \ker(T) \neq \{0\}\)); but \(0 \in \sigma(T)\) is automatic, since \(T\) compact and \(\dim\mathcal{H} = \infty\) force \(T\) to be non-invertible

Proof Sketch

The proof proceeds by an iterative maximization argument that parallels the Rayleigh quotient construction from finite-dimensional linear algebra.

Proof Sketch:

Step 1: Find the first eigenvector.
Since \(T\) is self-adjoint, \(\|T\| = \sup_{\|x\|=1} |\langle Tx, x \rangle|\). By weak compactness of the unit ball (every Hilbert space is reflexive) and compactness of \(T\), there exists \(x_* \in \overline{B}_\mathcal{H}\) achieving \(|\langle Tx_*, x_* \rangle| = \|T\|\). (Since \(T\) is compact, \(x_n \rightharpoonup x\) implies \(Tx_n \to Tx\) strongly; combined with weak convergence of \(x_n\), this gives \(\langle Tx_n, x_n \rangle \to \langle Tx, x \rangle\), so the map \(x \mapsto \langle Tx, x \rangle\) is weakly continuous and attains its extrema on the weakly compact unit ball.) Since the sup over \(\overline{B}\) equals the sup over the unit sphere (by positive homogeneity of \(|\langle Tx, x\rangle|\) in \(\|x\|^2\)), we may assume \(\|x_*\| = 1\); set \(e_1 = x_*\) and \(\lambda_1 = \langle Te_1, e_1 \rangle\).

Variational argument for \(Te_1 = \lambda_1 e_1\).
For any \(v \perp e_1\) and \(t \in \mathbb{R}\), consider \[ \phi(t) = \langle T(e_1 + tv), e_1 + tv \rangle / \|e_1 + tv\|^2. \] Since \(\|e_1\| = 1\) is a critical point of \(|\phi|\) and \(T\) is self-adjoint, differentiating at \(t = 0\) gives \[ \phi'(0) = 2\operatorname{Re} \langle Te_1 - \lambda_1 e_1, v \rangle = 0 \] replacing \(v\) with \(iv\) over \(\mathbb{C}\) gives the imaginary part, so \(\langle Te_1 - \lambda_1 e_1, v \rangle = 0\) for all \(v \perp e_1\). Thus \[ Te_1 - \lambda_1 e_1 \in \{e_1\}^{\perp\perp} = \operatorname{span}\{e_1\} \] (since \(\operatorname{span}\{e_1\}\) is one-dimensional, hence closed, and the double-orthogonal-complement of a closed subspace equals itself by the Projection Theorem); combined with \(\langle Te_1 - \lambda_1 e_1, e_1\rangle = 0\) by construction, we conclude \(Te_1 = \lambda_1 e_1\).

Step 2: Restrict and repeat.
The subspace \(\{e_1\}^\perp\) is invariant under \(T\) (because \(T\) is self-adjoint). The restriction \(T|_{\{e_1\}^\perp}\) is still compact and self-adjoint, so we repeat Step 1 to find \(e_2\) with \(|\lambda_2| = \|T|_{\{e_1\}^\perp}\| \leq |\lambda_1|\).

Step 3: Convergence to zero.
The sequence \(\{|\lambda_n|\}\) is nonincreasing and bounded below by 0, so it converges. If it converged to some \(\epsilon > 0\), the orthonormal sequence \(\{e_n\}\) would satisfy \(\|Te_m - Te_n\|^2 = \lambda_m^2 + \lambda_n^2 \geq 2\epsilon^2\), contradicting the compactness of \(T\) (which requires \(\{Te_n\}\) to have a convergent subsequence). Hence \(\lambda_n \to 0\).

Closing the remaining claims.
Steps 1-3 yield (b) directly. Statement (d) follows because the construction enumerates the eigenvalues in nonincreasing order of \(|\lambda_n|\), so any accumulation point must equal \(\lim |\lambda_n| = 0\). For (c), if some nonzero eigenvalue \(\mu\) had infinite multiplicity, an orthonormal basis \(\{f_k\}\) of its eigenspace would give \(\|Tf_k - Tf_l\|^2 = 2\mu^2\) for \(k \neq l\), again contradicting compactness. For (e), the constructed \(\{e_n\}\) span \(\overline{\operatorname{Range}(T)}\): any vector \(v\) orthogonal to all \(e_n\) satisfies \[ \|Tv\| \leq \|T|_{\{e_1,\ldots,e_n\}^\perp}\| \cdot \|v\| = |\lambda_{n+1}| \|v\| \to 0, \] forcing \(Tv = 0\), hence \(v \in \ker(T)\). Since \(\mathcal{H}\) is separable, the closed subspace \(\ker(T)\) is also separable and admits a countable orthonormal basis. Concatenating \(\{e_n\}\) with such a basis of \(\ker(T)\) gives a complete orthonormal basis of \(\mathcal{H} = \overline{\operatorname{Range}(T)} \oplus \ker(T)\). The expansion in (a) then follows: writing \(x = x_R + x_K\) with \(x_R \in \overline{\operatorname{Range}(T)}\) and \(x_K \in \ker(T)\), Parseval gives \(x_R = \sum_n \langle x, e_n\rangle e_n\). Thus, \[ Tx = Tx_R = \sum_n \lambda_n \langle x, e_n\rangle e_n \] by linearity and continuity.

Notice that Step 1 uses the Rayleigh quotient characterization, Step 2 uses the orthogonal complement (Hilbert space structure), and Step 3 uses compactness to force the eigenvalues to decay. Each ingredient is essential.

The Singular Value Decomposition in Infinite Dimensions

Just as the finite-dimensional SVD extends the Spectral Theorem from symmetric to general matrices, there is an infinite-dimensional SVD for compact operators.

Theorem: Singular Value Decomposition (Compact Operators)

Let \(T \in \mathcal{B}(\mathcal{H})\) be a compact operator (not necessarily self-adjoint). Then there exist orthonormal sequences \(\{e_n\}_{n=1}^N\) and \(\{f_n\}_{n=1}^N\) (\(N \leq \infty\)) and a nonincreasing sequence of strictly positive reals \(\sigma_1 \geq \sigma_2 \geq \cdots > 0\) (the singular values) such that \[ Tx = \sum_{n=1}^{N} \sigma_n \langle x, e_n \rangle \, f_n. \] The singular values are the positive square roots of the nonzero eigenvalues of \(T^*T\) (equivalently, of \(TT^*\); the two operators share their nonzero spectrum with matching multiplicities), and \(\sigma_n \to 0\) when \(N = \infty\).

Proof sketch:

By Schauder's theorem, \(T^*\) is compact; hence \(T^*T\) is compact, self-adjoint, and positive (\(\langle T^*Tx, x\rangle = \|Tx\|^2 \geq 0\)). The Spectral Theorem above applied to \(T^*T\) yields its nonzero eigenvalues \(\{\sigma_n^2\}_{n=1}^N\) (\(\sigma_n^2 > 0\), nonincreasing, \(\sigma_n^2 \to 0\) when \(N = \infty\)) with corresponding orthonormal eigenvectors \(\{e_n\}\) spanning \(\overline{\operatorname{Range}(T^*T)}\). Define \(\sigma_n = \sqrt{\sigma_n^2} > 0\) and \(f_n = Te_n / \sigma_n\). Direct computation gives \[ \begin{align*} \langle f_n, f_m\rangle &= \langle Te_n, Te_m\rangle/(\sigma_n \sigma_m) \\\\ &= \langle T^*Te_n, e_m\rangle/(\sigma_n\sigma_m) \\\\ &= \sigma_n^2\delta_{nm}/(\sigma_n\sigma_m) \\\\ &= \delta_{nm}, \end{align*} \] so \(\{f_n\}\) is orthonormal.

For the expansion: note that \(\ker(T^*T) = \ker(T)\), since \(T^*Tx = 0\) implies \(\|Tx\|^2 = \langle T^*Tx, x\rangle = 0\), hence \(Tx = 0\), and conversely. Decompose any \(x \in \mathcal{H}\) as \(x = x_R + x_K\) with \(x_R \in \overline{\operatorname{Range}(T^*T)} = \ker(T^*T)^\perp = \ker(T)^\perp\) and \(x_K \in \ker(T)\). By Parseval on the ON basis \(\{e_n\}\) of \(\overline{\operatorname{Range}(T^*T)}\), \(x_R = \sum_n \langle x, e_n\rangle e_n\). Since \(Tx_K = 0\) and \(T\) is continuous, \[ Tx \;=\; Tx_R \;=\; \sum_n \langle x, e_n\rangle Te_n \;=\; \sum_n \sigma_n \langle x, e_n\rangle f_n. \] Finally, the \(\{f_n\}\) are themselves eigenvectors of \(TT^*\) with the same eigenvalues \(\sigma_n^2\): \(TT^* f_n = TT^*(Te_n/\sigma_n) = T(T^*T)e_n/\sigma_n = T(\sigma_n^2 e_n)/\sigma_n = \sigma_n^2 f_n\). This is the precise sense in which \(T^*T\) and \(TT^*\) "see" the same singular values from the two sides of \(T\).

This is the precise infinite-dimensional analogue of \(A = U\Sigma V^T\). The \(\{e_n\}\) are the "right singular vectors," the \(\{f_n\}\) are the "left singular vectors," and the \(\sigma_n\) decay to zero because \(T\) is compact.

Functional PCA & The Path to Kernels

The Spectral Theorem is not merely an abstract result — it is the mathematical engine behind several of the most important algorithms in machine learning and statistics. In this final section, we connect the theory to three concrete applications.

Application 1: The Karhunen-Loève Expansion (Functional PCA)

In PCA, we performed eigendecomposition on the covariance matrix \(\Sigma \in \mathbb{R}^{n \times n}\) of a finite-dimensional random vector. But what if our data consists of random functions — e.g., time series, curves, or images viewed as functions on a domain?

Let \(X(t)\) be a centered, square-integrable stochastic process on \([a,b]\), with covariance function \[ C(s, t) = \mathbb{E}[X(s)\, \overline{X(t)}]. \] The kernel \(C\) is Hermitian symmetric (\(C(s,t) = \overline{C(t,s)}\)) and positive semi-definite. The most common ML setting takes \(X\) real-valued, in which case the conjugate is a no-op and \(C\) reduces to a real symmetric kernel \(C(s,t) = \mathbb{E}[X(s) X(t)]\); the formulation here covers both this classical case and complex-valued processes (e.g., complex baseband signals in communications, quantum stochastic processes). The covariance operator \(\mathcal{C} : L^2([a,b]) \to L^2([a,b])\) is defined by \[ (\mathcal{C}f)(s) = \int_a^b C(s, t)\, f(t)\, dt. \] This is a Hilbert-Schmidt integral operator with a Hermitian, positive semi-definite kernel, so it is compact and self-adjoint. The Spectral Theorem applies directly.

Theorem: Karhunen-Loève Expansion

Let \(X(t)\) be a centered, square-integrable stochastic process on \([a,b]\) with continuous covariance function \(C(s,t)\). Let \(\{\lambda_n, e_n\}\) be the eigenvalues and eigenfunctions of the covariance operator \(\mathcal{C}\). Then \[ X(t) = \sum_{n=1}^{\infty} Z_n \, e_n(t), \] where \(Z_n = \langle X, e_n \rangle = \int_a^b X(t)\, \overline{e_n(t)}\, dt\) are uncorrelated random variables with \(\mathbb{E}[Z_n] = 0\) and \(\text{Var}(Z_n) = \lambda_n\). The convergence is in \(L^2\).

Proof sketch:

Apply the Spectral Theorem to the compact self-adjoint covariance operator \(\mathcal{C}\) to obtain orthonormal eigenfunctions \(\{e_n\}\) with eigenvalues \(\lambda_n \geq 0\) (positivity from \(\langle \mathcal{C}f, f\rangle = \mathbb{E}[|\langle X, f\rangle|^2] \geq 0\), real-valuedness of \(\lambda_n\) from self-adjointness). Define \(Z_n = \langle X, e_n\rangle\). Centering of \(X\) gives \(\mathbb{E}[Z_n] = 0\). By Fubini's theorem and the eigenvalue equation \(\mathcal{C}e_m = \lambda_m e_m\), \[ \begin{align*} \mathbb{E}[Z_n\, \overline{Z_m}] &=\; \int_a^b \!\!\int_a^b C(s,t)\, \overline{e_n(s)}\, e_m(t)\, ds\, dt \\\\ &=\; \langle \mathcal{C}e_m, e_n\rangle \\\\ &=\; \lambda_m \langle e_m, e_n\rangle \\\\ &=\; \lambda_m\, \delta_{nm}, \end{align*} \] giving uncorrelatedness and \(\text{Var}(Z_n) = \mathbb{E}[|Z_n|^2] = \lambda_n\).

\(L^2\) convergence of the series \(\sum_n Z_n e_n(t)\) to \(X(t)\) is an application of Parseval in \(L^2([a,b])\) combined with completeness of \(\{e_n\}\) on \(\overline{\operatorname{Range}(\mathcal{C})}\); the orthogonal complement of this range contributes a.s. zero to \(X\) by the covariance structure.

This is literally infinite-dimensional PCA. The eigenfunctions \(\{e_n\}\) are the principal components (the directions of maximum variance), the eigenvalues \(\lambda_n\) give the variance in each direction, and the random coefficients \(Z_n\) are the "scores." The decay \(\lambda_n \to 0\) guarantees that truncating after \(k\) terms captures most of the variance — the continuous analogue of the scree plot.

Application 2: Spectral Clustering via the Continuous Laplacian

In Spectral Clustering, we used eigenvectors of the discrete graph Laplacian \(\mathbf{L} = \mathbf{D} - \mathbf{W}\) to embed data into a low-dimensional space and then applied K-means. The continuous analogue replaces the graph with a smooth domain and the discrete Laplacian with the Laplace-Beltrami operator on a manifold.

On a compact Riemannian manifold (or on a bounded domain with appropriate boundary conditions), the negative Laplace-Beltrami operator \(-\Delta\) is a positive self-adjoint unbounded operator with compact resolvent \((-\Delta + I)^{-1}\). Although \(-\Delta\) itself is unbounded — and thus outside the scope of the present chapter's spectral theorem — applying the Spectral Theorem to its compact self-adjoint resolvent yields an orthonormal basis of eigenfunctions \(\{e_n\}\) for \(L^2\) on the manifold; these are simultaneously eigenfunctions of \(-\Delta\), with eigenvalues \(0 = \lambda_0 \leq \lambda_1 \leq \cdots \to \infty\). The first few eigenfunctions — the "low-frequency modes" — capture the large-scale geometry of the manifold, precisely as the Fiedler vector captures the main partition of a graph in the discrete case.

From Graphs to Manifolds: The Convergence

As the number of data points \(N \to \infty\) and the graph becomes denser, the discrete graph Laplacian converges (in a suitable sense) to the Laplace-Beltrami operator on the underlying data manifold. The eigenvectors of the Gram matrix converge to the eigenfunctions of the continuous operator. This is the rigorous justification for spectral clustering: the discrete algorithm is a finite-sample approximation of a well-defined continuous operation, and the Spectral Theorem guarantees that the continuous object has the orthogonal eigenfunction structure we need.

Application 3: The Bridge to Mercer's Theorem and RKHS

Consider a continuous, Hermitian (\(K(x,t) = \overline{K(t,x)}\)), positive semi-definite kernel \(K : [a,b] \times [a,b] \to \mathbb{C}\). The integral operator \[ (T_K f)(x) = \int_a^b K(x, t)\, f(t)\, dt \] is compact, self-adjoint, and positive (all eigenvalues \(\lambda_n \geq 0\)). By the Spectral Theorem: \[ K(x, t) = \sum_{n=1}^{\infty} \lambda_n \, e_n(x)\, \overline{e_n(t)}, \] where the convergence is absolute and uniform. The most common ML setting uses real-symmetric kernels (Gaussian, polynomial, Laplacian, Matérn, etc.); in that case the conjugate is a no-op and the formula reduces to \(K(x,t) = \sum_n \lambda_n e_n(x) e_n(t)\).

This eigenfunction expansion has a profound consequence: the kernel \(K\) defines an implicit feature map \[ \Phi(x) = \bigl(\sqrt{\lambda_1}\, e_1(x),\; \sqrt{\lambda_2}\, e_2(x),\; \ldots\bigr) \in \ell^2, \] such that \(K(x, t) = \langle \Phi(x), \Phi(t) \rangle_{\ell^2}\) (in the linear-in-first-argument convention, the second slot conjugates, recovering the Hermitian \(\overline{e_n(t)}\) automatically; in the real case both reduce to \(K(x,t) = \sum_n \lambda_n e_n(x) e_n(t)\)). The kernel "trick" — computing inner products in a high-dimensional feature space without ever forming \(\Phi\) explicitly — is mathematically grounded in this spectral decomposition.

Summary: The Spectral Thread

Let us trace the spectral idea through our entire curriculum:

Finite-dimensional (Symmetry & SVD): \(A = Q\Lambda Q^T\). Discrete eigenvalues, finite eigenvectors. Powers PCA, SVD.
Discrete infinite (Graph Laplacians): \(\mathbf{L} = \mathbf{D} - \mathbf{W}\). Discrete eigenvalues of growing matrices. Powers Spectral Clustering, Graph Fourier Transform.
Continuous infinite (this chapter): \(Tx = \sum \lambda_n \langle x, e_n \rangle e_n\). Countable eigenvalues decaying to 0. Powers Functional PCA, continuous Laplacians.
Next: Mercer's Theorem converts the spectral decomposition of a positive kernel into an explicit feature space — the mathematical foundation of SVMs, Gaussian Processes, and Kernel PCA.

Spectral Theory of Compact Operators

Loading...

From Eigenvalues to the Spectrum

The Resolvent and Spectrum

Connection to the Graph Laplacian

The Adjoint & Self-Adjoint Operators

The Hilbert Space Adjoint

Self-Adjoint Operators

Connection to Quantum Mechanics

Compact Operators: "Almost Finite-Dimensional"

Finite-Rank Operators

The Canonical Example: Integral Operators

Key Properties of Compact Operators

Connection to Machine Learning: Kernel Matrices as Finite Approximations

The Spectral Theorem for Compact Self-Adjoint Operators

Comparison: Finite vs. Infinite Dimensions

Proof Sketch

The Singular Value Decomposition in Infinite Dimensions

Functional PCA & The Path to Kernels

Application 1: The Karhunen-Loève Expansion (Functional PCA)

Application 2: Spectral Clustering via the Continuous Laplacian

From Graphs to Manifolds: The Convergence

Application 3: The Bridge to Mercer's Theorem and RKHS

Summary: The Spectral Thread