The Heat Equation

Introduction Heat Equation on a Bounded Interval Heat Equation on the Real Line Properties of the Heat Kernel

Introduction

With the Fourier series and the Fourier transform in hand, the toolkit of Fourier analysis can appear, from a computer science perspective, essentially complete. Orthogonal decomposition, Plancherel's identity, convolution as pointwise multiplication, and the FFT cover what most CS curricula treat as the entirety of the subject — Fourier as the engine of signal processing, image compression, and spectral algorithms. Yet a question implicit in our earlier development of Fourier series remains unanswered: what was Fourier analysis invented for? The present page returns to that original purpose. We use the Fourier machinery already built to solve a partial differential equation.

Until recently, partial differential equations sat outside the CS mainstream. PDEs were the province of physics and applied mathematics — the heat equation belonged to thermodynamics, the wave equation to acoustics and electromagnetism, the Laplace equation to electrostatics and fluid flow. A CS student might encounter a discretized PDE in a numerical methods elective, but the continuous theory itself rarely entered the curriculum. This perception has been overturned by developments of the past several years. The forward process of a diffusion generative model — the mechanism underlying contemporary image and video synthesis — is in continuous time a stochastic diffusion whose density evolution is governed by the heat equation and its drift-augmented relatives. Neural ODEs, Fourier neural operators, flow matching, and score-based generative modelling have placed continuous-time differential equations at the centre of contemporary machine learning. The continuous domain, long dismissed as a physicist's concern, has become impossible for the ML researcher to ignore.

The heat equation \[ \partial_t u(x, t) = k\, \partial_{xx} u(x, t) \] is the canonical example of a parabolic partial differential equation, where \(u(x, t)\) represents temperature (or, abstractly, the density of a diffusing quantity) at position \(x\) and time \(t\), and \(k > 0\) is a positive constant called the diffusivity. Two distinct senses motivate its study here. Mathematically, the heat equation is the parabolic representative of the classical trichotomy — parabolic, hyperbolic, elliptic — whose other two representatives, the wave equation and the Laplace equation, will be developed in subsequent pages. In application, it underlies the Gaussian forward corruption of diffusion models, the graph heat kernel of spectral graph neural networks, and the Gaussian scale-space of classical image processing. The treatment is one-dimensional and classical: separation of variables on a bounded interval, the Fourier transform on the real line, the heat kernel and its qualitative properties. Higher dimensions, weak solutions in Sobolev spaces, semigroup theory, and the stochastic interpretation via Brownian motion belong to more advanced treatments.

Heat Equation on a Bounded Interval

Heat conduction was the very problem that motivated Joseph Fourier to introduce trigonometric series in his 1822 Théorie analytique de la chaleur. We begin with the prototype on which the entire theory rests: temperature evolution in a one-dimensional rod of length \(L\) whose two ends are held at zero temperature. The mathematical formulation is the initial-boundary value problem \[ \begin{cases} \partial_t u = k\, \partial_{xx} u, & 0 < x < L,\ t > 0, \\ u(0, t) = u(L, t) = 0, & t \geq 0, \\ u(x, 0) = f(x), & 0 \leq x \leq L, \end{cases} \] where \(f\) is a prescribed initial temperature profile. The boundary conditions \(u(0,t) = u(L,t) = 0\) are the homogeneous Dirichlet conditions, corresponding physically to the rod ends being held in contact with a thermal reservoir at temperature zero. Our goal is an explicit solution formula that exposes both how the temperature evolves and how that evolution reflects the spectral structure of the spatial differential operator \(\partial_{xx}\) on \([0, L]\).

The strategy proceeds in three stages: separation of variables reduces the PDE to two ordinary differential equations linked by a separation constant; the spatial ODE together with the boundary conditions becomes an eigenvalue problem whose solutions are the standing waves of the rod; superposition of these standing waves, weighted by the Fourier sine coefficients of \(f\), gives the complete solution. The method is due to Daniel Bernoulli (1700s) and was systematized by Fourier; its scope extends far beyond the heat equation, underpinning much of the linear theory of partial differential equations on bounded domains.

Separation of Variables

We seek solutions in the product form \[ u(x, t) = \phi(x)\, G(t), \] setting aside the initial condition for the moment and demanding only that \(u\) satisfy the PDE and the homogeneous boundary conditions. Substituting into the heat equation and computing the relevant partial derivatives gives \(\phi(x) G'(t) = k\, \phi''(x) G(t)\). Dividing both sides by \(k\, \phi(x)\, G(t)\) — legitimate wherever neither factor vanishes — separates the variables: \[ \frac{G'(t)}{k\, G(t)} = \frac{\phi''(x)}{\phi(x)}. \] The left-hand side depends only on \(t\), the right-hand side only on \(x\). Two functions of independent variables can agree everywhere only if both are equal to a common constant, which we denote \(-\lambda\): \[ \frac{G'(t)}{k\, G(t)} = \frac{\phi''(x)}{\phi(x)} = -\lambda. \] The minus sign is a convention chosen with foresight: we shall find that the physically relevant values of \(\lambda\) are positive, in which case \(G(t)\) decays rather than grows. The separation equation splits into two ordinary differential equations, \[ \begin{align*} \phi''(x) + \lambda\, \phi(x) &\;=\; 0, \\\\ G'(t) + \lambda k\, G(t) &\;=\; 0, \end{align*} \] coupled only through the shared constant \(\lambda\). The homogeneous boundary conditions \(u(0, t) = u(L, t) = 0\) translate into conditions on \(\phi\) alone: requiring \(\phi(0)\, G(t) = \phi(L)\, G(t) = 0\) for all \(t\) and rejecting the trivial possibility \(G \equiv 0\) (which gives \(u \equiv 0\), of no interest), we obtain \[ \phi(0) = \phi(L) = 0. \]

The Eigenvalue Problem

The spatial problem \[ \begin{align*} \phi''(x) + \lambda\, \phi(x) &\;=\; 0, \\\\ \phi(0) = \phi(L) &\;=\; 0, \end{align*} \] is a boundary value problem: two conditions are imposed at two distinct points of the interval, not at a single point as for an initial value problem. It is fundamentally different in character from the initial value problems familiar from a first course in ordinary differential equations. The trivial function \(\phi \equiv 0\) satisfies the ODE and both boundary conditions regardless of \(\lambda\); the question is whether nontrivial solutions exist, and if so for which values of \(\lambda\).

Definition: Eigenvalues and Eigenfunctions of the Dirichlet Laplacian on \([0, L]\)

A real number \(\lambda\) is an eigenvalue of the boundary value problem \[ \begin{align*} \phi''(x) + \lambda\, \phi(x) &\;=\; 0, \\\\ \phi(0) = \phi(L) &\;=\; 0 \end{align*} \] if there exists a nontrivial solution \(\phi \not\equiv 0\). Any such \(\phi\) is called an eigenfunction corresponding to \(\lambda\). The terminology arises because the problem can be rewritten as \(-\phi'' = \lambda\, \phi\), expressing \(\phi\) as an eigenvector of the differential operator \(-d^2/dx^2\) acting on twice-differentiable functions on \([0, L]\) that vanish at both endpoints.

To find the eigenvalues, we solve the ODE directly. The characteristic equation \(r^2 + \lambda = 0\) has roots \(r = \pm\sqrt{-\lambda}\), and the structure of these roots — real, zero, or purely imaginary — depends on the sign of \(\lambda\). A complete analysis must consider three cases.

Case 1: \(\lambda < 0\). Writing \(\lambda = -s\) with \(s > 0\), the characteristic roots are \(r = \pm\sqrt{s}\), and the general solution is \(\phi(x) = c_1 \cosh\sqrt{s}\, x + c_2 \sinh\sqrt{s}\, x\). The condition \(\phi(0) = 0\) forces \(c_1 = 0\), leaving \(\phi(x) = c_2 \sinh\sqrt{s}\, x\). The condition \(\phi(L) = 0\) requires \(c_2 \sinh\sqrt{s}\, L = 0\). Since \(\sinh\) vanishes only at zero and \(\sqrt{s}\, L > 0\), we must have \(c_2 = 0\), giving only the trivial solution. No negative eigenvalue exists.

Case 2: \(\lambda = 0\). The ODE reduces to \(\phi'' = 0\) with general solution \(\phi(x) = c_1 + c_2 x\). The boundary conditions \(\phi(0) = c_1 = 0\) and \(\phi(L) = c_2 L = 0\) (with \(L > 0\)) force both constants to vanish. \(\lambda = 0\) is not an eigenvalue.

Case 3: \(\lambda > 0\). The characteristic roots are purely imaginary, \(r = \pm i\sqrt{\lambda}\), and the general solution in real form is \[ \phi(x) = c_1 \cos\sqrt{\lambda}\, x + c_2 \sin\sqrt{\lambda}\, x. \] The condition \(\phi(0) = 0\) again forces \(c_1 = 0\). The remaining condition \(\phi(L) = c_2 \sin\sqrt{\lambda}\, L = 0\) admits nontrivial solutions \((c_2 \neq 0)\) precisely when \(\sin\sqrt{\lambda}\, L = 0\), that is, when \(\sqrt{\lambda}\, L = n\pi\) for some positive integer \(n\). (The value \(n = 0\) is excluded because it gives \(\lambda = 0\), already handled; a negative integer \(n = -m\) with \(m \geq 1\) yields the same eigenvalue \(\lambda_m\) and, since \(\sin(-m\pi x/L) = -\sin(m\pi x/L)\), the same eigenfunction up to sign.)

Theorem: Eigenvalues and Eigenfunctions of the Dirichlet Laplacian on \([0, L]\)

The eigenvalues and corresponding eigenfunctions of the boundary value problem \[ \begin{align*} \phi''(x) + \lambda\, \phi(x) &\;=\; 0, \\\\ \phi(0) = \phi(L) &\;=\; 0 \end{align*} \] are \[ \begin{align*} \lambda_n &\;=\; \left(\frac{n\pi}{L}\right)^{\!2}, \\\\ \phi_n(x) &\;=\; \sin\frac{n\pi x}{L}, \end{align*} \] for \(n = 1, 2, 3, \ldots\). The eigenvalues form a strictly increasing sequence \(0 < \lambda_1 < \lambda_2 < \cdots\) tending to infinity, with \(\lambda_n = O(n^2)\). Each eigenvalue is simple: its eigenspace is one-dimensional, spanned by \(\phi_n\) up to scalar multiplication.

Proof:

The case analysis above establishes that nontrivial real-valued solutions exist only when \(\lambda > 0\) and \(\sqrt{\lambda}\, L = n\pi\) for some positive integer \(n\), giving \(\lambda_n = (n\pi/L)^2\). The corresponding eigenfunctions \(\phi_n(x) = \sin(n\pi x/L)\) are one-dimensional in span: any other solution of the ODE with eigenvalue \(\lambda_n\) is a linear combination of \(\sin\sqrt{\lambda_n}\, x\) and \(\cos\sqrt{\lambda_n}\, x\), and the boundary condition at \(x = 0\) eliminates the cosine component while \(\phi(L) = 0\) is automatically satisfied by \(\sin\sqrt{\lambda_n}\, x\). The sequence is strictly increasing because \(\lambda_{n+1} - \lambda_n = (2n + 1)\pi^2/L^2 > 0\), and the asymptotic \(\lambda_n = O(n^2)\) is immediate from \(\lambda_n/n^2 = \pi^2/L^2\). Restricting attention to real \(\lambda\) is harmless: the operator \(-d^2/dx^2\) with these boundary conditions is symmetric on the appropriate function space, so all of its eigenvalues are real — a fact whose general formulation belongs to the spectral theory of self-adjoint operators.

Insight: Spectral Geometry of an Interval

The eigenfunctions \(\phi_n(x) = \sin(n\pi x/L)\) are the standing waves of a rod fixed at both ends — the same modes that govern a vibrating string of length \(L\) with fixed endpoints. The \(n\)th mode has \(n - 1\) internal nodes (zeros) and oscillates \(n/2\) times along the rod. The eigenvalue \(\lambda_n = (n\pi/L)^2\) measures the "spatial frequency squared" of the mode: higher modes oscillate more rapidly and, as we shall see, decay more rapidly in time. The asymptotic \(\lambda_n \sim n^2\) is no accident; it reflects the fact that the Laplacian on a one-dimensional interval is a second-order operator, and Weyl's law in higher dimensions generalizes this scaling: on a \(d\)-dimensional domain of volume \(V\), the \(n\)th Dirichlet eigenvalue scales like \(\lambda_n \sim (n/V)^{2/d}\). The spectral structure of a domain encodes deep geometric information — a theme that connects to the spectral analysis of graph Laplacians in spectral graph theory and to the manifold theory that lies ahead in subsequent pages.

Time Evolution and Product Solutions

With the eigenvalues in hand, the time-dependent ODE \(G'(t) + \lambda_n k\, G(t) = 0\) associated with the \(n\)th mode is a first-order linear equation with constant coefficient \(\lambda_n k\), whose general solution is the exponential \[ \begin{align*} G_n(t) &\;=\; e^{-\lambda_n k\, t} \\\\ &\;=\; \exp\!\left(-k\!\left(\tfrac{n\pi}{L}\right)^{\!2} t\right), \end{align*} \] chosen with the normalisation \(G_n(0) = 1\) so that the multiplicative constant is absorbed into the spatial part at the time of superposition. Multiplying the spatial eigenfunction by its time evolution gives the \(n\)th product solution or fundamental mode: \[ u_n(x, t) = \sin\frac{n\pi x}{L}\, \exp\!\left(-k\!\left(\tfrac{n\pi}{L}\right)^{\!2} t\right). \]

Each \(u_n\) is a solution of the PDE \(\partial_t u = k\, \partial_{xx} u\) satisfying the homogeneous Dirichlet boundary conditions. Three features stand out. First, every mode decays exponentially in time: this is the hallmark of parabolic behaviour and distinguishes the heat equation sharply from the wave equation, whose modes oscillate without decay. Second, higher modes decay faster: the decay rate \(k\lambda_n = k(n\pi/L)^2\) grows quadratically in \(n\), so spatial features of high frequency are damped on a much shorter timescale than coarse features. The thermal diffusion time of the \(n\)th mode is \(\tau_n = 1/(k\lambda_n) = L^2/(k n^2 \pi^2)\). Third, the spatial profile is preserved: each \(u_n\) factors as a pure sine wave times a time-dependent amplitude, so the shape \(\sin(n\pi x/L)\) is fixed and only its amplitude shrinks.

Superposition and the Series Solution

The heat equation is linear and homogeneous, and the boundary conditions are also homogeneous. Consequently any finite linear combination of product solutions is again a solution: \[ \sum_{n=1}^{N} B_n\, u_n(x, t) = \sum_{n=1}^{N} B_n \sin\frac{n\pi x}{L}\, \exp\!\left(-k\!\left(\tfrac{n\pi}{L}\right)^{\!2} t\right) \] satisfies the PDE and the Dirichlet boundary conditions for any choice of coefficients \(B_1, \ldots, B_N\). The principle of superposition extends this to infinite series, provided the series converges in an appropriate sense: termwise differentiation in both \(x\) and \(t\) is justified for \(t > 0\) because the exponential factors \(\exp(-k(n\pi/L)^2 t)\) decay so rapidly in \(n\) that the differentiated series converges uniformly on every closed half-strip \(0 \leq x \leq L\), \(t \geq t_0 > 0\).

To match the initial condition \(u(x, 0) = f(x)\), evaluate the formal infinite series at \(t = 0\): the exponential factors collapse to unity, leaving \[ f(x) = \sum_{n=1}^{\infty} B_n \sin\frac{n\pi x}{L}, \qquad 0 \leq x \leq L. \] This is precisely a Fourier sine series expansion of \(f\) on \([0, L]\). The orthogonality of the trigonometric system gives the coefficient formula immediately: multiplying both sides by \(\sin(m\pi x/L)\) and integrating over \([0, L]\) selects the term with \(n = m\), using \[ \int_0^L \sin\frac{n\pi x}{L} \sin\frac{m\pi x}{L}\, dx = \begin{cases} L/2, & m = n, \\ 0, & m \neq n. \end{cases} \] The resulting coefficients are the Fourier sine coefficients of \(f\). We collect these observations into a single statement.

Theorem: Series Solution of the Heat Equation on a Bounded Interval

Let \(f \in L^2([0, L])\) be the initial temperature profile. The initial-boundary value problem \[ \begin{cases} \partial_t u = k\, \partial_{xx} u, & 0 < x < L,\ t > 0, \\ u(0, t) = u(L, t) = 0, & t \geq 0, \\ u(x, 0) = f(x), & 0 \leq x \leq L \end{cases} \] admits the solution \[ u(x, t) = \sum_{n=1}^{\infty} B_n \sin\frac{n\pi x}{L}\, \exp\!\left(-k\!\left(\tfrac{n\pi}{L}\right)^{\!2} t\right), \] where the coefficients are the Fourier sine coefficients of the initial datum, \[ B_n = \frac{2}{L}\int_0^L f(x) \sin\frac{n\pi x}{L}\, dx. \] The series converges in \(L^2([0, L])\) for every \(t \geq 0\) and, for \(t > 0\), converges absolutely and uniformly on \([0, L]\) together with all of its \(x\)-derivatives and \(t\)-derivatives.

Proof:

Termwise the series satisfies the PDE and the Dirichlet boundary conditions, as each product solution \(u_n\) does. The convergence claims are a consequence of the rapid decay of the exponential factors. For \(t \geq t_0 > 0\), the bound \(|B_n| \leq \|f\|_{L^2}\sqrt{2/L}\) (which follows from Bessel's inequality) combined with \(\exp(-k(n\pi/L)^2 t_0)\) decaying faster than any inverse polynomial in \(n\) gives absolute and uniform convergence of the series and of all its termwise derivatives; consequently \(u\) is \(C^\infty\) in \((x, t)\) for \(t > 0\), differentiation can be carried out under the summation, and \(u\) satisfies the PDE classically. At \(t = 0\), the series reduces to the Fourier sine series of \(f\), which converges to \(f\) in \(L^2([0, L])\) because \(\{\sqrt{2/L}\sin(n\pi x/L)\}_{n \geq 1}\) is a complete orthonormal system on \(L^2([0, L])\), as established in our treatment of Fourier series. The \(L^2\)-continuity \(u(\cdot, t) \to f\) as \(t \to 0^+\) follows from Parseval's identity: \[ \|u(\cdot, t) - f\|_{L^2}^2 \;=\; \frac{L}{2}\sum_{n=1}^{\infty} B_n^2 \left(e^{-k\lambda_n t} - 1\right)^2, \] which tends to \(0\) as \(t \to 0^+\) by dominated convergence — each term vanishes pointwise, and is bounded uniformly in \(t \geq 0\) by \(B_n^2\) (since \(0 \leq e^{-k\lambda_n t} \leq 1\)) with \((L/2)\sum_n B_n^2 = \|f\|_{L^2}^2 < \infty\).

The theorem asserts existence. Uniqueness within the appropriate function class holds as well, but it is genuinely a separate result whose proof requires machinery beyond the present scope (a standard tool is the energy estimate \(\frac{d}{dt}\!\int_0^L u(x,t)^2\, dx \leq 0\), valid for any sufficiently regular solution with the given boundary conditions). We use uniqueness as a fact when it is needed.

The solution formula has a clean structure: each Fourier mode of the initial datum evolves independently, weighted by an exponential decay whose rate is the eigenvalue times the diffusivity. The Laplacian \(\partial_{xx}\) on \([0, L]\) with Dirichlet boundary conditions is diagonalized by the orthonormal basis \(\{\sqrt{2/L}\sin(n\pi x/L)\}\), and the heat semigroup \(e^{tk\partial_{xx}}\) acts on this basis by multiplication by \(e^{-tk\lambda_n}\). This is the prototype example of functional calculus for self-adjoint operators: an abstract framework we touched in Fourier analysis in Hilbert spaces and one which generalizes to elliptic operators on manifolds, graph Laplacians in spectral graph theory, and ultimately to the spectral theorem itself.

A Worked Example

Example: Parabolic Initial Profile

Take \(L = 1\) and the initial datum \(f(x) = x(1 - x)\), a parabolic bump vanishing at both endpoints and reaching its maximum \(1/4\) at the midpoint \(x = 1/2\). The Fourier sine coefficients are computed by direct integration: \[ B_n = 2\int_0^1 x(1 - x) \sin(n\pi x)\, dx. \] Two integrations by parts, or the standard tabulated identities, give \[ \begin{align*} \int_0^1 x \sin(n\pi x)\, dx &\;=\; \frac{(-1)^{n+1}}{n\pi}, \\\\ \int_0^1 x^2 \sin(n\pi x)\, dx &\;=\; \frac{(-1)^{n+1}}{n\pi} - \frac{2[1 - (-1)^n]}{(n\pi)^3}. \end{align*} \] Subtracting yields \[ \begin{align*} B_n &\;=\; 2\!\left(\frac{(-1)^{n+1}}{n\pi} - \left[\frac{(-1)^{n+1}}{n\pi} - \frac{2[1-(-1)^n]}{(n\pi)^3}\right]\right) \\\\ &\;=\; \frac{4[1 - (-1)^n]}{(n\pi)^3}. \end{align*} \] Since \(1 - (-1)^n\) equals \(2\) when \(n\) is odd and \(0\) when \(n\) is even, only odd modes contribute, and writing \(n = 2j + 1\), \[ \begin{align*} B_{2j+1} &\;=\; \frac{8}{(2j+1)^3 \pi^3}, \\\\ B_{2j} &\;=\; 0. \end{align*} \] The solution is therefore \[ u(x, t) = \frac{8}{\pi^3} \sum_{j=0}^{\infty} \frac{1}{(2j+1)^3}\, \sin((2j+1)\pi x)\, \exp\!\left(-k (2j+1)^2 \pi^2 t\right). \] Three qualitative features illustrate the general structure. The vanishing of even-mode coefficients reflects the spatial symmetry of \(f\) about \(x = 1/2\): the initial profile is even with respect to that midpoint, and the even-indexed eigenfunctions \(\sin(2j\pi x)\) are odd about it, hence orthogonal. The coefficients decay as \(1/(2j+1)^3\), an algebraic rate set by the smoothness of \(f\) and its compatibility with the boundary conditions; smoother initial data produce more rapidly decaying coefficients. Finally, the time exponential \(\exp(-k(2j+1)^2 \pi^2 t)\) damps high modes far faster than the fundamental \((j = 0)\) mode \(\sin(\pi x) e^{-k\pi^2 t}\): for any \(t > 0\) of moderate size, the solution is well approximated by its first nonzero term and resembles a sine arch whose amplitude shrinks exponentially. The parabolic initial profile relaxes, on the timescale \(1/(k\pi^2)\), to a sine arch and thence to the zero equilibrium.

Insight: From Bounded Interval to Real Line

The series formula above is the complete solution of the heat equation on \([0, L]\) with Dirichlet boundary conditions — every initial profile decomposes into Dirichlet eigenmodes, each evolving independently by exponential damping with rate \(k(n\pi/L)^2\). The structure is rigid: a countable basis, a discrete spectrum, a sum. The next section removes the bounded interval and asks what happens on the entire real line, where the boundary is sent to infinity. The Dirichlet eigenmodes \(\sin(n\pi x/L)\) lose their normalisation as \(L \to \infty\), and the discrete spectrum thickens into a continuum. The replacement of the Fourier series by the Fourier transform — already developed in our preceding page — is precisely what is needed, and the heat equation on \(\mathbb{R}\) will be solved by a single integral kernel.

Heat Equation on the Real Line

Replacing the rod \([0, L]\) by the full line \(\mathbb{R}\) changes the analytic situation in a structurally important way. On a bounded interval, the Dirichlet eigenfunctions \(\sin(n\pi x/L)\) provide a countable orthonormal basis and the solution is a discrete sum over normal modes. On the real line, no such basis of eigenfunctions of \(-\partial_{xx}\) exists in \(L^2(\mathbb{R})\); the formal eigenfunctions \(e^{i\xi x}\) are bounded but not square-integrable, and the "expansion in eigenmodes" must be replaced by an integral over a continuous frequency variable. This is precisely the passage from Fourier series to the Fourier transform developed in the preceding page; here we put that machinery to its first serious use, solving an initial-value problem on an unbounded domain.

The problem we now consider is: \[ \begin{cases} \partial_t u = k\, \partial_{xx} u, & x \in \mathbb{R},\ t > 0, \\ u(x, 0) = f(x), & x \in \mathbb{R}, \end{cases} \] with \(k > 0\) and \(f \in L^1(\mathbb{R}) \cap L^2(\mathbb{R})\) sufficiently regular and decaying at infinity. We additionally require that \(u(\cdot, t)\) decays as \(|x| \to \infty\) for each \(t > 0\); this restriction picks out the natural function class for the problem and rules out, for instance, the addition of a constant temperature or a polynomially growing background field. The uniqueness of the solution within this class is treated alongside the main theorem below.

Fourier-Transforming the Equation

Define the spatial Fourier transform of \(u(\cdot, t)\) for each fixed \(t\): \[ \hat{u}(\xi, t) = \int_{-\infty}^{\infty} u(x, t)\, e^{ix\xi}\, dx. \] Applying the transform to both sides of the heat equation and using the differentiation property of the Fourier transform — which on \(\partial_{xx}\) produces the factor \((-i\xi)^2 = -\xi^2\) — converts the partial differential equation in \((x, t)\) into an ordinary differential equation in \(t\) alone, with \(\xi\) entering only as a parameter:

Theorem: Fourier-Transformed Heat Equation

Let \(u(x, t)\) solve the heat equation on \(\mathbb{R}\) with \(u(\cdot, t)\) decaying at infinity for each \(t \geq 0\), and let \(\hat{u}(\xi, t)\) denote its spatial Fourier transform. Then, for each \(\xi \in \mathbb{R}\), the function \(t \mapsto \hat{u}(\xi, t)\) satisfies \[ \partial_t \hat{u}(\xi, t) = -k \xi^2\, \hat{u}(\xi, t), \qquad \hat{u}(\xi, 0) = \hat{f}(\xi). \] The unique solution is \[ \hat{u}(\xi, t) = \hat{f}(\xi)\, e^{-k \xi^2 t}. \]

Proof:

Differentiation under the integral sign is justified by the regularity and decay hypotheses on \(u\) stated at the start of this section. Then \[ \begin{align*} \partial_t \hat{u}(\xi, t) &= \int_{-\infty}^{\infty} \partial_t u(x, t)\, e^{ix\xi}\, dx \\\\ &= \int_{-\infty}^{\infty} k\, \partial_{xx} u(x, t)\, e^{ix\xi}\, dx \\\\ &= k \cdot (-i\xi)^2\, \hat{u}(\xi, t) \\\\ &= -k\xi^2\, \hat{u}(\xi, t), \end{align*} \] where the third equality applies the differentiation property of the Fourier transform twice; the boundary terms that arise from the two integrations by parts, namely \([u\, e^{ix\xi}]_{-\infty}^{\infty}\) and \([\partial_x u \cdot e^{ix\xi}]_{-\infty}^{\infty}\), vanish by the assumed decay of \(u\) and \(\partial_x u\) at infinity.

For each fixed \(\xi\), the equation \(\partial_t \hat{u}(\xi, t) = -k\xi^2\, \hat{u}(\xi, t)\) is a scalar linear ODE in \(t\) with constant coefficient \(-k\xi^2\) (\(\xi\) is held fixed). Its unique solution with initial condition \(\hat{u}(\xi, 0) = \hat{f}(\xi)\) is \(\hat{u}(\xi, t) = \hat{f}(\xi) e^{-k\xi^2 t}\).

This is the central simplification: the Fourier transform diagonalizes the spatial derivative operator, turning a partial differential equation into a family of decoupled scalar ODEs indexed by frequency. Each frequency \(\xi\) decays independently at rate \(k\xi^2\) — proportional to the square of the frequency, so that high-frequency components are suppressed exponentially faster than low-frequency ones. The smoothing phenomenon is, in the frequency domain, simply multiplication by a rapidly decaying Gaussian envelope.

The Heat Kernel

To recover \(u(x, t)\) we must invert the Fourier transform of the right-hand side, that is, identify the function whose Fourier transform is \(e^{-k\xi^2 t}\). This is the moment at which the Gaussian calculation done once in the preceding page pays for itself across all of parabolic theory. There it was established (as a worked example) that \[ \widehat{e^{-x^2/(2\sigma^2)}}(\xi) = \sigma \sqrt{2\pi}\, e^{-\sigma^2 \xi^2 / 2} \] for any \(\sigma > 0\). Choosing \(\sigma^2 = 2kt\) makes the right-hand side equal \(\sigma\sqrt{2\pi}\, e^{-k\xi^2 t}\), which differs from the target \(e^{-k\xi^2 t}\) only by the normalisation factor \(\sigma\sqrt{2\pi} = \sqrt{4\pi kt}\). Dividing through gives the function whose Fourier transform is \(e^{-k\xi^2 t}\). This function is fundamental enough to deserve a name.

Definition: Heat Kernel on \(\mathbb{R}\)

The heat kernel (or fundamental solution of the heat equation) on \(\mathbb{R}\) with diffusivity \(k > 0\) is \[ K_t(x) \;=\; \frac{1}{\sqrt{4\pi k t}}\, \exp\!\left(-\frac{x^2}{4kt}\right), \qquad t > 0,\ x \in \mathbb{R}. \] Its spatial Fourier transform is \[ \widehat{K_t}(\xi) \;=\; e^{-k \xi^2 t}. \]

The Fourier identity in the definition is the substitution \(\sigma^2 = 2kt\) applied to the boxed Gaussian Fourier result of the preceding page, divided by \(\sqrt{4\pi kt}\); no fresh integration is required. In the spatial variable, \(K_t\) is a centred Gaussian bell whose width \(\sqrt{2kt}\) grows like \(\sqrt{t}\) — the characteristic length scale of diffusive spreading.

Solution by Convolution

With \(\hat{u}(\xi, t) = \hat{f}(\xi)\, \widehat{K_t}(\xi)\), the convolution theorem inverts the right-hand side at once: a pointwise product of Fourier transforms is the Fourier transform of a convolution. The solution in the spatial variable is therefore the convolution of the initial datum with the heat kernel.

Theorem: Heat Kernel Solution on \(\mathbb{R}\)

Let \(f \in L^1(\mathbb{R})\) (or more generally \(f\) bounded and continuous with sufficient decay). The unique decaying solution of the heat equation \(\partial_t u = k\, \partial_{xx} u\) on \(\mathbb{R} \times (0, \infty)\) with \(u(x, 0) = f(x)\) is the convolution of \(f\) with the heat kernel: \[ u(x, t) \;=\; (f * K_t)(x) \;=\; \int_{-\infty}^{\infty} K_t(x - y)\, f(y)\, dy. \] Explicitly, \[ u(x, t) \;=\; \frac{1}{\sqrt{4\pi kt}} \int_{-\infty}^{\infty} \exp\!\left(-\frac{(x - y)^2}{4kt}\right) f(y)\, dy. \]

Proof:

From the Fourier-transformed heat equation, \(\hat{u}(\xi, t) = \hat{f}(\xi)\, e^{-k\xi^2 t} = \hat{f}(\xi)\, \widehat{K_t}(\xi)\). By the convolution theorem applied in reverse, \(\hat{f}(\xi)\, \widehat{K_t}(\xi) = \widehat{f * K_t}(\xi)\). By the injectivity of the Fourier transform on the relevant function class, \(u(\cdot, t) = f * K_t\). The integral form is the definition of convolution, and the symmetry \(K_t(x - y) = K_t(y - x)\) (since \(K_t\) is even) makes the two integrals interchangeable.

Uniqueness within the class of decaying solutions follows from a standard argument we do not detail here; we note only that the decay condition is essential — without it, non-trivial solutions of the same initial-value problem exist (a classical counterexample due to Tychonoff), so the "decaying" qualifier in the theorem is doing genuine work.

The convolution form admits a transparent physical reading: the temperature at \(x\) at time \(t\) is a weighted average of the initial temperature, with weights given by the heat kernel centred at \(x\). The kernel is sharply peaked at \(y = x\) for small \(t\) and broadens with width \(\sqrt{2kt}\), so that nearby initial values contribute heavily at first and progressively more distant values contribute as time advances. In the limit \(t \to 0^+\), \(K_t\) concentrates into a Dirac mass at \(0\), and the convolution recovers \(f(x)\) — confirming that the initial condition is attained in a distributional sense.

Insight: The Heat Kernel as a Gaussian Density

The expression for \(K_t\) is not a new special function: setting \(\sigma^2 = 2kt\), \[ K_t(x) \;=\; \frac{1}{\sqrt{2\pi \sigma^2}}\, \exp\!\left(-\frac{x^2}{2\sigma^2}\right), \] which is the probability density function of the normal distribution with mean \(0\) and variance \(2kt\). The identity \(\int K_t(x)\, dx = 1\) — which we will exploit in the next section as conservation of total heat — is, viewed in this light, the Gaussian integral.

The agreement is not coincidence. The heat equation governs the time evolution of the probability density of Brownian motion: if \(B_t\) is a Brownian motion with variance \(2kt\) at time \(t\) (a constant rescaling of standard Brownian motion), then its density satisfies \(\partial_t p = k\, \partial_{xx} p\) with initial mass concentrated at the origin, and the unique solution is \(p(x, t) = K_t(x)\). The convolution \(u(x, t) = (f * K_t)(x)\) admits the probabilistic reading \(u(x, t) = \mathbb{E}[f(x + B_t)]\): temperature at \((x, t)\) is the expected value of the initial profile sampled along a random walk emanating from \(x\). The width \(\sqrt{2kt}\) is then nothing but the standard deviation of the random walk's displacement, growing as \(\sqrt{t}\) — the hallmark of diffusive scaling, in contrast to the linear-in-\(t\) growth of ballistic motion. The full development of Brownian motion, the Itô calculus that formalizes this connection, and the Fokker-Planck equation that generalizes it to drift-diffusion processes belong to more advanced treatments; what is established here is the bridge from a purely analytic computation to a probabilistic object, set up so that the bridge can be crossed in either direction when the time comes.

Properties of the Heat Kernel

The convolution solution \(u(x, t) = (f * K_t)(x)\) places the qualitative behaviour of every initial-value problem on the same Gaussian footing: all features of the solution can be read off from a single object, the heat kernel \(K_t\). Four properties — mass conservation, instantaneous smoothing, the maximum principle, and the ill-posedness of the backward problem — capture the essential character of parabolic evolution and, taken together, distinguish the heat equation from the other two members of the parabolic-hyperbolic-elliptic trichotomy.

Conservation of Total Heat

The total integral of the temperature is preserved in time. For the heat kernel itself, this is the statement that \(K_t\) is a probability density: the Gaussian integral identity gives \[ \int_{-\infty}^{\infty} K_t(x)\, dx \;=\; \frac{1}{\sqrt{4\pi kt}} \int_{-\infty}^{\infty} e^{-x^2/(4kt)}\, dx \;=\; \frac{1}{\sqrt{4\pi kt}} \cdot \sqrt{4\pi kt} \;=\; 1 \] for every \(t > 0\). For a general initial datum \(f \in L^1(\mathbb{R})\), Fubini's theorem applied to the convolution \(u = f * K_t\) gives \[ \begin{align*} \int_{-\infty}^{\infty} u(x, t)\, dx &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} K_t(x - y)\, f(y)\, dy\, dx \\\\ &= \int_{-\infty}^{\infty} f(y) \left( \int_{-\infty}^{\infty} K_t(x - y)\, dx \right) dy \\\\ &= \int_{-\infty}^{\infty} f(y)\, dy, \end{align*} \] so \(\int u(\cdot, t) = \int f\) for all \(t > 0\). Physically: heat is neither created nor destroyed; it merely redistributes itself across the line. The same identity, viewed probabilistically, says that the diffusing density remains a probability density at every time, a fact that lies at the foundation of the Fokker-Planck equation.

Instantaneous Smoothing

The most characteristic feature of parabolic evolution is that the solution becomes smooth instantly, no matter how rough the initial datum is. The mechanism is most transparent in the frequency domain. From \(\hat{u}(\xi, t) = \hat{f}(\xi)\, e^{-k\xi^2 t}\), the high-frequency content of \(u(\cdot, t)\) is suppressed by a Gaussian factor that decays super-exponentially in \(|\xi|\). Concretely, for any \(t > 0\) and any \(n \in \mathbb{N}\), \[ |\xi|^n\, |\hat{u}(\xi, t)| \;\leq\; \|f\|_{L^1} \cdot |\xi|^n\, e^{-k\xi^2 t} \] (using the elementary bound \(|\hat{f}(\xi)| \leq \|f\|_{L^1}\) valid for any \(f \in L^1\)). The right-hand side is integrable in \(\xi\) for every \(n\), since polynomial growth is beaten by Gaussian decay.

To conclude smoothness from this integrability, recall the Fourier inversion formula \[ u(x, t) \;=\; \frac{1}{2\pi}\int_{-\infty}^{\infty} \hat{u}(\xi, t)\, e^{-ix\xi}\, d\xi. \] Differentiating \(n\) times in \(x\) under the integral sign brings down a factor \((-i\xi)^n\): \[ \partial_x^n u(x, t) \;=\; \frac{1}{2\pi}\int_{-\infty}^{\infty} (-i\xi)^n\, \hat{u}(\xi, t)\, e^{-ix\xi}\, d\xi. \] Differentiation under the integral is justified precisely because the bound above makes the integrand absolutely integrable in \(\xi\), uniformly in \(x\); the same bound also makes the right-hand side a continuous function of \(x\) by the dominated convergence theorem. Hence \(\partial_x^n u(\cdot, t)\) exists and is continuous for every \(n\), so \(u(\cdot, t)\) is of class \(C^n\) for every \(n\), and therefore of class \(C^\infty\) — even if \(f\) is merely \(L^1\) with jump discontinuities or worse. The smoothing acts in an arbitrarily short time: any \(t > 0\), however small, produces an infinitely differentiable function.

The same conclusion can be reached from the convolution side: \(K_t \in C^\infty\) and decays together with all its derivatives, and convolution with a smooth rapidly decaying kernel inherits the kernel's smoothness regardless of the regularity of the other factor. The frequency-domain argument is preferred here because it identifies the precise quantitative mechanism — Gaussian damping of high frequencies — and makes the rate of smoothing explicit.

The Maximum Principle

For this subsection we strengthen our hypotheses and assume that the initial datum \(f\) is bounded on \(\mathbb{R}\), so that \(\inf_{y \in \mathbb{R}} f(y)\) and \(\sup_{y \in \mathbb{R}} f(y)\) are both finite real numbers. Boundedness is essential for the statement that follows: the argument interprets \(u(x, t)\) as a weighted average of values of \(f\), and a weighted average is bounded by the input range only when that range is finite.

Because \(K_t\) is non-negative and integrates to one, the convolution \(u(x, t) = \int K_t(x - y)\, f(y)\, dy\) is a weighted average of the values of \(f\), with weights given by the heat kernel. A weighted average cannot exceed the supremum of its inputs nor fall below their infimum: \[ \inf_{y \in \mathbb{R}} f(y) \;\leq\; u(x, t) \;\leq\; \sup_{y \in \mathbb{R}} f(y) \qquad \text{for all } x \in \mathbb{R},\ t > 0. \] This is the maximum principle on the real line. It states that the temperature at any point at any time is bounded above by the maximum initial temperature and below by the minimum — heat does not spontaneously concentrate to produce hot spots hotter than what was initially present, nor cold spots colder. The result is genuinely a statement about parabolic evolution: the wave equation has no analogous pointwise estimate — its conserved quantity is the global mechanical energy rather than a pointwise bound — and the Laplace equation has a related but distinct boundary-data form.

The Backward Problem is Ill-Posed

Smoothing is irreversible. The same Gaussian factor \(e^{-k\xi^2 t}\) that damps high frequencies in forward time amplifies them exponentially when run backwards. To attempt to recover \(u(\cdot, 0) = f\) from knowledge of \(u(\cdot, T)\) at some later time \(T > 0\), one would need to multiply \(\hat{u}(\xi, T)\) by \(e^{+k\xi^2 T}\). At frequency \(\xi = 10\) with \(kT = 0.14\), the amplification factor is \(e^{14} \approx 1.2 \times 10^6\); at \(\xi = 20\) it exceeds \(10^{24}\). Any high-frequency noise present in \(u(\cdot, T)\) — whether from measurement error, numerical round-off, or any other source — is amplified beyond all bounds. The map \(f \mapsto u(\cdot, T)\) is well-defined and even smoothing, but its inverse fails to be continuous on any reasonable function space; this is what is meant by saying the backward heat equation is ill-posed in the sense of Hadamard.

The asymmetry of forward and backward evolution — the arrow of time built into parabolic dynamics — is mathematically the same phenomenon as the smoothing discussed above, seen from the opposite direction. Forward in time, information about fine spatial structure is irretrievably destroyed by exponential damping; backward in time, recovering it would require an exponentially unstable amplification. This stands in sharp contrast to the time-reversibility of the wave equation, where the backward problem is as well-posed as the forward one and no information is lost. In the context of diffusion generative models, the forward (noising) process is precisely the well-posed direction; the difficulty of reversing it — of denoising a corrupted sample back to its origin — is the heat equation's ill-posed backward problem, which a neural network learns to approximate by exploiting strong prior assumptions on the data distribution.