Riemann Integration

Riemann Integration Improper Riemann Integration Limitation of the (Improper) Riemann integration

Riemann Integration

The Riemann Integral is the standard "grid-based" approach to integration taught in introductory calculus. It relies on partitioning the domain (\(x\)-axis) into increasingly fine intervals to approximate the area under a curve. While intuitive, this method is fundamentally tied to the geometry of intervals, which limits its ability to handle functions defined on more complex sets or those with highly irregular structures. Understanding these limitations is the primary motivation for moving toward Measure Theory.

To begin our formalization, we must first define how we "slice" the input space. This is achieved through the concept of a partition, which divides a continuous interval into a finite set of discrete sub-segments.

Definition: Partition and Subintervals

Let \(f: [a, b] \to \mathbb{R} \) be a bounded function on a closed interval \([a, b]\). A partition \(\mathcal{P}\) of \([a, b]\) is a finite sequence of points: \[ \mathcal{P} = \{x_0, x_1, x_2, \cdots, x_n\} \] where \(a = x_0 < x_1 < x_2 < \cdots < x_n = b\). The resulting subintervals are \([x_{i-1}, x_i]\) for \(i = 1, \cdots, n\).

Once we have partitioned the domain, we need a way to approximate the function's behavior within each sub-segment. By taking the extreme values (the supremum and infimum) of the function on each interval, we can construct two bounding rectangles that "sandwich" the true area.

Definition: Lower & Upper Sums

For each subinterval, let \(m_i\) be the infimum (greatest lower bound) and \(M_i\) be the supremum (least upper bound) of \(f(x)\): \[ \begin{align*} m_i &= \inf_{x\in [x_{i-1}, x_i]} f(x), \\\\ M_i &= \sup_{x\in [x_{i-1}, x_i]} f(x). \end{align*} \] Moreover, we define the Lower Sum \(L(f, \mathcal{P})\) and Upper Sum \(U(f, \mathcal{P})\) as: \[ \begin{align*} L(f, \mathcal{P}) &= \sum_{i=1}^n m_i (x_i - x_{i-1}), \\ \\ U(f, \mathcal{P}) &= \sum_{i=1}^n M_i (x_i - x_{i-1}). \end{align*} \]

Integrability is achieved when these upper and lower bounds converge to the same value as we refine our partition. This convergence implies that the "gap" between our overestimation and underestimation vanishes, leaving us with a unique value for the area.

Convention. We define Riemann integrability via the Darboux formulation — equality of the supremum of lower sums and the infimum of upper sums. The equivalent formulation via tagged Riemann sums \(\sum_i f(\xi_i)(x_i - x_{i-1})\) with arbitrary tags \(\xi_i \in [x_{i-1}, x_i]\) is a theorem in this setting, established below. For the basic calculus-level topics of this page, the two formulations are interchangeable.

Definition: Riemann Integrability

The lower Riemann integral and the upper Riemann integral of \(f\) over \([a, b]\) are as follows respectively: \[ \begin{align*} &\underline{\int_a^b} f(x)dx = \sup_{\mathcal{P}} L(f, \mathcal{P}), \\\\ &\overline{\int_a^b} f(x)dx = \inf_{\mathcal{P}} U(f, \mathcal{P}). \end{align*} \] Then, a function \(f\) is said to be Riemann integrable on \([a, b]\) if \[ \underline{\int_a^b} f(x)dx = \overline{\int_a^b} f(x)dx = \alpha. \] When this holds, the common value \(\alpha\) is called the Riemann integral of \(f\) over \([a, b]\): \[ \int_a^b f(x)dx = \alpha. \]

In practice, we often describe this refinement process using the norm of the partition, denoted as \(\| \mathcal{P} \|\), which represents the width of the largest sub-segment. As this width approaches zero, the Riemann sums converge to the integral.

Theorem: Darboux's Criterion (Partition-Norm Formulation)

Let \( \| \mathcal{P} \| = \max_{1 \leq i \leq n} (x_i - x_{i-1}) \). A bounded function \(f\) is Riemann integrable on \([a, b]\) if and only if \[ \lim_{ \| \mathcal{P} \| \to 0} L(f, \mathcal{P}) = \lim_{ \| \mathcal{P} \| \to 0} U(f, \mathcal{P}) = \alpha. \]

Proof (sketch):

("If" direction.) Suppose the two limits exist and coincide at \(\alpha\). For every partition \(\mathcal{P}\) we have \(L(f, \mathcal{P}) \leq \underline{\int} \leq \overline{\int} \leq U(f, \mathcal{P})\). Taking \(\|\mathcal{P}\| \to 0\) on both sides squeezes the two Darboux integrals to the common value \(\alpha\). Hence \(\underline{\int} = \overline{\int} = \alpha\), so \(f\) is Riemann integrable in the sense of the definition above.

("Only if" direction.) Suppose \(f\) is Riemann integrable with common value \(\alpha\). Given \(\varepsilon > 0\), by definition of the Darboux integrals as sup/inf we may choose a partition \(\mathcal{P}_\varepsilon = \{y_0, \ldots, y_N\}\) such that \[ U(f, \mathcal{P}_\varepsilon) - L(f, \mathcal{P}_\varepsilon) < \varepsilon / 2. \] Refinement shrinks (or preserves) this gap: if \(\mathcal{Q}\) refines \(\mathcal{P}\), then \(L(f, \mathcal{P}) \leq L(f, \mathcal{Q})\) and \(U(f, \mathcal{Q}) \leq U(f, \mathcal{P})\).

Now let \(\mathcal{P}\) be any partition and set \(\mathcal{Q} = \mathcal{P} \cup \mathcal{P}_\varepsilon\). The subintervals of \(\mathcal{P}\) that contain no interior point of \(\mathcal{P}_\varepsilon\) (i.e., none of \(y_1, \ldots, y_{N-1}\)) are not split by \(\mathcal{Q}\) and contribute identically to \(U(f, \mathcal{P})\) and \(U(f, \mathcal{Q})\). At most \(N - 1\) subintervals of \(\mathcal{P}\) are split. On each split subinterval (length \(\leq \|\mathcal{P}\|\)), the discrepancy between \(U(f, \mathcal{P})\) and \(U(f, \mathcal{Q})\) is at most \(\mathrm{osc}(f) \cdot \|\mathcal{P}\|\), where \(\mathrm{osc}(f) = \sup_{[a,b]} f - \inf_{[a,b]} f\). Hence \[ U(f, \mathcal{P}) - U(f, \mathcal{Q}) \leq (N - 1) \cdot \|\mathcal{P}\| \cdot \mathrm{osc}(f), \] and by a symmetric argument, \[ L(f, \mathcal{Q}) - L(f, \mathcal{P}) \leq (N - 1) \cdot \|\mathcal{P}\| \cdot \mathrm{osc}(f). \]

Combining these with \(U(f, \mathcal{Q}) - L(f, \mathcal{Q}) \leq U(f, \mathcal{P}_\varepsilon) - L(f, \mathcal{P}_\varepsilon) < \varepsilon / 2\), \[ U(f, \mathcal{P}) - L(f, \mathcal{P}) \;\leq\; 2(N-1) \cdot \|\mathcal{P}\| \cdot \mathrm{osc}(f) + \varepsilon / 2. \] Choosing \(\|\mathcal{P}\|\) small enough that the first term is below \(\varepsilon / 2\) gives \(U(f, \mathcal{P}) - L(f, \mathcal{P}) < \varepsilon\). Since \(\varepsilon\) was arbitrary, both sums converge to the common Darboux value \(\alpha\) as \(\|\mathcal{P}\| \to 0\).

Remark: Equivalence with Tagged Riemann Sums

Classically, the Riemann integral is defined via tagged sums \(S(f, \mathcal{P}, \xi) = \sum_i f(\xi_i)(x_i - x_{i-1})\) with arbitrary tags \(\xi_i \in [x_{i-1}, x_i]\), requiring convergence to a common value as \(\|\mathcal{P}\| \to 0\) regardless of the tag choice. Since \(L(f, \mathcal{P}) \leq S(f, \mathcal{P}, \xi) \leq U(f, \mathcal{P})\) for every tag choice, the theorem above shows the Darboux formulation (adopted on this page) and the tagged-sum formulation are equivalent. We continue with Darboux as it is cleaner to manipulate.

Understanding when a function is integrable is key. While some properties are strict requirements (necessary conditions), others provide a guarantee that the integral exists (sufficient conditions).

Requirement and Sufficient Conditions
  • Prerequisite (Boundedness):
    The Darboux definition above presumes \(f\) to be bounded — otherwise the suprema \(M_i\) or infima \(m_i\) on some subinterval would be \(\pm\infty\), making the Darboux sums ill-defined as finite numbers. Any sensible extension to unbounded integrands must therefore be handled through improper integrals (next section) or through the more flexible Lebesgue framework (developed in the measure-theoretic pages).
  • Sufficient Condition (Continuity):
    If \(f\) is continuous on \([a, b]\), it is guaranteed to be Riemann integrable.
  • Sufficient Condition (Monotonicity):
    If \(f\) is monotonic on \([a, b]\), it is guaranteed to be Riemann integrable, even if it has many points of discontinuity.
Justifications:

Monotonicity is sufficient. Assume \(f\) is non-decreasing (the non-increasing case is symmetric). For any partition \(\mathcal{P}\), monotonicity gives \(m_i = f(x_{i-1})\) and \(M_i = f(x_i)\). Hence \[ U(f, \mathcal{P}) - L(f, \mathcal{P}) = \sum_{i=1}^n \bigl(f(x_i) - f(x_{i-1})\bigr)(x_i - x_{i-1}) \leq \|\mathcal{P}\| \sum_{i=1}^n \bigl(f(x_i) - f(x_{i-1})\bigr) = \|\mathcal{P}\| \bigl(f(b) - f(a)\bigr), \] which tends to \(0\) as \(\|\mathcal{P}\| \to 0\). This forces the upper and lower integrals to coincide.

Continuity is sufficient. The proof uses uniform continuity of \(f\) on the compact interval \([a, b]\), which lets us choose \(\|\mathcal{P}\|\) small enough that \(M_i - m_i < \varepsilon\) uniformly on every subinterval. Granting this, we conclude \[ U(f, \mathcal{P}) - L(f, \mathcal{P}) = \sum_{i=1}^n (M_i - m_i)(x_i - x_{i-1}) \;\leq\; \varepsilon \, (b - a), \] which can be made arbitrarily small. Both uniform continuity and the role of compactness are developed later in the foundations-of-analysis track.

However, the Riemann integral has its limits — specifically when dealing with functions that are "too" discontinuous or intervals that are unbounded. To address these, we must extend our toolkit to include improper integrals.

Improper Riemann Integration

In many cases, especially in statistics, a function may fail to be Riemann integrable due to unbounded intervals or singularities. However, such functions can often still be integrated using the improper Riemann integral, which extends the classical Riemann integral by incorporating limits to handle these issues. Fortunately, the improper Riemann integral covers a wide range of integrals commonly encountered in statistics and machine learning.

Case 1: One side of the interval is unbounded

Let \(f: [a, \infty) \to \mathbb{R} \) be a function. \(f\) is improperly integrable as follows: \[ \int_a^{\infty} f(x)dx = \lim_{b \to \infty} \int_a^b f(x)dx. \] if the limit exists and is finite. Similarly, for \(f: (-\infty, b] \to \mathbb{R} \), \[ \int_{-\infty}^b f(x)dx = \lim_{a \to -\infty} \int_a^b f(x)dx. \]

Example: \[ \begin{align*} \int_0^{\infty} e^{-x} dx &= \lim_{b \to \infty} \int_0^b e^{-x} dx \\\\ &= \lim_{b \to \infty} \left[-e^{-x}\right]_0^b \\\\ &= \lim_{b \to \infty} \left(-e^{-b} - (-e^{0})\right) \\\\ &= \lim_{b \to \infty} \left(-e^{-b} + 1\right) \\\\ &= 0 + 1 = 1. \end{align*} \]

Case 2: Both sides of the interval are unbounded

For a function \(f: \mathbb{R} \to \mathbb{R}\), we define the integral over the entire real line by splitting it at any real number \(c\) (commonly \(c = 0\)): \[ \int_{-\infty}^{\infty} f(x)dx = \int_{-\infty}^c f(x)dx + \int_c^{\infty} f(x)dx \] More formally, this is defined using two independent limits: \[ \int_{-\infty}^{\infty} f(x)dx = \lim_{a \to -\infty} \int_a^c f(x)dx + \lim_{b \to \infty} \int_c^b f(x)dx \] The integral exists (converges) if and only if both limits exist and are finite independently.

Example: Divergence and Cauchy Principal Value

A classic example is the Gaussian integral: \[ \int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}. \] (The value \(\sqrt{\pi}\) is computed in the Gaussian distribution page via a polar-coordinate change of variables; here we only use the fact that the integral converges.)

Important Caution:
The improper integral is not defined as \(\lim_{R \to \infty} \int_{-R}^{R} f(x) dx\). That symmetric limit is known as the Cauchy Principal Value, which can exist even when the improper integral does not. Consider \(f(x) = \frac{x}{1+x^2}\): \[ \begin{align*} \int_{-\infty}^{\infty} \frac{x}{1+x^2}dx &= \lim_{a \to -\infty} \int_a^0 \frac{x}{1+x^2}dx + \lim_{b \to \infty} \int_0^b \frac{x}{1+x^2}dx \\\\ &= \lim_{a \to -\infty} \left[ \frac{1}{2}\ln(1+x^2) \right]_a^0 + \lim_{b \to \infty} \left[ \frac{1}{2}\ln(1+x^2) \right]_0^b \\\\ &= \lim_{a \to -\infty} \left( -\frac{1}{2}\ln(1+a^2) \right) + \lim_{b \to \infty} \left( \frac{1}{2}\ln(1+b^2) \right) \end{align*} \] As \(a \to -\infty\), the first term diverges to \(-\infty\); independently, as \(b \to \infty\), the second term diverges to \(+\infty\). Since the definition of the improper integral requires both limits to exist and be finite independently, the integral does not converge. This explains why the Cauchy distribution has no defined mean.

Note that the Cauchy principal value \(\lim_{R \to \infty} \int_{-R}^{R} \frac{x}{1+x^2}dx = 0\) by symmetry, which can be misleading if the independent nature of the limits is ignored.

Case 3: Function has a singularity (becomes unbounded) on a finite interval

The improper integral also extends to functions that are not defined or become unbounded at certain points within a finite interval. It is important to note that this does not require the function to be continuous; it only requires the function to be Riemann integrable on every closed subinterval that excludes the singularity.

If \(f\) has a singularity at a point \(c \in (a, b)\), the improper integral is defined by approaching \(c\) from both sides with two independent limits: \[ \int_a^b f(x)dx = \lim_{\epsilon \to 0^+} \int_a^{c-\epsilon} f(x)dx + \lim_{\delta \to 0^+} \int_{c+\delta}^b f(x)dx \] The integral converges only if both limits exist and are finite independently. The symmetric choice \(\epsilon = \delta\), which would produce a Cauchy-principal-value-style cancellation, is not the definition of the improper integral — same caution as in Case 2 above.

If the singularity occurs at an endpoint, we define: \[ \int_a^b f(x)dx = \lim_{\epsilon \to 0^+} \int_{a+\epsilon}^b f(x)dx \] if \(f\) has a singularity at \(a\), or \[ \int_a^b f(x)dx = \lim_{\epsilon \to 0^+} \int_a^{b-\epsilon} f(x)dx \] if \(f\) has a singularity at \(b\).

Example: Singularity at an Endpoint Consider the function \(f(x) = \frac{1}{\sqrt{x}}\) on the interval \([0, 1]\). The function is unbounded at \(x = 0\): \[ \begin{align*} \int_0^1 \frac{1}{\sqrt{x}}dx &= \lim_{\epsilon \to 0^+} \int_{\epsilon}^1 x^{-1/2} dx \\\\ &= \lim_{\epsilon \to 0^+} \left[2\sqrt{x}\right]_{\epsilon}^1 \\\\ &= \lim_{\epsilon \to 0^+} \left(2\sqrt{1} - 2\sqrt{\epsilon}\right) \\\\ &= 2 - 0 = 2. \end{align*} \] Even though the function "shoots up" to infinity at the origin, the area under the curve remains finite.

Limitation of the (Improper) Riemann integration

Consider the Dirichlet function on \([0, 1]\):

Definition: Dirichlet function

On \(x \in [0, 1]\), \[ f(x)= \begin{cases} 1 &\text{if \(x \in \mathbb{Q}\)} \\ 0 &\text{if \(x \in \mathbb{R} \setminus \mathbb{Q}\)} \end{cases} \]

Since both the rational \(\mathbb{Q}\) and the irrational \(\mathbb{R} \setminus \mathbb{Q}\) numbers are dense in the interval \([0, 1]\), every subinterval \([x_{i-1}, x_i]\) of a partition \(\mathcal{P}\) — no matter how small — contains both. This leads to the fact that the upper and lower Riemann sums never converge to the same value:

Consequently, for every partition \(\mathcal{P}\), \[ U(f, \mathcal{P}) = \sum_i M_i (x_i - x_{i-1}) = 1, \qquad L(f, \mathcal{P}) = 0. \] Taking the infimum (resp. supremum) over all partitions, the upper Darboux integral equals \(1\) while the lower equals \(0\). These disagree, so the Dirichlet function is not Riemann integrable.

Moreover, the Dirichlet function is discontinuous everywhere in \([0, 1]\). While improper Riemann integrals can handle isolated discontinuities (such as singularities), they cannot handle an infinite, dense set of discontinuities like this.

The failure here stems from the Riemann integral's inability to distinguish between the "size" of \(\mathbb{Q}\) and \(\mathbb{R} \setminus \mathbb{Q}\). While both are dense, Measure Theory reveals that \(\mathbb{Q}\) is a set of measure zero.

The Lebesgue integral will allow us to "ignore" these points of measure zero, yielding an integral of 0 for the Dirichlet function - a result that matches our probabilistic intuition that the "weight" of rational points on the real line is negligible.

Looking Ahead: Lebesgue's Criterion for Riemann Integrability

The structural reason why "too many" discontinuities break Riemann integrability is captured by the following theorem: a bounded function \(f\) on \([a, b]\) is Riemann integrable if and only if the set of points where \(f\) is discontinuous has Lebesgue measure zero.

The Dirichlet function is discontinuous at every point of \([0, 1]\) — a set of Lebesgue measure one — so it fails this criterion. By contrast, a monotone function can only be discontinuous at a countable set of points (which has measure zero), explaining why monotonicity suffices.

Definition: Dense Set (Side Note)

A set \(A \subset \mathbb{R}\) is said to be dense in \(\mathbb{R}\) if for any \(x, y \in \mathbb{R}\) with \(x < y\), there exists \(a \in A\) such that \(x < a < y\).

Equivalently, every non-empty open interval in \(\mathbb{R}\) contains at least one point from \(A\).