Gamma & Beta Distribution

Gamma Function Gamma Distribution Beta Function Beta Distribution Interactive Visualization

Gamma Function

Before introducing the Gamma and Beta distributions, we need the special functions that serve as their building blocks. The gamma function extends the factorial to non-integer (and even complex) arguments. This generalization is essential: the normalizing constants of many continuous distributions involve factorials of non-integer parameters, and the gamma function provides the mathematically rigorous way to handle them.

Definition: Gamma Function

The gamma function is defined as: \[ \Gamma(z) = \int_{0}^{\infty} t^{z-1}e^{-t}\, dt, \quad z > 0. \] We work with real positive \(z\) throughout this page; the integral in fact converges for all complex \(z\) with \(\operatorname{Re}(z) > 0\), and the gamma function extends to a meromorphic function on the entire complex plane via analytic continuation. We do not use the complex extension here.

Theorem: Gamma Function Extends the Factorial

For any positive integer \(n\), \[ \Gamma (n+1) = n! \]

Proof:

We first establish the recurrence \(\Gamma(n+1) = n\,\Gamma(n)\) for any \(n > 0\) by integration by parts on \[ \Gamma(n+1) = \int_0^\infty t^{n} e^{-t}\, dt. \] Take \(u = t^n\) (so \(du = n\,t^{n-1}\,dt\)) and \(dv = e^{-t}\,dt\) (so \(v = -e^{-t}\)). Then \[ \begin{align*} \Gamma(n+1) &= \bigl[-t^n e^{-t}\bigr]_0^\infty + \int_0^\infty n\,t^{n-1} e^{-t}\,dt \\\\ &= n\int_0^\infty t^{n-1} e^{-t}\,dt \\\\ &= n\,\Gamma(n), \end{align*} \] using that the boundary term vanishes: \(t^n e^{-t} \to 0\) as \(t \to \infty\) (exponential decay dominates polynomial growth) and \(t^n e^{-t} \to 0\) as \(t \to 0^+\) for any \(n > 0\).

We now prove \(\Gamma(n+1) = n!\) for positive integers \(n\) by induction on \(n\).

Base case (\(n = 1\)): \[ \Gamma(2) = \int_0^\infty t\,e^{-t}\,dt = \bigl[-t\,e^{-t}\bigr]_0^\infty + \int_0^\infty e^{-t}\,dt = 0 + 1 = 1 = 1!, \] again by integration by parts with \(u = t\), \(dv = e^{-t}\,dt\).

Inductive step: Assume \(\Gamma(k+1) = k!\) for some integer \(k \geq 1\). By the recurrence, \[ \Gamma(k+2) = (k+1)\,\Gamma(k+1) = (k+1)\,k! = (k+1)!. \]

By induction, \(\Gamma(n+1) = n!\) for all positive integers \(n\); equivalently, \(\Gamma(n) = (n-1)!\).

Let's compute the most iconic value of the gamma function. \[ \Gamma (\frac{1}{2}) = \int_{0} ^\infty t^{\frac{-1}{2}}e^{-t} dt \] Let \(t = u^2\), so \(dt =2udu\). Then \[ \begin{align*} \Gamma (\frac{1}{2}) &= \int_{0} ^\infty u^{2\cdot \frac{-1}{2}}e^{-u^2} 2udu \\\\ &= 2\int_{0} ^\infty e^{-u^2} du \\\\ &= \int_{-\infty} ^\infty e^{-u^2} du \tag{1} \\\\ &= \sqrt{\pi} \end{align*} \] In line (1) we used the symmetry of the even function \(e^{-u^2}\) to extend the integral to the full real line, and the final equality is the famous Gaussian integral: \[ \int_{-\infty} ^\infty e^{-x^2} dx = \sqrt{\pi}. \]

This Gaussian integral is proved independently (via polar coordinates) on the normal distribution page; we use it here as an external fact.

Although the factorial \(n!\) is defined only for non-negative integers, the gamma function extends it to all positive arguments via \[ (z)! \;:=\; \Gamma(z+1). \] The recurrence \(\Gamma(z+1) = z\,\Gamma(z)\) — proved above for integers but valid for all \(z > 0\) by the same integration-by-parts argument — translates into the familiar factorial-style recursion \((z)! = z\,(z-1)!\). Combined with \(\Gamma(1/2) = \sqrt{\pi}\), this yields: \[ \begin{align*} \Gamma(3/2) &= \tfrac{1}{2}\,\Gamma(1/2) = \tfrac{1}{2}\sqrt{\pi}, \\\\ \Gamma(5/2) &= \tfrac{3}{2}\,\Gamma(3/2) = \tfrac{3}{4}\sqrt{\pi}, \\\\ \Gamma(7/2) &= \tfrac{5}{2}\,\Gamma(5/2) = \tfrac{15}{8}\sqrt{\pi}. \end{align*} \]

BONUS: Volume of the \(n\)-dimensional ball of radius \(r\): \[ V_n(r) = \frac{\pi^{n/2}}{\Gamma\!\left(\tfrac{n}{2} + 1\right)}\,r^n. \] (Recall that the sphere is the surface and the ball is the solid region it bounds.)

Gamma Distribution

With the gamma function in hand, we can now define a flexible family of continuous distributions on \([0, \infty )\). The gamma distribution arises naturally as the distribution of waiting times: if events occur independently at a constant average rate, the time until the \(\alpha\)-th event follows a gamma distribution.

Definition: Gamma Distribution

A random variable \(X\) has the gamma distribution with shape parameter \(\alpha > 0\) and rate parameter \(\beta > 0\) if its p.d.f. is: \[ f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}, \quad x \geq 0, \quad f(x) = 0 \text{ for } x < 0. \tag{2} \] We write \(X \sim \text{Gamma}(\alpha, \beta)\). The mean and variance are: \[ \mathbb{E}[X] = \frac{\alpha}{\beta}, \qquad \operatorname{Var}(X) = \frac{\alpha}{\beta^2}. \]

That \(f\) integrates to \(1\) follows from the substitution \(t = \beta x\): \[ \int_0^\infty \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\,dx = \frac{\beta^\alpha}{\Gamma(\alpha)} \cdot \frac{1}{\beta^\alpha} \int_0^\infty t^{\alpha-1} e^{-t}\,dt = \frac{\Gamma(\alpha)}{\Gamma(\alpha)} = 1. \]

The mean and variance of the gamma distribution:

Using definition of the mean, \[ \begin{align*} \mathbb{E}[X] &= \int_{0}^\infty x f(x)dx \\\\ &= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^\infty x^{\alpha}e^{-\beta x} dx \end{align*} \] Let \(t = \beta x\) and so \(x = \frac{t}{\beta}\) and \(dx = \frac{1}{\beta}dt\). Substituting these into the equation: \[ \begin{align*} \mathbb{E}[X] &= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^\infty (\frac{t}{\beta})^{\alpha}e^{-t} \frac{1}{\beta}dt \\\\ &= \frac{\beta^\alpha}{\Gamma(\alpha) \beta^{\alpha +1}} \int_{0}^\infty t^{\alpha}e^{-t} dt \\\\ &= \frac{\Gamma (\alpha +1)}{\Gamma(\alpha) \beta} \\\\ \end{align*} \] Since \(\Gamma(\alpha +1) = \alpha \Gamma(\alpha)\), \[ \begin{align*} \mathbb{E}[X] &= \frac{\alpha \Gamma (\alpha)}{\Gamma(\alpha) \beta} \\\\ &= \frac{\alpha}{\beta} \end{align*} \] We use the fact \(\operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X] )^2\).

Compute\(\mathbb{E}[X^2]\) applying the same substitution as we did above: \[ \begin{align*} \mathbb{E}[X^2] &= \int_{0}^\infty x^2 f(x)dx \\\\ &= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^\infty x^{\alpha+1}e^{-\beta x} dx \\\\ &= \frac{\beta^\alpha}{\Gamma(\alpha) \beta^{\alpha +2}} \int_{0}^\infty t^{\alpha+1}e^{-t} dt \\\\ &= \frac{\Gamma (\alpha +2)}{\Gamma(\alpha) \beta^2}. \end{align*} \] Since \(\Gamma(\alpha +2) = (\alpha +1)\Gamma (\alpha +1) = (\alpha +1 )\alpha \Gamma(\alpha)\), \[ \begin{align*} \mathbb{E}[X^2] &= \frac{(\alpha +1 )\alpha \Gamma(\alpha)}{\Gamma(\alpha) \beta^2} \\\\ &= \frac{\alpha(\alpha +1 )}{\beta^2} \end{align*} \] Substitute the results: \[ \begin{align*} \operatorname{Var}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X] )^2 \\\\ &= \frac{\alpha(\alpha +1 )}{\beta^2} - (\frac{\alpha}{\beta})^2 \\\\ &= \frac{\alpha}{\beta^2} \end{align*} \]

Many references use the shape–scale parametrization \(X \sim \text{Gamma}(k, \theta)\) with \(k = \alpha\) and \(\theta = 1/\beta\), in which the density becomes \[ f(x) = \frac{1}{\Gamma(k)\,\theta^k}\,x^{k-1}\,e^{-x/\theta}, \quad x \geq 0. \] We adopt the rate parametrization \((\alpha, \beta)\) throughout this page; both conventions are equivalent and are used interchangeably in the literature.

The Exponential distribution is a special case of the gamma distribution: \[ \text{Exp}(\lambda) \;\equiv\; \text{Gamma}(1, \beta = \lambda). \] So, p.d.f. (2) becomes \[ f(x) = \lambda e^{-\lambda x} \qquad x \geq 0. \] The exponential distribution represents a process in which events occur continuously and independently at an average rate \(\lambda\). For example, if a machine gets an error once every 20 years, then the time to failure is represented by the exponential distribution with \(\lambda = \frac{1}{20}\).

The mean and variance of an exponential distribution are given by \[ \mathbb{E}[X] = \frac{1}{\lambda} \qquad \operatorname{Var}(X) = \frac{1}{\lambda^2}. \] This implies the mean is equivalent to the standard deviation. Also, its c.d.f. is given by \[ F(x) = \lambda \int_{0} ^x e^{-\lambda u} du = 1 - e^{-\lambda x}, \quad x \geq 0, \] and \(F(x) = 0\) for \(x < 0\).

More generally, when \(\alpha = n\) is a positive integer, \(\text{Gamma}(n, \beta)\) is known as the Erlang distribution — the distribution of the sum of \(n\) independent \(\text{Exp}(\beta)\) random variables. Making this statement rigorous requires the language of independent random variables and product measures, which we develop in Limit Theorems & Product Measures.

Beta Function

Just as the gamma function underpins the gamma distribution, the beta function provides the normalizing constant for distributions supported on bounded intervals. Its intimate connection to the gamma function — expressed through the Beta–Gamma identity below — is what makes the beta distribution analytically tractable.

Definition: Beta Function

The beta function is defined as: \[ B(a, b) = \int_{0}^1 t^{a-1}(1-t)^{b-1}\, dt, \quad a > 0,\; b > 0. \] As with the gamma function, the integral converges for all complex \(a, b\) with \(\operatorname{Re}(a) > 0\) and \(\operatorname{Re}(b) > 0\); we restrict to the real positive case throughout this page.

Theorem: Beta–Gamma Identity

The beta function can be represented by the gamma function: \[ B(a, b) = \frac{\Gamma (a) \Gamma (b)}{\Gamma (a+b)} \]

Proof:

Consider the product of two distinct gamma functions with inputs \(a, b > 0\). \[ \begin{align*} \Gamma (a) \Gamma(b) &= \int_{0}^\infty u^{a-1}e^{-u}du \cdot \int_{0}^\infty v^{b-1}e^{-v}dv \\\\ &= \int_{0}^\infty \int_{0}^\infty u^{a-1} v^{b-1} e^{-(u+v)} dudv \end{align*} \] We perform the change of variables \((u, v) \mapsto (s, t)\) given by \[ s = u + v, \qquad t = \frac{u}{u+v}, \] with inverse \(u = st\), \(v = (1-t)s\). The region \((u, v) \in (0,\infty)^2\) maps bijectively onto \((s, t) \in (0, \infty) \times (0, 1)\). The Jacobian is \[ \frac{\partial(u, v)}{\partial(s, t)} = \det\!\begin{pmatrix} \partial u/\partial s & \partial u/\partial t \\ \partial v/\partial s & \partial v/\partial t \end{pmatrix} = \det\!\begin{pmatrix} t & s \\ 1-t & -s \end{pmatrix} = -st - s(1-t) = -s, \] so \(du\,dv = |{-s}|\,ds\,dt = s\,ds\,dt\). Substituting: \[ \begin{align*} \Gamma (a) \Gamma(b) &= \int_{0}^\infty \int_{0}^1 (st)^{a-1} ((1-t)s)^{b-1} e^{-s} \, s\,dt\,ds \\\\ &= \int_{0}^\infty s^{(a+b)-1} e^{-s} ds \cdot \int_{0}^1 t^{a-1} (1-t)^{b-1} dt \\\\ &= \Gamma (a+b) \cdot B(a, b), \end{align*} \] where the second line uses \(s \cdot s^{a-1} \cdot s^{b-1} = s^{(a+b)-1}\) and Fubini's theorem to separate the integrals. Therefore, \[ B(a, b) = \frac{\Gamma (a) \Gamma (b)}{\Gamma (a+b)}. \]

Beta Distribution

The beta function leads directly to a distribution on the unit interval \([0, 1]\). This makes the beta distribution the natural choice for modeling quantities that represent probabilities or proportions - and it is the reason the beta distribution plays a central role as a conjugate prior in Bayesian inference.

Definition: Beta Distribution

A random variable \(X\) has a beta distribution on the unit interval with parameters \(a > 0\) and \(b > 0\) if its p.d.f. is: \[ f(x) = \frac{1}{B(a, b)}x^{a-1}(1-x)^{b-1}, \quad x \in (0, 1). \tag{3} \] We write \(X \sim \text{Beta}(a, b)\). The mean and variance are: \[ \mathbb{E}[X] = \frac{a}{a+b}, \qquad \operatorname{Var}(X) = \frac{ab}{(a+b)^2(a+b+1)}. \]

We use the open interval \((0, 1)\) because the density diverges at \(x = 0\) when \(a < 1\) (since \(x^{a-1} \to \infty\)) and at \(x = 1\) when \(b < 1\); the boundary points \(\{0, 1\}\) have Lebesgue measure zero and contribute nothing to integrals, so probabilities such as \(P(0 \leq X \leq 1) = 1\) remain unchanged.

Mean and Variance of the Beta Distribution:

We can rewrite (3) using the Beta–Gamma identity: \[ f(x) = \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)}x^{a-1} (1-x)^{b-1} \quad x \in (0, 1). \] Using definition of the mean, \[ \begin{align*} \mathbb{E}[X] &= \int_{0}^1 x f(x)dx \\\\ &= \int_{0}^1 x \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)}x^{a-1} (1-x)^{b-1} dx \\\\ &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \int_{0}^1 x^{a} (1-x)^{b-1} dx \\\\ &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} B(a+1, b) \\\\ &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{\Gamma (a+1) \Gamma (b)}{\Gamma (a+b+1)} \end{align*} \] Since \(\Gamma (a+1) = a \Gamma (a)\) and similarly, \(\Gamma (a+b+1) = (a+b)\Gamma (a+b)\), \[ \begin{align*} \mathbb{E}[X] &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{a\Gamma (a) \Gamma (b)}{(a+b)\Gamma (a+b)} \\\\ &= \frac{a}{a+b} \end{align*} \] We use the fact \(\operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X] )^2\).

Compute\(\mathbb{E}[X^2]\): \[ \begin{align*} \mathbb{E}[X^2] &= \int_{0}^1 x^2 f(x)dx \\\\ &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \int_{0}^1 x^{a+1} (1-x)^{b-1} dx \\\\ &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} B(a+2, b) \\\\ &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{\Gamma (a+2) \Gamma (b)}{\Gamma (a+b+2)} \end{align*} \] Since \(\Gamma (a+2) = (a+1) \Gamma (a+1) = (a+1) a \Gamma (a)\), and similarly, \(\Gamma (a+b+2) = (a+b+1)(a+b)\Gamma (a+b)\), \[ \begin{align*} \mathbb{E}[X^2] &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{(a+1)a\Gamma (a) \Gamma (b)}{(a+b+1)(a+b)\Gamma (a+b)} \\\\ &= \frac{a(a+1)}{(a+b)(a+b+1)}. \end{align*} \] Substitute the results: \[ \begin{align*} \operatorname{Var}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X] )^2 \\\\ &= \frac{a(a+1)}{(a+b)(a+b+1)} - (\frac{a}{a+b})^2 \\\\ &= \frac{a(a+1)(a+b) - a^2 (a+b+1) }{(a+b)^2(a+b+1)} \\\\ &= \frac{ab}{(a+b)^2(a+b+1)} \end{align*} \]

The beta family includes several important special cases. The simplest is the uniform distribution, which arises when both parameters equal one.

The uniform distribution over the interval \([a, b]\) is a special case of the beta distribution. \[ U[0, 1] \;\equiv\; \text{Beta}(1, 1), \quad \text{so } f(x) = 1 \text{ on } [0, 1]. \] In general, the p.d.f. of the uniform distribution \(X \sim \text{Uniform}[a, b]\) is given by \[ f(x) = \frac{1}{b - a} \qquad x \in [a, b]. \] Note: Otherwise, \(f(x) = 0\).

The uniform distribution represents the situations where every value is equally likely over the interval \([a, b]\). For example, this distribution is very popular for generating random numbers in programming languages.

The mean and variance of the uniform distribution are given by \[ \mathbb{E}[X] = \frac{a+b}{2} \qquad \operatorname{Var}(X) = \frac{(b-a)^2}{12} \] and its c.d.f. is given by \[ F(x) = \frac{x-a}{b-a} \qquad x \in [a, b]. \] Note: if \(x < a\), \(\, F(x) = 0\) and if \(x > b\), \(\, F(x) = 1\).

Mean and Variance of the Uniform Distribution:

Using definition of the mean, \[ \begin{align*} \mathbb{E}[X] &= \int_{a}^b x \frac{1}{b-a}dx \\\\ &= \frac{1}{b-a} (\frac{b^2 - a^2}{2}) \\\\ &= \frac{a+b}{2} \end{align*} \] Also, \[ \begin{align*} \mathbb{E}[X^2] &= \int_{a}^b x^2 \frac{1}{b-a}dx \\\\ &= \frac{1}{b-a} (\frac{b^3 - a^3}{3}) \\\\ &= \frac{a^2 +ab + b^2}{3} \end{align*} \] and then \[ \begin{align*} \operatorname{Var}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X] )^2 \\\\ &= \frac{a^2 +ab + b^2}{3} - (\frac{a+b}{2})^2 \\\\ &= \frac{4(a^2 + ab +b^2)-3(a^2 + 2ab + b^2)}{12} \\\\ &= \frac{(b-a)^2}{12}. \end{align*} \]

Insight: Why Beta and Gamma? (Conjugate Priors)

In Bayesian inference, a conjugate prior is a prior distribution that, when multiplied by the likelihood, results in a posterior distribution of the same functional form (family) as the prior.

  • Beta distribution is the conjugate prior for Bernoulli and Binomial likelihoods. This makes it the standard choice for modeling uncertainty about a probability (e.g., click-through rates).
  • Gamma distribution is the conjugate prior for the Poisson distribution and the precision (inverse variance) of a Gaussian.

Using these distributions allows us to update our beliefs with simple algebraic additions to the parameters, avoiding the need for complex numerical integration.

Most common distributions in machine learning belong to the exponential family, which guarantees the existence of a conjugate prior. This allows us to update our beliefs with simple algebraic additions to parameters, avoiding intractable integrals.

Interactive Visualization

Below is an interactive visualization to help you understand the Gamma and Beta distributions. You can adjust the parameters using the sliders and observe how the probability density function changes. Try the special cases to see important variants of these distributions.

With the Gamma and Beta distributions established, we now turn to the single most important distribution in all of statistics: the normal (Gaussian) distribution.