Gamma Function
Before introducing the Gamma and Beta distributions, we need the special functions that serve as their building blocks.
The gamma function extends the factorial to non-integer (and even complex) arguments. This generalization
is essential: the normalizing constants of many continuous distributions involve factorials of non-integer parameters,
and the gamma function provides the mathematically rigorous way to handle them.
Definition: Gamma Function
The gamma function is defined as:
\[
\Gamma(z) = \int_{0}^{\infty} t^{z-1}e^{-t}\, dt, \quad z > 0.
\]
We work with real positive \(z\) throughout this page; the integral in fact converges for
all complex \(z\) with \(\operatorname{Re}(z) > 0\), and the gamma function extends to
a meromorphic function on the entire complex plane via analytic continuation. We do not
use the complex extension here.
Theorem: Gamma Function Extends the Factorial
For any positive integer \(n\),
\[
\Gamma (n+1) = n!
\]
Proof:
We first establish the recurrence \(\Gamma(n+1) = n\,\Gamma(n)\) for any \(n > 0\) by
integration by parts on
\[
\Gamma(n+1) = \int_0^\infty t^{n} e^{-t}\, dt.
\]
Take \(u = t^n\) (so \(du = n\,t^{n-1}\,dt\)) and \(dv = e^{-t}\,dt\) (so \(v = -e^{-t}\)). Then
\[
\begin{align*}
\Gamma(n+1) &= \bigl[-t^n e^{-t}\bigr]_0^\infty + \int_0^\infty n\,t^{n-1} e^{-t}\,dt \\\\
&= n\int_0^\infty t^{n-1} e^{-t}\,dt \\\\
&= n\,\Gamma(n),
\end{align*}
\]
using that the boundary term vanishes: \(t^n e^{-t} \to 0\) as \(t \to \infty\) (exponential
decay dominates polynomial growth) and \(t^n e^{-t} \to 0\) as \(t \to 0^+\) for any \(n > 0\).
We now prove \(\Gamma(n+1) = n!\) for positive integers \(n\) by induction on \(n\).
Base case (\(n = 1\)):
\[
\Gamma(2) = \int_0^\infty t\,e^{-t}\,dt
= \bigl[-t\,e^{-t}\bigr]_0^\infty + \int_0^\infty e^{-t}\,dt
= 0 + 1 = 1 = 1!,
\]
again by integration by parts with \(u = t\), \(dv = e^{-t}\,dt\).
Inductive step: Assume \(\Gamma(k+1) = k!\) for some integer \(k \geq 1\).
By the recurrence,
\[
\Gamma(k+2) = (k+1)\,\Gamma(k+1) = (k+1)\,k! = (k+1)!.
\]
By induction, \(\Gamma(n+1) = n!\) for all positive integers \(n\); equivalently,
\(\Gamma(n) = (n-1)!\).
Let's compute the most iconic value of the gamma function.
\[
\Gamma (\frac{1}{2}) = \int_{0} ^\infty t^{\frac{-1}{2}}e^{-t} dt
\]
Let \(t = u^2\), so \(dt =2udu\). Then
\[
\begin{align*}
\Gamma (\frac{1}{2}) &= \int_{0} ^\infty u^{2\cdot \frac{-1}{2}}e^{-u^2} 2udu \\\\
&= 2\int_{0} ^\infty e^{-u^2} du \\\\
&= \int_{-\infty} ^\infty e^{-u^2} du \tag{1} \\\\
&= \sqrt{\pi}
\end{align*}
\]
In line (1) we used the symmetry of the even function \(e^{-u^2}\) to extend the integral
to the full real line, and the final equality is the famous Gaussian integral:
\[
\int_{-\infty} ^\infty e^{-x^2} dx = \sqrt{\pi}.
\]
This Gaussian integral is proved independently
(via polar coordinates) on the normal distribution page; we use it here as an external fact.
Although the factorial \(n!\) is defined only for non-negative integers, the gamma function
extends it to all positive arguments via
\[
(z)! \;:=\; \Gamma(z+1).
\]
The recurrence \(\Gamma(z+1) = z\,\Gamma(z)\) — proved above for integers but valid for all
\(z > 0\) by the same integration-by-parts argument — translates into the familiar factorial-style
recursion \((z)! = z\,(z-1)!\). Combined with \(\Gamma(1/2) = \sqrt{\pi}\), this yields:
\[
\begin{align*}
\Gamma(3/2) &= \tfrac{1}{2}\,\Gamma(1/2) = \tfrac{1}{2}\sqrt{\pi}, \\\\
\Gamma(5/2) &= \tfrac{3}{2}\,\Gamma(3/2) = \tfrac{3}{4}\sqrt{\pi}, \\\\
\Gamma(7/2) &= \tfrac{5}{2}\,\Gamma(5/2) = \tfrac{15}{8}\sqrt{\pi}.
\end{align*}
\]
BONUS: Volume of the \(n\)-dimensional ball of radius \(r\):
\[
V_n(r) = \frac{\pi^{n/2}}{\Gamma\!\left(\tfrac{n}{2} + 1\right)}\,r^n.
\]
(Recall that the sphere is the surface and the ball is the solid region it bounds.)
Gamma Distribution
With the gamma function in hand, we can now define a flexible family of continuous distributions
on \([0, \infty )\). The gamma distribution arises naturally as the distribution
of waiting times: if events occur independently at a constant average rate, the time until the
\(\alpha\)-th event follows a gamma distribution.
Definition: Gamma Distribution
A random variable \(X\) has the gamma distribution with shape parameter
\(\alpha > 0\) and rate parameter \(\beta > 0\) if its p.d.f. is:
\[
f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}, \quad x \geq 0,
\quad f(x) = 0 \text{ for } x < 0. \tag{2}
\]
We write \(X \sim \text{Gamma}(\alpha, \beta)\). The mean and variance are:
\[
\mathbb{E}[X] = \frac{\alpha}{\beta}, \qquad \operatorname{Var}(X) = \frac{\alpha}{\beta^2}.
\]
That \(f\) integrates to \(1\) follows from the substitution \(t = \beta x\):
\[
\int_0^\infty \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\,dx
= \frac{\beta^\alpha}{\Gamma(\alpha)} \cdot \frac{1}{\beta^\alpha} \int_0^\infty t^{\alpha-1} e^{-t}\,dt
= \frac{\Gamma(\alpha)}{\Gamma(\alpha)} = 1.
\]
The mean and variance of the gamma distribution:
Using definition of the mean,
\[
\begin{align*}
\mathbb{E}[X] &= \int_{0}^\infty x f(x)dx \\\\
&= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^\infty x^{\alpha}e^{-\beta x} dx
\end{align*}
\]
Let \(t = \beta x\) and so \(x = \frac{t}{\beta}\) and \(dx = \frac{1}{\beta}dt\).
Substituting these into the equation:
\[
\begin{align*}
\mathbb{E}[X] &= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^\infty (\frac{t}{\beta})^{\alpha}e^{-t} \frac{1}{\beta}dt \\\\
&= \frac{\beta^\alpha}{\Gamma(\alpha) \beta^{\alpha +1}} \int_{0}^\infty t^{\alpha}e^{-t} dt \\\\
&= \frac{\Gamma (\alpha +1)}{\Gamma(\alpha) \beta} \\\\
\end{align*}
\]
Since \(\Gamma(\alpha +1) = \alpha \Gamma(\alpha)\),
\[
\begin{align*}
\mathbb{E}[X] &= \frac{\alpha \Gamma (\alpha)}{\Gamma(\alpha) \beta} \\\\
&= \frac{\alpha}{\beta}
\end{align*}
\]
We use the fact \(\operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X] )^2\).
Compute\(\mathbb{E}[X^2]\) applying the same substitution as we did above:
\[
\begin{align*}
\mathbb{E}[X^2] &= \int_{0}^\infty x^2 f(x)dx \\\\
&= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^\infty x^{\alpha+1}e^{-\beta x} dx \\\\
&= \frac{\beta^\alpha}{\Gamma(\alpha) \beta^{\alpha +2}} \int_{0}^\infty t^{\alpha+1}e^{-t} dt \\\\
&= \frac{\Gamma (\alpha +2)}{\Gamma(\alpha) \beta^2}.
\end{align*}
\]
Since \(\Gamma(\alpha +2) = (\alpha +1)\Gamma (\alpha +1) = (\alpha +1 )\alpha \Gamma(\alpha)\),
\[
\begin{align*}
\mathbb{E}[X^2] &= \frac{(\alpha +1 )\alpha \Gamma(\alpha)}{\Gamma(\alpha) \beta^2} \\\\
&= \frac{\alpha(\alpha +1 )}{\beta^2}
\end{align*}
\]
Substitute the results:
\[
\begin{align*}
\operatorname{Var}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X] )^2 \\\\
&= \frac{\alpha(\alpha +1 )}{\beta^2} - (\frac{\alpha}{\beta})^2 \\\\
&= \frac{\alpha}{\beta^2}
\end{align*}
\]
Many references use the shape–scale parametrization
\(X \sim \text{Gamma}(k, \theta)\) with \(k = \alpha\) and \(\theta = 1/\beta\),
in which the density becomes
\[
f(x) = \frac{1}{\Gamma(k)\,\theta^k}\,x^{k-1}\,e^{-x/\theta}, \quad x \geq 0.
\]
We adopt the rate parametrization \((\alpha, \beta)\) throughout this page; both
conventions are equivalent and are used interchangeably in the literature.
The Exponential distribution is a special case of the gamma distribution:
\[
\text{Exp}(\lambda) \;\equiv\; \text{Gamma}(1, \beta = \lambda).
\]
So, p.d.f. (2) becomes
\[
f(x) = \lambda e^{-\lambda x} \qquad x \geq 0.
\]
The exponential distribution represents a process in which events occur continuously and independently at
an average rate \(\lambda\). For example, if a machine gets an error once every 20 years, then the time to
failure is represented by the exponential distribution with \(\lambda = \frac{1}{20}\).
The mean and variance of an exponential distribution are given by
\[
\mathbb{E}[X] = \frac{1}{\lambda} \qquad \operatorname{Var}(X) = \frac{1}{\lambda^2}.
\]
This implies the mean is equivalent to the standard deviation.
Also, its c.d.f. is given by
\[
F(x) = \lambda \int_{0} ^x e^{-\lambda u} du = 1 - e^{-\lambda x}, \quad x \geq 0,
\]
and \(F(x) = 0\) for \(x < 0\).
More generally, when \(\alpha = n\) is a positive integer, \(\text{Gamma}(n, \beta)\) is known as the
Erlang distribution — the distribution of the sum of \(n\) independent
\(\text{Exp}(\beta)\) random variables. Making this statement rigorous requires the language of
independent random variables and product measures, which we develop in
Limit Theorems & Product Measures.
Beta Function
Just as the gamma function underpins the gamma distribution, the beta function
provides the normalizing constant for distributions supported on bounded intervals. Its intimate
connection to the gamma function — expressed through the Beta–Gamma identity below — is what makes the beta
distribution analytically tractable.
Definition: Beta Function
The beta function is defined as:
\[
B(a, b) = \int_{0}^1 t^{a-1}(1-t)^{b-1}\, dt, \quad a > 0,\; b > 0.
\]
As with the gamma function, the integral converges for all complex \(a, b\) with
\(\operatorname{Re}(a) > 0\) and \(\operatorname{Re}(b) > 0\); we restrict to the
real positive case throughout this page.
Theorem: Beta–Gamma Identity
The beta function can be represented by the gamma function:
\[
B(a, b) = \frac{\Gamma (a) \Gamma (b)}{\Gamma (a+b)}
\]
Proof:
Consider the product of two distinct gamma functions with inputs \(a, b > 0\).
\[
\begin{align*}
\Gamma (a) \Gamma(b) &= \int_{0}^\infty u^{a-1}e^{-u}du \cdot \int_{0}^\infty v^{b-1}e^{-v}dv \\\\
&= \int_{0}^\infty \int_{0}^\infty u^{a-1} v^{b-1} e^{-(u+v)} dudv
\end{align*}
\]
We perform the change of variables \((u, v) \mapsto (s, t)\) given by
\[
s = u + v, \qquad t = \frac{u}{u+v},
\]
with inverse \(u = st\), \(v = (1-t)s\). The region \((u, v) \in (0,\infty)^2\) maps bijectively
onto \((s, t) \in (0, \infty) \times (0, 1)\). The Jacobian is
\[
\frac{\partial(u, v)}{\partial(s, t)}
= \det\!\begin{pmatrix} \partial u/\partial s & \partial u/\partial t \\ \partial v/\partial s & \partial v/\partial t \end{pmatrix}
= \det\!\begin{pmatrix} t & s \\ 1-t & -s \end{pmatrix}
= -st - s(1-t) = -s,
\]
so \(du\,dv = |{-s}|\,ds\,dt = s\,ds\,dt\). Substituting:
\[
\begin{align*}
\Gamma (a) \Gamma(b) &= \int_{0}^\infty \int_{0}^1 (st)^{a-1} ((1-t)s)^{b-1} e^{-s} \, s\,dt\,ds \\\\
&= \int_{0}^\infty s^{(a+b)-1} e^{-s} ds \cdot \int_{0}^1 t^{a-1} (1-t)^{b-1} dt \\\\
&= \Gamma (a+b) \cdot B(a, b),
\end{align*}
\]
where the second line uses \(s \cdot s^{a-1} \cdot s^{b-1} = s^{(a+b)-1}\) and Fubini's theorem
to separate the integrals. Therefore,
\[
B(a, b) = \frac{\Gamma (a) \Gamma (b)}{\Gamma (a+b)}.
\]
Beta Distribution
The beta function leads directly to a distribution on the unit interval \([0, 1]\).
This makes the beta distribution the natural choice for modeling quantities that
represent probabilities or proportions - and it is the reason the beta distribution plays a central
role as a conjugate prior in Bayesian inference.
Definition: Beta Distribution
A random variable \(X\) has a beta distribution on the unit interval
with parameters \(a > 0\) and \(b > 0\) if its p.d.f. is:
\[
f(x) = \frac{1}{B(a, b)}x^{a-1}(1-x)^{b-1}, \quad x \in (0, 1). \tag{3}
\]
We write \(X \sim \text{Beta}(a, b)\). The mean and variance are:
\[
\mathbb{E}[X] = \frac{a}{a+b}, \qquad \operatorname{Var}(X) = \frac{ab}{(a+b)^2(a+b+1)}.
\]
We use the open interval \((0, 1)\) because the density diverges at \(x = 0\) when \(a < 1\)
(since \(x^{a-1} \to \infty\)) and at \(x = 1\) when \(b < 1\); the boundary points \(\{0, 1\}\)
have Lebesgue measure zero and contribute nothing to integrals, so probabilities such as
\(P(0 \leq X \leq 1) = 1\) remain unchanged.
Mean and Variance of the Beta Distribution:
We can rewrite (3) using the Beta–Gamma identity:
\[
f(x) = \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)}x^{a-1} (1-x)^{b-1} \quad x \in (0, 1).
\]
Using definition of the mean,
\[
\begin{align*}
\mathbb{E}[X] &= \int_{0}^1 x f(x)dx \\\\
&= \int_{0}^1 x \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)}x^{a-1} (1-x)^{b-1} dx \\\\
&= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \int_{0}^1 x^{a} (1-x)^{b-1} dx \\\\
&= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} B(a+1, b) \\\\
&= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{\Gamma (a+1) \Gamma (b)}{\Gamma (a+b+1)}
\end{align*}
\]
Since \(\Gamma (a+1) = a \Gamma (a)\) and similarly, \(\Gamma (a+b+1) = (a+b)\Gamma (a+b)\),
\[
\begin{align*}
\mathbb{E}[X] &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{a\Gamma (a) \Gamma (b)}{(a+b)\Gamma (a+b)} \\\\
&= \frac{a}{a+b}
\end{align*}
\]
We use the fact \(\operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X] )^2\).
Compute\(\mathbb{E}[X^2]\):
\[
\begin{align*}
\mathbb{E}[X^2] &= \int_{0}^1 x^2 f(x)dx \\\\
&= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \int_{0}^1 x^{a+1} (1-x)^{b-1} dx \\\\
&= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} B(a+2, b) \\\\
&= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{\Gamma (a+2) \Gamma (b)}{\Gamma (a+b+2)}
\end{align*}
\]
Since \(\Gamma (a+2) = (a+1) \Gamma (a+1) = (a+1) a \Gamma (a)\), and similarly, \(\Gamma (a+b+2) = (a+b+1)(a+b)\Gamma (a+b)\),
\[
\begin{align*}
\mathbb{E}[X^2] &= \frac{\Gamma (a+b)}{\Gamma (a) \Gamma (b)} \frac{(a+1)a\Gamma (a) \Gamma (b)}{(a+b+1)(a+b)\Gamma (a+b)} \\\\
&= \frac{a(a+1)}{(a+b)(a+b+1)}.
\end{align*}
\]
Substitute the results:
\[
\begin{align*}
\operatorname{Var}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X] )^2 \\\\
&= \frac{a(a+1)}{(a+b)(a+b+1)} - (\frac{a}{a+b})^2 \\\\
&= \frac{a(a+1)(a+b) - a^2 (a+b+1) }{(a+b)^2(a+b+1)} \\\\
&= \frac{ab}{(a+b)^2(a+b+1)}
\end{align*}
\]
The beta family includes several important special cases. The simplest is the uniform distribution, which
arises when both parameters equal one.
The uniform distribution over the interval \([a, b]\) is a special case of the beta distribution.
\[
U[0, 1] \;\equiv\; \text{Beta}(1, 1), \quad \text{so } f(x) = 1 \text{ on } [0, 1].
\]
In general, the p.d.f. of the uniform distribution \(X \sim \text{Uniform}[a, b]\) is given by
\[
f(x) = \frac{1}{b - a} \qquad x \in [a, b].
\]
Note: Otherwise, \(f(x) = 0\).
The uniform distribution represents the situations where every value is equally likely over
the interval \([a, b]\). For example, this distribution is very popular for generating random
numbers in programming languages.
The mean and variance of the uniform distribution are given by
\[
\mathbb{E}[X] = \frac{a+b}{2} \qquad \operatorname{Var}(X) = \frac{(b-a)^2}{12}
\]
and its c.d.f. is given by
\[
F(x) = \frac{x-a}{b-a} \qquad x \in [a, b].
\]
Note: if \(x < a\), \(\, F(x) = 0\) and if \(x > b\), \(\, F(x) = 1\).
Mean and Variance of the Uniform Distribution:
Using definition of the mean,
\[
\begin{align*}
\mathbb{E}[X] &= \int_{a}^b x \frac{1}{b-a}dx \\\\
&= \frac{1}{b-a} (\frac{b^2 - a^2}{2}) \\\\
&= \frac{a+b}{2}
\end{align*}
\]
Also,
\[
\begin{align*}
\mathbb{E}[X^2] &= \int_{a}^b x^2 \frac{1}{b-a}dx \\\\
&= \frac{1}{b-a} (\frac{b^3 - a^3}{3}) \\\\
&= \frac{a^2 +ab + b^2}{3}
\end{align*}
\]
and then
\[
\begin{align*}
\operatorname{Var}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X] )^2 \\\\
&= \frac{a^2 +ab + b^2}{3} - (\frac{a+b}{2})^2 \\\\
&= \frac{4(a^2 + ab +b^2)-3(a^2 + 2ab + b^2)}{12} \\\\
&= \frac{(b-a)^2}{12}.
\end{align*}
\]
Insight: Why Beta and Gamma? (Conjugate Priors)
In Bayesian inference, a conjugate prior is a prior distribution that, when
multiplied by the likelihood, results in a posterior distribution of the same functional form (family) as the prior.
- Beta distribution is the conjugate prior for Bernoulli and Binomial likelihoods.
This makes it the standard choice for modeling uncertainty about a probability (e.g., click-through rates).
- Gamma distribution is the conjugate prior for the Poisson distribution and the
precision (inverse variance) of a Gaussian.
Using these distributions allows us to update our beliefs with simple algebraic additions to the parameters, avoiding the
need for complex numerical integration.
Most common distributions in machine learning belong to the exponential family, which
guarantees the existence of a conjugate prior. This allows us to update our beliefs with simple algebraic additions to parameters,
avoiding intractable integrals.
Interactive Visualization
Below is an interactive visualization to help you understand the Gamma and Beta distributions.
You can adjust the parameters using the sliders and observe how the probability density function changes.
Try the special cases to see important variants of these distributions.
With the Gamma and Beta distributions established, we now turn to the single most important distribution in all of statistics:
the normal (Gaussian) distribution.