The Central Limit Theorem

The Central Limit Theorem is one of the most important results in statistics. It is the result that makes it possible to use samples to accurately predict population means.

The Theorem

Let $X_i$ be a collection of $n$ independent and identically distributed random variables, having mean $\mu$ and standard deviation $\sigma$. Define the random variables $\bar{X} = \dfrac{\sum\limits_{i=1}^n X_i}{n}$ and $Y= \dfrac{\bar{X} - \mu}{\sigma / \sqrt{n}}$. If the function $g(y)$ is the PDF of the random variable $Y$, then $\lim\limits_{n \to \infty} g(y) = \dfrac{1}{\sqrt{2\pi}} e^{-\frac12 y^2}$.

Interpretations

In words, given any population of data having any distribution, the theorem says that the standard scores (or z-scores) of the distribution of sample means will approach the standard normal distribution, as the sample size increases without bound. The random variable $Y$ in the above statement is the distribution of sample means, while the right hand side of the conclusion is the PDF of the standard normal distribution.

As an immediate consequence, for any population of data, if the sample size is sufficiently large, the distribution of sample means will be approximately normal. Furthermore, from the sections on the sums of random variables, we know that the mean of the sample means will equal the population mean, and that the standard deviation of the sample means will be the standard error of the mean. That is, $\mu_{\bar{x}} = \mu$ and $\sigma{\bar{x}} = \dfrac{\sigma_x}{\sqrt{n}}$.

Caution: Sometimes the statement of the theorem is oversimplified, saying that the sample means will approach the normal distribution as the sample size increases without bound. However, this statement is not true. As the sample size approaches $\infty$, the standard deviation of the sample means will approach zero, and therefore the distribution obtained is a single point, not a normal distribution.

So what is a sufficiently large sample? Generally, a sample size of at least 30 is considered large enough for the distribution of sample means to be approximately normal. If, however, the original population of data was normally distributed, then the mean of a sample of any size will be normally distributed.

The Proof

With a little bit of algebra, we can rewrite $Y$.

$Y = \dfrac{\bar{X} - \mu}{\sigma / \sqrt{n}} = \dfrac{ \dfrac{\sum\limits_{i=1}^n X_i}{n} - \mu}{\sigma / \sqrt{n}} = \dfrac{ \sum\limits_{i=1}^n X_i - n \mu}{\sigma \sqrt{n}} = \dfrac{1}{\sqrt{n}} \sum\limits_{i=1}^n \dfrac{X_i - \mu}{\sigma}$

We recognize the summand to be the definition of a standard score (or z-score), so we now define $Z_i = \dfrac{X_i - \mu}{\sigma}$. Therefore, we have:

$Y = \dfrac{1}{\sqrt{n}} \sum\limits_{i=1}^n Z_i$

Now let us consider the moment generating function of $Y$. When two random variables are independent, their moment generating functions satisfy $M_{X+Y}(t) = M_X(t) M_Y(t)$. We also have $M_{aX+b}(t)= e^{tb} M_X (at)$. With these results, we will be able to simplify $M_Y(t)$. Also, since the random variables $X_i$ are from identical distributions, the random variables $Z_i$ will also be from identical distributions (to one another, not to the original $X$s). We shall let $Z$ represent that common distribution of the $Z_i$ variables.

$M_Y(t) = M_{\frac{1}{\sqrt{n}} \sum Z_i} (t) = M_{\sum Z_i} \left( \dfrac{t}{\sqrt{n}} \right) = \prod\limits_{i=1}^n M_{Z_i} \left( \dfrac{t}{\sqrt{n}} \right) = \left[ M_Z \left( \dfrac{t}{\sqrt{n}} \right) \right]^n$

Having written $M_Y(t)$ in terms of $M_Z(t)$, we want to understand that quantity further. Recall that the Taylor polynomial, with remainder, of a function $f(t)$, expanded about the value $t=0$, is given by

$f(t) = f(0) + f'(0) t + \cdots + \dfrac{f^{(n)}(0)}{n!} t^n + \dfrac{f^{(n+1)}(c)}{(n+1)!} t^{n+1}$

with $c \in [0,t]$. Therefore, for the function $M_Z \left( \dfrac{t}{\sqrt{n}} \right)$, using $n=1$, we have the Taylor polynomial

$M_Z \left( \dfrac{t}{\sqrt{n}} \right) = M_Z(0) + M'_Z(0) \dfrac{t}{\sqrt{n}} + M''_Z(c) \dfrac{t^2}{2n}$

with $c \in \left[0, \dfrac{t}{\sqrt{n}} \right]$. The right hand side of the expression contains $M_Z(t)$ and two of its derivatives, each of which is easy to evaluate. From the definition of a moment generating function, we have

$M_Z(0) = \left. E(e^{tz}) \right|_{t=0} = \left. \sum\limits_z e^{tz} P(z) \right|_{t=0} = \sum\limits_z P(z) = 1$

Since the mean of a set of z-scores is always zero, $M'_Z (0) = 0$.

Although the Taylor series desires the term $M''_Z(c)$, we shall find it useful to have the value of $M''_Z(0)$. Now since $M''_Z (0) = E(Z^2) = Var(Z) + E(Z)^2$, and the standard deviation of a set of z-scores is always 1, we have $M''_Z (0) = 1$.

Replacing each of these values into the Taylor series, we have

$M_Z \left( \dfrac{t}{\sqrt{n}} \right) = 1 + 0 + M''_Z(c) \dfrac{t^2}{2n} = 1 + \dfrac{t^2}{2n} + (M''_Z(c) - 1) \dfrac{t^2}{2n}$

The Central Limit Theorem is concerned with the behavior of the distribution as $n$ increases without bound, so we want to take a limit as $n$ approaches $\infty$. Doing this, we note that $\dfrac{t}{\sqrt{n}}$ approaches 0, so $c$ approaches 0, and $M''_Z(c)$ approaches $M''_Z(0) = 1$. This means $M_Z \left(\dfrac{t}{\sqrt{n}} \right)$ approaches 1, so $M_Y(t)$ has the form $1^{\infty}$, which is an indeterminate form. Therefore, we need to use L'Hopital's Rule on the logarithm of $M_Y(t)$.

\begin{align} \lim\limits_{n \to \infty} \ln M_Y(t) &= \lim\limits_{n \to \infty} \ln \left[ M_Z \left( \dfrac{t}{\sqrt{n}} \right) \right]^n \\ &= \lim\limits_{n \to \infty} n \ln M_Z \left( \dfrac{t}{\sqrt{n}} \right) \\ &= \lim\limits_{n \to \infty} n \ln \left[ 1 + \dfrac{t^2}{2n} + (M''_Z(c) - 1) \dfrac{t^2}{2n} \right] \\ &= \lim\limits_{n \to \infty} \dfrac{\ln \left[ 1 + \dfrac{t^2}{2n} + (M''_Z(c) - 1) \dfrac{t^2}{2n} \right]}{\dfrac1n} \\ &= \lim\limits_{n \to \infty} \dfrac{\dfrac{t^2}{2} + (M''_Z(c) - 1) \dfrac{t^2}{2}}{1 + \dfrac{t^2}{2n} + (M''_Z(c) - 1) \dfrac{t^2}{2n}} \\ &= \dfrac{ \dfrac{t^2}{2} + (1-1)\dfrac{t^2}{2}}{1+0+(1-1)(0)} \\ &= \dfrac12 t^2 \end{align}

Exponentiating both sides, we then obtain $\lim\limits_{n \to \infty} M_Y(t) = e^{\frac12 t^2}$. Recognizing this result as the moment generating function of the standard normal distribution, and knowing that moment generating functions are unique, we have therefore found that, for any original distribution, the distribution of the z-scores of the sample means will approach the standard normal distribution.