The population of sample means was found to be related to the mean of the population from which they arise. Sample proportions are similarly related.
Although we often think of a mathematical proportion as an equality of two ratios, in statistics the proportion is a percentage of a total in which a certain characteristic is observed. If a population has size $N$, and the characteristic occurs $x$ times in that population, then the population proportion is given by $p = \dfrac{x}{N}$. If a sample of size $n$ is obtained, and the characteristic occurs $x$ times in the sample, then the proportion in that sample is given by $\hat{p} = \dfrac{x}{n}$.
There is a connection between these formulas for the proportion and a binomial distribution. In fact, the formula $p = \dfrac{x}{N}$, if solved for $x$, gives the expected value of the number of successes for a binomial distribution, $x = Np$. Looking further, we see that if the observed characteristic is considered as a success, then not observing it is a failure. The probability of a success is $p$. If individuals are randomly selected from a very large population, then we can assume that the selections are independent, and that the probabilities will be constant. Therefore, all of the conditions of the binomial distribution are met for the variable $x$.
So what is the expected value of a sample proportion, $E(\hat{p})$? The binomial result leads us to the answer.
$E(\hat{p}) = E \left( \dfrac{x}{n} \right) = \dfrac1n E(X) = \dfrac1n (np) = p$ |
Similarly, we can find the variance in a population of sample proportions.
$Var(\hat{p}) = Var \left( \dfrac{x}{n} \right) = \dfrac{1}{n^2} Var(X) = \dfrac{1}{n^2} np(1-p) = \dfrac{p(1-p)}{n}$ |
And from this result, we can easily obtain the standard deviation. Therefore, we have the following parameters for a distribution of sample proportions.
$\mu_{\hat{p}} = p$ |
$\sigma_{\hat{p}} = \sqrt{ \dfrac{p(1-p)}{n} }$ |
If the values $np$ and $n(1-p)$ are both at least 5, then the binomial distribution of $X$ will be approximately normal, and it will follow that the sampling distribution of the proportions will also be approximately normal, and can be standardized with the formula $z = \dfrac{\hat{p} - p}{\sigma_{\hat{p}}}$.
Suppose the true value of the president's approval rating is 56%. Find the probability that a sample of 1200 people would find a proportion between 53% and 58%.
The standard deviation of the sample proportions is
$\sigma_{\hat{p}} = \sqrt{\dfrac{p(1-p)}{n}} = \sqrt{\dfrac{(0.56)(0.44)}{1200}} \approx 0.0143$ |
The z-scores are $z = \dfrac{0.53-0.56}{0.0143} \approx -2.10$ and $z = \dfrac{0.58-0.56}{0.0453} \approx 1.40$. Computing the probability using the standard normal distribution, we have
\begin{align} P(0.53 < p < 0.58) &= P(-2.10 < z < 1.40) = \Phi(1.40) - \Phi(-2.10) \\ &= \operatorname{normalcdf}(-2.10,1.40) \approx 0.9014 \end{align}Therefore, there is a 90% probability that the sample proportion will fall between 53% and 58%.