Confidence Intervals for Proportions

Since we were able to develop the theory of the distribution of sample proportions, we can use those results to obtain confidence intervals for proportions.

Binomial and Normal Connections

You should recall that the proportion of data is a ratio of the number of successes to the number of trials. That connection allowed us to use the characteristics of a binomial distribution to develop the theory for sample proportions. We found that $\mu_{\hat{p}} = p$ and $\sigma_{\hat{p}} = \sqrt{ \dfrac{p(1-p)}{n}}$. Then, when the conditions $np \ge 5$ and $n(1-p) \ge 5$ are both met, the sample proportions will be approximately normal as well.

However, the computation of $\sigma_{\hat{p}}$ would appear to require knowledge of the population proportion $p$. Realistically, that quantity is unknown, and would need to be estimated with the sample proportion $\hat{p}$. In theory, that would necessitate a distribution different than normal to accommodate the randomness due to the variation in the sample proportions, but when $n$ is large, the distribution will still be approximately normal. Therefore, the confidence interval for estimating a population proportion is the following formula.

$\hat{p} \pm z_{\alpha/2} \sqrt{\dfrac{\hat{p} (1- \hat{p})}{n}}$

Example for a Confidence Interval

A sample of 500 people finds that 225 have blood type O. Find a 90% confidence interval for the percentage of people with blood type O.

The sample proportion is $\hat{p} = \dfrac{225}{500} = 0.450$.
The standard error is $\sigma_{\hat{p}} = \sqrt{\dfrac{\hat{p} (1- \hat{p})}{n}} = \sqrt{\dfrac{(0.45)(0.55)}{500}} = 0.0222$.
We note that $n \hat{p} = 225$, and $n (1-\hat{p}) = 275$, so the normal distribution can be used as an approximation of the binomial distribution.
For a 90% confidence interval, we have $\alpha = 0.10$, so $z_{\alpha/2} = z_{0.05} = 1.645$.
Then $\hat{p} \pm z_{\alpha/2} \sqrt{\dfrac{\hat{p} (1- \hat{p})}{n}} = 0.45 \pm (1.645)(0.0222) = 0.450 \pm 0.037$, so the interval is $[0.413, 0.487]$.

In other words, we can be 90% confident that the proportion of people who have blood type O is between 41.3% and 48.7%.

Sample Size

The margin of error obtained in the confidence interval can be expressed by the formula $E = z_{\alpha/2} \sqrt{\dfrac{\hat{p} (1- \hat{p})}{n}}$. When solved for the sample size $n$, we obtain $n = \dfrac{z_{\alpha/2}^2 \hat{p} (1-\hat{p})}{E^2}$. However, this result depended on our ability to use the normal distribution as an approximation for the binomial distribution, so we also have the requirements that $n \hat{p} \ge 5$ and $n (1-\hat{p}) \ge 5$. Solving these last two results for $n$, and combining them with the former result, we obtain the following sample size requirement.

$n = \max \left\{ \dfrac{z_{\alpha/2}^2 \hat{p} (1-\hat{p})}{E^2}, \dfrac{5}{\hat{p}}, \dfrac{5}{1-\hat{p}} \right\}$

If we do not have an estimate for $p$, and obtaining a preliminary value of $\hat{p}$ is either too costly or too time-consuming, we can use $p=0.5$ as our preliminary estimate. This value will be a conservative estimate for the sample size, since it actually produces the maximum possible value for the standard error, as it is equivalent to obtaining the vertex of a parabola that opens down. Solving for $n$, we obtain the following formula.

$n = \left( \dfrac{z_{\alpha/2}}{2E} \right)^2$

The conditions $n\hat{p} \ge 5$ and $n(1-\hat{p}) \ge 5$ are still needed to approximate the binomial as a normal distribution, and whether they are met still depends on the unknown proportion. But once $n$ has been predicted, we can solve these inequalities for $\hat{p}$ to determine how robust the process could be. And we find that as long as the sample proportion falls in the interval $\left[ \dfrac5n, 1-\dfrac5n \right]$, the sample size will have been sufficient.

Example for a Sample Size

In October, 2010, a Gallup poll found 29% of Americans favored a handgun ban. If Gallup wants their next poll on this question to be accurate to within 3 percentage points, with 96% confidence, how many people will need to be sampled?

The margin of error will be $E = 0.03$, and the significance level will be $\alpha = 0.04$, giving a z-score of $z_{\alpha/2} = z_{0.02} = 2.054$.

If we use the previous poll's results as our estimate for $p$, we have $\hat{p} = 0.29$. Using the formula, we obtain $n = \dfrac{z_{\alpha/2}^2 \hat{p} (1-\hat{p})}{E^2} = \dfrac{2.054^2 (0.29)(0.71)}{0.03^2} = 965.19$. We briefly note that $\dfrac{5}{\hat{p}}$ and $\dfrac{5}{1-\hat{p}}$ are approximately 17 and 7, respectively, so the conditions to approximate the binomial with the normal are clearly met. Rounding our result up, we find that a sample size of 966 people would be sufficient to give both the precision and the confidence desired.

If, however, we do not want to make the assumption that the previous poll's results will be a good estimate of the current proportion, then we can assume an initial estimate of $p=0.5$. Then we obtain $n = \left( \dfrac{z_{\alpha/2}}{2E} \right)^2 = \left( \dfrac{2.054}{2(0.03)} \right)^2 = 1171.92$. In other words, 1172 people should be sampled. Now this sample size will be sufficient as long as the sample proportion eventually obtained is between $\left[ \dfrac5n, 1-\dfrac5n \right] = \left[ \dfrac{5}{1172}, \dfrac{1167}{1172} \right] = [0.0043, 0.9957]$. From a practical viewpoint, we would be very surprised to see such a dramatic change. And we would then report whatever confidence level and margin of error that the data could give us, even if those levels were not up to the standard we originally set for\ ourselves.