Percentages have the ability to display many unintuitive results. Simpson's Paradox states that one conclusion may be reached when data is analyzed in the aggregate, and the opposite conclusion may be reached when data is analyzed in smaller groups.
A civic orchestra is being formed in your community, and is holding auditions. Among the wind musicians, 26 of the 40 men are invited to join, and 4 of the 6 women. Among the string musicians, 30 of the 80 men are invited, and 19 of the 49 women. Which gender was favored in the selection of musicians?
This looks quite easy. We can obtain totals for the men and women, and then convert the data into percents. We find:
It looks like there was a small bias against the women. So where did this discrimination come from, the strings or the winds? Again, we compute some percentages.
In each class of instruments, we find that the percentage of females was (very slightly) greater than the percentage of males. What happened to the bias against the women? This is quite contrary to our original conclusion.
The two groups (winds and strings) did not accept musicians at the same rate. Among the strings, we see $\dfrac{49}{129} = 0.3798 = 37.98\%$ of the applicants were accepted. And both the men and the women were accepted at about that rate. Among the winds, we see $\dfrac{30}{46}=0.6522=65.22\%$ of the applicants were accepted. And again, both the men and the women were accepted at about that rate. But we notice that the acceptance rate for the strings was far less than for the wind instruments.
So, if you were to join this orchestra and could play every instrument, what position would you choose to maximize your chance of being accepted? Obviously, you would audition to play a wind instrument with the orchestra. In this example, almost all of the applicants for the wind positions were male, and they essentially pulled up the acceptance rate for the men. Since so few women auditioned on wind instruments, their acceptance rate was far closer to the acceptance rate of the strings in general.
In other words, Simpson's Paradox arises when we try to average percentages (or probabilities). The percentages must be weighted according to the underlying sample sizes. Therefore, we could state the paradox as follows.
Even though $P(A_1) < P(B_1)$ and $P(A_2) < P(B_2)$, there will still be suitable values of $m_1$, $m_2$, $n_1$, and $n_2$ such that $\dfrac{m_1 P(A_1) + m_2 P(A_2)}{m_1+m_2} > \dfrac{n_1 P(B_1) + n_2 P(B_2)}{n_1+n_2}$. |
In our original example, events $A_i$ would be that a male was selected, $B_i$ would be that a female was selected, with the subscript denoting the particular class of instrument played. Then the quantities $m_i$ and $n_i$ are the number of applicants for each position.
The original question was whether one gender was favored over another in being accepted into the orchestra. In this question, there are two variables, gender and acceptance. When we examined the data by section of the orchestra, we were actually introducing a third variable, often called a lurking variable. The lurking variable had a great deal to do with the final results, although it was not apparent when the data was analyzed in the aggregate.
When Simpson's Paradox occurs in the analysis of data from two groups, a comparison in the aggregate will find one group excelling, while a comparison by partitioning the data will find the other group excelling. Some situations where the paradox actually arose include:
Through an internet search, you can find more about any of these situations, and many other situations as well.