# Examples of Simpson's Paradox

I’m looking for fresh examples of Simpson’s paradox for use in my statistics courses. The examples I’ve been using are fine, but I’d like to have some new ones, and I’m hoping folks here might know a few.

I’m familiar with the ones on the Wikipedia page: the gender bias lawsuit at Berkeley, batting averages, mortality rates among low birth babies, party voting on the Civil Rights Act, and success rates of kidney stone treatments. I’m also familiar with the example of survival rates on the Titanic and the one about delayed flights at America West vs. Alaska Airlines. These are all good ones, but, like I said, I’m looking for more. Does anybody know any others?

For those who aren’t familiar with it, Simpson’s paradox is essentially a property of unreduced fractions: It’s possible to have, simultaneously, $$\frac{a_1}{b_1} < \frac{c_1}{d_1} \text{ and } \frac{a_2}{b_2} < \frac{c_2}{d_2}, \text{ but } \frac{a_1 + a_2}{b_1+b_2} > \frac{c_1+c_2}{d_1+d_2}.$$

For example, $\frac{1}{3} < \frac{34}{100}$ and $\frac{66}{100} < \frac{2}{3}$, but $\frac{67}{103} > \frac{36}{103}$. It can be hard to spot this happening in a given real-world scenario.

#### Solutions Collecting From Web of "Examples of Simpson's Paradox"

Example: Effect of race on death-penalty sentences in Florida murder cases.

NB: This is adapted from Subsection 2.3.2 of A. Agresti (2002), Categorical Data Analysis, 2nd ed., Wiley, pp. 48-51.

In a 1991 study by Radelet and Pierce of the effect of race on death-penalty sentences, the following table was obtained tabulating the death-penalty sentences ($\text{Death}$) and non-death-penalty sentences ($\text{No death}$) in murder convictions in the state of Florida.
$$\begin{array}{lrrr} \text{Defendant's race} & \text{Death} & \text{No death} & \text{Percent death} \\ \hline \text{Caucasian} & 53 & 430 & 11.0 \\ \text{African-American} & 15 & 176 & 7.9 \end{array}$$

From this table, we see Caucasian defendants received the death penalty more often than African-American defendants.

Now, we consider the very same data, except that we stratify according to the race of the victim of the murder. Below is the table.

$$\begin{array}{llrrr} \text{Victim's race} & \text{Defendant's race} & \text{Death} & \text{No death} & \text{Percent death} \\ \hline \text{Caucasian} & \text{Caucasian} & 53 & 414 & 11.3 \\ \text{Caucasian} & \text{African-American} & 11 & 37 & 22.9 \\ \text{African-American} & \text{Caucasian} & 0 & 16 & 0.0 \\ \text{African-American} & \text{African-American} & 4 & 139 & 2.8 \end{array}$$

Here we see that when considering the cases involving Caucasian victims separately from the cases involving African-American victims, that the African-American defendants are more likely than Caucasian ones to receive the death penalty in both instances (22.9% vs 11.3% in the first case and 2.8% vs. 0.0% in the second case).

Thus, this is a clear instance of Simpson’s paradox.

(A similar previous study in 1981 by Radelet observed the same effect.)

Sailors in the U.S. Navy who went overboard at sea were found to be more likely to be rescued if they were not wearing life jackets than if they were. The explanation was that they wore life jackets in bad weather but not in good weather. In either good weather or bad, they were more likely to be rescued while wearing life jackets, but overall, they were more likely to be rescued while not wearing life jackets. The data are in an introductory text by Danny Kaplan, which I don’t have before me.

Here’s an artificial example. Imagine two major-league baseball players, Puckett and Smith. Puckett has 600 at-bats during the season and gets 200 hits, for a .333 season average. Smith gets called up to the majors in time for the last game of the season, has three at-bats, and gets three hits, for a season-average of 1.000. Thus Smith’s batting average for the season is higher than Puckett’s. The next year, Smith has 500 at-bats and gets 125 hits, for an average of .250. Puckett plays in the first game and the next morning gets hit by a truck while crossing the street, and can’t play for the rest of the season. He gets no hits in the first game. So once again, Smith’s average for the season is higher than Puckett’s. Two years in a row, Smith’s average was higher than Puckett’s. But Puckett’s average for the two seasons combined is higher than Smiths.