Frank Anscombe's Regression Examples
The intimate relationship between correlation and regression raises the
question of whether it is possible for a regression analysis to be misleading in
the same sense as the set of scatterplots all of which had a correlation
coefficient of 0.70. In 1973, Frank Anscombe published a set of examples showing
the answer is a definite yes (Anscombe FJ (1973), "Graphs in Statistical
Analysis," The American Statistician, 27, 17-21). Anscombe's examples share not
only the same correlation coefficient, but also the same value for any other
summary statistic that is usually calculated.
n | 11 |
![]() |
9.0 |
![]() |
7.5 |
Regression equation of y on x |
y = 3 + 0.5 x |
![]() |
110.0 |
Regression SS | 27.5 |
Residual SS | 13.75 (9 df) |
Estimated SE of b1 | 0.118 |
r | 0.816 |
R2 | 0.667 |
Figure 1 is the picture drawn
by the mind's eye when a simple linear regression equation is reported. Yet, the
same summary statistics apply to figure 2, which shows a perfect curvilinear
relation, and to figure 3, which shows a perfect linear relation except for a
single outlier.
The summary statistics also apply to figure 4, which is the most troublesome. Figures 2 and 3 clearly call the straight line relation into question. Figure 4 does not. A straight line may be appropriate in the fourth case. However, the regression equation is determined entirely by the single observation at x=19. Paraphrasing Anscombe, we need to know the relation between y and x and the special contribution of the observation at x=19 to that relation.
x | y1 | y2 | y3 | x4 | y4 |
10 | 8.04 | 9.14 | 7.46 | 8 | 6.58 |
8 | 6.95 | 8.14 | 6.77 | 8 | 5.76 |
13 | 7.58 | 8.74 | 12.74 | 8 | 7.71 |
9 | 8.81 | 8.77 | 7.11 | 8 | 8.84 |
11 | 8.33 | 9.26 | 7.81 | 8 | 8.47 |
14 | 9.96 | 8.10 | 8.84 | 8 | 7.04 |
6 | 7.24 | 6.13 | 6.08 | 8 | 5.25 |
4 | 4.26 | 3.10 | 5.39 | 19 | 12.50 |
12 | 10.84 | 9.13 | 8.15 | 8 | 5.56 |
7 | 4.82 | 7.26 | 6.42 | 8 | 7.91 |
5 | 5.68 | 4.74 | 5.73 | 8 | 6.89 |