each with eleven observations of two variables we'll call X and Y. Look at the following Minitab session:
MTB > read 'anscombe.1' into c1-c2
Entering data from file: anscombe.1
11 rows read.
MTB > name c1 'X' c2 'Y'
MTB > regress c2 1 c1;
SUBC> fits = c3;
SUBC> resid = c4.
The regression equation is
Y = 3.00 + 0.500 X
Predictor Coef Stdev t-ratio p
Constant 3.000 1.125 2.67 0.026
X 0.5001 0.1179 4.24 0.002
s = 1.237 R-sq = 66.7% R-sq(adj) = 62.9%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 27.510 27.510 17.99 0.002
Error 9 13.763 1.529
Total 10 41.273
MTB > name c3 'Fit' c4 'Res'
MTB > plot c4 c3
Res - *
-
-
1.2+ * *
-
-
-
- *
0.0+ * * *
- *
-
- *
-
-1.2+
-
- *
- *
----+---------+---------+---------+---------+---------+--Fit
5.0 6.0 7.0 8.0 9.0 10.0
Now you try it, with each of the four data sets
anscombe.1, anscombe.2,
anscombe.3, and anscombe.4.
The remarkable thing is that the regression equation, Stdev values and
t-ratios, Analysis of Variance table, etc. are IDENTICAL for all four data
sets-- but the residual analysis reveals that they are quite different. Why?