each with eleven observations of two variables we'll call X and Y. Look at the following Minitab session:
MTB > read 'anscombe.1' into c1-c2 Entering data from file: anscombe.1 11 rows read. MTB > name c1 'X' c2 'Y' MTB > regress c2 1 c1; SUBC> fits = c3; SUBC> resid = c4. The regression equation is Y = 3.00 + 0.500 X Predictor Coef Stdev t-ratio p Constant 3.000 1.125 2.67 0.026 X 0.5001 0.1179 4.24 0.002 s = 1.237 R-sq = 66.7% R-sq(adj) = 62.9% Analysis of Variance SOURCE DF SS MS F p Regression 1 27.510 27.510 17.99 0.002 Error 9 13.763 1.529 Total 10 41.273 MTB > name c3 'Fit' c4 'Res' MTB > plot c4 c3 Res - * - - 1.2+ * * - - - - * 0.0+ * * * - * - - * - -1.2+ - - * - * ----+---------+---------+---------+---------+---------+--Fit 5.0 6.0 7.0 8.0 9.0 10.0Now you try it, with each of the four data sets anscombe.1, anscombe.2, anscombe.3, and anscombe.4. The remarkable thing is that the regression equation, Stdev values and t-ratios, Analysis of Variance table, etc. are IDENTICAL for all four data sets-- but the residual analysis reveals that they are quite different. Why?