1.3. Regressioin-Correlation Data

Two dissimilar measurements x and y are taken on each of n subjects: (x1,y1), (x2,y2), ..., (xn,yn). The objective is to quantify the relationship between the two measurements x and y.

This is an exercise in using the correlation and regression facilities in Minitab. We will use actual data from a study of achievement, aptitude, and study habits. Copy the dataset into your account:

	  cp ~peter/achieve.data achieve.data  
Look at the data set by typing:
	  more achieve.data  
The dataset has three columns and 30 rows. Each row corresponds to one subject and reports measurements for the variables ACHSCORE (column 1) -- score on a standard achievement test; INTEL (column 2) -- a measure of innate intellectual ability, and STUDHAB (column 3) -- a measure of the quality of study habits:
     47      20      55
     46      19      40
     44      10      56
     46      23      81
     36      20      49
     59      22      76
     50      23      49
     58      21      63
     52      17      69
     57      17      88
     31      14      46
     41      24      48
     49      20      31
     66      30      63
     49      17      57
     34      14      50
     57      21      61
     44      14      26
     53      21      51
     40      18      25
     53      16      64
     51      17      24
     57      24      69
     54      26      66
     46      21      52
     38      12      40
     70      25      107
     53      19      41
     43      15      35
     56      23      46
If your xterm window is too short to display all rows, more will pause after filling the screen. Hit the space bar to continue. Start Minitab:
	  minitab  
First, we need to read the data into Minitab columns:
	  read 'achieve.data' c1 c2 c3  
This will read the columns of the data set into the Minitab columns c1, c2 and c3 . To assign names to the columns type:
	  names c1='ACHSCOR'  c2='INTEL'  c3='STUDHAB'  
Don't forget the single quotes around the names!

1.3.1 Correlation Coefficient

Use the Minitab command corr to compute correlation coefficients :
	  corr 'ACHSCOR' 'INTEL' 'STUDHAB'  
Again -- don't forget the quotes around the variable names. The output of corr shows the correlation coefficients between each possible pair in one table. Type help corr for more details.
Question 4: Discuss the correlation coefficients. What do they tell about the relationships between the variables.
Question 5: What pattern do you expect for a scatterplot of ACHSCOR versus INTEL? To verify your answer to the last question, type:
	  plot 'ACHSCOR' 'INTEL'  
If you want to save the plot for later printing, use the outfile command and redo the plot:
	  outfile 'plot-score-intel.lis'  
	  plot 'ACHSCOR' 'INTEL'  
	  nooutfile  
The outfile command tells Minitab to save all output -- including the plot -- in the file plot-score-intel.lis until the nooutfile directive.

1.3.2 Regression Line

To fit a regression line (i.e. ``least squares line''), use the command:
	  regress 'ACHSCOR' 1 'INTEL'  
The ``1'' tells Minitab that we are only using one ``explanatory variable'' (we will do regression with more than one explanatory variable later in the semester). For the moment, ignore most of the output except for the first two lines which state the fitted line (in the format y = a x + b) and the number labeled ``R-sq'' (which is simply the squared correlation coefficient).
Question 6: For a student with intelligence score INTEL = 30, estimate his/her achievement score ACHSCOR using the estimated regression line.

1.3.3 Influential Points

What do you think will happen if we include an extra row in the data set on an individual with low ACHSCOR but high INTEL? Try:
	  insert 0 1 'ACHSCOR' 'INTEL'   
The command tells Minitab that we want to insert an extra row of data between row 0 and 1 (i.e. before the first row of the data file). Minitab will respond with a prompt for the new data. Type the following numbers:
	  20  30  
	  end  
Then repeat the corr command:
	  corr 'ACHSCOR' 'INTEL' 

Question 7: Why did the correlation coefficient decrease?
To find the new values for slope and intercept of the least-squares line:
	  regress 'ACHSCOR' 1 'INTEL' 

Question 8: In what direction did the line change?