2.4 Regressioin-Correlation Data

Two dissimilar measurements x and y are taken on each of n subjects: (x1,y1), (x2,y2), ..., (xn,yn). The objective is to quantify the relationship between the two measurements x and y.

This is an exercise in using the correlation and regression facilities in Minitab. We will use actual data from a study of achievement, aptitude, and study habits. Copy the dataset into your account:

	  cp /afs/acpub/project/sta215/achieve.data achieve.data  
Look at the data set by typing:
	  more achieve.data  
The dataset has three columns and 30 rows. Each row corresponds to one subject and reports measurements for the variables ACHSCORE (column 1) -- score on a standard achievement test; INTEL (column 2) -- a measure of innate intellectual ability, and STUDHAB (column 3) -- a measure of the quality of study habits:
     ACHSC   INTEL   STUDHAB
     47      20      55
     46      19      40
     44      10      56
     46      23      81
     36      20      49
     59      22      76
     50      23      49
     58      21      63
     52      17      69
     57      17      88
     31      14      46
     41      24      48
     49      20      31
     66      30      63
     49      17      57
     34      14      50
     57      21      61
     44      14      26
     53      21      51
     40      18      25
     53      16      64
     51      17      24
     57      24      69
     54      26      66
     46      21      52
     38      12      40
     70      25      107
     53      19      41
     43      15      35
     56      23      46
If your xterm window is too short to display all rows, more will pause after filling the screen. Hit the space bar to continue. Start Minitab:
	  minitab  
First, we need to read the data into Minitab columns:
	  read 'achieve.data' c1 c2 c3  
This will read the columns of the data set into the Minitab columns c1, c2 and c3 . To assign names to the columns type:
	  names c1='ACHSCOR'  c2='INTEL'  c3='STUDHAB'  
Don't forget the single quotes around the names!

2.4.1 Correlation Coefficient

Use the Minitab command corr to compute correlation coefficients :
	  corr 'ACHSCOR' 'INTEL' 'STUDHAB'  
Again -- don't forget the quotes around the variable names. The output of corr shows the correlation coefficients between each possible pair in one table. Type help corr for more details.
Question 4: Discuss the correlation coefficients. What do they tell about the relationships between the variables.

Click here for an answer .



Question 5: What pattern do you expect for a scatterplot of ACHSCOR versus INTEL?

Click here for an answer .


To verify your answer to the last question, type:
	  plot 'ACHSCOR' 'INTEL'  
If you want to save the plot for later printing, use the outfile command and redo the plot:
	  outfile 'plot-score-intel.lis'  
	  plot 'ACHSCOR' 'INTEL'  
	  nooutfile  
The outfile command tells Minitab to save all output -- including the plot -- in the file plot-score-intel.lis until the nooutfile directive.

2.4.2 Regression Line

To fit a regression line (i.e. ``least squares line''), use the command:
	  regress 'ACHSCOR' 1 'INTEL'  
The ``1'' tells Minitab that we are only using one ``explanatory variable'' (we will do regression with more than one explanatory variable later in the semester). For the moment, ignore most of the output except for the first two lines which state the fitted line (in the format y = a x + b) and the number labeled ``R-sq'' (which is simply the squared correlation coefficient).
Question 6: For a student with intelligence score INTEL = 30, estimate his/her achievement score ACHSCOR using the estimated regression line.

Click here for an answer .


2.4.3 Influential Points

What do you think will happen if we include an extra row in the data set on an individual with low ACHSCOR but high INTEL? Try:
	  insert 0 1 'ACHSCOR' 'INTEL'   
The command tells Minitab that we want to insert an extra row of data between row 0 and 1 (i.e. before the first row of the data file). Minitab will respond with a prompt for the new data. Type the following numbers:
	  20  30  
	  end  
Then repeat the corr command:
	  corr 'ACHSCOR' 'INTEL' 

Question 7: Why did the correlation coefficient decrease?

Click here for an answer .


To find the new values for slope and intercept of the least-squares line:
	  regress 'ACHSCOR' 1 'INTEL' 

Question 8: In what direction did the line change?

Click here for an answer .