In this lab, we will use MatLab to illustrate simple linear
regression.
Items that we will investigate using MatLab include:
Use the data from problem 11.51 in the book to investigate the linear relationship between the AUTOMARK scoring and INSTRUCTOR scoring for grading the same computer programming assignments. We are interested in whether there is a linear relationship between them. And if so, how well does the best fitting straight line predict the INSTRUCTOR score, given the AUTOMARK score (see problem 11.51, p. 580 for more details).
Read in the data and call the first column AUTOMARK, and the second column INSTRUCTOR, or some variation thereof.
score = load('lab8.dat'); AUTOMARK = [ones(length(score),1) score(:,1)]; % X needs to be a % matrix with a leading % column of ones. INSTRUCTOR = score(:,2);
Let's investigate the the items above:
% The predictor variable (X) is going to be AUTOMARK % and the response variable (Y) is INSTRUCTOR. % First make a scatter plot of AUTOMARK vs. INSTRUCTOR plot(AUTOMARK(:,2), INSTRUCTOR, '.') % add the appropriate labels etc. % Now find the least squares regression line [b,bint,r,rint,stats] = regress(INSTRUCTOR,AUTOMARK,0.05); b % = estimates for the regression coefficients
Now add the regression line to the scatter plot.
x = 10:22; y = b(1) + b(2).*x; hold on; plot(x , y); % OR plot(AUTOMARK(:,2), INSTRUCTOR, '.', x, y);
The hypothesis Ho: b1 = 0, vs. Ha:b1 != 0 can be tested by specifying the alpha level of the test (default is 0.05). To see the test statistic, look at the estimated value for the coefficient b1. What other important information is in the coefficients? What do the coefficients mean? What is the significance of testing whether the slope is zero, i.e. in words what are you testing?
Check out the test statistic and p-value in stats:
stats % = R-squared, F-statistic, p-value
The F-statistic is the second number, and is the same as the t-statistic squared, so take the square root to find out the t-statistic. The p-value is the same for both the F and t, so you don't have to transform it. What does it signify, i.e. what test is it for?
Confidence intervals at the 1-alpha level are given by
bint % (1-alpha)% confidence interval
What can you say about the above hypothesis test by looking at the confidence interval? What sticks out to you about the confidence interval for b0?
R-squared is given in stats, and is the coefficient of determination.
stats % = R-squared, F-statistic, p-value
To find the coefficient of correlation we need to know the sign of b1, and then take the square root of R-squared
corr = sqrt(Rsd) % don't forget to define Rsd = stats(1)
What do you think r and rint are? Can you use r to find an estimate for sigma?
Now you know how to use MatLab to do the basic linear regression analysis. In the coming weeks you'll learn how to build on this to consider more than one predictor variable at a time.