In this lab, we will use S-Plus to carry out simple and multiple regressions for using the Low Birth Weight data. Read in the data lowbwt (since you have used it before, you may only need to restore your previous workspace; if not download it and read it in again)
Simple Linear Regression (Ex 18:9)
Construct a twoway scatter plot of systolic blood pressure (sbp) versus gestational age (gestage). What does this suggest about the use of linear regression?
Using sbp as the repsonse and gestage as the explanatory variable, find the least squares line. To fit the OLS regression in S-Plus, go to the Statistics menu, and select Regression > linear. In the Dialog box, select sbp for the Dependent (response) variable and gestage for the Independent variable. The fomula window describes the model that we are fitting, i.e. the mean for spb is modelled as a linear function of gestage.
The other tabs in the dialog box control the output and plots.
To create a residual plot, select the "Plot" tab, and check off the boxes for the plots that you would like to create. Select Residuals vs Fit and Residuals Normal QQ.
For the Output tab, select Long Output and ANOVA table.
To create fitted values, confidence intervals, and se of fitted means, select the "Predict" tab. We will find the fitted values for all cases. Give a new name to save the output, i.e. ex8, select all three boxes. This output will include the 95% confidence interval. directly. To calculate the prediction interval, as in part(g) will require some hand calculations. i.e. the SE(predict) = sqrt(S(Y|X)^2 + SE(fit)^2) where S(Y|X) is the residual standard error and SE(fit) is the SE for the fitted mean at x.
Answer the questions in problem 18:9
Multiple Regression (exercises 19:8-9)
There are other variable that may be important in predicting sbp. Create a scatter plot of sbp versus apgar5 Is there a linear relationship?
To include apgar5 in the regression model, go to the Statistics menu and select Regression linear. Again select sbp as the Dependent variable, and select both gestage and apgar5 as the independent variables. (you may need to use a Control-Click to select both. Repeat the steps previously to produce residual plots, ANOVA tables, and predictions. Click on OK.
Repeat the above steps to fit a model with independent variables gestage and sex.
To fit a model that allows an interaction between sex and gestational age, go to the linear regression dialog. Rather than selecting terms for the model, this time click on Create Formula. Choose sbp as the Response. To fit the model with gestage, sex and the interaction between sex and gestage, Control-click to select both sex and gestage variables. Click on the "Main + Interact: (*)" button; your model formula should look like sbp ~ sex*gestage which fits "main" effects of sex and gestage plus their interaction. Note: a * in a fomula does not mean multiplication! Click on OK, to return to the linear regression dialog, select what plots that you would like to create and then click on OK to run the regression.
Creating plots with the regression lines is a little trick, but feasible with patience!
First, save your predictions in your lowbwt data frame.
Create a scatterplot of sbp versus gestage. Relabel the axes nicely, and add a title.
To add the two regression lines, go to the Insert menu and select Plot. Select Line Plot as the type to insert. In the dialog, select gestage as the x axis, and fit as the y column. For the Subset Rows with field, enter sex==0 (if your lines look like they were created by a Spirograph, select Presort data XY on X in the Smooth/Sort tab)
This will plot the line only for sex = 0. Click OK. Repeat, but use sex==1. You can change the color and line type in the other tabs.
You may also want to create and add the confidence intervals.