Lab 12 Objectives

In this lab, we will use S-Plus to explore the relationships among (ranks of) 5 different variables as described in Exercse 9, Chapter 17 (page 414 POB). For this purpose we will construct a scatter plot matrix and a correlation matrix. We then use Exercise 9, Chapter 18 to illustrate simple linear regression.

Scatter Plots and Correlation

Start by importing the actions data set for Exercise 9, Chapter 17. You should see 5 columns consisting of (the ranks of) the number of disciplinary actions per 1000 doctors in each state (and DC) for each of 5 years. So, our scatter plots will show the ranks, not the raw data, and the correlations we calculate will be the Spearman's rank correlation (although it's calculated in the same way as the Pearson correlation)

Instead of constructing a scatter plot for each pair of ranks one at a time (5 choose 2 = 10), we construct all the plots at once in a scatter plot matrix, often used for just this purpose of exploring the relationship among several variables (here ranks). From the pull-down menus: Graph>2D.... In the Insert Graph dialog box set Axes Type: Matrix. There is only one selection available for Plot Type:; select it and press OK. In the Scatter Plot Matrix dialog box enter the following settings: Data Set: actions, x Columns: rank91, rank92, rank93, rank94, rank95, and y Columns: rank91, rank92, rank93, rank94, rank95 (same as the x Columns:). It's nice if you select the variables for the x and y columns in this order, but this is not necessary. Press OK. You should see a scatter plot matrix of the ranks. Cool, eh? Notice how the strength of the correlation tends to die off as the years get further apart. You should be able to answer all of the questions in Exercise 9. No need to do (g) for this lab. We'll do part (d) next.

Now we calculate the (Spearman's rank) correlation matrix that corresponds to the scatter plot matrix that you just created. From the pull-down menus: Statistics>Data Summaries>Correlations.... In the Correlations and Covariances dialog box, select Data Set: actions and Variables: rank91, rank92, rank93, rank94, rank95. Make sure the Statistic Type: button is set to Correlations. Press OK. You should see a correlation matrix in the Report Window. Notice how the correlations matrix confirms what we saw in the scatter plot matrix: the correlations die off with time. Remeber, we just used the Pearson correlation equation to calculate correlations among ranks. So, we really have the Spearman's correlation matrix indicating the linear association among ranks, not necessarily the linear association among the raw data (which we don't have). Now, on to regression.

Simple Linear Regression

First, import the lowbwt data set for Exercise 9, Chapter 18. You may already have this data set from a previous lab. We'll work with the systolic blood pressure sbp and gestational age gestage variables.

We have not covered all of Chapter 18, so some of the things we do here may be unfamiliar. But, if you just go through the motions, perhaps the concepts will be more familiar to you when we discuss them in lecture. If you need help, ask your TA. If it gets to be too confusing, you can come back next week and finish it up after we've discussed more linear regression.

Construct a 2-way scatter plot of systolic blood pressure (sbp) versus gestational age (gestage). (See the end of lab 1 if you forget how to do this, make your TA earn her/his salary, or just follow your nose.) What does this suggest about fitting a line to the data?

Next, we determine the regression line using sbp as the repsonse and gestage as the explanatory variable. To fit the OLS (ordinary least squares) regression line in S-Plus, use the menus: Statistics>Regression>Linear.... In the dialog box select Data Set: lowbwt; Dependent: sbp (response variable) and Independent: gestage (explanatory variable). The Formula: window describes the model that we are fitting, i.e. the mean for sbp is modeled as a linear function of gestage (the intercept is implied). (DON'T press OK yet)

The other tabs in the dialog box control the output and plots. To create a residual plot, select the Plot tab, and check off the boxes for the plots that you would like to create. Select Residuals vs Fit and Residuals Normal QQ. On the Results tab, select Long Output and ANOVA table.

To create fitted values (y-i-hat), confidence intervals for the mean, and se of fitted means, select the Predict tab; this will give the fitted values (y-i-hat) for all cases. We'll save stuff to the lowbwt data set: Save in: lowbwt; and select all three boxes. Now press OK.

You should note several things that happen. You'll see a Report Window with the estimated regression line and some ANOVA results. Discuss with your TA. Print and bring to class on Tuesday; we'll repeat the analysis. You'll also see a Graph Sheet with 2 tabs, one for each of the plots you asked for. Discuss with your TA. Print these and bring to class. Finally, notice that 4 columns were added to the lowbwt data sheet. We'll use these for plotting next.

Browse the questions in Exercise 18 Chapter 9; I don't expect you to be able to answer them until next week.

Plots

Creating plots with the estimated regression lines is a little tricky, but feasible with patience!

Assuming you've saved the fitted values, etc., to the lowbwt data sheet as described above, now create (again, if you don't still have the first one) a scatterplot of sbp versus gestage. You might relabel the axes nicely, and add a title.

To add the previously estimated regression line to the scatter plot, go to the menu:Insert>Plot.... Select Line Plot as the type to insert. In the dialog box, select x Column: gestage, and y Column: fit. Press OK. If your lines look like a Spirograph drawing of spahgetti, or just plain wrong, repeat, but this time select Pre-Sort Data: XY on X on the Smooth/Sort tab when inserting lines to the scatter plot

To add the confidence interval lines to the plot (assuming you saved them in lowbwt), repeat (twice) the steps above for adding the regression line, but change the y Column: to LCL95 for the lower bound line and to UCL95 for the upper bound line. Cool. I expect you will have questions when you come to class next. Print out whatever you think will be helpful.