In this lab we will investigate assumptions behind the two-sample t-test and paired t-test in more examples to prepare for material in Chapter 13. The data below are used in exercises 13, 14, 15, and 16 in chapter 13.

Tools for exploratory analysis are useful here.

Let's proceed using the data for exercise 15. Read in the data lowbwt (since you have used it before, you may only need to restore your previous workspace; if not download it and read it in again)

By default, S-plus will read in the data as double precision; in this case we want to treat sex as a categorical factor. Go to the Data menu, and select Change Data Type. Select the column for sex, and then under the New Type field select Factor. Click on OK.

Create a side-by-side boxplot of the apgar score, using sex as the x column.

Using the boxplot, discuss the appropriateness of the assumptions behind using a t-test. First, are the data paired or unpaired?

The apgar5 scores is on ordinal variable that takes on the values between 0 and 10. What does this imply about the normality assumption? Would the CLT be useful if our sample size was much larger?


Let's look at the data in problem 8. These have been entered in the Excel spreadsheet sar.xls. Download the file and read it in to S-Plus. Create a side-by side boxplot with the two variables Air and SO2. Since there is no grouping variable, specify both as Y variables. (you will need to hold down the Ctrl key and click to select a second Y variable.) Is this view useful? What does it imply about assumptions? (Are the data paired or independent? does this matter?)

To create a variable equal to the difference in increase in SAR, use the Data menu and select Transform. Enter the name for the new target column, say diff, and then enter the expression Air - SO2 in the field for the expression. CLick on Apply or OK. Create a boxplot of the differences; what does this say about assumptions?

Which set of boxplots is relevant here?


Repeat these steps/questions for exercise 13 (data bed), 14 (data program), and 16 (data insure). Be careful to check whether the variable is ordinal! For other problems, enter the data and investigate assumptions.


In class Thursday we'll discuss nonparametric alternatives to the t-tests. If you want to work ahead, carry out the Wilcoxon Rank (Sum) Test in S-Plus for the data in the above data sets in 13-16 and the Wilcoxon Signed Rank test for the data on differences in exercise 8. We'll discuss the output and tests in class.

To run the Wilcoxon test, go to the Statistics menu and select Compare samples. For two independent samples, select 2 samples, then Wilcoxon Rank test. Specify the outcome variable and the grouping variable.

For paired data, repeat the above, enter the two outcomes as Variable 1 and Variable 2, but do not check the grouping variable box. Select Signed Rank instead of Rank Sum. If you entered data for differences, you can use the One Sample option to get to the Signed Rank Test also.

Without knowing how the test actually works, how do you think the p-values are interpreted?