Lab 8 Background:

When performing t-tests (paired or 2-sample), we must make some assumptions in order for the testing results to be valid. When these assumptions are not met, we can use what are called "nonparametric" tests (see Chapter 13) which allow us to relax some assumptions. This is seen as an advantage of nonparametric tests. However, when the assumptions of the paired t-test and 2-sample t-tests are met, these "parametric" tests are preferred because they generally have better power.


Lab 8 Objectives:

In this lab we will investigate assumptions behind the two-sample t-test and paired t-test using data from exercises in Chapter 13. The objective is to determine whether we are dealing with paired data or independent samples and whether we can use the parametric methods of Chapter 11, or, if assumptions appear to be violated, if we should use analogous nonparametric procedures in Chapter 13. Because we have not performed a paired t-test in lab yet, we will use exercise 8 data, Chapter 13, to do so while we also ask whether we should really be doing a nonparametric test instead. We will also use the data in exercises 13, 14, 15, and 16 in chapter 13.

Tools for exploratory analysis (mostly graphics) from early labs useful here.


Exercise 8

First, let's look at the data in problem 8. These have been entered in the Excel spreadsheet sar.xls. Download the file and read it in to S-Plus. Create a side-by side box plot with the two variables Air and SO2. (review early labs?). Since there is no grouping variable, specify both as Y variables. (you will need to hold down the Ctrl key and click to select a second Y variable.) Is this view useful? What does it imply about assumptions? Are you able to see that the data are paired rather than independent? Why does this matter?

You should convince yourself that the data are paired and that you should be investigating assumptions using differences. To create a variable equal to the difference in increase in SAR, use the Data menu and select Transform. Enter the name for the new target column, say diff, and then enter the expression Air - SO2 in the field for the expression. Click on Apply or OK. Create a box plot of the differences. What does this say about assumptions? Which set of box plots is relevant for this data?


Exercise 15

Let's proceed using the data for exercise 15. Read in the data lowbwt (since you have used it before, you may only need to restore your previous workspace; if not download it and read it in again). By default, S-plus will read in the data as double precision; in this case we want to treat sex as a categorical factor. Go to the Data menu, and select Change Data Type. Select the column for sex, and then under the New Type field select Factor. Click on OK.

Create a side-by-side box plot of the apgar score, using sex as the x column. Using the box plot, discuss the appropriateness of the assumptions behind using a t-test. Are the data paired or unpaired?

The apgar5 scores is on ordinal variable that takes on the values between 0 and 10. What does this imply about the normality assumption? Would the CLT be useful if our sample size was much larger?


Other Exercises

Perform similar steps to those above to investigate assumptions for data in exercise 13 (data bed), 14 (data program), and 16 (data insure). Be careful to check whether the variable is ordinal! For other problems, enter the data and investigate assumptions.


Looking Ahead

In class, we will discuss nonparametric alternatives to the t-tests. If you want to work ahead, carry out the Wilcoxon Rank (Sum) Test (or signed-rank if appropriate) in S-Plus for the data in the above data sets in exercises 8, 13-16. When should you use the rank sum test vs. the signed rank test? We'll look at some S-Plus output in class.

To run the Wilcoxon test, go to the Statistics menu and select Compare samples. For two independent samples, select 2 samples, then Wilcoxon Rank test. Specify the outcome variable and the grouping variable.

For paired data, repeat the above, enter the two outcomes as Variable 1 and Variable 2, but do not check the grouping variable box. Select Signed Rank instead of Rank Sum. If you entered data for differences, you can use the One Sampleoption to get to the Signed Rank Test also.

Without knowing how the test actually works, how do you think the p-values are interpreted?