STA102 Lab10

Lab 10 Objectives

In the first part of this lab we will use S-Plus to do inference for proportions as discussed in Section 14.7, Further Applications, POB pp. 335-338. Follow the book as you have S-Plus repeat the analysis; compare S-Plus output to that in the book. You may find some differences, but we will not worry about the details of how S-Plus does its calculations (check out the help file(s) if you're curious).

The second part of the lab will use S-Plus to analyze r x c contingency tables for the purpose of investigating relationships between factor variables. We'll use exercises in chapter 15 of POB.

Proportions

Turn to Section 14.7, page 335 of POB. The data on infant cognitive ability show 8 of 33 infants born with perinatal growth failure have IQs less than 70 ("cognitive deficiency"). We will use S-Plus to construct a 99% CI (as on page 335) and to test the hypothesis that the proportion of infants with perinatal growth failure having IQs less than 70 is p=0.032 (that of the "normal" infant population). There's no need to use a data sheet in S-Plus.

In S-Plus, navigate the menus: Statistics>Compare Samples>Counts and Proportions>Proportions Parameters.... The Proportions Test dialog box will appear. We don't need to specify a Data Set, so make sure nothing is entered in this field. In the Success Variable field, put 8, and put 33 in the Trials Variable field. Put 0.032 into the Proportions Variable field. The Alternative Hypothesis is two.sided and the Confidence Level is 0.99. Finally, unselect the Apply Yates' Continuity Correction box (Hmmmm, we talked about this when we discussed contingency tables.). Press OK.

You should be able to decipher the output. Notice that S-Plus uses a chi-square random variable (1 df) for the hypothesis test. This is equivalent to using the normal approximation (z squared is a chi-square with 1 df; take the square root of the resulting chi-square statistic; it should be close to the z statistic in the book.) When I did this, the CI was not symmetric about the estimated proportion (like in your book). Again, S-Plus is calculating things a bit different than in the book, although its help files do not indicate that there should be a difference (?!)

Next we will compare proportions between 2 samples and construct a confidence interval for the difference in proportions (see pp. 336 - 338). Without explaining the data (see text for that), we go immediately to S-Plus.

Follow the same steps as above to get to the Proportions Test dialog box. We don't need to specify a Data Set, so make sure nothing is entered in this field. In the Success Variable field, put c(292, 397), and put c(658,1580) in the Trials Variable field. Leave the Proportions Variable field blank (defaults to difference of proportions equals zero). The Alternative Hypothesis is two.sided and the Confidence Level is 0.95. Finally, unselect the Apply Yates' Continuity Correction box. Press OK.

Compare your answers to that in the book. Again, the answers may be slightly different. Again, note the relationship between the chi-square and the z statistic.

Contingency Tables

Next, we use S-Plus to analyze data from r x c contingency tables. We can either enter the counts as a contingency table directly into a data set or we can enter the raw data as two columns and have the program automatically tabulate the counts. Once the data are entered we have to decide if they are independent samples (and thus we should use the chi-square test) or if the observations are "paired" or matched (in which case we should use McNemar's test; we'll discuss this next week in lecture)

Look at the description of exercise 20 and exercise 21 in Chapter 15 -- Which tests should be used? How many levels do the two "factors" have?

Let's start with exercise 20. Download and read in the data for exercise 20, angio. Each row corresponds to one site. Look at row one; what type of geographic area and appropriate use does this site have?

To construct a contingency table from the raw data, navigate the menus: Statistics>Data Summaries>Crosstabulations...Select <all> for the variables, then click OK. The table with various summaries will appear in the report window (we've seen something similar in our notes). There is a chi-square statistic in this output which is not appropriate if the data are paired. Conclusions?

To carry out either a chi-square test or McNemar's test, navigate the menus: Statistics>Compare Samples> Counts and Proportions. Then, select either the Chi-square Test... or McNemar's Test... depending on which is appropriate.

In either case, because we have the raw data, specify that site is Variable 1 and appropro is Variable 2 (or in the other order). Do not click the box indicating data is a contingency table. Click OK; the output will be in the Reports Window.

Repeat these steps for the alcohol data in exercise 21.

What conclusions would you make? You should be able to identify the hypotheses being tested, know which procedure to use, and be able to interpret the output from a test.

For practice: given the summary contingency tables from the crosstabulation output, go back and verify the test statistics by hand.

etc.

For checking other homework exercises, you may enter the contingency table as a new data set. Go to the File menu, and select New... then Data set. Enter the data for the r rows and c columns as a table. To carry out a chi-square or McNemar's test, navigate the menus:Statistics>Compare Samples>Counts and Proportions and then the corresponding test that you would like to do. Rather than specifying Variable 1 and Variable 2, click the box that indicates that this is a contingency table.