Statistics 101
Data Analysis and Statistical Inference
 

Instructions for lab 10


Lab Objective

The purpose of this lab is to perform chi-squared goodness of fit and independence tests using JMP.

Lab Procedures


Unit 1:  Goodness of fit tests

In the  U.S., you are supposed to be tried by a jury of your peers.   Does this really happen in practice?  A study in the UCLA Law Review (1973) of grand juries in Alameda County, California, compared the demographic characteristics of a random sample of jurors with the general population.  Below are the data for age and educational level.  Only persons 21 and over are considered; the population data are known from the Public Health Department.

Age           County-wide %     # of jurors 
----------------------------------------------
21-40 42 5
41-50 23 9
51-60 16 19
> 61 19 33
----------------------------------------------
Total 100 66


Educational Level County-wide % # of jurors
-------------------------------------------------------
Elementary 28.4 1
Secondary 48.5 10
Some college 11.9 16
College degree 11.2 35
-------------------------------------------------------
Total 100.0 62

Questions:

1)  Test whether the juries appear to be randomly selected with respect to the distribution of age.  Report the sample and population percentages in each age group, the value of the chi-squared test statistic and its degrees of freedom, the p-value, and your conclusion.  

To perform a chi-squared goodness of fit test in JMP, enter the data in two columns, the first containing the age categories and the second containing the counts in those categories (not the percentages).  Make the column of labels a character variable and the column of counts a continuous variable.  Select Analyze-Distribution.   Enter the name of the column with labels as the Y variable, and enter the column of counts as the Freq variable. This tells JMP that the column of counts is the frequency of each category.  Hit OK to get the sample percentages in each age category.  On the red arrow next to the variable name, select Test Probabilities.  Enter in the probabilities from the null hypothesis where indicated, and select Done.  The output for the chi-squared goodness of fit test is in the row labeled "Pearson."  The first entry is the value of the chi-squared test statistic; the second entry is the degrees of freedom (number of categories - 1); and the last entry is the p-value from the appropriate chi-squared distribution.

2)  Perform the chi-squared goodness of fit test for the education data by hand.  That is, show in your report the null hypothesis, the four pieces of the chi-squared test statistic including all values of (observed - expected)2/expected, the degrees of freedom, the p-value, and your conclusions.  You can use JMP to check your answer, but all the by hand work must appear to get full credit.

3)  To get more familiar with how chi-squared goodness  of fit tests work, let's play with history and change the education data.  You're allowed to change any of the 35 jurors with college degrees to other degrees.  You must keep the total at 62 people, and you only are allowed to add people to the other degrees.  You can't subtract from the other three degree counts.  Spread some of the 35 college people so that you get a p-value much closer to 0.01 that you do from the actual data.

Unit 2: Independence tests

Do people's opinions of their appearance change with age?   In a survey reported in Newsweek magazine (Spring/Summer 1999), 747 randomly selected women were asked, "How satisfied are you with your overall appearance?"  The numbers of women who chose each of four answers are shown in the table below.

Age  	      Very     Somewhat     Not Too     Not At All
-----------------------------------------------------------
Under 30 45 82 10 4
30 - 49 73 168 47 6
Over 50 106 153 41 12
-----------------------------------------------------------

Questions:

1.   Test the null hypothesis that women's satisfaction with their appearance is not associated with age.  Report the sample percentages in each age group, the value of the chi-squared test statistic and its degrees of freedom, the p-value, and your conclusion.  

To perform a chi-squared independence test in JMP, enter the data in three columns, the first containing the age labels, the second containing the satisfaction labels, and the third containing the counts in those categories (not the percentages).   You should have 12 rows total in the dataset.  Make the columns of labels character variables and the column of counts a continuous variable.  Select Analyze-Fit Y by X.   Enter the name of the column labels (satisfaction) as the Y variable, the row labels (age) as the X variable, and the column of counts as the Freq variable.  Hit OK to get a contingency table of percentages in each category.  The output from the chi-squared test of independence is in the row labeled "Pearson."  The first entry is the value of the chi-squared test statistic; the second entry is the degrees of freedom; and the last entry is the p-value from the appropriate chi-squared distribution.

In the contingency table, there are three probabilities below each count.  The top one in each cell is the percentage of units in the entire data set that fall in the cell of the table.  The middle one in each cell is the percentage of units in the row, given that they are in the column.  The bottom one in each cell is the percentage of units in the column, given that they are in the row.  You can see the expected count in each cell by clicking on the red arrow next to Contingency Table, and selecting Expected.

2.  Assuming the null hypothesis is true, obtain by hand the expected number of women under age 30 in a random sample of 747 women who would be very satisfied with their appearance.  Show exactly what you multiplied together to obtain the expected count.

3.  Suppose you mixed up the data by accident, and really there were 153 very satisfied women over age 50 and 106 somewhat satisfied women over age 50.  Would the p-value for the chi-squared test get smaller or larger?  Explain briefly.   You should be able to answer this without redoing the chi-squared test.  Your answer should be phrased in terms someone who knows nothing about chi-squared tests can understand.