Statistics 101
Data Analysis and Statistical Inference

Instructions for Lab 10

Lab Objective: The purpose of this lab is to perform chi-squared goodness of fit and independence tests using JMP.

Lab Procedures

To open JMP, click on the Start Menu, then select Programs – Mathematics & Statistics – JMP IN 5.1 – JMP IN 5.1.  You can click on the title box to make it disappear.

Unit 1:  Goodness of Fit Tests

In the United States, you are supposed to be tried by a jury of your peers.  Does this really happen in practice?  A study in the UCLA Law Review (1973) of grand juries in Alameda County, California, compared the demographic characteristics of a random sample of jurors with the general population.  Below are the data for age and educational level.  Only persons 21 and over are considered; the population data are known from the Public Health Department.

 `Age` `County-wide %` `# of Jurors` `21 – 40` `42` `5` `41 – 50` `23` `9` `51 – 60` `16` `19` `> 61` `19` `33` `Total` `100` `66`
` `
` `
 `Educational Level` `County-wide %` `# of Jurors` `Elementary` `28.4` `1` `Secondary` `48.5` `10` `Some College` `11.9` `16` `College Degree` `11.2` `35` `Total` `100` `62`

1.  Test whether the juries appear to be randomly selected with respect to the distribution of age.  Use a chi-squared goodness of fit test.  Report your hypotheses, the sample percentages in each age group, the value of the chi-squared test statistic and its degrees of freedom, the p-value, and your conclusion. Use a chi-square table here

To perform a chi-squared goodness of fit test in JMP:

Create a new data table in JMP (File – New – Data Table).  Enter the data in two columns, the first containing the age categories and the second containing the counts of those categories (not the percentages).  Edit each variable by right clicking on the variable name at the top of the column and select Column Info.  Make the age categories variable a character variable (under “Data Type”), and make the counts variable continuous (under “Modeling Type”).  Don’t forget to name the variables something useful as well.

Select Analyze – Distribution.  Enter the variable of age categories as the Y variable, and enter the variable of counts in the Freq box.  This tells JMP that the variable of counts is the frequency of each category.  Click OK to get the sample percentages in each age category.

Click on the red arrow next to the variable name, and select Test Probabilities.  For each age category, enter the probability from the null hypothesis (use the population percentages; JMP will automatically convert percentages to proportions), and select Done.  The output for the chi-squared goodness of fit test is in the row labeled "Pearson."  The first entry is the value of the chi-squared test statistic; the second entry is the degrees of freedom (number of categories – 1); and the last entry is the p-value from the appropriate chi-squared distribution.

2.  Perform the chi-squared goodness of fit test for the education data by hand and report your conclusion.  Please show all your work on the lab report, but clearly mark your final answer for: the value of the chi-squared test statistic, its degrees of freedom, and the p-value.  Please round to two decimal places.  You can check your work with JMP.

The equation for the chi-squared test statistic is: ∑ (observed count – expected count)2 / expected count

The degrees of freedom is calculated as: (number of rows – 1) * (number of columns – 1)

3.  Now, alter the education data by moving 25 jurors from the college degree category to the secondary educational level (so now there are 10 jurors with college degrees and 35 jurors at the secondary educational level).  Re-run the JMP chi-squared goodness of fit test and report the test statistic and the p-value.  In one sentence, state how (if at all) this changes your conclusion.

Unit 2: Independence Tests

Do people's opinions of their appearance change with age?  In a survey reported in Newsweek magazine (Spring/Summer 1999), 747 randomly selected women were asked, "How satisfied are you with your overall appearance?"  The numbers of women who chose each of four answers are shown in the table below.

 `Age` `Very` `Somewhat` `Not too` `Not at all` `Under 30` `45` `82` `10` `4` `30 – 49` `73` `168` `47` `6` `Over 50` `106` `153` `41` `12`

4.  Test whether women’s statisfaction with their appearance is associated with age.  Use a chi-squared independence test.  Report your hypotheses, the sample percentages, the value of the chi-squared test statistic and its degrees of freedom, the p-value, and your conclusion.

To perform a chi-squared independence test in JMP, enter the data in three columns, the first containing the age categories, the second containing the satisfaction levels, and the third containing the counts in those categories.   You should have 12 rows total in the dataset.  Make the variables for age categories and satisfaction levels character variables, and the variable for counts a continuous variable (you can edit a variable by right clicking on the variable name at the top of the column and selecting Column Info).

Select Analyze – Fit Y by X.  Enter the variable of satisfaction levels as the Y variable, the variable of age categories as the X variable, and the variable of counts as the Freq variable.  Click OK to get a contingency table of percentages in each category.  In the contignency table, there are four values in each cell.  The top value is the count of units that fall in that row and that column.  The second value in each cell is the percentage of units in the entire data set that fall in that cell of the table.  The third value in each cell is the percentage of units in the row, given that they are in the column.  The last value in each cell is the percentage of units in the column, given that they are in the row.  You can see the expected count in each cell by clicking on the red arrow next to “Contingency Table” and selecting Expected.

The output from the chi-squared test of independence is in the row labeled "Pearson."  The first entry is the value of the chi-squared test statistic; the second entry is the degrees of freedom; and the last entry is the p-value from the appropriate chi-squared distribution.

5.  Assuming the null hypothesis is true, obtain by hand the expected number of women under age 30 in a random sample of 747 women who would be very satisfied with their appearance.  Show exactly what you multiplied together to obtain the expected count.

6.  Suppose you mixed up the data by accident, and really there were 153 very satisfied women over age 50 and 106 somewhat satisfied women over age 50.  Would the chi-squared test statistic get smaller or larger?  Would the p-value for the chi-squared test get smaller or larger?  (You should be able to answer this without redoing the chi-squared test.)  Briefly explain your answer.

PLEASE DON’T FORGET TO LOG OFF YOUR COMPUTER.