Statistics 101
Data Analysis and Statistical
Inference
Instructions for lab 10
Lab Objective
The purpose of this lab is to perform chi-squared goodness of fit and
independence tests using JMP.
Lab Procedures
Unit 1: Goodness of fit tests
In the U.S., you are
supposed to be tried by a jury of your peers. Does this really
happen in practice? A study in the UCLA Law Review (1973) of grand
juries in Alameda County, California, compared the demographic
characteristics of a random sample of jurors with the general
population. Below are the data for age and educational level.
Only persons 21 and over are considered; the population data are
known from the Public Health Department.
Age County-wide % # of jurors
----------------------------------------------
21-40 42 5
41-50 23 9
51-60 16 19
> 61 19 33
----------------------------------------------
Total 100 66
Educational Level County-wide % # of jurors
-------------------------------------------------------
Elementary 28.4 1
Secondary 48.5 10
Some college 11.9 16
College degree 11.2 35
-------------------------------------------------------
Total 100.0 62
Questions:
1) Test whether the juries appear to be randomly selected with
respect to the distribution of age. Report the sample and
population percentages in each age group, the value of the chi-squared
test statistic and its degrees of freedom, the p-value, and your
conclusion.
To perform a chi-squared goodness of fit test in JMP, enter the data in
two columns, the first containing the age categories and the second
containing the counts in those categories (not the percentages).
Make the column of labels a character variable and the column of
counts a continuous variable. Select Analyze-Distribution. Enter
the name of the column with labels as the Y variable, and enter the column
of counts as the Freq variable.
This tells JMP that the column of counts is the frequency of each
category. Hit OK to get
the sample percentages in each age category. On the red arrow next
to the variable name, select Test
Probabilities. Enter in the probabilities from the null
hypothesis where indicated, and select Done.
The output for the chi-squared goodness of fit test is in
the row labeled "Pearson." The first entry is the value of the
chi-squared test statistic; the second entry is the degrees of freedom
(number of categories - 1); and the last entry is the p-value from the
appropriate chi-squared distribution.
2) Perform the chi-squared goodness of fit test for the education
data by hand.
That is, show in your report the null hypothesis, the four pieces
of the chi-squared test statistic including all values of (observed -
expected)2/expected, the degrees of freedom, the p-value,
and your conclusions. You can use JMP to check your answer, but
all the by hand work must appear to get full credit.
3) To get more familiar with how chi-squared goodness of
fit tests work, let's play with history and change the education data.
You're allowed to change any of the 35 jurors with college degrees
to other degrees. You must keep the total at 62 people, and you
only are allowed to add people to the other degrees. You can't
subtract from the other three degree counts. Spread some of the
35 college people so that you get a p-value much closer to 0.01 that
you do from the actual data.
Unit 2: Independence tests
Do people's opinions of their appearance change with age? In a
survey reported in Newsweek magazine
(Spring/Summer 1999), 747 randomly selected women were asked, "How
satisfied are you with your overall appearance?" The numbers of
women who chose each of four answers are shown in the table below.
Age Very Somewhat Not Too Not At All
-----------------------------------------------------------
Under 30 45 82 10 4
30 - 49 73 168 47 6
Over 50 106 153 41 12
-----------------------------------------------------------
Questions:
1. Test the null
hypothesis that women's satisfaction with their appearance is not
associated with age. Report the sample percentages in each age
group, the value of the chi-squared test statistic and its degrees of
freedom, the p-value, and your conclusion.
To perform a chi-squared independence test in JMP, enter the data in
three columns, the first containing the age labels, the second
containing the satisfaction labels, and the third containing the counts
in those categories (not the percentages). You should have 12
rows total in the dataset. Make the columns of labels character
variables and the column of counts a continuous variable. Select Analyze-Fit Y by X. Enter
the name of the column labels (satisfaction) as the Y variable, the row labels (age)
as the X variable, and the
column of counts as the Freq variable.
Hit OK to get a contingency
table of percentages in each category. The output from the
chi-squared test of independence is in the row labeled "Pearson."
The first entry is the value of the chi-squared test statistic;
the second entry is the degrees of freedom; and the last entry is the
p-value from the appropriate chi-squared distribution.
In the contingency table, there are three probabilities below each
count. The top one in each cell is the percentage of units in the
entire data set that fall in the cell of the table. The middle one
in each cell is the percentage of units in the row, given that they are
in the column. The bottom one in each cell is the percentage of
units in the column, given that they are in the row. You can see
the expected count in each cell by clicking on the red arrow next to Contingency Table, and selecting Expected.
2. Assuming the null hypothesis is true, obtain by hand the
expected number of women under age 30 in a random sample of 747 women
who would be very satisfied with their appearance. Show exactly
what you multiplied together to obtain the expected count.
3. Suppose you mixed up the data by accident, and really there
were 153 very satisfied women over age 50 and 106 somewhat satisfied
women over age 50. Would the p-value for the chi-squared test get
smaller or larger? Explain briefly. You should be able to
answer this without redoing the chi-squared test. Your answer
should be phrased in terms someone who knows nothing about chi-squared
tests can understand.