Lab: Confidence intervals and t-tests using S-Plus
Purpose and summary of procedures
The purposes of today's lab are
- learn how to use S-Plus to conduct t-tests
- increase our understanding of hypothesis testing and type II error
Description of the data
This dataset contains information about the salaries, education, and
experience for a sample of 93 bank employees, who began their time at
the bank in skilled, entry-level clerical positions. (The bank was
sued for alleged gender discrimination in salaries, and this sort of
data came under intense scrutiny.) The data was released in the
report "Harris Trust and Savings Bank: An Analysis of Employee
Compensation", Report 7946, Center for Mathematical Studies in
Business and Economics, University of Chicago Graduate School of
Business, 1979. We have the following variables in the data set (in
this order):
- salhire: Annual salary at the time of hiring (in U.S. dollars)
- sal1977: Salary as of March 1977 (in U.S. dollars)
- educatn: Educational level (in years)
- expernce: Work experience before coming to work at the bank (in months)
Obtaining the data
You can obtain this data by clicking here. Use the Netscape menus (File, then Save As)to save the data
to a file on your desktop (say "bankdata.txt").
Starting S-Plus and getting the data
To start S-Plus, click Start, then Programs, then
Statistics & Mathematics, then S-PLUS 2000. To read
the data into S-Plus, choose from the S-Plus menus File, then
Import Data, then From file. Choose the file that
you have just saved. You should have a spreadsheet open with data in
four columns. Note: By default, S-Plus will name the dataset
according to the name of the file from which the data was imported.
Using S-Plus for t-tests
1.Let's say we're interested in conducting a t-test to know
whether the mean salary in 1977("sal1977") differed significantly from
$10250. We'd like to test at significance level 5%. To obtain this,
go to Statistics, Compare samples, One
sample, and finally t Test. (S-Plus can still use a
t-test for sample sizes over 30 since it can give better numerical
approximations of the quantiles of t distribution than we have
available in the book's t table.)
- How can you change the significance level of your t test? For
instance, can you perform a test with alpha=0.10 or alpha=0.01?
- How can you change the value for the mean salary in 1977 hypothesized in the null hypothesis?
- For what values of mu in the null hypothesis would the null hypothesis be rejected, with alpha=0.10? Hint:Generate a (1-alpha)%=90% confidence interval for mu.
2.Let's say that we want to know beta, the probability of type
II error, for the test above (with alpha=0.10), if the mean salary is
really $10500. We use the fact that power=1-beta, and use the menu
Statistics, Power and sample size, Normal
mean. We want to find "power" (rather than sample size or minimum difference), so we have to make sure to choose the right button. We also have to make sure that we have the right information listed in the other fields, such as the alpha value, the fact that this is a one sample test (only one mean involved), and the sample standard deviation of our data. (You can find this using the summary statistics option that we've used in several other labs.)
- What is the power of the test? What is beta, the probability of type II error?
- Let's say that we want the test to be more powerful (lower beta)? If we want the test to have probability of type II error of 0.20 or lower, what is the minimum sample size we would need? Hint:Choose the "sample size" button rather than the "power" button.
- How is the power of the test affected by changes in the specific alternative hypothesis value we're looking at? How is it affected by changes in the probability of type I error (alpha) that we're willing to accept with our test?