STA103: Confidence intervals and t-tests using S-Plus

Lab: Confidence intervals and t-tests using S-Plus

Purpose and summary of procedures

The purposes of today's lab are

learn how to use S-Plus to conduct t-tests
increase our understanding of hypothesis testing and type II error

Description of the data

This dataset contains information about the salaries, education, and experience for a sample of 93 bank employees, who began their time at the bank in skilled, entry-level clerical positions. (The bank was sued for alleged gender discrimination in salaries, and this sort of data came under intense scrutiny.) The data was released in the report "Harris Trust and Savings Bank: An Analysis of Employee Compensation", Report 7946, Center for Mathematical Studies in Business and Economics, University of Chicago Graduate School of Business, 1979. We have the following variables in the data set (in this order):

salhire: Annual salary at the time of hiring (in U.S. dollars)
sal1977: Salary as of March 1977 (in U.S. dollars)
educatn: Educational level (in years)
expernce: Work experience before coming to work at the bank (in months)

Obtaining the data

You can obtain this data by clicking here. Use the Netscape menus (File, then Save As)to save the data to a file on your desktop (say "bankdata.txt").

Starting S-Plus and getting the data

To start S-Plus, click Start, then Programs, then Statistics & Mathematics, then S-PLUS 2000. To read the data into S-Plus, choose from the S-Plus menus File, then Import Data, then From file. Choose the file that you have just saved. You should have a spreadsheet open with data in four columns. Note: By default, S-Plus will name the dataset according to the name of the file from which the data was imported.

Using S-Plus for t-tests

1.Let's say we're interested in conducting a t-test to know whether the mean salary in 1977("sal1977") differed significantly from $10250. We'd like to test at significance level 5%. To obtain this, go to Statistics, Compare samples, One sample, and finally t Test. (S-Plus can still use a t-test for sample sizes over 30 since it can give better numerical approximations of the quantiles of t distribution than we have available in the book's t table.)

How can you change the significance level of your t test? For instance, can you perform a test with alpha=0.10 or alpha=0.01?
How can you change the value for the mean salary in 1977 hypothesized in the null hypothesis?
For what values of mu in the null hypothesis would the null hypothesis be rejected, with alpha=0.10? Hint:Generate a (1-alpha)%=90% confidence interval for mu.

2.Let's say that we want to know beta, the probability of type II error, for the test above (with alpha=0.10), if the mean salary is really $10500. We use the fact that power=1-beta, and use the menu Statistics, Power and sample size, Normal mean. We want to find "power" (rather than sample size or minimum difference), so we have to make sure to choose the right button. We also have to make sure that we have the right information listed in the other fields, such as the alpha value, the fact that this is a one sample test (only one mean involved), and the sample standard deviation of our data. (You can find this using the summary statistics option that we've used in several other labs.)

What is the power of the test? What is beta, the probability of type II error?
Let's say that we want the test to be more powerful (lower beta)? If we want the test to have probability of type II error of 0.20 or lower, what is the minimum sample size we would need? Hint:Choose the "sample size" button rather than the "power" button.
How is the power of the test affected by changes in the specific alternative hypothesis value we're looking at? How is it affected by changes in the probability of type I error (alpha) that we're willing to accept with our test?