Variable | Type | Definition |
---|---|---|
EDUCATN | ordinal | number of years of education |
SOUTH | nominal | indicator for Southern region |
GENDER | nominal | indicator variable for gender |
EXPERNCE | ordinal | number of years of work experience |
UNION | nominal | indicator variable for union membership |
WAGES | continuous | wage in dollars per hour |
AGE | ordinal | age in years, |
RACE | nominal | race category |
OCCUPATN | nominal | occupational category |
SECTOR | nominal | sector of economy |
MARRIED | nominal | indicator variable for married |
Your report should be about 5 typed pages in length (excluding figures and appendix material) and include:
Presume your reader is comfortable with basic statistical methods, but is not an expert in them, and assume that your reader is not familiar with the data. When you use a statistical method like regression or ANOVA explain why and carefully interpret your results. Present only important summaries, plots and their interpretations; don't burden your reader with unnecessary facts and analysis.
Your project report is due in class on December 9th. Late reports will not be accepted.
To get started, open SAS by clicking "Start>Programs>Statistics and Mathematics>SAS System v6.11". Once the SAS environment appears, click on "File>Open". In the "Open" window, type "D:\project.sas" in the "File name:" field and then click "Open". Next, find the button on the menu bar with a picture of a running person. Click on this button and the data set will appear. Some information on printing, especially output containing text, can be found here.
2) Explore relationships between wages and other variables in the data set. Look at scatter plots of wages by continuous covariates, and box plots or histograms of wages grouped by categorical covariates. What do you see? Do any of the covariates show promise as predictors of hourly wages?
3) Use linear regression to explore the relationship between hourly wages and factors that may influence wages. While data for many of the variables that determine wages are not available to us in this data set, we can use the variables we have to correlate type of job (occupation and sector), qualifications (education and experience), personal characteristics (age,gender,married, and race), union membership (union) and region of country (south) with hourly wages.
Start by fitting a linear regression model for wages including (a) promising covariates identified in "Question 2" and (b) those that you have strong prior beliefs should be included in the model as predictor variables; do not include redundant variables. Plot residuals against fitted values for this regression; verify this plot looks like the right panel of figure 14-10 on page 463 of the book. Create a new response variable logwages = log(wages). Repeat the previous regression using logwages as the response variable. Look at plots of residuals against all predictor variables to verify that each residual relationship is flat (for example, is the relationships between age and wages linear?; look for patterns in the residual plots of continuous and ordinal variables like that pictured in Figure 14-7(b) on page 462 of the book). Enter appropriate quadratic terms into the model, if necessary. Work with your model until its residual plots look OK.
Summarize and interpret your regression results: Do you find evidence of a gender gap in wages? On average, controlling for other factors, how much higher/lower do you predict wages to be for a 40 year old worker than a 50 year old worker?, a worker in the South vs. a non Southern worker?, a female worker vs. a male worker?, etc... Remember that your response variable is the log of wages.
Return to the Stat 110B home page.