Lab 8 (Week of 3/30/98)

Down-load this week's SAS/Insight program

Click here and this week's program will appear in your browser window. Click on "File>Save As..." in Netscape and choose "Format for Saved Document: Text" then click "OK". The program is now saved in your account (in your home directory, by default). The file's name is "lab8.sas". Return to this page by choosing "GO>Back" from the Netscape menu bar. To get started type "sas lab8 &" in one of the terminals open on your screen.

Background:

Firms that pursue discriminatory hiring practices place themselves at a competitive disadvantage. For typical businesses this economic theory is hard to verify. Baseball is not your typical business: its hiring practices and a measure of competitive advantage are readily available to the public. The early years of player integration (1947 to 1956) are of particular interest.

In a 1974 article ("Employer Costs and Discrimination: The Case of Baseball," Journal of Political Economy, vol. 82, no. 4, 1974, pp. 873-882), James Gwartney and Charles Haworth studied the economic costs incurred by baseball franchises with less racially discriminatory hiring practices in an effort to test this theory. Teams willing to hire black players had an advantage over teams slow to integrate as they had access to a larger pool of talented players. If the theory is true, teams with a higher representation of black players should have performed better, on average, than those with lower representations.

Consider the regression equation:

WON = a + b*BLACK47

where WON denotes the percentage of games won by a given team over the period 1947 to 1956 (the response variable) and where BLACK47 is the number of ``player years'' for the team's black players during this time period. The slope coefficient b measures the effect (in the language of economics: the marginal product) on a typical team's winning percentage of adding another black player to their roster for one year during this time period.

Questions

1) Produce a scatter plot of winning percentage (on the y-axis) against player years (on the x-axis). In a sentence or two, describe the relationship between the two variables.

2) Use the Analyze>Fit command to fit the linear regression of winning percentage on black player years. What is the equation of the estimated regression line? What is the estimate of error variance (hint: see the Analysis of Variance table)?

3) Use the regression line to predict winning percentages for teams with 10, 20 and 30 black player-years.

4) Is the regression line discernible from a horizontal line (one where b equals zero) at the 5% error level?; or, in regression parlance, "is the regression model significant at the 5% error level?" In simple linear regression (what we are doing here) there are two ways to answer this question. The first is to look at the p-value associated with the slope estimate (see the table "Parameter Estimates") and compare it to the error level. The second way is to look at the p-value for the regression given in the Analysis of Variance table and compare it to the error level. Here we are testing the hypothesis that Ho: b=0 against Ha: b not equal to 0. Verify that both methods yield the same p-value.

5) What can you conclude regarding Gwartney and Haworth's theory?

Return to the Stat 110B lab page.


iversen@stat.duke.edu
last updated 26 March 1998