HOMEWORK 3 6.1) False: each measurement is thrown off by chance error, and this changes from measurement to measurement. (It is a good idea to replicate measurements, so as to judge the size of the chance error). 6.3) a. False: chance errors are sometimes positive and sometimes negative, but bias pushes one way. b. False, same reason. c. True 6.4) 0.03 inches or so 6.5) a. No. Person 2 and 10 copied from each other: they got exactly the same answers, with the decimal point in the wrong place. The other students seem to have worked independently. b. First, nobody got the same answer both times. Second, there is a lot of person-to-person variation. S6.1) False, see pg. 89 S6.3) We can get the SD by looking at the first two scores, in original units and in standard units: 79-64=15 points, while 1.8-0.8=1.0. So, 1.0 SD = 15 points, and the SD is 15 points. Next, we get the average from the fact that 64 is 0.6 in standard units: 0.8 SD = 12 points, so the average must be 64-12=52. Finally, we complete the table: 52 72 31 0 1.33 -1.4 S6.4) a. Approximately equql to the area under the curve between -1.5 and 1.5, or 87% b. 560. The range of 400-600 corresponds to +/- 1 in standard units. About 68% of the students at the university hads scores in this range. There must have been about 1000/6.8 = (approx) 1470 students at the universitiy. The range 450-550 corresonds to +/- 0.5 in standard units; about 38% of the students at the university had scores in this range: 38% of 1470 = 0.38 * 1470 = (approx) 560. Moral: the normal curve is not a rectangle. S6.8) You would be too low, because the curve is lower than the histogram in that range. S6.9) Average = 10 - 6.4 = 3.6 and SD = 2.0. Reason: (# wrong) = 10 - (# right) S6.12) The probable explanation is digit preference; people round their incomes to the nearest $1,000 or $10,000. The class interval $10,000-$12,500 includes the left endpoint (a beautiful round number) as well as $11,000 and $12,000. The next class interval, $12,500-$15,000 includes $13,000 and $14,000 -- but not $15,000, because class intervals include left endpoints but not right endpoints. So this second interval has only two round numbers, rather than three; and neither is as round as $10,000. The other pairs of intervals are similar to these two. S6.13) a. Heart disease rates go up with age; the drivers could be older, accounting for the difference in rates. That explanation has been eliminated. b. You want the difference in rates to be due to exercise on the job, rather than pre-existing factors; it might take some time for exercise to have its beneficial effect. c. Drivers and conductors are going to be similar with respect to age, education, income, and so forth. d. This is an observational study, so there may be some confounding: drivers may be more at risk than conductors to start with. Fo example, drivers may be heavier. e. You could look to see if the drivers and conductors had similar body sizes when they were hired; that would suggest the two groups were comparable at time of hire, and strengthen the argument that exercise matters. If drivers were heavier than conductors at time of hire, then the argument for exercise is weaker. S6.14) Moving high-risk women from the control group to the treatment group lowers the death rate in the control group and increases the death rate in the treatment group. That biases the study against screening. The assignment does increase the number of lives saved: that number is estimated by comparing the death rates in the treatment and control groups, and the bias runs against treatment. 8.1) The answer is (d). With (a), the averages are too low. With (b), the SD's are too small. With (c), the SD's are too big and the correlation is too high. 8.2) a. Negative: older cars get fewer miles per gallon. b. Richer people own newer cars, and maintain them better. (Some rich people own Ferrari gas guzzlers, but not many; and 10-year-old Chevrolets in poor repair might guzzle even more.) 8.3) The correlation would be 1.00: all the points on the scatter diagram (for height of wife vs. height of husband) would lie on a straight line which slopes up. The slope of the line is 0.92, but correlation and slope are two different things. 8.4) 0.3. Taller men do marry taller women, on average; but there is lots of variation around the line. 8.8) a. 0.42 inches b. 2.5 inches c. 0.80 d. solid 8.10) Smaller when the score on form L is 75. If you take narrow vertical chimneys over 75 and 125 in the scatter diagram, there is less vertical spread - therefore, less uncertainty in the predictions - in the chimney over 75. Chapter 9. 9.2. a) False. With negative correlations, below-average values of one variable are associated with above-average values of the other. b) False. That y is less than x does not tell you anything about the correlation between the two variables. For instance, age is almost always less than income, but the correlation between the two is positive. Likewise, in married couples, the wife's height is usually less than the husband's height, but the correlation is also positive. 9.3. a) The correlation between height at age 16 and height at age 18 is higher; people do not change much in height between these ages, so someone who is tall at age 16 will likely be tall at 18. b) Height is more correlated. Other factors (such as diet) affect weight more than height, so someone's height at 18 may be predicted by their height at 4, but their weight at 18 is much less predictable. c) Height and weight at age 4 are more correlated. There are a wider range of body types present at age 18 than at age 4. Confounding variables such as diet and exercise are more likely to affect 18 yr olds (more years to eat well/poorly, etc.) than 4 yr olds. 9.4 The correlation would be higher (cf. the computer assignment, and exercise B, p145). 9.7 No (see pg. 148-149). Grouping the data by county eliminates much of the spread in the data. 9.8 False. The negative correlation is due to social factors that have changed over the course of older women's lifetimes. For instance, a much smaller proportion of women went to college 40 years ago than attend today. These data are _cross-sectional_ not _longitudinal_. 9.10 a) True. State averages in SAT scores are generated by taking the mean of every student who takes the test. In some states, only the best students take the SAT and the average will be high. In others, a much broader group of students take SATs, so the mean will be lower in those states. b) False. These data do not address whether another (confounding) variable, such as % of students who take the test, could explain the results. If a much higher % of students are taking the test in NY, then perhaps the NY schools are doing a similar job as WY schools. [Note: this does not mean that the conclusion is wrong, but just that _these_ data are not sufficient to justify it] 9.11 Much less than .97. (again, pg. 148-149) By taking averages, individuals with high scores on one test and low scores on the other are canceled out (reducing the spread). 9.12 a) The points are on stripes because educational level is a discrete (as opposed to continuous) measure. You have 1,2,3, etc., years of education, not 1.2, 2.5, or 3.14. b) At some points, there may be more than one couple (e.g., at 12 yrs for the husband and 12 yrs for the wife). c) Area A: iv. Area B: iii. Area C: i.