Sta 110a Practice Midterm Solutions 1) Method: Draw out a a 4x4 grid with all the possible first draws on the vertical axis (G,O,O,B) and all the possible second draws on the horizontal. (G,O,O,B) Just count 'em up to find the answers. a) 10/16 b) 12/16 2) a) Graph 1 has means that are too high, Graph 2 has too small an SD, and Graph 4 has too good a correlation. The answer is 3. b) Slope = r*SDx/SDy = .6 intercept = 100 - 100*.6 = 40 y= .6x + 40 3a) a) GPA Freshman year vs. GPA Sophomore year = .6 b) GPA Freshman year vs. GPA Senior year = .3 c) Length vs. Weight of a two by four board = .95 3b) a) True. The outlier is below the regression line and would pull that end of the line down. b) .99. It's a tight, positive correlation. 3c) To answer this problem, you don't have to calculate anything, but you do need to know how the RMS error works. When looking at the relationship of someone's actual score at age 35 to their predicted score, you use the RMS error as the SD of the scores around the regression line. The RMS error is always less then the SD for the population, so we know that it's less than 10. That means that there is more than one SD's worth of people within 10 points of their predicted scores. From the normal distribution, we know that this means that there will be more than 68% of the population within 10 points of their predicted score. 3d) False. Run the numbers through the binomial formula and you'll find that (i) has a probability of .25, and (ii) has a probability of .08. 4a) This is a regression problem, only we have to think in standard units. In 1995 the 95th percentile is 1.65 standard units. For this guy we would predict his 1996 average to be r*Z = 0.6 * 1.65 = 1 in standard units above average. Hence his percentile will be 84. 4b) From the normal distribution, we know that 84 percent of the population will be above one SD below the mean. If 90% of the class scored above 50 and the mean was 60, we know that the SD was less than 10 points. 4c) The population of male students at Duke is large enough that we can treat this as a drawing with replacement problem. If the SD is 3 and the average is 69 inches, then 16% of the population will be above 6 feet tall. To calculate the chance of randomly picking three people above 6 feet, use (.16x.16x.16). 5) a) 1 in 256 (1 over 2 to the 8th) b) i) False. Lilipads 2 and 6 are equally extreme outcomes, so they're equally likely. ii) True. Each path is equally likely. However, each OUTCOME is not. iii) To reach lilipad 3 requires 5 jumps to the left and 3 jumps to the right. This can be calculated with the binomial coefficient, using N = 8, k = 5. The answer is false. Out of the 256 possible paths, 56 lead to pad 3. c) Eight random draws out of a box with L and R, with replacement. The pad he ends up on is the number of rights. 6a) a) False. b) False, the correlation isn't the same as the slope of the regression line c) True d) True - the correlation works both ways. 6b) Positive. Both vocabulary and height are functions of age, so older children will be both taller and have larger vocabularies. 6c) The blocks alternate because people are rounding their incomes off to the nearest $1000. Lots of people at 10,000, few at 10,500, lots at 11,000, etc.