Practice Problems

Practice Problems for Prerequisite topics:
One-way ANOVA
Simple Linear Regression, Chapter 7
Assumptions of simple linear regression, anova tables for regression, lack of fit test, logarithmic transformations, examination of residuals, Chapter 8
Multiple linear regression and inference, the use of specially designed explanatory variables, Chapters 9 and 10.
Chapter 12 Practice Problems
Chapter 18 Practice Problems
Chapter 20 Practice Problems

Practice Problems for Prerequisite topics:

All below are from Moore and McCabe, 3rd Edition. TAs have solutions to even-numbered problems.

Normal distribution: You should read section 1.3 and do review exercises on pages 87-90 in the Moore and McCabe text. Suggested: 1.75, 1.77, 1.85, 1.87, 1.89, 1.93.
Understanding p-values is crucial for introductory statistics. You should be able to interpret a p-value (as in exercises 6.33) and calculate a p-value (exc. 6.35, 6.37). You should know the difference between one-sided and 2-sided p-values as well (exc. 6.40 vs. 6.41). Also exc. 6.43.
Confidence intervals: Exc 6.6, 6.13, 6.19, 6.21, 6.23
It is essential that you understand the sampling distribution of the sample mean and the Central Limit Theorem. Review section 5.2 of Moore and McCabe and do the following exercises: 5.29, 5.33, 5.35, 5.37. Also section 6.1: Exc. 6.1, 6.3, 6.7, 6.9, 6.11.
You should review one-sided tests in Section 6.2 of Moore and McCabe, particularly pages 455-456. Exercises 6.27 and 6.29 will be of help. You should also know how to do 6.35 and 6.37.
You should read section 7.1 of Moore and McCabe to make sure you understand the form of the standard error and how it differs from the standard deviation.
Review the components of a confidence interval -- the standard error, the choice of z or t quantiles and the degrees of freedom if you choose a t. You should know when to use a t versus a z distribution. Review pages 503-517 of Moore and McCabe. Suggested Exercises: 7.1, 7.7, 7.9, 7.11, 7.15.
The natural logarithm of any number k (in this case, k = x + y) is the exponent which raises the constant e to the number k. It is critical that you know how to do basic calculations with natural logs. Please review this topic in your calculus book because you will see it numerous times in your NSEES career.
Summary statistics: You should read section 1.2 and 1.3 of Moore and McCabe. Suggested review exercises: 1.95, 1.75, 1.77, 1.85, 1.87, 1.89, 1.93.
Correlation coefficient: When the correlation coefficient is negative, this means that as variable A increases, variable B decreases; a value close to -1 indicates a strong inverse relationship. A value close to zero indicates a weak relationship. You should read section 2.2. Suggested exercises: 2.22, 2.25, 2.32.
Analysis of Variance:
- You should read chapter 12 through p.775. Suggested exercises: 12.9, 12.21, 12.23, 12.25.

Review: Quiz on Preq. Topics from Fall 2002

Answers: 1(a) 3 +/- (1/sqrt(100))*z-star, where z-star is the quantile that cuts off a probability equal to 0.95 (to the left on the table). Look up in table. (b) False. This is the correct definition of a CI, but the interval in (a) does not guarantee a Type I error rate of 5%. (c) False (d) Ho: mu>3.2 Ha: mu<3.2 (e) P(Z>2) is approx 2.5%. You should not need a table for this. (f) False. 2. True 3. False 4. True

Review: A Practice Final Exam

This exam (in MSWord) was given in PPS222/STA101, which is the prerequisite course at Duke for NSEES masters-level statistics courses. It is a closed book test. You'll need a normal table from the back of any introductory statistics book, as well as a calculator.
Try as best you can to simulate exam conditions so that you can get the most information on your level of preparation and necessary additional review.
Solutions

One-way ANOVA

In Moore and McCabe, read chapter 12 through p.775. Suggested exercises: 12.9, 12.21, 12.23, 12.25.
Read Ch 5.2, 5.3, 5.5, 5.6.1 of Sleuth. Do all Ch. 5 conceptual exercises. Computational exercise # 20. Make sure you understand the difference between ANOVA and simple linear regression models.

Chapter 7 Simple Linear Regression

Sleuth, Chapter 7, all conceptual exercises.
Computational exercises: #12,13.
Ch 7 Sleuth Exercises #19-22 concern meat processing data. Columns are "time" and "ph". Note that in order to do this problem you need to LOG TRANSFORM the "time" variable. To do this take a look at this Splus help topic. Confirm #21 by hand. Answers here.
Sample problem from past final exam
Spring 2001 midterm (skip problem 9,10 for now). Solutions
For additional review, read Chapter 10 of Moore and McCabe. Suggested exercises:
1. 10.7 (use Splus),
2. 10.6 (use Splus). Follow-on questions: 10.11, 10.12, 10.13
3. 10.14, 10.20, 10.21, 10.23 (use calculations on pages 686+).
TAs will post solutions to even-numbered exercises to the newsgroup if requested.
2003 Quiz on Ch. 7 material

Chapter 8 practice problems

All conceptual exercises.
You can now do all problems in the Spring 2001 midterm. Solutions
8.22, Ecosystem Decay data. Assume that you have done the model exploration for this cases and found that the model log(species)~log(area) is the model you have chosen.
1. Provide the fitted regression line.
2. Give a one sentence interpretation of the slope on the original scale of measurement.
3. Give a CI for this slope.
4. Give an estimate and CI for the median number of species as a function of area when area=1.
5. Although the residuals may not indicate significant lack of fit, you decide to perform a lack-of-fit test to test the claim that the simple linear fit of log(species) on log(area) is inadequate.

Answers to selected Ch7 problems

7.19: (a) Intercept est: 6.9836 with SE: 0.0485. Slope est: -0.7257 with SE=0.0344. estimate of residual standard error (sigma)=0.0823

(b) Est.=5.8157, SE(Est.)=0.0297 Get the ingredients for this by using the summary statistics command in Splus as well as your regression output.

7.20:SE{PRED}=.0875, CI for mean pH: 5.6139, 6.0175

7.21: If zero were not a lower limit on time, this would be impossible. However, the predicted time should be between 0 and 1.3 (from Display 7.4)

7.22: About 109

Solutions to Ch 8 Ecosystem Decay problem:

Fitted regression line: Estimated mean of log(species) = 3.60 + 0.18 log(area). On the log scale, a one unit increase in log(area) is associated with a 0.18 unit (additive) increase in the estimated mean of log(species).
A 10-fold increase in the area is associated with an estimated [10^(0.18=1.51] 51% increase in the median number of species. (or a 1.51-fold increase).
CI for beta1 on log scale is 0.18 +/- qt(.975,16-2)*(0.05). Let's say this interval is (e1,e2). To find the CI for the increase, you need to take the endpoints (e1,e2) and calculate: (10^(e1),10^(e2)) to find the interval.
The F-statistic for lack of fit is compared to an F on 2, 12 df. The calculated F-statistic is 0.1053, with p-value .9009. We do not have convincing evidence to reject the hypothesis that the linear regression model is adequate.

Chapter 9:

All conceptual exercises.
Multiple regression with continuous X variables: Pace of life and heart disease. These data are described on page 260 of Sleuth, problem 14.
Data in EX0914.ASC. Variables: "bank", "walk", "talk", "heart".
- The model statement you will enter in Splus is: heart~bank+walk+talk
- Follow the exercises in the book in addition to the supplement below.
- Make a scatterplot matrix for part (a). (Splus won't let you put heart on the vertical axis, I don't think.) Put all plots on a separate page.
- Additional part (e): Give one sentence interpretations for each regression parameter, using careful language. Holding "bank clerk speed (bank)" and "postal clerk talking speed (talk)" constant, what is the effect of a one-unit (what units are they?) increase in pedestrian walking speed on mean death rate due to heart disease?
Moore and McCabe Exercises 11.1, 11.2, 11.9

Chapter 10:

If you haven't already, work through all research questions/results for the bat data in Case Study 2 of Chapter 10. Computational exercise #13 is good practice. In 13b, report the confidence intervals for slopes of each of the 3 species and write 1-sentence interpretations of each (use careful language). Under the parallel regression lines model, for *each* of the three species, how would you use the computer centering trick to calculate a prediction interval for the median energy expenditure for a future observation of median body mass=200g? How would you use the computer centering trick to calculate a confidence interval for the median energy expenditure for a median body mass of 200g? How do these intervals differ? How do they change as median body mass is increased to 400g?
All conceptual exercises
Computational exercises: #9, 10, 11
Sample quiz (long) from 2001 course
Added 3/20: Use the analyses for the pollen data handed out in class for this problem. Assuming the parallel lines model is true, is there evidence that, after accounting for the amount of time on the flower, queens tend to remove a smaller proportion of pollen than workers? Perform a hypothesis test, giving test statistic and p-value. Give a confidence interval for the difference in the logit of the proportion of pollen removed.

Chapter 12

#10,11,12, Sleuth.

Moore & McCabe Ch. 11, page 731. # 11.5, 11.6, 11.7, 11.13, 11.14, 11.15, 11.17-11.23.

The law of total probability and Bayes theorem:

The percentages of voters classified as liberals in 3 election districts are as follows. In District 1, 21%; District 2, 45%; District 3, 75%. If a district is selected at random and a voter is selected at random from that district, what is the probability that she will be a liberal? (ans: 0.47)
In a certain city, we have 30% conservative, 50% liberal, 20% independent. In a particular election, 65% of conservatives voted, 82% of liberals voted, and 50% of independents voted. If a person is selected at random and it is learned that she did not vote in the last election, what is the probability that she is a liberal? (ans: 18/59)

Chapter 18

#9 (a)(i), (c)(i)(ii); #11, #12, Sleuth.

Chapter 20

#9, Sleuth.

Exercises thru 15.17 of Moore & McCabe's chapter on logistic regression.