Note: As a reminder, \(\mu\) and variance \(\sigma^2\) as the arguments for the parameters in the normal distribution. For instance, a random variable X \(\sim\) N(2, 25) would have mean \(\mu\) = 2 and variance \(\sigma^2\) = 25 (such that \(\sigma\), the standard deviation, is 5).

The HW template may be found at

download.file("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/hw-04-template.Rmd", destfile="hw-04.Rmd")

The t distribution

We can calculate probabilities and quantiles of the t distribution in a similar fashion to the same calculations for the normal distribution: - Use the pt function to calculate left-tail probabilities (i.e., the area under the curve, to the left of the point you choose) - Use the qt function to determine the value corresponding to a specific quantile of the t distribution Note: The t distribution only has one parameter, the degrees of freedom.

Example: Suppose we are interested in the left-tailed probability P(T < 1.3), where T is a random variable that follows a t distribution with 38 degrees of freedom. The syntax for the pt() function is pt(value, df).

pt(1.3, 38)
## [1] 0.8992843

Example: Suppose we are interested in the 90th percentile of a t distribution with 64 degrees of freedom. The syntax for the qt() function is qt(value, df).

qt(0.9, 64)
## [1] 1.29492

Suppose it is known from NHANES (a representative population-level survey) that systolic blood pressure (SBP) among US adults follows a normal distribution with \(mu\) = 122 mmHg and \(\sigma\) = 23 mmHg (Wright et al., 2011). The CDC considers systolic blood pressure above 130 to be hypertensive, which may lead to serious cardiovascular morbidities and complications.

  1. 8 points Calculate the following probabilities.
  1. 6 points Exercise 1 B-D ask about the probability that average values from some sample are 8 mmHg higher than the population values. Are these probabilities the same or different? Explain why this is the case.
  2. 12 points Suppose we took a random sample of 10 Duke students and found that the sample mean SBP was 112 mmHg. You may suppose that the distribution of SBP among Duke students is normally distributed and that \(\sigma\) is known to be 14 mmHg. Construct and interpret a 99% two-sided confidence interval for the true mean SBP among Duke students.
  3. 12 points When calculating the confidence interval for a population mean under the following conditions, does the margin of error get wider, narrower, stay the same width, or can we not know for sure?
  1. 10 points Suppose we calculate a 95% confidence interval for the mean SBP among a sample by using a sample size of n = 40. Suppose we also calculate a confidence interval for the mean SBP using a sample of size n = 10. For what confidence levels would the width of the interval using n = 10 be narrower than the one using n = 40?

Researchers calculated a 90% confidence interval for the mean SBP among a random sample of ten adults in North Carolina and found it to be (114.836, 125.164). In calculating this interval, the researchers assumed that the underlying distribution was normally distributed, but did NOT assume that they knew the population standard deviation of SBP.

  1. 5 points Interpret this confidence interval.
  2. 5 points What was the sample mean and sample standard deviation in their sample?
  3. 12 points Calculate and interpret a one-sided 95 confidence interval using their data that bounds the mean from below. That is, your confidence interval should take on the form ([lower limit], \(\infty\)).

Excess fat in the liver (steatosis) is a chronic condition which may lead to complications such as cirrhosis or liver failure. It is common among morbidly obese individuals, and may lead to liver damage and/or poor liver function. A common test for liver damage is of the enzyme AST, which is normally found in the blood at low levels. Impaired liver function may lead to elevated levels of AST.

The file steatosis.csv contains data from 135 morbidly obese individuals and contains information on their sex (Male vs. Female), their AST levels (units per liter), and steatosis status (1 = yes, 0 = no). These individuals are representative of morbidly obese individuals in the state of North Carolina. You may read the file into R by using the code

steatosis <- read.csv("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/data/steatosis.csv")
  1. 14 points Construct and interpret a 95% confidence interval for the mean AST level among patients with steatosis.

Hint: This can be done in one pipeline using dplyr functions. First, filter for the observations of interest, then summarize the sample mean mean() and sample standard deviation sd(), and lastly create new variables corresponding to the lower and upper limits of the confidence interval according to the appropriate confidence multiplier and estimated standard error.

  1. 16 points Create a binary variable high_ast which takes on values of 1 if the patient’s AST is 40 or above, and 0 if the patient’s AST is lower than 40. Construct and interpret a 95% confidence interval for the proportion of steatosis patients who have high AST. In this dataset, 1/38 (or approximately 2.6%) of patients without steatosis have high AST. How does your confidence interval for patients with steatosis compare?

Additional instruction: For Exercise 10, use a normal distribution to calculate the confidence interval in your confidence multiplier.