Note: As a reminder, \(\mu\) and variance \(\sigma^2\) as the arguments for the parameters in the normal distribution. For instance, a random variable X \(\sim\) N(2, 25) would have mean \(\mu\) = 2 and variance \(\sigma^2\) = 25 (such that \(\sigma\), the standard deviation, is 5).
The HW template may be found at
download.file("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/hw-04-template.Rmd", destfile="hw-04.Rmd")
We can calculate probabilities and quantiles of the t distribution in a similar fashion to the same calculations for the normal distribution: - Use the pt function to calculate left-tail probabilities (i.e., the area under the curve, to the left of the point you choose) - Use the qt function to determine the value corresponding to a specific quantile of the t distribution Note: The t distribution only has one parameter, the degrees of freedom.
Example: Suppose we are interested in the left-tailed probability P(T < 1.3), where T is a random variable that follows a t distribution with 38 degrees of freedom. The syntax for the pt() function is pt(value, df).
pt(1.3, 38)
## [1] 0.8992843
Example: Suppose we are interested in the 90th percentile of a t distribution with 64 degrees of freedom. The syntax for the qt() function is qt(value, df).
qt(0.9, 64)
## [1] 1.29492
Suppose it is known from NHANES (a representative population-level survey) that systolic blood pressure (SBP) among US adults follows a normal distribution with \(mu\) = 122 mmHg and \(\sigma\) = 23 mmHg (Wright et al., 2011). The CDC considers systolic blood pressure above 130 to be hypertensive, which may lead to serious cardiovascular morbidities and complications.
Researchers calculated a 90% confidence interval for the mean SBP among a random sample of ten adults in North Carolina and found it to be (114.836, 125.164). In calculating this interval, the researchers assumed that the underlying distribution was normally distributed, but did NOT assume that they knew the population standard deviation of SBP.
Excess fat in the liver (steatosis) is a chronic condition which may lead to complications such as cirrhosis or liver failure. It is common among morbidly obese individuals, and may lead to liver damage and/or poor liver function. A common test for liver damage is of the enzyme AST, which is normally found in the blood at low levels. Impaired liver function may lead to elevated levels of AST.
The file steatosis.csv contains data from 135 morbidly obese individuals and contains information on their sex (Male vs. Female), their AST levels (units per liter), and steatosis status (1 = yes, 0 = no). These individuals are representative of morbidly obese individuals in the state of North Carolina. You may read the file into R by using the code
steatosis <- read.csv("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/data/steatosis.csv")
Hint: This can be done in one pipeline using dplyr functions. First, filter for the observations of interest, then summarize the sample mean mean() and sample standard deviation sd(), and lastly create new variables corresponding to the lower and upper limits of the confidence interval according to the appropriate confidence multiplier and estimated standard error.
high_ast which takes on values of 1 if the patient’s AST is 40 or above, and 0 if the patient’s AST is lower than 40. Construct and interpret a 95% confidence interval for the proportion of steatosis patients who have high AST. In this dataset, 1/38 (or approximately 2.6%) of patients without steatosis have high AST. How does your confidence interval for patients with steatosis compare?Additional instruction: For Exercise 10, use a normal distribution to calculate the confidence interval in your confidence multiplier.