The Quarto template for this assignment may be found in the
repository at the following link: https://classroom.github.com/a/eN5J9VbD
Sleep apnea is a common sleep disorder characterized by repeated
episodes of shallow or stoppage of breathing during sleep. Because of
the disruption to sleeping, in severe cases, sleep apnea may lead to a
reduction in blood oxygen saturation, which can be potentially serious.
A quantity that physicians care about is whether oxygen saturation is
below 90%. The “sleeptime” variable specifically measures the percentage
of time spent while sleeping when this was the case.
Turan et al. (Anesthesiology, 2015) conducted a study
examining physiological characteristics among patients who underwent
bariatric (weight-loss) surgery who results available from a
comprehensive sleep study at a major medical center.
The variables are as follows:
age: age in years
female: whether the patient was female
race: the self-reported race of the patient
bmi: BMI (yes, these are accurate)
sleeptime: the percentage of time spent under 90%
arterial oxygen saturation
mino2: the minimum recorded arterial oxygen saturation
(in %)
ahi: apnea/hypopnea index, which counts the number of
apnea episodes during the night
smoking: whether the patient smokes
diabetes: whether the patient has diabetes
ht: whether the patient has hypertension
cad: whether the patient has coronary artery
disease
cpap: whether the patient regularly uses a CPAP
(continuous positive airway pressure) machine
The overall goal is to predict sleeptime (the percentage of
time spent under 90% arterial oxygen saturation) using the other
variables in the dataset.
- Consider all predictors available in the dataset. Use all-subset
selection utilizing \(C_p\) as the
selection criterion to arrive at a “best” model. What variables ended up
being included in your model? Do not consider any transformations or
interaction terms in this analysis.
- Consider all predictors available in the dataset. Use stepwise
selection utilizing \(AIC\) as the
selection criterion to arrive at a “best” model (include results for
forward selection, backward elimination, and stepwise in both
directions). What variables ended up being included in your model? Do
not consider any transformations or interaction terms in this
analysis.
- Consider all predictors available in the dataset. Use LASSO to
arrive at a “best” model, utilizing the lambda that gives the lowest
mean MSE from a 10-fold cross-validation procedure aimed at evaluating
lambda. What variables ended up being included in your model? Do not
consider any transformations or interaction terms in this analysis.
- Choose a categorical predictor and a continuous predictor that was
present in all models above (hint: there should be at least one
of each). Interpret the coefficient estimates for both of them as
provided by your “final” model as determined by backward
elimination and as provided by LASSO. Provide
two reasons why the estimates from the two models are
different.
- Consider the model that uses all predictors (no transformations or
interactions). What is the average root mean square error in
test sets from a 10-fold cross-validation procedure? Compare this to the
same quantity from the model chosen by backward
elimination (using AIC). Which is higher? In the beginning of
the R chunk used to evaluate this question, set a random seed of 919 as
follows:
set.seed(919).
- Suppose a collaborator was interested in the relationship between
BMI and expected reading percentage of time spent under 90% arterial
oxygen saturation and wanted to adjust for potential confounders. After
reading Lin, Davidson, and Ancoli-Israel (2008), which notes that
“number of population based studies have shown that obstructive
sleep apnea is more common in men than in women and this discrepancy is
often evident in the clinical setting” and explores
“pathophysiological differences to suggest why men are more prone to
[sleep apnea] than women,” they particularly want to explore such
relationships adjusting for sex. Explain what implications using any of
your models from Exercises 1 - 3 might have scientifically in light of
this proposed analysis.