The goal of this exercise is to walk through a logistic regression analysis. It will give you a basic idea of the analysis steps and thought-process; however, due to class time constraints, this analysis is not exhaustive.
library(tidyverse)
library(broom)
library(rms)
## add any other packages as needed
This data is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The goal is to predict whether a patient has a 10-year risk of future coronary heart disease. The dataset includes the following:
male
: 0 = Female; 1 = Maleage
: Age at exam time.education
: 1 = Some High School; 2 = High School or GED; 3 = Some College or Vocational School; 4 = CollegecurrentSmoker
: 0 = nonsmoker; 1 = smokercigsPerDay
: number of cigarettes smoked per day (estimated average)BPMeds
: 0 = Not on Blood Pressure medications; 1 = Is on Blood Pressure medicationsprevalentStroke
prevalentHyp
diabetes
: 0 = No; 1 = YestotChol
: total cholesterol (mg/dL)sysBP
: systolic blood pressure (mmHg)diaBP
: diastolic blood pressure (mmHg)BMI
: BodyMass Index calculated as: Weight (kg) / Height(meter-squared)heartRate
Beats/Min (Ventricular)glucose
: total glucose mg/dLTenYearCHD
: 0 = Patient doesnโt have 10-year risk of future coronary heart disease; 1 = Patient has 10-year risk of future coronary heart disease;fram_data <- read_csv("data/framingham.csv") %>%
drop_na() %>%
mutate(education = case_when(
education == 1 ~ "Some HS",
education == 2 ~ "HS or GED",
education == 3 ~ "Some College",
education == 4 ~ "College"
),
currentSmoker = if_else(currentSmoker == 0, "nonsmoker", "smoker"),
diabetes = if_else(diabetes == 0,"No", "Yes"),
male = factor(male)
)
Fit a full model (main effects only) with TenYearCHD
as the response. Display the model output.
Based on the goal of the analysis, should the full model be the final model? Why or why not?
step
function to conduct backward model selection. What is selection criteria used by the step
function?
male
.
Use the results from model selection and the drop-in-deviance test to select a final model. Display the model below.
Plot and analyze the binned residuals for the final model. Include all appropriate plots. What is your assessment on the model fit based on these plots?
Plot and analyze the ROC curve. Based on the ROC curve, does the model fit the data well?
A doctor plans to use the results from your model to help select patients for a new heart disease prevention program. She asks you which threshold would be best to select patients for this program. What threshold would you recommend to the doctor? Why?