Homework 05: Logistic regression

Due: Wednesday, April 21 11:59pm ET

Goals

General guidelines

Getting started

Packages

library(tidyverse)

Exercises

In this assignment you will be working with a dataset containing information on individuals from the Donner party. The Donner party was a group of pioneers traveling to California from Missouri on the Oregon trail by wagon train. They were trapped in the Sierra Nevada mountains by extremely heavy snowfall during the winter of 1846-1847 and eventually ran out of food supplies. Of the 90 members of the party, only 48 survived. We will use logistic regression to model the probability of survival based on age and sex. Relevant data is contained in donner.csv.

  1. What is the relationship between sex and survival? Effectively visualize the relationship and summarize what you observe in a brief sentence.

  2. What is the relationship between age and survival? Effectively visualize the relationship and summarize what you observe in a brief sentence.

  3. Fit a logistic regression model to predict survival based on sex and age. You do not need to include an interaction. Report the model output in tidy format.

  4. Write out the logistic regression model.

  5. Provide an interpretation of \(e^{\hat{\beta}_0}\) in the context of the problem.

  6. Provide an interpretation of \(e^{\hat{\beta}_\text{age}}\) in the context of the problem.

  7. Provide an interpretation of \(e^{\hat{\beta}_\text{sex}}\) in the context of the problem.

  8. What is the predicted probability of survival for a 60 year old man? For a 20 year old man? For a female newborn?

  9. Create a predicted probability plot showing the effect of age and sex on survival. Comment on what you observe.

  10. How young or old must a female member of the Donner party be in order to have a predicted probability of survival greater than 0.75 based on your logistic regression model? Use algebra (not code) to answer.

  11. What are some limitations of your model given the data? Answer in a brief paragraph.

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Overall” section with the first page.