Lab 09: Logistic Regression

Due: Thu, Apr 15 at 11:59pm ET

Goals

Getting started

This is an individual lab. Go to the course GitHub organization and locate your lab_09 repository, which should be named lab_09-<github name>. Copy the URL of the repository and clone the remote repo in RStudio.

You may work with others in your lab team, but you must submit your own work.

Packages

library(tidyverse)
library(broom)

Data

The 2018 election saw a near-record number of retirements from Congress. In this lab, you will be working with data from that election.

The variables in this dataset are:

Exercises

  1. Fit a logistic regression model with retirement as the response variable and the percentage of vote received by presidential candidate for the party that holds that seat in Congress, the party of the representative currently holding the seat, and the House incumbent’s age as the explanatory variables.

    Before fitting your model, be sure to do some EDA on the side and make sure there are no NA values present.

  2. What do each of these coefficients mean? Discuss what each means in terms of odds ratios.

  3. Create a predicted probability plot showing the effect of presidential vote for both Democrats and Republicans. Set age at its mean value when creating this plot. Choose a reasonable range of values for other variables. Comment on what you observe.

  4. The current representative for North Carolina’s 4th district is Democrat David Price, who was a Duke Public Policy and Political Science Professor before being elected to Congress. In 2018, Price was 78 years old and Clinton received 70.75% of the vote in his district two years earlier. According to the model, what was the predicted probability that Price would retire in 2018?

  5. Based on your model, did any member of Congress have over a 50% chance of retiring in 2018? Which district’s representative was given the highest probability of a retirement and what was that probability? Did they actually retire?

Sources

The variables for this dataset are a subset of data compiled by Jacob Smith for research for the book Minority Party Misery. Variables come from a combination of public sources, Daily Kos Elections, Vote View, and the Jacobson-Carson congressional elections dataset.