Fit and interpret logistic regression models using a tidy framework
Compute log odds and predicted probabilities
This is an individual lab. Go to the course GitHub organization and locate your lab_09 repository, which should be named lab_09-<github name>
. Copy the URL of the repository and clone the remote repo in RStudio.
You may work with others in your lab team, but you must submit your own work.
The 2018 election saw a near-record number of retirements from Congress. In this lab, you will be working with data from that election.
The variables in this dataset are:
stcd
: an ID number for the congressional district. Districts are listed in alphabetic order (e.g., 101 is Alabama’s first district).
dpres
: the two-party percentage of vote received by Hillary Clinton in the district in 2016. (Third party vote has been eliminated here, this represents Democratic vote/ (Democratic + Republican Vote).)
retiring
: a dummy variable indicating whether the representative retired in 2018. The includes both representatives who left politics altogether and those who sought higher office, as well as some who resigned early.
gopseat
: a dummy variable indicating whether the seat was held by a Republican.
age
: the age of the representative at the end of 2018.
Fit a logistic regression model with retirement as the response variable and the percentage of vote received by presidential candidate for the party that holds that seat in Congress, the party of the representative currently holding the seat, and the House incumbent’s age as the explanatory variables.
Before fitting your model, be sure to do some EDA on the side and make sure there are no NA
values present.
What do each of these coefficients mean? Discuss what each means in terms of odds ratios.
Create a predicted probability plot showing the effect of presidential vote for both Democrats and Republicans. Set age at its mean value when creating this plot. Choose a reasonable range of values for other variables. Comment on what you observe.
The current representative for North Carolina’s 4th district is Democrat David Price, who was a Duke Public Policy and Political Science Professor before being elected to Congress. In 2018, Price was 78 years old and Clinton received 70.75% of the vote in his district two years earlier. According to the model, what was the predicted probability that Price would retire in 2018?
Based on your model, did any member of Congress have over a 50% chance of retiring in 2018? Which district’s representative was given the highest probability of a retirement and what was that probability? Did they actually retire?
The variables for this dataset are a subset of data compiled by Jacob Smith for research for the book Minority Party Misery. Variables come from a combination of public sources, Daily Kos Elections, Vote View, and the Jacobson-Carson congressional elections dataset.