Lab 08: Linear Regression

Due: Thu, Apr 1 at 11:59pm ET

Goals

Getting started

Every team member should go to the course GitHub organization and locate their Lab 08 repository, which should be named lab_08-<team name>. Copy the URL of the repository and clone the remote repo in RStudio.

As you work on this lab, merge conflicts may arise. Refer back to Lab 05 for how to fix them. You and your team are free to divide up the work how you think is best. However, everyone should understand all code in the lab’s final submission.

Packages

library(tidyverse)
library(broom)

Data

Here, you will be working with demographic data from counties in the Midwest from the midwest dataset in R. You can learn more about this dataset by typing ?midwest into the console.

Exercises

  1. Do Midwestern cities with a higher percentage of people with a college degree have a lower poverty rate? Using ggplot, make a scatterplot with percentage of people with a college degree as the explanatory variable and the percentage of the total population below the poverty line as the response variable. Make sure to label your axes and give the plot a title. Please discuss what your scatterplot shows and if a linear assumption is appropriate.

  2. Fit a linear model with percentage with a college degree as the explanatory variable and poverty rate as the response variable. Please write out the model and interpret both the intercept and slope coefficient for percentage with a college degree.

  3. Assess the model fit by obtaining the \(R^2\). What does this value mean? Is this a high or low value?

  4. Construct and interpret a 95% confidence interval around the coefficient for the percentage with a college degree variable in your above model.

  5. In Summit County, Ohio, 24.7% of the population has a college degree. What does the model predict the poverty rate will be here? What is the actual poverty rate here? What is the difference between these two value called and what is its value?

  6. Does the state a county is located in matter in terms of predicting the poverty rate? Fit a model with the poverty rate as the response variable. Carefully consider how you will include state in this model. Interpret your results, discuss statistical significance at the \(\alpha = 0.05\) significance level, and assess the model’s fit.

Submission

Upload your team’s PDF to Gradescope. Include every team member’s name in the Gradescope submission and identify which problems are on each in Gradescope. Associate the “Overall” section with the first page of your PDF.

Include all team members’ names with the team name in the author portion of the YAML header.

There should only be one submission per team on Gradescope.

References

Midwest Demographics. Dataset in ggplot2. https://ggplot2.tidyverse.org/reference/midwest.html