Lab 09: Putting it All Together

due Wed, Apr 10 at 11:59p

In this lab, you will put together everything you’ve learned thus far. Unlike previous lab assignments, your lab write up will be in the form of a small report (rather than numbered exercises). Though this analysis will not be as in-depth as your analysis in the final project, this assignment will give your group practice organizing the results of a statistical analysis to tell a complete narrative.

You will also practice imputing missing data and using k-fold cross validation to assess your model’s performance on test data.

Getting Started

When configuring Git, be sure to use the email address that is associated with your GitHub account.

library(usethis)
use_git_config(user.name="your name", user.email="your email")

Password caching

If you would like your git password cached for a week for this project, type the following in the Terminal:

git config --global credential.helper 'cache --timeout 604800'

You will need to enter your GitHub username and password one more time after caching the password. After that you won’t need to enter your credentials for 604800 seconds = 7 days.

Packages

You will need the following packages for today’s lab:

library(tidyverse)
library(dslabs)
# Fill in other packages as needed

Project name:

Currently your project is called Untitled Project. Update the name of your project to the title of today’s lab.

Warm up

Before we introduce the data, let’s warm up with a simple exercise.

YAML:

Data

The data for this lab is the gapminder dataset in the dslabs package. This dataset contains health and income data for 184 countries during the years 1960 to 2016. After loading the dslabs package, you can type ?gapminder in the console to to see the variables in the dataset.

You will only use data from 2011 in this lab.

Exercises

The goal of this analysis is to build a regression model that could be used to predict a country’s gross domestic product (gdp) using the other characteristics included in the data.

Introduction

Brief introduction of the data and the research question

Exploratory Data Analysis

At a minimum, your exploratory data analysis should include the following:

Regression Model

At a minimum, the discussion for the final regression model should include the following:

Assumptions

At a minimum, the discussion of model assumptions should include the following:

Model Validation

At a minimum, the discussion of the model validation should include the following:

Conclusion

Brief summary of the conclusions drawn from the analysis.