Getting started

Go to the course organization on GitHub: https://github.com/Sta199-S18.
Find the repo starting with lab-05 and that has your team name at the end (this should be the only lab-05 repo available to you).
In the repo, click on the green Clone or download button, select Use HTTPS (this might already be selected by default, and if it is, you’ll see the text Clone with HTTPS as in the image below). Click on the clipboard icon to copy the repo URL.
Go to RStudio Cloud and into the course workspace. Create a New Project from Git Repo. You will need to click on the down arrow next to the New Project button to see this option.
Copy and paste the URL of your assignment repo into the dialog box:
Hit OK, and you’re good to go!

Packages

In this lab we will work with the tidyverse and mosaicData packages. So we need to install and load them:

install.packages("tidyverse")
install.packages("mosaicData")
library(tidyverse) 
library(mosaicData)

Note that these packages are also loaded in your R Markdown document.

Housekeeping

Git configuration

⊕Your email address is the address tied to your GitHub account and your name should be first and last name.

Go to the Terminal pane
Type the following two lines of code, replacing the information in the quotation marks with your info.

git config --global user.email "your email"
git config --global user.name "your name"

To confirm that the changes have been implemented, run the following:

git config --global user.email
git config --global user.name

Password caching

If you would like your git password cached for a week for this project, type the following in the Terminal:

git config --global credential.helper 'cache --timeout 604800'

Project name:

Currently your project is called Untitled Project. Update the name of your project to be “Lab 05 - Simpson’s paradox”.

Warm up

Pick one team member to complete the steps in this section while the others contribute to the discussion but do not actually touch the files on their computer.

Before we introduce the data, let’s warm up with some simple exercises.

YAML:

Open the R Markdown (Rmd) file in your project, change the author name to your team name, and knit the document.

Commiting and pushing changes:

Go to the Git pane in your RStudio.
View the Diff and confirm that you are happy with the changes.
Add a commit message like “Update team name” in the Commit message box and hit Commit.
Click on Push. This will prompt a dialogue box where you first need to enter your user name, and then your password.

Pulling changes:

Now, the remaining team members who have not been concurrently making these changes on their projects should click on the Pull button in their Git pane and observe that the changes are now reflected on their projects as well.

The data

The data is in the mosaicData package. You can load it with

data(Whickham)

Take a peek at the codebook with

?Whickham

or at https://www.rdocumentation.org/packages/mosaicData/versions/0.14.0/topics/Whickham.

Exercises

What type of study do you think these data come from: observational or experiment? Why?
How many observations are in this dataset? What does each observation represent?
How many variables are in this dataset? What type of variable is each? Display each variable using an appropriate visualization.
What would you expect the relationship between smoking status and health outcome to be?
Create a visualization depicting the relationship between smoking status and health outcome. Briefly describe the relationship, and evaluate whether this meets your expectations. Additionally, calculate the relevant conditional probabilities to help your narrative. Here is some code to get you started:

Whickham %>%
  count(smoker, outcome)

Create a new variable called age_cat using the following scheme:

age <= 44 ~ "18-44"
age > 44 & age <= 64 ~ "45-64"
age > 64 ~ "65+"

Re-create the visualization depicting the relationship between smoking status and health outcome, faceted by age_cat. What changed? What might explain this change? Extend the contingency table from earlier by breaking it down by age category and use it to help your narrative.

Whickham %>%
  count(smoker, age_cat, outcome)

Lab 05 - Simpson’s paradox

2018-02-15

Introduction