Due: 2018-02-22 at noon
A study of conducted in Whickham, England recorded participants’ age, smoking status at baseline, and then 20 years later recorded their health outcome.
Go to the course organization on GitHub: https://github.com/Sta199-S18.
Find the repo starting with lab-05
and that has your team name at the end (this should be the only lab-05
repo available to you).
In the repo, click on the green Clone or download button, select Use HTTPS (this might already be selected by default, and if it is, you’ll see the text Clone with HTTPS as in the image below). Click on the clipboard icon to copy the repo URL.
Go to RStudio Cloud and into the course workspace. Create a New Project from Git Repo. You will need to click on the down arrow next to the New Project button to see this option.
Copy and paste the URL of your assignment repo into the dialog box:
Hit OK, and you’re good to go!
In this lab we will work with the tidyverse
and mosaicData
packages. So we need to install and load them:
install.packages("tidyverse")
install.packages("mosaicData")
library(tidyverse)
library(mosaicData)
Note that these packages are also loaded in your R Markdown document.
Your email address is the address tied to your GitHub account and your name should be first and last name.
git config --global user.email "your email"
git config --global user.name "your name"
To confirm that the changes have been implemented, run the following:
git config --global user.email
git config --global user.name
If you would like your git password cached for a week for this project, type the following in the Terminal:
git config --global credential.helper 'cache --timeout 604800'
Currently your project is called Untitled Project. Update the name of your project to be “Lab 05 - Simpson’s paradox”.
Pick one team member to complete the steps in this section while the others contribute to the discussion but do not actually touch the files on their computer.
Before we introduce the data, let’s warm up with some simple exercises.
Open the R Markdown (Rmd) file in your project, change the author name to your team name, and knit the document.
Now, the remaining team members who have not been concurrently making these changes on their projects should click on the Pull button in their Git pane and observe that the changes are now reflected on their projects as well.
The data is in the mosaicData
package. You can load it with
data(Whickham)
Take a peek at the codebook with
?Whickham
or at https://www.rdocumentation.org/packages/mosaicData/versions/0.14.0/topics/Whickham.
What type of study do you think these data come from: observational or experiment? Why?
How many observations are in this dataset? What does each observation represent?
How many variables are in this dataset? What type of variable is each? Display each variable using an appropriate visualization.
What would you expect the relationship between smoking status and health outcome to be?
Create a visualization depicting the relationship between smoking status and health outcome. Briefly describe the relationship, and evaluate whether this meets your expectations. Additionally, calculate the relevant conditional probabilities to help your narrative. Here is some code to get you started:
Whickham %>%
count(smoker, outcome)
age_cat
using the following scheme:age <= 44 ~ "18-44"
age > 44 & age <= 64 ~ "45-64"
age > 64 ~ "65+"
age_cat
. What changed? What might explain this change? Extend the contingency table from earlier by breaking it down by age category and use it to help your narrative.Whickham %>%
count(smoker, age_cat, outcome)