A study of conducted in Whickham, England recorded participants’ age, smoking status at baseline, and then 20 years later recorded their health outcome.

Accessing the data

The data can be found in the mosaicData package.

In your console, run the following to install this package:

install.packages("mosaicData")

Then load the package with

library(mosaicData)

and load the data with

data(Whickham)

Take a peek at the codebook with

?Whickham

or at https://www.rdocumentation.org/packages/mosaicData/versions/0.14.0/topics/Whickham.

Accessing the assignment repo

Go to the #assignment-links channel on Slack and click on the link for mini-hw-06, and accept the assignment. This will automatically put you in the teams you created previously. You can confirm this by looking at the name of your repo (it will have your team name on it).

Then, each team member can follow the usual steps to clone the repo and get started with the analysis.

Assignment

  1. What type of study do you think these data comne from: observational or experiment? Why?

  2. How many observations are in this dataset? What does each observation represent?

  3. How many variables are in this dataset? What type of variable is each? Display each variable using an appropriate visualization.

  4. What would you expect the relationship between smoking status and health outcome to be?

  5. Create a visualization depicting the relationship between smoking status and health outcome. Briefly describe the relationship, and evaluate whether this meets your expectations. You can also create a contingency table and use the values on the table to calculate conditional probabilities to help your narrative:

Whickham %>%
  count(smoker, outcome)
  1. Create a new variable called age_cat using the following scheme:
  • age <= 44 ~ "18-44"
  • age > 44 & age <= 64 ~ "45-64"
  • age > 64 ~ "65+"
  1. Re-create the visualization depicting the relationship between smoking status and health outcome, faceted by age_cat. What changed? What might explain this change? You can extend the contingency table from earlier by breaking it down by age category and use it to help your narrative.
Whickham %>%
  count(smoker, age_cat, outcome)

Grading

Total 15 pts
Questions 1-3 1 pt / question - 3 pts
Questions 4-7 2 pt / question - 8 pts
Code style 1 pt
Informatively named code chunks 1 pt
Commit frequency and informative messages 1 pt