The data set consists of 271 homes sampled with three water lead contaminant values at designated time points. The lead content is in parts per billion (ppb). Additionally, some location data is given about each home.
To get started, read in the flint.csv
file using function read_csv()
.
library(tidyverse)
flint <- read_csv("flint.csv")
Let’s preview the data with function glimpse()
glimpse(flint)
Observations: 813
Variables: 5
$ id <dbl> 1, 2, 4, 5, 6, 7, 8, 9, 12, 13, 15, 16, 17, 18, 19, 20, 21,…
$ zip <dbl> 48504, 48507, 48504, 48507, 48505, 48507, 48507, 48503, 485…
$ ward <dbl> 6, 9, 1, 8, 3, 9, 9, 5, 9, 3, 9, 5, 2, 7, 9, 9, 5, 6, 2, 6,…
$ draw <chr> "first", "first", "first", "first", "first", "first", "firs…
$ lead <dbl> 0.344, 8.133, 1.111, 8.007, 1.951, 7.200, 40.630, 1.100, 10…
Let’s see how many samples were taken from each zip code.
flint %>% # data
group_by(zip) %>% # perform a grouping by zip code
count() # count occurrences
Which zip code had the most samples drawn?
Next, let’s look at the mean and median lead contaminant values for each zip code and draw combination. We have eight zip codes and samples taken at three time points. How many combinations do we have?
flint %>%
group_by(zip, draw) %>%
summarise(mean_pb = mean(lead))
flint %>%
group_by(zip, draw) %>%
summarise(median_pb = median(lead))
How many rows are in each of two above data frames?
Modify the code below to compute the mean and median lead contaminant values for zip code 48503 at the first draw. What should you put in for draw == "-----"
? Don’t forget to uncomment the second line of code.
flint %>%
# filter(zip == 48503, draw == "-----") %>%
summarise(mean_pb = mean(lead),
median_pb = median(lead))
Let’s make some plots, where we will focus on zip codes 48503, 48504, 48505, 48506, and 48507. We will restrict our attention to samples with lead values less than 1,000 ppb.
flint_focus <- flint %>%
filter(zip %in% 48503:48507, lead < 1000)
Below are side-by-side box plots for the three flushing times in each of the five zip codes considered. Add x
and y
labels; add a title by inserting title = "title_name"
inside the labs()
function.
ggplot(data = flint_focus, aes(x = factor(zip), y = lead)) +
geom_boxplot(aes(fill = factor(draw))) +
labs(x = "--------", y = "--------", fill = "Flushing time") +
scale_fill_discrete(breaks = c("first", "second", "third"),
labels = c("0 (sec)", "45 (sec)", "120 (sec)")) +
coord_flip() +
theme_bw()
Add labels for x
, y
, a title
, and subtitle
to the code below to update the corresponding plot.
ggplot(data = flint_focus, aes(x = factor(zip), y = lead)) +
geom_boxplot(aes(fill = factor(draw))) +
labs(x = "--------", y = "--------", fill = "Flushing time",
subtitle = "--------") +
scale_fill_discrete(breaks = c("first", "second", "third"),
labels = c("0 (sec)", "45 (sec)", "120 (sec)")) +
coord_flip(ylim = c(0, 50)) +
theme_bw()
What is the difference between the two plots?