ggplot2
library(tidyverse)
teams <- read_csv("http://www2.stat.duke.edu/~sms185/data/mlb/teams.csv")
Object teams
is a data frame that contains yearly statistics and standings for MLB teams from 2009 to 2018.
The data has 300 rows and 56 variables.
energy <- read_csv("http://www2.stat.duke.edu/~sms185/data/energy/energy.csv")
The power sources represent the amount of energy a power source generates each day as represented in daily MWh.
MWhperDay
: MWh of energy generated per dayname
: energy source nametype
: type of energy sourcelocation
: country of energy sourcenote
: more details on energy sourceboe
: barrel of oil equivalentflint <- read_csv("http://www2.stat.duke.edu/~sms185/data/health/flint.csv")
Each row represents a home in Flint, Michigan. Water lead contaminant value were recorded at three times as represented by draw1
, draw2
, and draw3
.
Use tibble teams
to re-create the plot below.
ggplot(data = teams, mapping = aes(x = SO, y = R, color = factor(DivWin))) +
geom_point(size = 3, alpha = .8) +
facet_wrap(~yearID, nrow = 2) +
labs(x = "Strike outs", y = "Runs", color = "Division winner")
Try to improve the visualization in Exercise 1 by drawing attention to the division winners and their relationship between runs and strikeouts.
ggplot(data = teams, mapping = aes(x = SO, y = R, color = factor(DivWin))) +
geom_point(size = 2, alpha = .8) +
geom_hline(yintercept = 750, lty = 2, alpha = .5, color = "blue") +
geom_vline(xintercept = 1250, lty = 2, alpha = .5, color = "blue") +
facet_wrap(~yearID, nrow = 2) +
labs(x = "Strike outs", y = "Runs", color = "Division winner",
title = "Division winners generally score more runs",
subtitle = "and have fewer strike outs") +
scale_color_manual(values = c("grey", "red")) +
scale_x_continuous(limits = c(750, 1750), breaks = seq(900, 1700, 350),
labels = seq(900, 1700, 350)) +
scale_y_continuous(limits = c(500, 1000), breaks = seq(500, 1000, 100),
labels = seq(500, 1000, 100)) +
theme_bw(base_size = 16) +
theme(legend.position = "bottom")
Re-create the plot below using energy
.
A few notes:
base font size is 18
hex colors: c("#9d8b7e", "#315a70", "#66344c", "#678b93", "#b5cfe1", "#ffcccc")
use function order()
to help get the top 30
Starter code:
energy_top_30 <- energy[order(energy$MWhperDay, decreasing = T)[1:30], ]
ggplot(energy_top_30, mapping = aes(x = reorder(name, MWhperDay),
y = MWhperDay / 1000,
fill = type)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("#9d8b7e", "#315a70", "#66344c",
"#678b93", "#b5cfe1", "#ffcccc")) +
theme_bw(base_size = 18) +
labs(y = "Daily MWh (in thousands)", x = "Power Source",
title = "Top 30 power source energy generators",
fill = "Power Source",
caption = "1 MWh is, on average, enough power for 28 people in the USA") +
coord_flip()
Recreate the plot below using flint
.
Some details to help you replicate the plot:
theme_bw()
flint %>%
filter(!(zip %in% c(48529, 48502))) %>%
ggplot(mapping = aes(x = reorder(factor(zip), draw1, quantile, .75), y = draw1)) +
geom_boxplot(fill = "#256d7b", alpha = 0.7) +
scale_y_continuous(breaks = seq(0, 165, 15), labels = seq(0, 165, 15)) +
coord_flip() +
geom_hline(yintercept = 15, color = "red", alpha = 0.7,
linetype = 2, size = 1.25) +
annotate("text", y = 45, x = .75, label = "EPA action level, 15 ppb",
color = "red", alpha = 0.7, size = 6) +
labs(x = "Zip code", y = "Lead content (ppb)",
caption = "Action level for lead is when 15 ppb is in more than 10% of customer taps sampled", title = "First draw of lead samples in Flint, MI homes") +
theme_bw(base_size = 16)