ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts.
It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.
Statistical: Visualize \(Y \sim X | Z\) with ease
Technical: Human readable syntax
Bonus: Aesthetically pleasing
The data set is comprised of 651 randomly sampled movies produced and released before 2016. Data come from IMDB and Rotten Tomatoes. Codebook is available here.
movies = read.csv("", stringsAsFactors = FALSE) %>% tbl_df() movies
## # A tibble: 651 x 32 ## title title_type genre runtime mpaa_rating studio ## <chr> <chr> <chr> <int> <chr> <chr> ## 1 Filly Brown Feature Film Drama 80 R Indomina Media Inc. ## 2 The Dish Feature Film Drama 101 PG-13 Warner Bros. Pictures ## 3 Waiting for Guffman Feature Film Comedy 84 R Sony Pictures Classics ## 4 The Age of Innocence Feature Film Drama 139 PG Columbia Pictures ## 5 Malevolence Feature Film Horror 90 R Anchor Bay Entertainment ## 6 Old Partner Documentary Documentary 78 Unrated Shcalo Media Group ## 7 Lady Jane Feature Film Drama 142 PG-13 Paramount Home Video ## 8 Mad Dog Time Feature Film Drama 93 R MGM/United Artists ## 9 Beauty Is Embarrassing Documentary Documentary 88 Unrated Independent Pictures ## 10 The Snowtown Murders Feature Film Drama 119 Unrated IFC Films ## # ... with 641 more rows, and 26 more variables: thtr_rel_year <int>, thtr_rel_month <int>, ## # thtr_rel_day <int>, dvd_rel_year <int>, dvd_rel_month <int>, dvd_rel_day <int>, ## # imdb_rating <dbl>, imdb_num_votes <int>, critics_rating <chr>, critics_score <int>, ## # audience_rating <chr>, audience_score <int>, best_pic_nom <chr>, best_pic_win <chr>, ## # best_actor_win <chr>, best_actress_win <chr>, best_dir_win <chr>, top200_box <chr>, ## # director <chr>, actor1 <chr>, actor2 <chr>, actor3 <chr>, actor4 <chr>, actor5 <chr>, ## # imdb_url <chr>, rt_url <chr>
ggplot(data = movies, aes(x = critics_score, y = audience_score, color = title_type)) + geom_point(alpha=0.5) + geom_smooth(method = "lm", se = FALSE)
plot(y = movies$audience_score, x = movies$critics_score, col = adjustcolor(as.integer(factor(movies$title_type)),alpha=0.5), pch=16) doc = movies[movies$title_type == "Documentary", ] ff = movies[movies$title_type == "Feature Film", ] tv = movies[movies$title_type == "TV Movie", ] m_doc = lm(audience_score ~ critics_score, data = doc) m_ff = lm(audience_score ~ critics_score, data = ff) m_tv = lm(audience_score ~ critics_score, data = tv) abline(m_doc, col = 1, lwd=2) abline(m_ff, col = 2, lwd=2) abline(m_tv, col = 3, lwd=2) legend("topleft", levels(factor(movies$title_type)), col = c(1,2,3), lty = 1)
A statistical graphic is a…
ggplot(data = movies, aes(x = audience_score, y = critics_score)) + geom_point()
ggplot(data = movies, aes(x = audience_score, y = critics_score)) + geom_point(alpha = 0.5, color = "blue")
ggplot(data = movies, aes(x = audience_score, y = critics_score, color = genre)) + geom_point(alpha = 0.5) + facet_grid(. ~ title_type)
ggplot(data = movies, aes(x = audience_score, y = critics_score, color = genre)) + geom_point(alpha = 0.5) + facet_grid(audience_rating ~ title_type)
ggplot(data = movies, aes(x = audience_score, y = critics_score, color = title_type)) + geom_point(alpha = 0.5) + facet_wrap(~genre)
ggplot(data = [dataframe], aes(x = [var_x], y = [var_y], color = [var_for_color], fill = [var_for_fill], shape = [var_for_shape])) + geom_[some_geom]([geom_arguments]) + ... # other geometries scale_[some_axis]_[some_scale]() + facet_[some_facet]([formula]) + ... # other options
ggplot(data = movies, aes(x = audience_score)) + geom_histogram(binwidth = 5)
ggplot(data = movies, aes(y = audience_score, x = genre)) + geom_boxplot()
ggplot(data = movies, aes(y = audience_score, x = genre)) + geom_boxplot() + theme(axis.text.x=element_text(angle = 45, hjust = 1))
ggplot(data = movies, aes(x = runtime, color = audience_rating)) + geom_density()
ggplot(data = movies, aes(x = runtime, fill = audience_rating)) + geom_density()
ggplot(data = movies, aes(x = runtime, fill = audience_rating)) + geom_density(alpha = 0.5)
ggplot(data = movies, aes(x = imdb_rating, y = audience_score)) + geom_point(alpha = 0.5)
ggplot(data = movies, aes(x = imdb_rating, y = audience_score)) + geom_point(alpha = 0.5) + geom_smooth()
ggplot(data = movies, aes(x = imdb_rating, y = audience_score)) + geom_point(alpha = 0.5) + geom_smooth(method = "lm")
ggplot(data = movies, aes(x = genre)) + geom_bar() + theme(axis.text.x=element_text(angle = 45, hjust = 1))
ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar() + theme(axis.text.x=element_text(angle = 45, hjust = 1))
ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar(position = "fill") + theme(axis.text.x=element_text(angle = 45, hjust = 1))
ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar(position = "dodge") + theme(axis.text.x=element_text(angle = 45, hjust = 1))
?ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar(position = "dodge") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
resourcesVisit for documentation on the current version of the ggplot2
package. It's full of examples!
Refer to the ggplot2
Themes vignette:
Recreate the following plot. Hint: Add a labs()
Recreate the following plot. Hint: the black lines are a linear model (lm) fit to all of the movies within a rating category and the grey line is the 0-1 line (intercept 0, slope 1).
Above materials are derived in part from the following sources:
Mine Cetinkaya-Rundel's DataFest 2016 Visualization Workshop
Tim Winkle's ggplot2 Workshop