class: center, middle, inverse, title-slide # ggplot2 ### Colin Rundel ### 2018-02-13 --- exclude: true --- class: middle count: false # ggplot2 --- ## ggplot2 .pull-left[ <img src="imgs/hex-ggplot2.png" width="300px" style="display: block; margin: auto;" /> ] - ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. - It takes care of many of the fiddly details that make plotting a hassle (like drawing legends or faceting) as well as providing a powerful model of graphical representation of data that makes it easy to produce complex multi-layered visualizations. --- ## New data - movies The data set is comprised of 651 randomly sampled movies produced and released before 2016. Data come from [IMDB](http://www.imdb.com/) and [Rotten Tomatoes](http://www.rottentomatoes.com/). The codebook is available [here](data/movies/movies.html). .small[ ```r movies = readr::read_csv("data/movies/movies.csv") movies ``` ``` ## # A tibble: 651 x 32 ## title title_type genre runtime mpaa_rating studio thtr_rel_year ## <chr> <chr> <chr> <int> <chr> <chr> <int> ## 1 Filly Brown Feature Fi… Drama 80 R Indomina M… 2013 ## 2 The Dish Feature Fi… Drama 101 PG-13 Warner Bro… 2001 ## 3 Waiting fo… Feature Fi… Comedy 84 R Sony Pictu… 1996 ## 4 The Age of… Feature Fi… Drama 139 PG Columbia P… 1993 ## 5 Malevolence Feature Fi… Horror 90 R Anchor Bay… 2004 ## 6 Old Partner Documentary Docum… 78 Unrated Shcalo Med… 2009 ## 7 Lady Jane Feature Fi… Drama 142 PG-13 Paramount … 1986 ## 8 Mad Dog Ti… Feature Fi… Drama 93 R MGM/United… 1996 ## 9 Beauty Is … Documentary Docum… 88 Unrated Independen… 2012 ## 10 The Snowto… Feature Fi… Drama 119 Unrated IFC Films 2012 ## # ... with 641 more rows, and 25 more variables: thtr_rel_month <int>, ## # thtr_rel_day <int>, dvd_rel_year <int>, dvd_rel_month <int>, ## # dvd_rel_day <int>, imdb_rating <dbl>, imdb_num_votes <int>, ## # critics_rating <chr>, critics_score <int>, audience_rating <chr>, ## # audience_score <int>, best_pic_nom <chr>, best_pic_win <chr>, ## # best_actor_win <chr>, best_actress_win <chr>, best_dir_win <chr>, ## # top200_box <chr>, director <chr>, actor1 <chr>, actor2 <chr>, actor3 <chr>, ## # actor4 <chr>, actor5 <chr>, imdb_url <chr>, rt_url <chr> ``` ] --- ## A simple task Lets create a visualization with the following features: * Scatter plot of critics score vs audience score * Color points based on `title_type` (i.e. are they a Documentary, Feature Film, or TV Movie). * Include a least squares line for each title type. --- ## With base R ```r plot(y = movies$audience_score, x = movies$critics_score, col = adjustcolor(as.integer(factor(movies$title_type)),alpha=0.5), pch = 16) doc = movies[movies$title_type == "Documentary", ] ff = movies[movies$title_type == "Feature Film", ] tv = movies[movies$title_type == "TV Movie", ] m_doc = lm(audience_score ~ critics_score, data = doc) m_ff = lm(audience_score ~ critics_score, data = ff) m_tv = lm(audience_score ~ critics_score, data = tv) abline(m_doc, col = 1, lwd=2) abline(m_ff, col = 2, lwd=2) abline(m_tv, col = 3, lwd=2) legend("topleft", levels(factor(movies$title_type)), col = c(1,2,3), lty = 1) ``` --- ## With base R (output) <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## With ggplot2 ```r ggplot(data = movies, aes(x = critics_score, y = audience_score, color = title_type)) + geom_point(alpha=0.5) + geom_smooth(method = "lm", se = FALSE, fullrange=TRUE) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- class: middle count: false # Grammar of Graphics --- ## The Grammar of Graphics - Visualisation concept created by Leland Wilkinson (1999) - to define the basic elements of a statistical graphic - Adapted for R by Wickham (2009) - consistent and compact syntax to describe statistical graphics - highly modular as it breaks up graphs into semantic components - It is not meant as a guide to which graph to use and how to best convey your data (more on that later). --- ## Terminology A statistical graphic is a... - mapping of **data** - which may be **statistically transformed** (summarised, log-transformed, etc.) - to **aesthetic attributes** (color, size, xy-position, etc.) - using **geometric objects** (points, lines, bars, etc.) - and mapped onto a specific **facet** and **coordinate system** --- ## Audience score vs. critics score <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> - Which data is used as an input? - Are the variables statistically transformed before plotting? - What geometric objects are used to represent the data? - What variables are mapped onto which aesthetic attributes? - What type of scales are used to map data to aesthetics? --- ## Code ```r ggplot(data = movies, aes(x = audience_score, y = critics_score)) + geom_point() ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- ## Altering features <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> - How did the plot change? - Are these changes based on data or are the changes based on stylistic choices for the geometric objects? --- ## Code ```r ggplot(data = movies, aes(x = audience_score, y = critics_score)) + geom_point(alpha = 0.5, color = "blue") ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ## Faceting <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> - How did the plot change? - Are these changes based on data (i.e. can be mapped to variables in the dataset) or are the changes based on stylistic choices for the geometric objects? --- ## Faceting - code ```r ggplot(data = movies, aes(x = audience_score, y = critics_score, color = genre)) + geom_point(alpha = 0.5) + facet_grid(. ~ title_type) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- ## More faceting ```r ggplot(data = movies, aes(x = audience_score, y = critics_score, color = genre)) + geom_point(alpha = 0.5) + facet_grid(audience_rating ~ title_type) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ## Even More faceting ```r ggplot(data = movies, aes(x = audience_score, y = critics_score, color = title_type)) + geom_point(alpha = 0.5) + facet_wrap(~genre) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## Anatomy of a ggplot ``` ggplot( data = [dataframe], aes( x = [var_x], y = [var_y], color = [var_for_color], fill = [var_for_fill], shape = [var_for_shape] ) ) + geom_[some_geom]([geom_arguments]) + ... # other geometries scale_[some_axis]_[some_scale]() + facet_[some_facet]([formula]) + ... # other options ``` --- class: middle count: false # Various plots --- ## Histograms ```r ggplot(data = movies, aes(x = audience_score)) + geom_histogram(binwidth = 5) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Boxplots ```r ggplot(data = movies, aes(y = audience_score, x = genre)) + geom_boxplot() ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- ## Boxplots - axis formatting ```r ggplot(data = movies, aes(y = audience_score, x = genre)) + geom_boxplot() + theme(axis.text.x=element_text(angle = 45, hjust = 1)) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ## Boxplots - flipping axes ```r ggplot(data = movies, aes(y = audience_score, x = genre)) + geom_boxplot() + coord_flip() ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- ## Density plots ```r ggplot(data = movies, aes(x = runtime, color = audience_rating)) + geom_density() ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ## Density plots - fill vs color ```r ggplot(data = movies, aes(x = runtime, fill = audience_rating)) + geom_density() ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- ## Density plots - fill vs color + alpha ```r ggplot(data = movies, aes(x = runtime, fill = audience_rating)) + geom_density(alpha = 0.5) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## Scatter plots ```r ggplot(data = movies, aes(x = imdb_rating, y = audience_score)) + geom_point(alpha = 0.5) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- ## Smoothing ```r ggplot(data = movies, aes(x = imdb_rating, y = audience_score)) + geom_point(alpha = 0.5) + geom_smooth() ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- ## Smoothing - lm ```r ggplot(data = movies, aes(x = imdb_rating, y = audience_score)) + geom_point(alpha = 0.5) + geom_smooth(method = "lm") ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- ## Barplots ```r ggplot(data = movies, aes(x = genre)) + geom_bar() + theme(axis.text.x=element_text(angle = 45, hjust = 1)) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> --- ## Segmented barplots ```r ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar() + theme(axis.text.x=element_text(angle = 45, hjust = 1)) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> --- ## Segmented barplots - proportions ```r ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar(position = "fill") + theme(axis.text.x=element_text(angle = 45, hjust = 1)) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> --- ## Dodged barplots ```r ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar(position = "dodge") + theme(axis.text.x=element_text(angle = 45, hjust = 1)) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> --- ## Not on team `theme_grey()`? ```r ggplot(data = movies, aes(x = genre, fill = audience_rating)) + geom_bar(position = "dodge") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> --- ## More `ggplot2` resources - Visit http://docs.ggplot2.org/current/ for documentation on the current version of the `ggplot2` package. Lots of examples for every geometry and layer type. - Refer to the `ggplot2` cheatsheet: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf - Themes vignette: http://docs.ggplot2.org/dev/vignettes/themes.html --- ## Exercise 1 Recreate the following plot. Hint: Add a `labs()` layer. <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> --- ## Exercise 2 Recreate the following plot. Hint: the black lines are a linear model (lm) fit to *all* of the movies within a rating category and the grey line is the 0-1 line (intercept 0, slope 1). ``` ## Warning: Removed 23 rows containing missing values (geom_smooth). ``` <img src="Lec08_ggplot2_files/figure-html/unnamed-chunk-30-1.png" style="display: block; margin: auto;" /> --- class: middle count: false # Acknowledgments --- ## Acknowledgments Above materials are derived in part from the following sources: * Mine Cetinkaya-Rundel's [DataFest 2016 Visualization Workshop](https://github.com/mine-cetinkaya-rundel/df2016_workshops/tree/master/viz_ggplot2_shiny) * [RStudio Data Visualization Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf) * Package Documentation - [ggplot2](http://docs.ggplot2.org/current/) * Tim Winkle's [ggplot2 Workshop](https://rpubs.com/timwinke/ggplot2workshop)