class: center, middle, inverse, title-slide # Implementing principles for effective visualizations in R
👩🎨 ### Dr. Çetinkaya-Rundel --- layout: true <div class="my-footer"> <span> Dr. Mine Çetinkaya-Rundel - <a href="http://www2.stat.duke.edu/courses/Fall18/sta112.01/schedule" target="_blank">stat.duke.edu/courses/Fall18/sta112.01 </a> </span> </div> --- ## Which is the best? <iframe width="840" height="473" src="https://www.youtube.com/embed/AuJFuEq-qD8" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe> --- class: center, middle # Principles for effective visualizations --- ## Principles for effective visualizations - Order matters - Put long categories on the y-axis - Keep scales consistent - Select meaningful colors - Use meaningful and nonredundant labels --- ## Data A SurveyUSA poll<sup>✦</sup> asked 722 Los Angeles residents who described themselves as paying attention to the news about Brett Kavanaugh's confirmation to the Supreme Court of the United States the following question: >Which best describes you: >I strongly support Kavanaugh's confirmation. >I support Kavanaugh's confirmation. >I oppose Kavanaugh's confirmation. >I strongly oppose Kavanaugh's confirmation. .footnote[ <sup>✦</sup> Results of [SurveyUSA News Poll #24330](http://www.surveyusa.com/client/PollReport.aspx?g=b2c0e27f-02cc-4fde-8e6f-a026fe2f055f), retrieved Sep 18, 2018. ] --- class: center, middle # Order matters --- ## Alphabetical order rarely ideal ```r ggplot(data = bk, aes(x = opinion)) + geom_bar() ``` <!-- --> --- ## Use inherent level order ```r bk <- bk %>% mutate( opinion = fct_relevel(opinion, "Strongly support", "Support", "Oppose", "Strongly oppose", "Not sure") ) ggplot(data = bk, aes(x = opinion)) + geom_bar() ``` <!-- --> `fct_relevel`: Reorder factor levels using a custom order --- ## Alphabetical order rarely ideal ```r ggplot(data = bk, aes(x = race)) + geom_bar() ``` <!-- --> --- ## Order by frequency ```r ggplot(data = bk, aes(x = fct_infreq(race))) + geom_bar() ``` <!-- --> `fct_infreq`: Reorder factors levels by frequency --- ## Clean up labels ```r ggplot(data = bk, aes(x = fct_infreq(race))) + geom_bar() + labs(x = "Race", y = "Count") ``` <!-- --> --- class: center, middle # Put long categories on # the y-axis --- ## Long categories can be hard to read ```r ggplot(data = bk, aes(x = opinion)) + geom_bar() ``` <!-- --> --- ## Move them to the y-axis ```r ggplot(data = bk, aes(x = opinion)) + geom_bar() + coord_flip() ``` <!-- --> --- class: center, middle # Pick a purpose --- ## Segmented bar plots can be hard to read ```r ggplot(data = bk, aes(x = opinion, fill = race)) + geom_bar() + coord_flip() ``` <!-- --> --- ## Use facets ```r ggplot(data = bk, aes(x = opinion, fill = race)) + geom_bar() + coord_flip() + facet_grid(. ~ race) ``` <!-- --> --- ## Avoid redundancy ```r ggplot(data = bk, aes(x = opinion)) + geom_bar() + coord_flip() + facet_grid(. ~ race) ``` <!-- --> --- ## Clean up labels ```r ggplot(data = bk, aes(x = opinion)) + geom_bar() + coord_flip() + facet_grid(. ~ race) + labs(title = "Opinion on Brett Kavanaugh's confirmation", x = "", y = "") ``` <!-- --> --- class: center, middle # Select meaningful colors --- ## Ordinal data aren't well represented by rainbow colors ```r ggplot(data = bk, aes(x = race, fill = opinion)) + geom_bar(position = "fill") + coord_flip() ``` <!-- --> --- ## Viridis scale works well with ordinal data ```r ggplot(data = bk, aes(x = race, fill = opinion)) + geom_bar(position = "fill") + coord_flip() + scale_fill_viridis_d() ``` <!-- --> --- ## Clean up labels ```r ggplot(data = bk, aes(x = race, fill = opinion)) + geom_bar(position = "fill") + coord_flip() + scale_fill_viridis_d() + labs(title = "Opinion on Brett Kavanaugh's confirmation", subtitle = "by race", x = "", y = "", fill = "") ``` <!-- --> --- ## Acknowledgements These slides are based on earlier work by [Angela Zoss, Ph.D.](https://library.duke.edu/about/directory/staff/6881), Assessment & Data Visualization Analyst at Duke University Libraries.