Lab 02: Data visualization

Due: Thu, Feb 04 at 11:59pm ET

Goals

Clone assignment repo and start new project

Cache your GitHub credentials

Your GitHub credentials can be cached for the rest of the semester by entering the following in your Terminal under the Terminal tab.

git config --global credential.helper 'cache --timeout=600000'

Once you enter your username and password again, it should be saved going forward.

Car talk

Introduction

The data we will examine is loaded automatically with tidyverse. It is called mpg and contains fuel economy and characteristics of cars from the Environmental Protection Agency (EPA) from http://fueleconomy.gov.

To begin, familiarize yourself with the dataset by reading the documentation. Remember, you can pull up the documentation by running ?mpg in the console.

All plots should follow the best visualization practices we have discussed in lecture. Hence, plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

In addition, code and narrative should not exceed the 80 character limit. To help police this, add a vertical line at 80 characters by clicking “Tools” \(\rightarrow\) “Global Options” \(\rightarrow\) “Code” \(\rightarrow\) “Display”, then set “Margin Column” to 80 and click “Apply”.

Your assignment should have at least three meaningful commits.

Exercises

  1. Generate a scatterplot of city miles per gallon (cty) versus highway miles per gallon (hwy) with points colored by class.

  2. Note that there are only so many possibilities of highway and city miles per gallon, so some of the points are on top of each other. Using geom_jitter() or a position = argument in geom_point(), add a small amount of random variation to each point. Briefly comment on the differences between the plots you constructed in Exercises 1 and 2. What are the advantages and disadvantages of each?

  3. Examine the relationship between city and highway miles per gallon, with a separate plot for each type of drive train (drv).

  4. Create side-by-side boxplots of city miles per gallon for each class. Briefly comment on what you notice.

  5. Create a segmented bar chart with one bar per class, each bar going from 0 - 1, with the fill determined by the type of drive train (drv). What do you notice?

  6. Recreate the plot below. The functions theme_bw() and labs() will be helpful. The size of the points is 0.50. Also, set the figure dimensions with appropriate code chunk options, where the width is 9 and the height is 6. Note: the figure dimensions below are distorted due to the formatting of this HTML file.

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Overall” section with the first page.