September 1, 2015

Housekeeping

Announcements

  • HW 1 is due next Tuesday

From last time…

Task

Visualize relationship between life expectancy and GDP per capita in 2007 in countries.

Step 0: Load necessary packages

  • In the following exercises we'll use dplyr (for data wrangling) and ggplot2 (for visualization) packages.

  • Make sure the packages are installed.

  • Load these packages in your markdown file:

library(dplyr)
library(ggplot2)

Step 1: Load data

gapminder <- read.csv("https://stat.duke.edu/~mc301/data/gapminder.csv")

Step 2: Subset data

  • Start with the gapminder dataset

  • Filter for cases (rows) where year is equal to 2007

  • Save this new subsetted dataset as gap07

gap07 <- gapminder %>%
  filter(year == 2007)

Step 2: Explore and visualize

Task: Visualize the relationship between gdpPercap and lifeExp.

qplot(x = gdpPercap, y = lifeExp, data = gap07)

Step 3: Dig deeper

Task: Color the points by continent.

qplot(x = gdpPercap, y = lifeExp, color = continent, data = gap07)

Step 3: Commit and push all changes

  • Stage

  • Commit (with a message)

  • Push

A note on piping and layering

  • The %>% operator in dplyr functions is called the pipe operator. This means you "pipe" the output of the previous line of code as the first input of the next line of code.

  • The + operator in ggplot2 functions is used for "layering". This means you create the plot in layers, separated by +.

What's next?

Update your analysis

What if you wanted to now change your analysis

  • to subset for 1952

  • plot life expectancy (lifeExp) vs. population (pop)

  • and size the points by GPD (gpdPercap)
    • hint: add argument size = gpdPercap to your plotting code

Once you're done, commit and push all your changes with a meaningful message.