Packages

library(tidyverse)
library(janitor)

Data

Parcel boundaries with address and revenue-related information for properties in Wake County, NC. http://data-wake.opendata.arcgis.com/datasets/parcels

wake <- read_csv("https://www2.stat.duke.edu/~sms185/data/econ/parcels.csv")

wake <- clean_names(wake)

Exercise 1

Problem

Which city has the fewest land parcels in the dataset?

Solution

wake %>% 
  count(city_decode) %>% 
  arrange(n) %>% 
  slice(1)

Exercise 2

Problem

Create a tibble that shows the year a parcel was built and the total value, where all parcels are located in Apex and are more than one acre in area. Sort the result in ascending order by year built.

Solution

wake %>% 
  filter(city_decode == "APEX", calc_area > 1) %>% 
  select(year_built, total_value_assd) %>% 
  arrange(year_built)

Exercise 3

Problem

Choose a subset of five variables and 10 random rows from wake and save it as an object named wake_mini. Experiment renaming variables with select() and rename() on wake_mini. What is the difference between the two functions?

Solution

Function select() only keeps the variables you mention.
Function rename() keeps all variables.

Exercise 4

Problem

Compute the mean area for each design style.

Solution

wake %>% 
  group_by(design_style_decode) %>% 
  summarise(mean_area = mean(calc_area))

Exercise 5

Problem

Compute the median sale price for each year. Hint: lubridate::year()

Solution

wake %>% 
  filter(!is.na(sale_date)) %>% 
  mutate(year = lubridate::year(sale_date)) %>% 
  group_by(year) %>% 
  summarise(median_sale_price = median(totsalprice, na.rm = TRUE))

Exercise 6

Problem

Which city with at least 1,000 parcels classified as a “Townhouse” had the highest proportion of parcels as “Townhouse”?

Solution

wake %>% 
  count(city_decode, design_style_decode) %>% 
  group_by(city_decode) %>% 
  mutate(prop = n / sum(n)) %>% 
  ungroup() %>% 
  filter(design_style_decode == "Townhouse", n >= 1000) %>% 
  arrange(desc(prop)) %>% 
  slice(1)

Exercises: Data manipulation with dplyr

Shawn Santo

Packages

Data

Exercise 1

Problem

Solution

Exercise 2

Problem

Solution

Exercise 3

Problem

Solution

Exercise 4

Problem

Solution

Exercise 5

Problem

Solution

Exercise 6

Problem

Solution