Packages

library(tidyverse)
library(rvest)

Exercise 1

Problem

Scrape all the QuikTrip stores within 25 miles of Tulsa, OK. Tidy the result in a data frame. Hint: html_children()

Solution

Using Chrome’s developer tools we can find the XHR URL used for the store information. We’ll use bitly to shorten the link. Even though this is XML, we can use read_html() to read this into R.

xhr_url <- "https://bit.ly/2AoTZZD"
qt_xml <- read_html(xhr_url)

Get variable names (information is in the XML tags).

variable_names <- qt_xml %>% 
  html_nodes("poi") %>% 
  map(html_children) %>% 
  map(html_name) %>% 
  .[[1]]

Parse the results to obtain the text. Convert everything to a data frame.

qt_stores <- qt_xml %>% 
  html_nodes("poi") %>% 
  map(html_children) %>% 
  map(html_text) %>% 
  map_dfr(~as.data.frame(t(as.matrix(.x)), stringsAsFactors = FALSE)) %>% 
  as_tibble()

Set and clean-up the data frame variable names.

names(qt_stores) <- variable_names
qt_stores <- janitor::clean_names(qt_stores)

Preview the data frame.

qt_stores

Exercise 2

Problem

Navigate to https://coronavirus.jhu.edu/us-map. Identify the XHR for the data that corresponds to the “Confirmed Cases by County”.