class: center, middle, inverse, title-slide # Functions and automation ### Dr. Maria Tackett ### 04.10.19 --- layout: true <div class="my-footer"> <span> <a href="http://datasciencebox.org" target="_blank">datasciencebox.org</a> </span> </div> --- ## Announcements - Extra credit due Mon, Apr 15 at 11:59p - HW 6 due Wed, Apr 17 at 11:59p - Thursday: - My office hours: 10:30 - 11:30 - Regular lab --- ## Visualize .question[ How would you go about creating this visualization: Visualize the average yearly score for movies that made it on the top 250 list over time. ] -- .small[ <!-- --> ] --- ## Potential challenges - Unreliable formatting at the source - Data broken into many pages - ... .question[ Compare the display of information at [raleigh.craigslist.org/search/apa](https://raleigh.craigslist.org/search/apa) to the list on the IMDB top 250 list. What challenges can you foresee in scraping a list of the available apartments? ] --- class: center, middle # Application exercise --- ## Popular TV shows Go to RStudio Cloud `\(\rightarrow\)` Web scraping 1. Scrape the names, scores, and years of the most popular TV shows on IMDB: http://www.imdb.com/chart/tvmeter 2. Create a data frame called `tvshows` with four variables (rank, name, score, year) 3. Examine each of the first three TV shows to also obtain the genre and runtime. Time permitting, also try to get the following: - How many episodes so far (time permitting) - First five plot keywords (time permitting) Add this information to the `tvshows` data frame you created earlier. --- class: center, middle # Functions --- ## Setup ```r library(tidyverse) library(rvest) st <- read_html("http://www.imdb.com/title/tt4574334/") twd <- read_html("http://www.imdb.com/title/tt1520211/") got <- read_html("http://www.imdb.com/title/tt0944947/") ``` --- ## Why functions? - Automate common tasks in a power powerful and general way than copy-and-pasting: - You can give a function an evocative name that makes your code easier to understand. - As requirements change, you only need to update code in one place, instead of many. - You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another). -- - Down the line: Improve your reach as a data scientist by writing functions (and packages!) that others use --- ### When should you write a function? Whenever you’ve copied and pasted a block of code more than twice. .question[ Do you see any problems in the code below? ] .small[ ```r st_episode <- st %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() got_episode <- got %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() twd_episode <- twd %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() ``` ] --- ## Inputs .question[ How many inputs does the following code have? ] ```r st_episode <- st %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() ``` --- ## Turn your code into a function 1. Pick a short but informative <font class="vocab">name</font>, preferably a verb. <br> <br> <br> <br> ```r scrape_episode <- ``` --- ## Turn your code into a function 1. Pick a short but informative **name**, preferably a verb. 2. List inputs, or <font class="vocab">arguments</font>, to the function inside `function`. If we had more the call would look like `function(x, y, z)`. <br> ```r scrape_episode <- function(x){ } ``` --- ## Turn your code into a function 1. Pick a short but informative **name**, preferably a verb. 2. List inputs, or **arguments**, to the function inside `function`. If we had more the call would look like `function(x, y, z)`. 3. Place the <font class="vocab">code</font> you have developed in body of the function, a `{` block that immediately follows `function(...)`. ```r scrape_episode <- function(x){ x %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() } ``` -- ```r scrape_episode(st) ``` ``` ## [1] 25 ``` --- ### Check your function [The Walking Dead](https://www.imdb.com/title/tt1520211/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=AYY90SEXAACVY3XX1A2E&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_2) ```r scrape_episode(twd) ``` ``` ## [1] 147 ``` [Game of Thrones](https://www.imdb.com/title/tt0944947/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=AYY90SEXAACVY3XX1A2E&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_1) ```r scrape_episode(got) ``` ``` ## [1] 73 ``` --- ## Naming functions > "There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton -- - Names should be short but clearly evoke what the function does -- - Names should be verbs, not nouns -- - Multi-word names should be separated by underscores (`snake_case` as opposed to `camelCase`) -- - A family of functions should be named similarly (`scrape_title`, `scrape_episode`, `scrape_genre`, etc.) -- - Avoid overwriting existing (especially widely used) functions --- ## Scraping show info .small[ ```r scrape_show_info <- function(x){ title <- x %>% html_node("h1") %>% html_text(trim = TRUE) runtime <- x %>% html_node("time") %>% html_text() %>% # could use trim = TRUE instead of str_ functions str_replace("\\n", "") %>% str_trim() genres <- x %>% html_nodes(".txt-block~ .canwrap a") %>% html_text() %>% str_c(collapse = ", ") tibble(title = title, runtime = runtime, genres = genres) } ``` ] --- .small[ ```r scrape_show_info(st) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Stranger Thin… 51min " Drama, Fantasy, Horror, Mystery, Sci-Fi, T… ``` ```r scrape_show_info(twd) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Walking Dead 44min " Drama, Horror, Thriller" ``` ```r scrape_show_info(got) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Game of Thrones 57min " Action, Adventure, Drama, Fantasy, Romance" ``` ] --- .question[ How would you update the following function to use the URL of the page as an argument? ] .small[ ```r scrape_show_info <- function(x){ title <- x %>% html_node("h1") %>% html_text() %>% str_trim() runtime <- x %>% html_node("time") %>% html_text() %>% str_replace("\\n", "") %>% str_trim() genres <- x %>% html_nodes(".txt-block~ .canwrap a") %>% html_text() %>% str_trim() %>% paste(collapse = ", ") tibble(title = title, runtime = runtime, genres = genres) } ``` ] --- .small[ ```r scrape_show_info <- function(x){ * y <- read_html(x) title <- y %>% html_node("h1") %>% html_text() %>% str_trim() runtime <- y %>% html_node("time") %>% html_text() %>% str_replace("\\n", "") %>% str_trim() genres <- y %>% html_nodes(".txt-block~ .canwrap a") %>% html_text() %>% str_trim() %>% paste(collapse = ", ") tibble(title = title, runtime = runtime, genres = genres) } ``` ] --- ## Let's check .small[ ```r st_url <- "http://www.imdb.com/title/tt4574334/" twd_url <- "http://www.imdb.com/title/tt1520211/" got_url <- "http://www.imdb.com/title/tt0944947/" ``` ] -- .small[ ```r scrape_show_info(st_url) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Stranger Things 51min Drama, Fantasy, Horror, Mystery, Sci-Fi, Thriller ``` ```r scrape_show_info(twd_url) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Walking Dead 44min Drama, Horror, Thriller ``` ```r scrape_show_info(got_url) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Game of Thrones 57min Action, Adventure, Drama, Fantasy, Romance ``` ] --- class: center, middle # Automation --- .question[ You now have a function that will scrape the relevant info on shows given its URL. Where can we get a list of URLs of top 100 most popular TV shows on IMDB? Write the code for doing this in your teams. ] --- ```r urls <- read_html("http://www.imdb.com/chart/tvmeter") %>% html_nodes(".titleColumn a") %>% html_attr("href") %>% paste("http://www.imdb.com", ., sep = "") ``` ``` ## [1] "http://www.imdb.com/title/tt0944947/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_1" ## [2] "http://www.imdb.com/title/tt1520211/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_2" ## [3] "http://www.imdb.com/title/tt6932244/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_3" ## [4] "http://www.imdb.com/title/tt5580540/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_4" ## [5] "http://www.imdb.com/title/tt4635282/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_5" ## [6] "http://www.imdb.com/title/tt2583620/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_6" ## [7] "http://www.imdb.com/title/tt8682948/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_7" ## [8] "http://www.imdb.com/title/tt9561862/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_8" ## [9] "http://www.imdb.com/title/tt1312171/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_9" ## [10] "http://www.imdb.com/title/tt7569592/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_10" ## [11] "http://www.imdb.com/title/tt0413573/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_11" ## [12] "http://www.imdb.com/title/tt2303687/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_12" ## [13] "http://www.imdb.com/title/tt2741602/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_13" ## [14] "http://www.imdb.com/title/tt0460681/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_14" ## [15] "http://www.imdb.com/title/tt7879820/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_15" ## [16] "http://www.imdb.com/title/tt5420376/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_16" ## [17] "http://www.imdb.com/title/tt3749900/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_17" ## [18] "http://www.imdb.com/title/tt5555260/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_18" ## [19] "http://www.imdb.com/title/tt5171438/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_19" ## [20] "http://www.imdb.com/title/tt1898069/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_20" ## [21] "http://www.imdb.com/title/tt2467372/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_21" ## [22] "http://www.imdb.com/title/tt0898266/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_22" ## [23] "http://www.imdb.com/title/tt4574334/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_23" ## [24] "http://www.imdb.com/title/tt0386676/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_24" ## [25] "http://www.imdb.com/title/tt2306299/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_25" ## [26] "http://www.imdb.com/title/tt7908628/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_26" ## [27] "http://www.imdb.com/title/tt2442560/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_27" ## [28] "http://www.imdb.com/title/tt2193021/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_28" ## [29] "http://www.imdb.com/title/tt3107288/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_29" ## [30] "http://www.imdb.com/title/tt2085059/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_30" ## [31] "http://www.imdb.com/title/tt8295472/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_31" ## [32] "http://www.imdb.com/title/tt3865236/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_32" ## [33] "http://www.imdb.com/title/tt8416494/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_33" ## [34] "http://www.imdb.com/title/tt7016936/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_34" ## [35] "http://www.imdb.com/title/tt8398600/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_35" ## [36] "http://www.imdb.com/title/tt0108778/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_36" ## [37] "http://www.imdb.com/title/tt2356777/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_37" ## [38] "http://www.imdb.com/title/tt4270492/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_38" ## [39] "http://www.imdb.com/title/tt4052886/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_39" ## [40] "http://www.imdb.com/title/tt0203259/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_40" ## [41] "http://www.imdb.com/title/tt7043380/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_41" ## [42] "http://www.imdb.com/title/tt2661044/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_42" ## [43] "http://www.imdb.com/title/tt5691552/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_43" ## [44] "http://www.imdb.com/title/tt7371896/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_44" ## [45] "http://www.imdb.com/title/tt1586680/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_45" ## [46] "http://www.imdb.com/title/tt0364845/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_46" ## [47] "http://www.imdb.com/title/tt5348176/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_47" ## [48] "http://www.imdb.com/title/tt0452046/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_48" ## [49] "http://www.imdb.com/title/tt0903747/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_49" ## [50] "http://www.imdb.com/title/tt3743822/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_50" ## [51] "http://www.imdb.com/title/tt7767422/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_51" ## [52] "http://www.imdb.com/title/tt1632701/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_52" ## [53] "http://www.imdb.com/title/tt7587890/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_53" ## [54] "http://www.imdb.com/title/tt7599942/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_54" ## [55] "http://www.imdb.com/title/tt1442437/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_55" ## [56] "http://www.imdb.com/title/tt8103070/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_56" ## [57] "http://www.imdb.com/title/tt7235466/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_57" ## [58] "http://www.imdb.com/title/tt0367279/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_58" ## [59] "http://www.imdb.com/title/tt3526078/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_59" ## [60] "http://www.imdb.com/title/tt1759761/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_60" ## [61] "http://www.imdb.com/title/tt4016454/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_61" ## [62] "http://www.imdb.com/title/tt5834204/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_62" ## [63] "http://www.imdb.com/title/tt4236770/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_63" ## [64] "http://www.imdb.com/title/tt6468322/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_64" ## [65] "http://www.imdb.com/title/tt7491982/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_65" ## [66] "http://www.imdb.com/title/tt4532368/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_66" ## [67] "http://www.imdb.com/title/tt1378167/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_67" ## [68] "http://www.imdb.com/title/tt7772602/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_68" ## [69] "http://www.imdb.com/title/tt6859806/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_69" ## [70] "http://www.imdb.com/title/tt2364582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_70" ## [71] "http://www.imdb.com/title/tt3566726/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_71" ## [72] "http://www.imdb.com/title/tt0141842/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_72" ## [73] "http://www.imdb.com/title/tt7335184/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_73" ## [74] "http://www.imdb.com/title/tt4555364/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_74" ## [75] "http://www.imdb.com/title/tt6470478/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_75" ## [76] "http://www.imdb.com/title/tt7413448/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_76" ## [77] "http://www.imdb.com/title/tt4254242/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_77" ## [78] "http://www.imdb.com/title/tt0475784/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_78" ## [79] "http://www.imdb.com/title/tt1796960/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_79" ## [80] "http://www.imdb.com/title/tt6474378/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_80" ## [81] "http://www.imdb.com/title/tt0118401/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_81" ## [82] "http://www.imdb.com/title/tt1266020/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_82" ## [83] "http://www.imdb.com/title/tt1844624/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_83" ## [84] "http://www.imdb.com/title/tt7866098/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_84" ## [85] "http://www.imdb.com/title/tt5675620/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_85" ## [86] "http://www.imdb.com/title/tt8699270/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_86" ## [87] "http://www.imdb.com/title/tt3006802/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_87" ## [88] "http://www.imdb.com/title/tt0460649/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_88" ## [89] "http://www.imdb.com/title/tt0052520/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_89" ## [90] "http://www.imdb.com/title/tt7493974/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_90" ## [91] "http://www.imdb.com/title/tt0436992/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_91" ## [92] "http://www.imdb.com/title/tt1492179/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_92" ## [93] "http://www.imdb.com/title/tt0804503/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_93" ## [94] "http://www.imdb.com/title/tt3230854/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_94" ## [95] "http://www.imdb.com/title/tt5687612/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_95" ## [96] "http://www.imdb.com/title/tt9817298/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_96" ## [97] "http://www.imdb.com/title/tt4179452/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_97" ## [98] "http://www.imdb.com/title/tt3205802/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_98" ## [99] "http://www.imdb.com/title/tt1043813/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_99" ## [100] "http://www.imdb.com/title/tt6483832/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=332cb927-0342-42b3-815c-f9124e84021d&pf_rd_r=52RH1KM4861VKWH9RA0Z&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_100" ``` --- ### Go to each page, scrape show info Now we need a way to programatically direct R to each page on the `urls` list and run the `scrape_show_info` function on that page. .small[ ```r scrape_show_info(urls[1]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Game of Thrones 57min Action, Adventure, Drama, Fantasy, Romance ``` ```r scrape_show_info(urls[2]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Walking Dead 44min Drama, Horror, Thriller ``` ```r scrape_show_info(urls[3]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Hanna 1h "" ``` ] --- class: middle ## Oh no! ### 1. We're not scraping genre for every TV show! ### 2. We're repeating our code again! --- ## Update genre code .small[ ```r scrape_show_info <- function(x){ y <- read_html(x) title <- y %>% html_node("h1") %>% html_text() %>% str_trim() runtime <- y %>% html_node("time") %>% html_text() %>% str_replace("\\n", "") %>% str_trim() genres <- y %>% * html_nodes(".see-more.canwrap~ .canwrap a") %>% html_text() %>% str_trim() %>% paste(collapse = ", ") tibble(title = title, runtime = runtime, genres = genres) } ``` ] --- ### Try again: go to each page, scrape show info ```r scrape_show_info(urls[1]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Game of Thrones 57min Action, Adventure, Drama, Fantasy, Romance ``` ```r scrape_show_info(urls[2]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Walking Dead 44min Drama, Horror, Thriller ``` ```r scrape_show_info(urls[3]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Hanna 1h Action, Drama ``` -- **... but we're still repeating our code!** --- ## Automation - We need a way to programatically repeat the code - There are two ways to do this: - use a **`for`** loop - **`map`**ping with functional programming --- ## `for` loops - **`for`** loops are the simplest and most common type of loop in R - Iterate through the elements of a vector and evaluate the code block for each **Goal**: Scrape info from individual pages of TV shows using iteration with for loops. We'll use only 5 shows for now to keep things simple. --- ## `for` loop ### 1) Set up a tibble to store the results ```r n <- 5 top_n_shows <- tibble( title = rep(NA, n), runtime = rep(NA, n), genres = rep(NA, n), epsiodes = rep(NA, n), keywords = rep(NA, n) ) top_n_shows ``` ``` ## # A tibble: 5 x 5 ## title runtime genres epsiodes keywords ## <lgl> <lgl> <lgl> <lgl> <lgl> ## 1 NA NA NA NA NA ## 2 NA NA NA NA NA ## 3 NA NA NA NA NA ## 4 NA NA NA NA NA ## 5 NA NA NA NA NA ``` --- ## `for` loop ### 2) Iterate through urls to scrape data and save results ```r for(i in 1:n){ top_n_shows[i, ] = scrape_show_info(urls[i]) } top_n_shows ``` ``` ## # A tibble: 5 x 5 ## title runtime genres epsiodes keywords ## <chr> <chr> <chr> <chr> <chr> ## 1 Game of Thro… 57min Action, Adventure, Drama, F… Game of Thro… 57min ## 2 The Walking … 44min Drama, Horror, Thriller The Walking … 44min ## 3 Hanna 1h Action, Drama Hanna 1h ## 4 Santa Clarit… 30min Comedy, Horror Santa Clarit… 30min ## 5 The OA 1h Drama, Fantasy, Mystery, Sc… The OA 1h ``` --- ## `map`ping - **`map`** functions transform the input by applying a function to each element and returning an object the same length as the input - There are various map functions (e.g. `map_lgl()`, `map_chr()`, `map_dbl()`, `map_df()`) - each of which return a different type of object (logical, character, double, and data frame, respectively) -- - We will `map` the `scrape_show_info` function to each element of `urls` - This will go to each url at a time and get the info **Goal:** Scrape info from individual pages of TV shows using functional programming with mapping. We'll use only 5 shows for now to keep things simple. --- ### `map`: Go to each page, scrape show info ```r top_n_shows <- map_df(urls[1:n], scrape_show_info) top_n_shows ``` ``` ## # A tibble: 5 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Game of Thrones 57min Action, Adventure, Drama, Fantasy, Romance ## 2 The Walking Dead 44min Drama, Horror, Thriller ## 3 Hanna 1h Action, Drama ## 4 Santa Clarita Diet 30min Comedy, Horror ## 5 The OA 1h Drama, Fantasy, Mystery, Sci-Fi ``` --- ## Slow down the function - If you get `HTTP Error 429 (Too man requests)` you might want to slow down your hits. - You can add a `Sys.sleep()` call to slow down your function: ```r scrape_show_info <- function(x){ #suspend execution between 0 to 1 seconds * Sys.sleep(runif(1)) ... } ```