class: center, middle, inverse, title-slide # Functions and automation ### Dr. Çetinkaya-Rundel ### November 21, 2017 --- class: center, middle # Getting started --- ## Getting started - Final project assignment and the rest of the semester - Will catch up on any remaining grading over the break - Any questions from last time? --- class: center, middle # Functions --- ## ```r library(tidyverse) library(rvest) st <- read_html("http://www.imdb.com/title/tt4574334/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=1KS3NCRE9SZJ5NTFZAFD&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_1") twd <- read_html("http://www.imdb.com/title/tt1520211/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=1KS3NCRE9SZJ5NTFZAFD&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_2") got <- read_html("http://www.imdb.com/title/tt0944947/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=1KS3NCRE9SZJ5NTFZAFD&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_3") ``` --- ## Why functions? - Automate common tasks in a power powerful and general way than copy-and-pasting: - You can give a function an evocative name that makes your code easier to understand. - As requirements change, you only need to update code in one place, instead of many. - You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another). - Down the line: Improve your reach as a data scientist by writing functions (and packages!) that others use is to write functions. Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. Writing a function has three big advantages over using copy-and-paste: --- ## When should you write a function? Whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). <div class="question"> Do you see any problems in the code below? </div> .small[ ```r st_episode <- st %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() got_episode <- got %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() twd_episode <- got %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() ``` ] --- ## Inputs <div class="question"> How many inputs does the following code have? </div> ```r st_episode <- st %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() ``` --- ## Turn your code into a function 1. Pick a **name**: Short but informative 2. List inputs, or **arguments**, to the function inside `function`. If we had more the call would look like `function(x, y, z)`. 3. You place the **code** you have developed in body of the function, a `{` block that immediately follows `function(...)`. ```r scrape_episode <- function(x){ x %>% html_nodes(".np_right_arrow .bp_sub_heading") %>% html_text() %>% str_replace(" episodes", "") %>% as.numeric() } scrape_episode(st) ``` ``` ## [1] 20 ``` --- ## Check your function ![Number of episodes in The Walking Dead](img/22/episode_twd.png) ```r scrape_episode(twd) ``` ``` ## [1] 115 ``` ![Number of episodes in Game of Thrones](img/22/episode_twd.png) ```r scrape_episode(got) ``` ``` ## [1] 73 ``` --- ## Naming functions > "There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton - Names should be short but clearly evoke what the function does - Names should be verbs, not nouns - Multi-word names should be separated by underscores (`snake_case` as opposed to `camelCase`) - A family of functions should be named similarly (`scrape_title`, `scrape_episode`, `scrape_genre`, etc.) - Avoid overwriting existing (especially widely used) functions --- ## Scraping show info .small[ ```r scrape_show_info <- function(x){ title <- x %>% html_node("#title-overview-widget h1") %>% html_text() %>% str_trim() runtime <- x %>% html_node("time") %>% html_text() %>% str_replace("\\n", "") %>% str_trim() genres <- x %>% html_nodes(".txt-block~ .canwrap a") %>% html_text() %>% str_trim() %>% paste(collapse = ", ") show_info <- data_frame(title = title, runtime = runtime, genres = genres) return(show_info) } ``` ] --- .small[ ```r scrape_show_info(st) ``` ``` ## # A tibble: 1 x 3 ## title runtime ## <chr> <chr> ## 1 Stranger Things 51min ## # ... with 1 more variables: genres <chr> ``` ```r scrape_show_info(twd) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Walking Dead 44min Drama, Horror, Thriller ``` ```r scrape_show_info(got) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Game of Thrones 57min Adventure, Drama, Fantasy, Romance ``` ] --- ## Let's take things one step back ```r st_url <- "http://www.imdb.com/title/tt4574334/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=1KS3NCRE9SZJ5NTFZAFD&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_1" twd_url <- "http://www.imdb.com/title/tt1520211/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=1KS3NCRE9SZJ5NTFZAFD&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_2" got_url <- "http://www.imdb.com/title/tt0944947/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=1KS3NCRE9SZJ5NTFZAFD&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_3" ``` --- <div class="question"> How would you update the following function to use the URL of the page as an argument? </div> .small[ ```r scrape_show_info <- function(x){ title <- x %>% html_node("#title-overview-widget h1") %>% html_text() %>% str_trim() runtime <- x %>% html_node("time") %>% html_text() %>% str_replace("\\n", "") %>% str_trim() genres <- x %>% html_nodes(".txt-block~ .canwrap a") %>% html_text() %>% str_trim() %>% paste(collapse = ", ") show_info <- data_frame(title = title, runtime = runtime, genres = genres) return(show_info) } ``` ] --- .small[ ```r scrape_show_info <- function(x){ y <- read_html(x) title <- y %>% html_node("#title-overview-widget h1") %>% html_text() %>% str_trim() runtime <- y %>% html_node("time") %>% html_text() %>% str_replace("\\n", "") %>% str_trim() genres <- y %>% html_nodes(".txt-block~ .canwrap a") %>% html_text() %>% str_trim() %>% paste(collapse = ", ") show_info <- data_frame(title = title, runtime = runtime, genres = genres) return(show_info) } ``` ] --- ## Let's check .small[ ```r scrape_show_info(st_url) ``` ``` ## # A tibble: 1 x 3 ## title runtime ## <chr> <chr> ## 1 Stranger Things 51min ## # ... with 1 more variables: genres <chr> ``` ```r scrape_show_info(twd_url) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Walking Dead 44min Drama, Horror, Thriller ``` ```r scrape_show_info(got_url) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 Game of Thrones 57min Adventure, Drama, Fantasy, Romance ``` ] --- class: center, middle # Automation --- <div class="question"> You now have a function that will scrape the relevant info on shows given its URL. Where can we get a list of URLs of top 100 most popular TV shows on IMDB? Write the code for doing this in your teams. </div> --- ```r show_urls <- read_html("http://www.imdb.com/chart/tvmeter") %>% html_nodes(".titleColumn a") %>% html_attr("href") %>% paste("http://www.imdb.com", ., sep = "") # dot = outout of previous line ``` ``` ## [1] "http://www.imdb.com/title/tt5675620/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_1" ## [2] "http://www.imdb.com/title/tt4574334/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_2" ## [3] "http://www.imdb.com/title/tt1520211/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_3" ## [4] "http://www.imdb.com/title/tt0944947/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_4" ## [5] "http://www.imdb.com/title/tt1586680/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_5" ## [6] "http://www.imdb.com/title/tt5516154/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_6" ## [7] "http://www.imdb.com/title/tt3107288/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_7" ## [8] "http://www.imdb.com/title/tt2442560/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_8" ## [9] "http://www.imdb.com/title/tt2193021/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_9" ## [10] "http://www.imdb.com/title/tt0413573/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_10" ## [11] "http://www.imdb.com/title/tt1236246/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_11" ## [12] "http://www.imdb.com/title/tt5555260/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_12" ## [13] "http://www.imdb.com/title/tt5420376/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_13" ## [14] "http://www.imdb.com/title/tt4975856/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_14" ## [15] "http://www.imdb.com/title/tt6470478/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_15" ## [16] "http://www.imdb.com/title/tt1844624/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_16" ## [17] "http://www.imdb.com/title/tt5290382/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_17" ## [18] "http://www.imdb.com/title/tt5691552/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_18" ## [19] "http://www.imdb.com/title/tt0460681/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_19" ## [20] "http://www.imdb.com/title/tt2306299/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_20" ## [21] "http://www.imdb.com/title/tt5171438/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_21" ## [22] "http://www.imdb.com/title/tt4052886/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_22" ## [23] "http://www.imdb.com/title/tt3006802/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_23" ## [24] "http://www.imdb.com/title/tt1836037/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_24" ## [25] "http://www.imdb.com/title/tt4016454/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_25" ## [26] "http://www.imdb.com/title/tt4396630/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_26" ## [27] "http://www.imdb.com/title/tt3749900/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_27" ## [28] "http://www.imdb.com/title/tt4158110/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_28" ## [29] "http://www.imdb.com/title/tt2741602/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_29" ## [30] "http://www.imdb.com/title/tt6048596/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_30" ## [31] "http://www.imdb.com/title/tt4532368/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_31" ## [32] "http://www.imdb.com/title/tt1442437/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_32" ## [33] "http://www.imdb.com/title/tt1843230/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_33" ## [34] "http://www.imdb.com/title/tt0898266/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_34" ## [35] "http://www.imdb.com/title/tt2861424/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_35" ## [36] "http://www.imdb.com/title/tt0452046/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_36" ## [37] "http://www.imdb.com/title/tt0108778/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_37" ## [38] "http://www.imdb.com/title/tt1051220/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_38" ## [39] "http://www.imdb.com/title/tt0475784/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_39" ## [40] "http://www.imdb.com/title/tt3205802/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_40" ## [41] "http://www.imdb.com/title/tt2085059/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_41" ## [42] "http://www.imdb.com/title/tt2467372/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_42" ## [43] "http://www.imdb.com/title/tt4474344/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_43" ## [44] "http://www.imdb.com/title/tt0364845/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_44" ## [45] "http://www.imdb.com/title/tt3322312/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_45" ## [46] "http://www.imdb.com/title/tt1632701/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_46" ## [47] "http://www.imdb.com/title/tt1034007/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_47" ## [48] "http://www.imdb.com/title/tt6111130/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_48" ## [49] "http://www.imdb.com/title/tt2707408/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_49" ## [50] "http://www.imdb.com/title/tt5296406/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_50" ## [51] "http://www.imdb.com/title/tt0386676/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_51" ## [52] "http://www.imdb.com/title/tt0903747/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_52" ## [53] "http://www.imdb.com/title/tt6473344/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_53" ## [54] "http://www.imdb.com/title/tt6524350/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_54" ## [55] "http://www.imdb.com/title/tt0203259/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_55" ## [56] "http://www.imdb.com/title/tt2661044/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_56" ## [57] "http://www.imdb.com/title/tt2364582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_57" ## [58] "http://www.imdb.com/title/tt0264235/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_58" ## [59] "http://www.imdb.com/title/tt2249007/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_59" ## [60] "http://www.imdb.com/title/tt5834204/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_60" ## [61] "http://www.imdb.com/title/tt1600194/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_61" ## [62] "http://www.imdb.com/title/tt1837576/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_62" ## [63] "http://www.imdb.com/title/tt3514324/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_63" ## [64] "http://www.imdb.com/title/tt6461736/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_64" ## [65] "http://www.imdb.com/title/tt0460649/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_65" ## [66] "http://www.imdb.com/title/tt2372162/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_66" ## [67] "http://www.imdb.com/title/tt1405406/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_67" ## [68] "http://www.imdb.com/title/tt5071412/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_68" ## [69] "http://www.imdb.com/title/tt1475582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_69" ## [70] "http://www.imdb.com/title/tt4154858/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_70" ## [71] "http://www.imdb.com/title/tt1378167/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_71" ## [72] "http://www.imdb.com/title/tt2805096/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_72" ## [73] "http://www.imdb.com/title/tt0472954/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_73" ## [74] "http://www.imdb.com/title/tt1837492/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_74" ## [75] "http://www.imdb.com/title/tt4955642/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_75" ## [76] "http://www.imdb.com/title/tt1796960/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_76" ## [77] "http://www.imdb.com/title/tt2660806/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_77" ## [78] "http://www.imdb.com/title/tt0436992/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_78" ## [79] "http://www.imdb.com/title/tt1595859/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_79" ## [80] "http://www.imdb.com/title/tt2802850/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_80" ## [81] "http://www.imdb.com/title/tt4998350/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_81" ## [82] "http://www.imdb.com/title/tt4686698/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_82" ## [83] "http://www.imdb.com/title/tt0121955/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_83" ## [84] "http://www.imdb.com/title/tt6274614/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_84" ## [85] "http://www.imdb.com/title/tt3566726/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_85" ## [86] "http://www.imdb.com/title/tt6226232/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_86" ## [87] "http://www.imdb.com/title/tt1210820/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_87" ## [88] "http://www.imdb.com/title/tt1124373/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_88" ## [89] "http://www.imdb.com/title/tt0182576/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_89" ## [90] "http://www.imdb.com/title/tt1492179/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_90" ## [91] "http://www.imdb.com/title/tt2356777/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_91" ## [92] "http://www.imdb.com/title/tt4786824/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_92" ## [93] "http://www.imdb.com/title/tt1826940/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_93" ## [94] "http://www.imdb.com/title/tt0157246/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_94" ## [95] "http://www.imdb.com/title/tt1856010/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_95" ## [96] "http://www.imdb.com/title/tt0065333/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_96" ## [97] "http://www.imdb.com/title/tt1561755/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_97" ## [98] "http://www.imdb.com/title/tt0165598/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_98" ## [99] "http://www.imdb.com/title/tt3846642/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_99" ## [100] "http://www.imdb.com/title/tt5368542/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084122&pf_rd_r=0PASEFYRMDQNPC0PND20&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=tvmeter&ref_=chttvm_tt_100" ``` --- ## Go to each page, scrape show info Now we need a way of programatically directing R to each page on the `show_urls` list and running the `scrape_show_info` function on that page. .small[ ```r scrape_show_info(show_urls[1]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Punisher 44min ``` ```r scrape_show_info(show_urls[2]) ``` ``` ## # A tibble: 1 x 3 ## title runtime ## <chr> <chr> ## 1 Stranger Things 51min ## # ... with 1 more variables: genres <chr> ``` ```r scrape_show_info(show_urls[3]) ``` ``` ## # A tibble: 1 x 3 ## title runtime genres ## <chr> <chr> <chr> ## 1 The Walking Dead 44min Drama, Horror, Thriller ``` ] --- ## Go to each page, scrape show info In other words, we want to **map** the `scrape_show_info` function to each element of `show_urls`: ```r top_100_shows <- map_df(show_urls, scrape_show_info) ``` ---