class: center, middle, inverse, title-slide # Functionals ## Statistical Computing & Programming ### Shawn Santo --- ## Supplementary materials Full video lecture available in Zoom Cloud Recordings Additional resources - [Sections 9.1 - 9.4](https://adv-r.hadley.nz/functionals.html), Advanced R - `purrr` [tutorial](https://jennybc.github.io/purrr-tutorial/index.html) - `purrr` [cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf) --- ## Annoucements - Homework 02 due today at 11:59pm ET - Homework 03 out later this afternoon (same teams) - Focus: `dplyr`, `purrr`, and branching - Team evaluation to go out during Homework 03 --- ## What is a functional? A functional is a function that takes a function as an input and returns a vector as output. Example 1: ```r random <- function(fcn, ...) fcn(n = 5, ...) ``` -- Example 2: ```r fixed_point <- function(f, x0, tol = .0001, ...) { y <- f(x0, ...) x_new <- x0 while (abs(y - x_new) > tol) { x_new <- y y <- f(x_new, ...) } return(x_new) } ``` --- ```r set.seed(23545) random(rnorm) ``` ``` #> [1] -0.4142173 0.9755688 1.4615454 1.2557218 -0.3839461 ``` ```r random(rbinom, size = 40, prob = 0.5) ``` ``` #> [1] 24 24 23 22 17 ``` ```r random(rexp) ``` ``` #> [1] 0.4426210 2.6295953 0.5252317 0.6627861 0.3856905 ``` -- ```r fixed_point(cos, 1) ``` ``` #> [1] 0.7391302 ``` ```r fixed_point(sin, 0) ``` ``` #> [1] 0 ``` ```r fixed_point(f = sqrt, x0 = .01, tol = .000000001) ``` ``` #> [1] 1 ``` --- ## Functional programming A functional is one property of first-class functions and part of what makes a language a functional programming language. <center> <img src="images/functional_programming.png"> </center> --- ## Why use functionals? Bjarne Stroustrup provides a concise response to this question. >To become significantly more reliable, code must become more transparent. In particular, nested conditions and loops must be viewed with great suspicion. Complicated control flows confuse programmers. Messy code often hides bugs. <br/><br/> We'll be focusing on functionals that aide in automation. With enough care and effort, functionals soon will serve as a good alternative to your `for` loops. --- class: inverse, center, middle # Apply functions --- ## `[a-z]pply()` functions The apply functions are a collection of tools for functional programming in R, they are variations of the `map` function found in many other languages. - `lapply()` - `sapply()` - `apply()` - `vapply()` - `mapply()` - `rapply()` - `eapply()` <i> In many of the examples that follow, a functional is not required to accomplish the task. R's vectorization capabilities can do the job. The examples are for demonstration purposes only. </i> --- ## `lapply()` Usage: `lapply(X, FUN, ...)` `lapply()` **returns a list** of the same length as `X`, each element of which is the result of applying `FUN` to the corresponding element of `X`. <br/> .pull-left[ ```r lapply(1:8, sqrt) %>% str() ``` ``` #> List of 8 #> $ : num 1 #> $ : num 1.41 #> $ : num 1.73 #> $ : num 2 #> $ : num 2.24 #> $ : num 2.45 #> $ : num 2.65 #> $ : num 2.83 ``` ] .pull-right[ ```r lapply(1:8, function(x) (x+1)^2) %>% str() ``` ``` #> List of 8 #> $ : num 4 #> $ : num 9 #> $ : num 16 #> $ : num 25 #> $ : num 36 #> $ : num 49 #> $ : num 64 #> $ : num 81 ``` ] --- Another perspective: .pull-left[ ```r lapply(1:8, sqrt) %>% str() ``` ``` #> List of 8 #> $ : num 1 #> $ : num 1.41 #> $ : num 1.73 #> $ : num 2 #> $ : num 2.24 #> $ : num 2.45 #> $ : num 2.65 #> $ : num 2.83 ``` ] .pull-right[ ```r list( sqrt(1), sqrt(2), sqrt(3), sqrt(4), sqrt(5), sqrt(6), sqrt(7), sqrt(8) ) %>% str() ``` ``` #> List of 8 #> $ : num 1 #> $ : num 1.41 #> $ : num 1.73 #> $ : num 2 #> $ : num 2.24 #> $ : num 2.45 #> $ : num 2.65 #> $ : num 2.83 ``` ] --- ```r lapply(1:8, function(x, pow) x ^ pow, 3) %>% str() ``` ``` #> List of 8 #> $ : num 1 #> $ : num 8 #> $ : num 27 #> $ : num 64 #> $ : num 125 #> $ : num 216 #> $ : num 343 #> $ : num 512 ``` -- ```r pow <- function(x, pow) x ^ pow lapply(1:8, pow, x = 2) %>% str() ``` ``` #> List of 8 #> $ : num 2 #> $ : num 4 #> $ : num 8 #> $ : num 16 #> $ : num 32 #> $ : num 64 #> $ : num 128 #> $ : num 256 ``` --- ## `sapply()` Usage: `sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)` `sapply()` is a *user-friendly* version and wrapper of `lapply`, it is a *simplifying* version of lapply. Whenever possible it will return a vector, matrix, or an array. <br/> ```r sapply(1:8, sqrt) %>% round(2) ``` ``` #> [1] 1.00 1.41 1.73 2.00 2.24 2.45 2.65 2.83 ``` ```r sapply(1:8, function(x) (x + 1)^2) ``` ``` #> [1] 4 9 16 25 36 49 64 81 ``` --- ```r sapply(1:8, function(x) c(x, x^2, x^3, x^4)) ``` ``` #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #> [1,] 1 2 3 4 5 6 7 8 #> [2,] 1 4 9 16 25 36 49 64 #> [3,] 1 8 27 64 125 216 343 512 #> [4,] 1 16 81 256 625 1296 2401 4096 ``` ```r sapply(1:8, function(x) list(x, x^2, x^3, x^4)) ``` ``` #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #> [1,] 1 2 3 4 5 6 7 8 #> [2,] 1 4 9 16 25 36 49 64 #> [3,] 1 8 27 64 125 216 343 512 #> [4,] 1 16 81 256 625 1296 2401 4096 ``` --- ```r sapply(2:6, seq) ``` ``` #> [[1]] #> [1] 1 2 #> #> [[2]] #> [1] 1 2 3 #> #> [[3]] #> [1] 1 2 3 4 #> #> [[4]] #> [1] 1 2 3 4 5 #> #> [[5]] #> [1] 1 2 3 4 5 6 ``` **Why do we have a list?** <br/> -- ```r sapply(2:6, seq, from = 1, length.out = 4) ``` ``` #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.000000 1.000000 1 1.000000 1.000000 #> [2,] 1.333333 1.666667 2 2.333333 2.666667 #> [3,] 1.666667 2.333333 3 3.666667 4.333333 #> [4,] 2.000000 3.000000 4 5.000000 6.000000 ``` --- ## `[ls]apply()` and data frames We can use these functions with data frames, the key is to remember that a data frame is just a fancy list. ```r df <- data.frame(a = 1:6, b = letters[1:6], c = c(TRUE,FALSE)) lapply(df, class) %>% str() ``` ``` #> List of 3 #> $ a: chr "integer" #> $ b: chr "character" #> $ c: chr "logical" ``` ```r sapply(df, class) ``` ``` #> a b c #> "integer" "character" "logical" ``` --- ## More in the family - `apply(X, MARGIN, FUN, ...)` - applies a function over the rows or columns of a data frame, matrix, or array - `vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)` - is similar to `sapply()`, but has a enforced return type and size - `mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)` - like `sapply()` but will iterate over multiple vectors at the same time. - `rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)` - a recursive version of `lapply()`, behavior depends largely on the `how` argument - `eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)` - apply a function over an environment. --- ## Exercise Using `sw_people` in package `repurrrsive`, extract the name of all characters using: - a for loop, - an apply function. .tiny[ ```r library(repurrrsive) str(sw_people[[1]]) ``` ``` #> List of 16 #> $ name : chr "Luke Skywalker" #> $ height : chr "172" #> $ mass : chr "77" #> $ hair_color: chr "blond" #> $ skin_color: chr "fair" #> $ eye_color : chr "blue" #> $ birth_year: chr "19BBY" #> $ gender : chr "male" #> $ homeworld : chr "http://swapi.co/api/planets/1/" #> $ films : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ... #> $ species : chr "http://swapi.co/api/species/1/" #> $ vehicles : chr [1:2] "http://swapi.co/api/vehicles/14/" "http://swapi.co/api/vehicles/30/" #> $ starships : chr [1:2] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/22/" #> $ created : chr "2014-12-09T13:50:51.644000Z" #> $ edited : chr "2014-12-20T21:17:56.891000Z" #> $ url : chr "http://swapi.co/api/people/1/" ``` ] *Hint:* The `[` and `[[` are functions. ??? ## Solutions ```r out <- character(length = length(sw_people)) for (i in seq_along(sw_people)) { out[i] <- sw_people[[i]]$name } ``` ```r s_out <- sapply(sw_people, `[[`, "name") ``` --- class: inverse, center, middle # Package `purrr` --- ## Why `purrr`? - Member of the `tidyverse` package - Improves the functional programming tools in R - The `map()` family of functions can be used to replace loops and `[a-z]pply()` - The first argument is always the data, so purrr works naturally with the pipe. - All purrr functions are type-stable. They always return the advertised output type, or they throw an error. - All `map()` functions either accept function, formulas (used for succinctly generating anonymous functions), a character vector (used to extract components by name), or a numeric vector (used to extract by position). <br/> Load `tidyverse`. ```r library(tidyverse) ``` --- ## Map functions Basic functions for looping over an object and returning a value (of a specific type). | Map variant | Description | |--------------------------|----------------------------------------| | `map()` | returns a list | | `map_lgl()` | returns a logical vector | | `map_int()` | returns a integer vector | | `map_dbl()` | returns a double vector | | `map_chr()` | returns a character vector | | `map_df()` / `map_dfr()` | returns a data frame by row binding | | `map_dfc()` | returns a data frame by column binding | <br/> All have leading arguments `.x` and `.f`, where `.x` is a list or atomic vector, and `.f` is a function, formula, or vector. --- ## `map_*()` is strict ```r x <- list(1L:5L, c(-2, .2, -20), c(pi, sqrt(2), 7)) ``` ```r map_dbl(x, mean) ``` ``` #> [1] 3.000000 -7.266667 3.851935 ``` ```r map_chr(x, mean) ``` ``` #> [1] "3.000000" "-7.266667" "3.851935" ``` ```r map_lgl(x, mean) ``` ``` #> Error: Can't coerce element 1 from a double to a logical ``` ```r map_int(x, mean) ``` ``` #> Error: Can't coerce element 1 from a double to a integer ``` --- ```r x <- list(1L:5L, c(-2, .2, -20), c(pi, sqrt(2), 7)) ``` -- ```r map_dbl(x, `[`, 1) ``` ``` #> [1] 1.000000 -2.000000 3.141593 ``` ```r map_chr(x, `[`, 3) ``` ``` #> [1] "3" "-20.000000" "7.000000" ``` ```r map_lgl(x, `[`, 1) ``` ``` #> Error: Can't coerce element 1 from a integer to a logical ``` ```r map_int(x, `[`, 1) ``` ``` #> Error: Can't coerce element 2 from a double to a integer ``` --- ## Flexibility in `.f` Argument `.f` in `map()` and `map_*()` can take a - function name - formula (one sided) / anonymous function, or a - vector. - character vector - numeric vector - list If it is a formula, it is converted to a function. Arguments can be referenced in the following ways. 1. For a single argument function, use `.` 2. For a two argument function, use `.x` and `.y` 3. For more arguments, use `..1`, `..2`, `..3` etc. --- ## Examples .pull-left[ Using `purrr` ```r map_dbl(1:5, ~ . ^ .) ``` ``` #> [1] 1 4 27 256 3125 ``` ```r map_dbl(1:5, ~ .x ^ .x) ``` ``` #> [1] 1 4 27 256 3125 ``` ```r map2_dbl(1:5, -1:-5, ~ .y ^ .x) ``` ``` #> [1] -1 4 -27 256 -3125 ``` ```r pmap_dbl(data.frame(1:5, 1:5, 1:5), ~ ..1 + ..2 + ..3) ``` ``` #> [1] 3 6 9 12 15 ``` ] .pull-right[ Using Base R ```r sapply(1:5, function(x) x ^ x) ``` ``` #> [1] 1 4 27 256 3125 ``` ```r sapply(1:5, function(x) x ^ x) ``` ``` #> [1] 1 4 27 256 3125 ``` ```r mapply(function(x, y) y ^ x, x = 1:5, y = 1:5) ``` ``` #> [1] 1 4 27 256 3125 ``` ```r mapply(function(x, y, z) x + y + z, x= 1:5, y = 1:5, z = 1:5) ``` ``` #> [1] 3 6 9 12 15 ``` ] --- ## More examples Consider `gh_users` from package `repurrrsive`. ```r library(repurrrsive) str(gh_users, max.level = 1) ``` ``` #> List of 6 #> $ :List of 30 #> $ :List of 30 #> $ :List of 30 #> $ :List of 30 #> $ :List of 30 #> $ :List of 30 ``` --- .tiny[ ```r str(gh_users[[1]], max.level = 1) ``` ``` #> List of 30 #> $ login : chr "gaborcsardi" #> $ id : int 660288 #> $ avatar_url : chr "https://avatars.githubusercontent.com/u/660288?v=3" #> $ gravatar_id : chr "" #> $ url : chr "https://api.github.com/users/gaborcsardi" #> $ html_url : chr "https://github.com/gaborcsardi" #> $ followers_url : chr "https://api.github.com/users/gaborcsardi/followers" #> $ following_url : chr "https://api.github.com/users/gaborcsardi/following{/other_user}" #> $ gists_url : chr "https://api.github.com/users/gaborcsardi/gists{/gist_id}" #> $ starred_url : chr "https://api.github.com/users/gaborcsardi/starred{/owner}{/repo}" #> $ subscriptions_url : chr "https://api.github.com/users/gaborcsardi/subscriptions" #> $ organizations_url : chr "https://api.github.com/users/gaborcsardi/orgs" #> $ repos_url : chr "https://api.github.com/users/gaborcsardi/repos" #> $ events_url : chr "https://api.github.com/users/gaborcsardi/events{/privacy}" #> $ received_events_url: chr "https://api.github.com/users/gaborcsardi/received_events" #> $ type : chr "User" #> $ site_admin : logi FALSE #> $ name : chr "Gábor Csárdi" #> $ company : chr "Mango Solutions, @MangoTheCat " #> $ blog : chr "http://gaborcsardi.org" #> $ location : chr "Chippenham, UK" #> $ email : chr "csardi.gabor@gmail.com" #> $ hireable : NULL #> $ bio : NULL #> $ public_repos : int 52 #> $ public_gists : int 6 #> $ followers : int 303 #> $ following : int 22 #> $ created_at : chr "2011-03-09T17:29:25Z" #> $ updated_at : chr "2016-10-11T11:05:06Z" ``` ] --- What's happening here? ```r map_chr(gh_users, "login") ``` ``` #> [1] "gaborcsardi" "jennybc" "jtleek" "juliasilge" "leeper" #> [6] "masalmon" ``` ```r map_chr(gh_users, 1) ``` ``` #> [1] "gaborcsardi" "jennybc" "jtleek" "juliasilge" "leeper" #> [6] "masalmon" ``` ```r map_chr(gh_users, 2) ``` ``` #> [1] "660288" "599454" "1571674" "12505835" "3505428" "8360597" ``` -- <br/> What if we want the `login` and `id`? Can we pass in `c(1, 2)`? -- ```r map_chr(gh_users, c(1, 2)) ``` ``` #> Error: Result 1 must be a single string, not NULL of length 0 ``` --- ```r map(gh_users, `[`, c(1, 2)) %>% str() ``` ``` #> List of 6 #> $ :List of 2 #> ..$ login: chr "gaborcsardi" #> ..$ id : int 660288 #> $ :List of 2 #> ..$ login: chr "jennybc" #> ..$ id : int 599454 #> $ :List of 2 #> ..$ login: chr "jtleek" #> ..$ id : int 1571674 #> $ :List of 2 #> ..$ login: chr "juliasilge" #> ..$ id : int 12505835 #> $ :List of 2 #> ..$ login: chr "leeper" #> ..$ id : int 3505428 #> $ :List of 2 #> ..$ login: chr "masalmon" #> ..$ id : int 8360597 ``` ```r map(gh_users, `[[`, c(1, 2)) ``` ``` #> Error in .x[[...]]: subscript out of bounds ``` --- ```r map_dbl(gh_users, list(28, 1)) ``` ``` #> [1] 22 34 6 10 230 38 ``` ```r map_dbl(gh_users, list("following", 1)) ``` ``` #> [1] 22 34 6 10 230 38 ``` -- <br/> To make the above more clear: ```r my_list <- list( list(x = 1:10, y = 6, z = c(9, 0)), list(x = 1:10, y = 6, z = c(-3, 2)) ) map_chr(my_list, list("z", 2)) ``` ``` #> [1] "0.000000" "2.000000" ``` ```r map_chr(my_list, list(3, 1)) ``` ``` #> [1] "9.000000" "-3.000000" ``` --- ```r map_df(gh_users, `[`, c(1, 2)) ``` ``` #> # A tibble: 6 x 2 #> login id #> <chr> <int> #> 1 gaborcsardi 660288 #> 2 jennybc 599454 #> 3 jtleek 1571674 #> 4 juliasilge 12505835 #> 5 leeper 3505428 #> 6 masalmon 8360597 ``` ```r map_df(gh_users, `[`, c("name", "type", "location")) ``` ``` #> # A tibble: 6 x 3 #> name type location #> <chr> <chr> <chr> #> 1 Gábor Csárdi User Chippenham, UK #> 2 Jennifer (Jenny) Bryan User Vancouver, BC, Canada #> 3 Jeff L. User Baltimore,MD #> 4 Julia Silge User Salt Lake City, UT #> 5 Thomas J. Leeper User London, United Kingdom #> 6 Maëlle Salmon User Barcelona, Spain ``` --- ## More `map()` variants - `walk()` - returns nothing, call function exclusively for its side effects ```r datasets <- list(mtcars, faithful, longley, cars) file_names <- c("mtcars.csv", "faithful.csv", "longley.csv", "cars.csv") walk2(datasets, file_names, ~ write_csv(x = .x, path = .y)) ``` - `modify()` - returns the same type as the input object, useful for data frames ```r df <- data_frame(x = 1:3, y = -1:-3) modify(df, ~ .x ^ 3) ``` ``` #> # A tibble: 3 x 2 #> x y #> <dbl> <dbl> #> 1 1 -1 #> 2 8 -8 #> 3 27 -27 ``` - `map2()` and `pmap()` to vary two and n inputs, respectively - `imap()` iterate over indices and values --- ## Exercises Use `mtcars` and a single map or map variant to - get the type of each variable, - get the fourth row such that result is a character vector, - compute the mean of each variable, and - compute the mean and median for each variable such that the result is a data frame with the mean values in row 1 and the median values in row 2. <br/> Use a map function and your `mh_distance()` function from Homework 01 to iterate over both vectors `s` and `w` below. ```r s <- c(26, 50123, 456.12, 8, 0) w <- c(22, 50000, 451.00, 88, 0) ``` ??? ## Solutions ```r map_chr(mtcars, typeof) ``` ``` #> mpg cyl disp hp drat wt qsec vs #> "double" "double" "double" "double" "double" "double" "double" "double" #> am gear carb #> "double" "double" "double" ``` ```r map_chr(mtcars, 4) ``` ``` #> mpg cyl disp hp drat wt #> "21.400000" "6.000000" "258.000000" "110.000000" "3.080000" "3.215000" #> qsec vs am gear carb #> "19.440000" "1.000000" "0.000000" "3.000000" "1.000000" ``` ```r map_dbl(mtcars, mean) ``` ``` #> mpg cyl disp hp drat wt qsec #> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 #> vs am gear carb #> 0.437500 0.406250 3.687500 2.812500 ``` ```r map_df(mtcars, ~ c(mean(.), median(.))) ``` ``` #> # A tibble: 2 x 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81 #> 2 19.2 6 196. 123 3.70 3.32 17.7 0 0 4 2 ``` --- ## References 1. Grolemund, G., & Wickham, H. (2021). R for Data Science. https://r4ds.had.co.nz/ 2. Wickham, H. (2021). Advanced R. https://adv-r.hadley.nz/