Functional Programming

First order functions
Pure functions
Anonymous functions
Vectorized functions
Closures
Recursion

Apply functions

The apply functions are a collection of tools for functional programming in R, they are variations of the map function

??apply
## 
## Help files with alias or concept or title matching ‘apply’ using fuzzy
## matching:
## 
## base::apply             Apply Functions Over Array Margins
## base::.subset           Internal Objects in Package 'base'
## base::by                Apply a Function to a Data Frame Split by Factors
## base::eapply            Apply a Function Over Values in an Environment
## base::lapply            Apply a Function over a List or Vector
## base::mapply            Apply a Function to Multiple List or Vector Arguments
## base::rapply            Recursively Apply a Function to a List
## base::tapply            Apply a Function Over a Ragged Array

lapply

Usage: lapply(X, FUN, ...)

lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.

lapply(1:8, sqrt) %>% str()

## List of 8
##  $ : num 1
##  $ : num 1.41
##  $ : num 1.73
##  $ : num 2
##  $ : num 2.24
##  $ : num 2.45
##  $ : num 2.65
##  $ : num 2.83

lapply(1:8, function(x) (x+1)^2) %>% str()

## List of 8
##  $ : num 4
##  $ : num 9
##  $ : num 16
##  $ : num 25
##  $ : num 36
##  $ : num 49
##  $ : num 64
##  $ : num 81

lapply(1:8, function(x, pow) x^pow, pow=3) %>% str()

## List of 8
##  $ : num 1
##  $ : num 8
##  $ : num 27
##  $ : num 64
##  $ : num 125
##  $ : num 216
##  $ : num 343
##  $ : num 512

lapply(1:8, function(x, pow) x^pow, x=2) %>% str()

## List of 8
##  $ : num 2
##  $ : num 4
##  $ : num 8
##  $ : num 16
##  $ : num 32
##  $ : num 64
##  $ : num 128
##  $ : num 256

d = list(n = rnorm(100), e = rexp(100), ln = rlnorm(100))
lapply(d, quantile) %>% str()

## List of 3
##  $ n : Named num [1:5] -2.831 -0.76 -0.179 0.611 2.394
##   ..- attr(*, "names")= chr [1:5] "0%" "25%" "50%" "75%" ...
##  $ e : Named num [1:5] 0.0103 0.2121 0.5628 1.2516 4.6757
##   ..- attr(*, "names")= chr [1:5] "0%" "25%" "50%" "75%" ...
##  $ ln: Named num [1:5] 0.0745 0.4207 0.7146 1.3067 12.9503
##   ..- attr(*, "names")= chr [1:5] "0%" "25%" "50%" "75%" ...

sapply

Usage: sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

sapply is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, an array if appropriate.

sapply(1:8, sqrt)

## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427

sapply(1:8, function(x) (x+1)^2)

## [1]  4  9 16 25 36 49 64 81

sapply(1:8, function(x) c(x, x^2, x^3, x^4))

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    2    3    4    5    6    7    8
## [2,]    1    4    9   16   25   36   49   64
## [3,]    1    8   27   64  125  216  343  512
## [4,]    1   16   81  256  625 1296 2401 4096

sapply(1:8, function(x) list(x, x^2, x^3, x^4))

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1    2    3    4    5    6    7    8   
## [2,] 1    4    9    16   25   36   49   64  
## [3,] 1    8    27   64   125  216  343  512 
## [4,] 1    16   81   256  625  1296 2401 4096

d = list(norm = rnorm(100), exp = rexp(100), log_norm = rlnorm(100))
sapply(d, quantile)

##             norm         exp   log_norm
## 0%   -3.62023428 0.005119925  0.1554211
## 25%  -0.56898037 0.295745680  0.5194694
## 50%   0.03760506 0.767683150  1.0565506
## 75%   0.58875274 1.141757924  2.1052268
## 100%  2.17344045 5.754529196 16.9475848

sapply(2:6, seq)

## [[1]]
## [1] 1 2
## 
## [[2]]
## [1] 1 2 3
## 
## [[3]]
## [1] 1 2 3 4
## 
## [[4]]
## [1] 1 2 3 4 5
## 
## [[5]]
## [1] 1 2 3 4 5 6

vapply

Usage: vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)

vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.

d = list(1:3, 1:7, c(1,1,2,3,4))

sapply(d, function(x) x[x==2])

## [1] 2 2 2

sapply(d, function(x) x[x==1]) %>% str()

## List of 3
##  $ : int 1
##  $ : int 1
##  $ : num [1:2] 1 1

vapply(d, function(x) x[x==2], 1)

## [1] 2 2 2

vapply(d, function(x) x[x==1], 1)

## Error in vapply(d, function(x) x[x == 1], 1): values must be length 1,
##  but FUN(X[[3]]) result is length 2

vapply(1:3, function(x) c(x,letters[x]), c(1,1))

## Error in vapply(1:3, function(x) c(x, letters[x]), c(1, 1)): values must be type 'double',
##  but FUN(X[[1]]) result is type 'character'

vapply(1:3, function(x) c(x,letters[x]), c("",""))

##      [,1] [,2] [,3]
## [1,] "1"  "2"  "3" 
## [2,] "a"  "b"  "c"

[ls]apply and data frames

We can easily use these functions with data frames, the key is to remember that a data frame is just a fancy list with atomic vector columns of the same length.

df = data.frame(a = 1:6, b = letters[1:6], c = c(TRUE,FALSE))
lapply(df, class) %>% str()

## List of 3
##  $ a: chr "integer"
##  $ b: chr "factor"
##  $ c: chr "logical"

sapply(df, class)

##         a         b         c 
## "integer"  "factor" "logical"

lapply and do.call

By default (usually) the results of each function call within an sapply are placed into the columns of the results matrix. If we’d rather have the results form the rows of our results, if for example we were constructing a data frame, a useful approach is the combination of lapply and do.call.

l = lapply(1:8, function(x) list(LETTERS[x], x, x^2, x^3, x^4))
str(l)

## List of 8
##  $ :List of 5
##   ..$ : chr "A"
##   ..$ : int 1
##   ..$ : num 1
##   ..$ : num 1
##   ..$ : num 1
##  $ :List of 5
##   ..$ : chr "B"
##   ..$ : int 2
##   ..$ : num 4
##   ..$ : num 8
##   ..$ : num 16
##  $ :List of 5
##   ..$ : chr "C"
##   ..$ : int 3
##   ..$ : num 9
##   ..$ : num 27
##   ..$ : num 81
##  $ :List of 5
##   ..$ : chr "D"
##   ..$ : int 4
##   ..$ : num 16
##   ..$ : num 64
##   ..$ : num 256
##  $ :List of 5
##   ..$ : chr "E"
##   ..$ : int 5
##   ..$ : num 25
##   ..$ : num 125
##   ..$ : num 625
##  $ :List of 5
##   ..$ : chr "F"
##   ..$ : int 6
##   ..$ : num 36
##   ..$ : num 216
##   ..$ : num 1296
##  $ :List of 5
##   ..$ : chr "G"
##   ..$ : int 7
##   ..$ : num 49
##   ..$ : num 343
##   ..$ : num 2401
##  $ :List of 5
##   ..$ : chr "H"
##   ..$ : int 8
##   ..$ : num 64
##   ..$ : num 512
##   ..$ : num 4096

do.call(rbind, l)

##      [,1] [,2] [,3] [,4] [,5]
## [1,] "A"  1    1    1    1   
## [2,] "B"  2    4    8    16  
## [3,] "C"  3    9    27   81  
## [4,] "D"  4    16   64   256 
## [5,] "E"  5    25   125  625 
## [6,] "F"  6    36   216  1296
## [7,] "G"  7    49   343  2401
## [8,] "H"  8    64   512  4096

do.call(rbind, l) is the equivalent of passing all the elements of l as arguments to rbind, e.g.

rbind(l[[1]], l[[2]], l[[3]], l[[4]],
      l[[5]], l[[6]], l[[7]], l[[8]])

##      [,1] [,2] [,3] [,4] [,5]
## [1,] "A"  1    1    1    1   
## [2,] "B"  2    4    8    16  
## [3,] "C"  3    9    27   81  
## [4,] "D"  4    16   64   256 
## [5,] "E"  5    25   125  625 
## [6,] "F"  6    36   216  1296
## [7,] "G"  7    49   343  2401
## [8,] "H"  8    64   512  4096

l2 = lapply(1:8, function(x) data.frame(x, x^2, x^3, x^4))
do.call(rbind, l2)

##   x x.2 x.3  x.4
## 1 1   1   1    1
## 2 2   4   8   16
## 3 3   9  27   81
## 4 4  16  64  256
## 5 5  25 125  625
## 6 6  36 216 1296
## 7 7  49 343 2401
## 8 8  64 512 4096

apply

Usage: apply(X, MARGIN, FUN, ...)

Apply a function to margins of an array, matrix, or data frame.

(m = matrix(1:12, nrow=4, ncol=3))

##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

apply(m, 1, mean)

## [1] 5 6 7 8

apply(m, 2, mean)

## [1]  2.5  6.5 10.5

apply(m, 1:2, mean)

##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

(df = data.frame(a=1:3, b=4:6, c=7:9))

##   a b c
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9

apply(df, 1, mean)

## [1] 4 5 6

apply(df, 1, mean) %>% str()

##  num [1:3] 4 5 6

apply(df, 2, mean)

## a b c 
## 2 5 8

apply(df, 2, mean) %>% str()

##  Named num [1:3] 2 5 8
##  - attr(*, "names")= chr [1:3] "a" "b" "c"

(a = array(1:27,c(3,3,3)))

## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   10   13   16
## [2,]   11   14   17
## [3,]   12   15   18
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]   19   22   25
## [2,]   20   23   26
## [3,]   21   24   27

apply(a, 1, sum)

## [1] 117 126 135

apply(a, 2, sum)

## [1]  99 126 153

apply(a, 3, sum)

## [1]  45 126 207

apply(a, 1:2, sum)

##      [,1] [,2] [,3]
## [1,]   30   39   48
## [2,]   33   42   51
## [3,]   36   45   54

tapply

Usage: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

Apply a function to each (non-empty) group of values from X as specified by a unique combination of the levels of INDEX.

(df = data.frame(data = 3:11, cat1 = rep(1:3,3), 
                 cat2=rep(1:2,c(4,5))))

##   data cat1 cat2
## 1    3    1    1
## 2    4    2    1
## 3    5    3    1
## 4    6    1    1
## 5    7    2    2
## 6    8    3    2
## 7    9    1    2
## 8   10    2    2
## 9   11    3    2

tapply(df$data, df$cat1, sum)

##  1  2  3 
## 18 21 24

tapply(df$data, df[,2:3], sum)

##     cat2
## cat1 1  2
##    1 9  9
##    2 4 17
##    3 5 19

purrr

A Hadley package which improves functional programming in R with a focus on pure and type stable functions.

Map functions

Basic functions for looping over an object and returning a value (of a specific type) - replacement for lapply/sapply/vapply.

map() - returns a list.
map_lgl() - returns a logical vector.
map_int() - returns a integer vector.
map_dbl() - returns a double vector.
map_chr() - returns a character vector.
map_df() - returns a data frame.
walk() - returns nothing, call function exclusively for its side effects

Type Consistency

R is a weakly / dynamically typed language which means there is no way to define a function which enforces the argument or return types.

This flexibility can be useful at times, but often it makes it hard to reason about your code and requires more verbose code to handle edge cases.

map_dbl(list(rnorm(1e3),rnorm(1e3),rnorm(1e3)), mean)

## [1] -0.02980877 -0.02168100  0.04525821

map_chr(list(rnorm(1e3),rnorm(1e3),rnorm(1e3)), mean)

## [1] "0.051568" "0.012061" "0.010361"

map_int(list(rnorm(1e3),rnorm(1e3),rnorm(1e3)), mean)

## Error: Can't coerce element 1 from a double to a integer

Purrr shortcut - Anonymous Functions

An anonymous function is one that is never given a name (assigned to a variable)

sapply(1:10, function(x) x^(x+1))

##  [1]            1            8           81         1024        15625       279936      5764801    134217728
##  [9]   3486784401 100000000000

purrr lets us write anonymous functions using one sided formulas where the first arguments

map_dbl(1:10, ~ .^(.+1))

##  [1]            1            8           81         1024        15625       279936      5764801    134217728
##  [9]   3486784401 100000000000

Purrr shortcut - Lookups

Very often we want to extract only certain (named) values from a list, purrr provides a shortcut for this operation when you provide either a character or numeric value instead of a function to apply.

x = list(list(a=1L,b=2L,c=list(d=3L,e=4L)),
         list(a=5L,b=6L,c=list(d=7L,e=8L)))

map_int(x, "a")

## [1] 1 5

map_dbl(x, c("c","e"))

## [1] 4 8

map_df(x, 3)

## # A tibble: 2 × 2
##       d     e
##   <int> <int>
## 1     3     4
## 2     7     8

map_chr(x, c(3,1))

## [1] "3" "7"

Live demo

Acknowledgments

Above materials are derived in part from the following sources:

Hadley Wickham - Adv-R Functionals
Hadley Wickham - R for Data Science
Neil Saunders - A brief introduction to “apply” in R
Jenny Bryan - Purrr Tutorial
R Language Definition