Packages

library(tidyverse)
library(future)
library(furrr)
library(microbenchmark)

Exercise 1

Problem

Try the following examples. What do you notice?

plan(sequential)
ls() # show objects in current environment
Sys.getpid()

ex_1 %<-% {
  cat("The system PID is", Sys.getpid(), "\n")
  b <- sapply(mtcars, is.na)
  sum(b)
}
ls()

plan(multisession(workers = 2))
ls()
Sys.getpid()
X <- matrix(rnorm(1000 * 1000), nrow = 1000, ncol = 1000)

ex_2 <- future({
  cat("The system PID is", Sys.getpid(), "\n")
  solve(X)
})
X_inverse <- value(ex_2)

plan(multisession(workers = 2))
library(ggplot2)
library(plotly)

ex_3_a %<-% {
  Sys.sleep(5)
  plot_ly(data = diamonds, 
          x = ~price, y = ~carat, z = ~table, 
          type = "scatter3d", mode = "markers", color = ~cut)
}

ex_3_b %<-% {
  Sys.sleep(5)
  ggplot(diamonds, aes(x = carat, y = sqrt(price), color = cut)) +
    geom_point(alpha = 0.2) +
    geom_smooth() +
    theme_minimal()
}

ex_3_c %<-% {
  Sys.sleep(5)
  ggplot(diamonds, aes(x = price)) +
    geom_histogram() +
    theme_minimal()
}
plan(sequential)

Solution

In the first example, b does not exist in our global environment after the future is resolved. We are also running on the same process ID since the plan is set to sequential.

In the second example, X is automatically copied to the worker. We are computing the matrix inverse in a separate R session denoted by the given unique ID.

In the third example, the packages and their contents are shipped to the workers. However, since we are only using two workers, ex_3_c temporarily blocks the main session until a worker is available.

Exercise 2

Problem

Suppose you have decided on the below as your final model.

lm(mpg ~ wt + hp, data = mtcars)

Use target shuffling and extract the \(R^2\) metric for 1000 shuffles to assess the relationship. Compare the performance of doing this with map_dbl() and future_map_dbl(). Use the microbenchmark package.

Solution

First, we’ll create some helper functions.

shuffle_mpg <- function(df) {
    df$mpg <- df$mpg[sample(nrow(df))]
    df
}

lm_r2 <- function(i) {
  summary(lm(mpg ~ wt + hp, data = shuffle_mpg(mtcars)))$r.squared
}

Microbenchmarking:

plan(multisession, workers = 4)

microbenchmark(
  map_dbl(1:1000, lm_r2),
  future_map_dbl(1:1000, lm_r2)
)

Exercises: Futures and furrr

Shawn Santo

Packages

Exercise 1

Problem

Solution

Exercise 2

Problem

Solution