library(tidyverse)
library(future)
library(furrr)
library(microbenchmark)
Try the following examples. What do you notice?
plan(sequential)
ls() # show objects in current environment
Sys.getpid()
ex_1 %<-% {
cat("The system PID is", Sys.getpid(), "\n")
b <- sapply(mtcars, is.na)
sum(b)
}
ls()
plan(multisession(workers = 2))
ls()
Sys.getpid()
X <- matrix(rnorm(1000 * 1000), nrow = 1000, ncol = 1000)
ex_2 <- future({
cat("The system PID is", Sys.getpid(), "\n")
solve(X)
})
X_inverse <- value(ex_2)
plan(multisession(workers = 2))
library(ggplot2)
library(plotly)
ex_3_a %<-% {
Sys.sleep(5)
plot_ly(data = diamonds,
x = ~price, y = ~carat, z = ~table,
type = "scatter3d", mode = "markers", color = ~cut)
}
ex_3_b %<-% {
Sys.sleep(5)
ggplot(diamonds, aes(x = carat, y = sqrt(price), color = cut)) +
geom_point(alpha = 0.2) +
geom_smooth() +
theme_minimal()
}
ex_3_c %<-% {
Sys.sleep(5)
ggplot(diamonds, aes(x = price)) +
geom_histogram() +
theme_minimal()
}
plan(sequential)
In the first example, b
does not exist in our global environment after the future is resolved. We are also running on the same process ID since the plan is set to sequential.
In the second example, X
is automatically copied to the worker. We are computing the matrix inverse in a separate R session denoted by the given unique ID.
In the third example, the packages and their contents are shipped to the workers. However, since we are only using two workers, ex_3_c
temporarily blocks the main session until a worker is available.
Suppose you have decided on the below as your final model.
lm(mpg ~ wt + hp, data = mtcars)
Use target shuffling and extract the \(R^2\) metric for 1000 shuffles to assess the relationship. Compare the performance of doing this with map_dbl()
and future_map_dbl()
. Use the microbenchmark
package.
First, we’ll create some helper functions.
shuffle_mpg <- function(df) {
df$mpg <- df$mpg[sample(nrow(df))]
df
}
lm_r2 <- function(i) {
summary(lm(mpg ~ wt + hp, data = shuffle_mpg(mtcars)))$r.squared
}
Microbenchmarking:
plan(multisession, workers = 4)
microbenchmark(
map_dbl(1:1000, lm_r2),
future_map_dbl(1:1000, lm_r2)
)