Bootstrapping

Bootstrap CI for a proportion

one_cat_boot <- function(data, success = NULL, nsim = 10000, seed = NULL, plot = TRUE){
  # sample size
  data = data[!is.na(data)]
  n = length(data)
  # function for calculating proportion of successes
  f = function(x){ sum(x == success) / n }
  stat = f(data)
  # simulate the bootstrap distribution
  if(!is.null(seed)) { set.seed(seed) }
  bootdist = data.frame(stat = rep(NA, nsim))
  for(i in 1:nsim){
    boot_sample = sample(data, size = n, replace = TRUE)
    bootdist$stat[i] = f(boot_sample)
  }
  # calculate the interval
  boot_se = sd(bootdist$stat)
  ci = stat + c(-1,1) * 2 * boot_se 
  # plot
  if(plot == TRUE){
    boot_plot = qplot(stat, data = bootdist, geom = "histogram")
    boot_plot = boot_plot + geom_vline(aes(xintercept = ci))
    suppressWarnings( suppressMessages( print( boot_plot ) ) )
  }
  # return
  return(list(ci = round(ci, 4), est = stat, SE_boot = boot_se))
}

Bootstrap CI for a proportion - in action

one_cat_boot(kissing$side, success = "right")

plot of chunk unnamed-chunk-3

## $ci
## [1] 0.5593 0.7310
## 
## $est
## [1] 0.6452
## 
## $SE_boot
## [1] 0.04293

Bootstrap CI for a numerical variable

one_num_boot <- function(data, est = NULL, nsim = 10000, seed = NULL, plot = TRUE){
  # sample size
  data = data[!is.na(data)]
  n = length(data)
  # function for calculating the desired point estimate
  f = function(x, est){ est(x) }
  stat = f(data, est)
  # simulate the bootstrap distribution
  if(!is.null(seed)) { set.seed(seed) }
  bootdist = data.frame(stat = rep(NA, nsim))
  for(i in 1:nsim){
    boot_sample = sample(data, size = n, replace = TRUE)
    bootdist$stat[i] = f(boot_sample, est)
  }
  # calculate the interval
  boot_se = sd(bootdist$stat)
  ci = stat + c(-1,1) * 2 * boot_se 
  # plot
  if(plot == TRUE){
    boot_plot = qplot(stat, data = bootdist, geom = "histogram")
    boot_plot = boot_plot + geom_vline(aes(xintercept = ci))
    suppressWarnings( suppressMessages( print( boot_plot ) ) )
  }
  # return
  return(list(ci = round(ci, 4), est = stat, SE_boot = boot_se))
}

Rent in Durham

Data from a random sample of 20 apartments in Durham in 2012.

qplot(rent, data = durham_apts, geom = "histogram")

plot of chunk unnamed-chunk-6

median(durham_apts$rent)
## [1] 887

Bootstrap CI for a numerical variable - in action

one_num_boot(durham_apts$rent, median)

plot of chunk unnamed-chunk-7

## $ci
## [1]  721.6 1052.4
## 
## $est
## [1] 887
## 
## $SE_boot
## [1] 82.68

Application exercise 9:

Put the two functions together (one_cat_boot and one_num_boot) into one function (one_var_boot) which first checks for what type of variable is input, and constructs a bootstrap interval for (you can use prop as the name of the estimate for a sample proportion). You can download my functions at and simply call those in your function.

library(devtools)
source_url("https://stat.duke.edu/courses/Fall14/sta112.01/code/one_cat_boot.R")
source_url("https://stat.duke.edu/courses/Fall14/sta112.01/code/one_num_boot.R")
download.file("https://stat.duke.edu/courses/Fall14/sta112.01/data/durham_apts.csv", 
    destfile = "durham_apts.csv", method = "curl")
durham_apts = read.csv("durham_apts.csv")

Randomization testing

Is yawning contagious?

Do you think yawning is contagious?
empirical