# Bootstrapping

## Bootstrap CI for a proportion

``````one_cat_boot <- function(data, success = NULL, nsim = 10000, seed = NULL, plot = TRUE){
# sample size
data = data[!is.na(data)]
n = length(data)
# function for calculating proportion of successes
f = function(x){ sum(x == success) / n }
stat = f(data)
# simulate the bootstrap distribution
if(!is.null(seed)) { set.seed(seed) }
bootdist = data.frame(stat = rep(NA, nsim))
for(i in 1:nsim){
boot_sample = sample(data, size = n, replace = TRUE)
bootdist\$stat[i] = f(boot_sample)
}
# calculate the interval
boot_se = sd(bootdist\$stat)
ci = stat + c(-1,1) * 2 * boot_se
# plot
if(plot == TRUE){
boot_plot = qplot(stat, data = bootdist, geom = "histogram")
boot_plot = boot_plot + geom_vline(aes(xintercept = ci))
suppressWarnings( suppressMessages( print( boot_plot ) ) )
}
# return
return(list(ci = round(ci, 4), est = stat, SE_boot = boot_se))
}``````

## Bootstrap CI for a proportion - in action

``one_cat_boot(kissing\$side, success = "right")``

``````## \$ci
## [1] 0.5593 0.7310
##
## \$est
## [1] 0.6452
##
## \$SE_boot
## [1] 0.04293``````

## Bootstrap CI for a numerical variable

``````one_num_boot <- function(data, est = NULL, nsim = 10000, seed = NULL, plot = TRUE){
# sample size
data = data[!is.na(data)]
n = length(data)
# function for calculating the desired point estimate
f = function(x, est){ est(x) }
stat = f(data, est)
# simulate the bootstrap distribution
if(!is.null(seed)) { set.seed(seed) }
bootdist = data.frame(stat = rep(NA, nsim))
for(i in 1:nsim){
boot_sample = sample(data, size = n, replace = TRUE)
bootdist\$stat[i] = f(boot_sample, est)
}
# calculate the interval
boot_se = sd(bootdist\$stat)
ci = stat + c(-1,1) * 2 * boot_se
# plot
if(plot == TRUE){
boot_plot = qplot(stat, data = bootdist, geom = "histogram")
boot_plot = boot_plot + geom_vline(aes(xintercept = ci))
suppressWarnings( suppressMessages( print( boot_plot ) ) )
}
# return
return(list(ci = round(ci, 4), est = stat, SE_boot = boot_se))
}``````

## Rent in Durham

Data from a random sample of 20 apartments in Durham in 2012.

``qplot(rent, data = durham_apts, geom = "histogram")``

``median(durham_apts\$rent)``
``## [1] 887``

## Bootstrap CI for a numerical variable - in action

``one_num_boot(durham_apts\$rent, median)``

``````## \$ci
## [1]  721.6 1052.4
##
## \$est
## [1] 887
##
## \$SE_boot
## [1] 82.68``````

Application exercise 9:

Put the two functions together (`one_cat_boot` and `one_num_boot`) into one function (`one_var_boot`) which first checks for what type of variable is input, and constructs a bootstrap interval for (you can use `prop` as the name of the estimate for a sample proportion). You can download my functions at and simply call those in your function.

``library(devtools)``
``````source_url("https://stat.duke.edu/courses/Fall14/sta112.01/code/one_cat_boot.R")
source_url("https://stat.duke.edu/courses/Fall14/sta112.01/code/one_num_boot.R")``````
``````download.file("https://stat.duke.edu/courses/Fall14/sta112.01/data/durham_apts.csv",
destfile = "durham_apts.csv", method = "curl")