class: center, middle, inverse, title-slide # Simulation-Based Inference II ## Intro to Data Science ### Shawn Santo ### 03-31-20 --- ## Video table of contents 1. Introduction and overview: [slides 3 - 7](https://warpwire.duke.edu/w/TYsDAA/) <br/><br/> 2. Hypothesis testing for a single numeric variable: [slides 8 - 15](https://warpwire.duke.edu/w/U4sDAA/) <br/><br/> 3. Hypothesis testing for a single numeric variable (p-value): [slides 16 - 18](https://warpwire.duke.edu/w/VYsDAA/) <br/><br/> 4. Permutation tests for independence: [slides 19 - 38](https://warpwire.duke.edu/w/V4sDAA/) --- ## Announcements - Project proposal due 04-06-20 at 11:59pm EST - Homework 4 (final homework) to be assigned 04-02-20 --- ## Packages and seed ```r library(tidyverse) library(infer) ``` ```r set.seed(200331) ``` --- ## The hypothesis testing framework 1. Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis. 2. Choose a (representative) sample, collect data, and analyze the data. 3. Figure out how likely it is to see data like what we observed, IF the null hypothesis were in fact true. 4. If our data would have been extremely unlikely if the null claim were true, then we reject it and deem the alternative claim worthy of further study. Otherwise, we cannot reject the null claim. --- ## What can go wrong? Recall the following four scenarios: | Decision | `\(H_0\)` is true | `\(H_0\)` is false | |----------------------|---------------|----------------| | Fail to reject `\(H_0\)` | Correct decision | *Type II Error* | | Reject `\(H_0\)` | *Type I Error* | Correct decision | - The probability of making a type 1 error is `\(\alpha\)` (and is specified by the researcher prior to conducting the test) - The probability of making a type 2 error is `\(\beta\)`, and is equal to 1 - the **power** of the test .question[ The power of the test is the probability of rejecting the null hypothesis when it is in fact truly false. ] -- There is a trade-off between power and type 1 error rate, but the exact details are more advanced topics best saved for future statistics courses. --- ## Equivalency of confidence and significance levels - Two sided alternative HT with `\(\alpha\)` `\(\rightarrow\)` `\(CL = 1 - \alpha\)` - One sided alternative HT with `\(\alpha\)` `\(\rightarrow\)` `\(CL = 1 - (2 \times \alpha)\)` <img src="lec-13a-simulation-inference-2_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- class: middle, center, inverse # Hypothesis testing for a single numeric variable --- ## Rent in Manhattan Let's revisit the price to rent a one-bedroom apartment in Manhattan. ```r library(tidyverse) manhattan <- read_csv("data/manhattan.csv") ``` .pull-left[ ```r manhattan %>% slice(1:10) ``` ``` #> # A tibble: 10 x 1 #> rent #> <dbl> #> 1 3850 #> 2 3800 #> 3 2350 #> 4 3200 #> 5 2150 #> 6 3267 #> 7 2495 #> 8 2349 #> 9 3950 #> 10 1795 ``` ] .pull-right[ ```r manhattan %>% slice(11:20) ``` ``` #> # A tibble: 10 x 1 #> rent #> <dbl> #> 1 2145 #> 2 2300 #> 3 1775 #> 4 2000 #> 5 2175 #> 6 2350 #> 7 2550 #> 8 4195 #> 9 1470 #> 10 2350 ``` ] --- ## Rent in Manhattan ```r rent_mean <- manhattan %>% summarise(mean_rent = mean(rent)) %>% pull(mean_rent) rent_mean ``` ``` #> [1] 2625.8 ``` According to the site Rent Jungle, the average price to rent an apartment in LA is around $2400 / month. Is the average rent for a one-bedroom in Manhattan significantly different from $2400 / month? -- Let's set up our hypothesis test and conduct our test at the `\(\alpha = 0.05\)` significance level. `$$H_0: \mu = 2400$$` `$$H_1: \mu \neq 2400$$` --- ## Simulation process We will use bootstrapping to generate a sampling distribution under the assumption of the null hypothesis being true. Then, calculate the p-value to make a decision on the hypotheses. -- 1. Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample. -- 2. Calculate the mean of the bootstrap sample. -- 3. Repeat steps (1) and (2) many times to create a bootstrap distribution - a distribution of bootstrap means. -- 4. **Shift the bootstrap distribution to be centered at the null value by subtracting/adding the difference between the center of the bootstrap distribution and the null value to each bootstrap mean.** -- 5. Calculate the p-value as the proportion of simulations that yield a sample mean at least as extreme as the observed sample mean. --- ## Take a bootstrap sample Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample ```r rent_bootstrap <- manhattan %>% specify(response = rent) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") ``` --- ## Take a bootstrap sample <img src="lec-13a-simulation-inference-2_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> -- Where is the center of the distribution? What should it be under the null hypothesis? --- ## Shift the distribution Shift the bootstrap distribution to be centered at the null value by subtracting/adding the difference between the center of the bootstrap distribution and the null value to each bootstrap mean. What is the mean of all our bootstrap sample means? ```r rent_boot_mean <- rent_bootstrap %>% summarise(mean = mean(stat)) %>% pull() rent_boot_mean ``` ``` #> [1] 2625.491 ``` -- Now, shift the bootstrap distribution to be centered at the null value. ```r rent_bootstrap <- rent_bootstrap %>% mutate(null_dist_stat = stat - (rent_boot_mean - 2400)) ``` --- ## Shift the distribution <img src="lec-13a-simulation-inference-2_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## Calculate the p-value .question[ How should we calculate the p-value? ] <img src="lec-13a-simulation-inference-2_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Calculate the p-value How can we use the table below to calculate the p-value? What would be a better way to use `dplyr` functions to return the p-value directly? ```r rent_bootstrap %>% count(null_dist_stat > rent_mean) ``` ``` #> # A tibble: 2 x 2 #> `null_dist_stat > rent_mean` n #> <lgl> <int> #> 1 FALSE 904 #> 2 TRUE 96 ``` --- ## Using `hypothesize()` To avoid having to shift the distribution, we can use `hypothesize()`. ```r manhattan %>% specify(response = rent) %>% * hypothesize(null = "point", mu = 2400) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% * get_p_value(obs_stat = rent_mean, direction = "two_sided") ``` ``` #> # A tibble: 1 x 1 #> p_value #> <dbl> #> 1 0.212 ``` -- What is your conclusion? --- class: center, middle, inverse # Permutation tests for independence --- ## Is yawning contagious? .question[ Do you think yawning is contagious? ] .pull-left[ ![empirical](img/09a/yawn1.png) ] .pull-right[ ![empirical](img/09a/yawn2.png) ] --- ## Is yawning contagious? An experiment conducted by the MythBusters tested if a person can be subconsciously influenced into yawning if another person near them yawns. [https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious-2](https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious-2) -- *"Given how large our sample was, I'd say it's confirmed."* *"There's little doubt, it does seem to be contagious."* **Do we buy their conclusion?** --- ## Study description In their study, 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a control group where they didn't see someone yawn (control). ```r mb_yawn <- read_csv("data/mb-yawn.csv") ``` ```r mb_yawn %>% count(group, outcome) ``` ``` #> # A tibble: 4 x 3 #> group outcome n #> <chr> <chr> <int> #> 1 control not yawn 12 #> 2 control yawn 4 #> 3 treatment not yawn 24 #> 4 treatment yawn 10 ``` --- ## Proportion of yawners ```r mb_yawn %>% count(group, outcome) %>% group_by(group) %>% mutate(p_hat = n / sum(n)) ``` ``` #> # A tibble: 4 x 4 #> # Groups: group [2] #> group outcome n p_hat #> <chr> <chr> <int> <dbl> #> 1 control not yawn 12 0.75 #> 2 control yawn 4 0.25 #> 3 treatment not yawn 24 0.706 #> 4 treatment yawn 10 0.294 ``` - Proportion of yawners in the treatment group: `\(\frac{10}{34} = 0.2941\)` - Proportion of yawners in the control group: `\(\frac{4}{16} = 0.25\)` - Difference: `\(0.2941 - 0.25 = 0.0441\)` - (These are the results calculated on the MythBusters episode) --- ## Independence? Based on the proportions we calculated, do you think yawning is really contagious, i.e. are seeing someone yawn and yawning dependent? ``` #> # A tibble: 4 x 4 #> # Groups: group [2] #> group outcome n p_hat #> <chr> <chr> <int> <dbl> #> 1 control not yawn 12 0.75 #> 2 control yawn 4 0.25 #> 3 treatment not yawn 24 0.706 #> 4 treatment yawn 10 0.294 ``` --- ## Dependence, or another possible explanation? - The observed differences might suggest that yawning is contagious, i.e. seeing someone yawn and yawning are dependent. - But the differences are small enough that we might wonder if they might simple be **due to chance**. - Perhaps if we were to repeat the experiment, we would see slightly different results. - Instead of actually conducting the experiment many times, we will **simulate** our results using a **permutation test**. **Careful**: even if two events are not independent, it doesn't mean that one *causes* the other. .small-text[ ...technically, we will perform an *approximate* permutation test, but don't worry about that for now! ] --- ## Two competing claims `\(H_0\)`: "There is nothing going on." Yawning (outcome) and seeing someone yawn (treatment vs. control) are **independent**: `$$p_{treatment} = p_{control}$$` `\(H_1\)`: "There is something going on." Yawning and seeing someone yawn are <font class="vocab">not independent</font> (in fact, we will specify that the proportion of yawners is greater in the treatment group than in the control group: `$$p_{treatment} > p_{control}$$` where `\(p_{treatment}\)` is the true proportion (the parameter) of yawners among those who saw someone else yawn, and similarly for `\(p_{control}\)`. --- ## Permuting the treatment assignments We know from our observed data that 14 people yawned and 36 people didn't yawn. We also know that there were 16 people in the control group (didn't see someone yawn) and 34 people in the treatment group (saw someone yawn), resulting in a test statistic of 0.0441. .tiny[ ```r p_hat_diff <- mb_yawn %>% count(group, outcome) %>% group_by(group) %>% mutate(p_hat = n / sum(n)) %>% filter(outcome == "yawn") %>% pull(p_hat) %>% diff() p_hat_diff ``` ``` #> [1] 0.04411765 ``` ] -- While *keeping the same responses* (yawn vs. no yawn) as we observed, we will **permute** (shuffle) the treatment assignments of the observations and recalculate the difference in proportions. That is, we will recalculate the difference in proportions *as if* some of the yawners and some (or even all) of the non-yawners perhaps might have been in a different treatment group. .question[ This seems a bit odd. Why do we do this? ] --- ## The null hypothesis Remember, under `\(H_0\)`, there is no association between seeing someone yawn and yawning. If there truly is no association, then shuffling whether someone was assigned to watch somebody yawn or not yawn wouldn't matter -- we would expect similar proportions of people who yawn in each treatment group. -- In hypothesis testing, we assume the null hypothesis is true, and then compute the sampling distribution of the test statistic under this assumption. How might we obtain the sampling distribution? --- ## Repeated permutations We will do this treatment-shuffling again and again, calculate the difference in proportion for each simulation, and use this as an approximation to the null distribution. -- This distribution approximates all the possible differences in proportion we could have seen *if `\(H_0\)` were in fact true*. We then use this distribution to obtain the probability that we see our observed data (or more extreme) -- the *p-value*. This is similar in spirit to bootstrap estimation, but here we sample **without** replacement; we merely permute the treatment labels of each of our outcomes. --- ## Let's do this in R with `infer` ```r mb_null_dist <- mb_yawn %>% specify(response = outcome, explanatory = group, success = "yawn") %>% * hypothesize(null = "independence") %>% * generate(1000, type = "permute") %>% calculate(stat = "diff in props", order = c("treatment", "control")) ``` --- ## Simulation by computation - 1 - Start with the data frame - Specify the variables - Since the response variable is categorical, specify the level which should be considered as "success" ```r mb_null_dist <- mb_yawn %>% * specify(response = outcome, explanatory = group, success = "yawn") ``` --- ## Simulation by computation - 2 - Start with the data frame - Specify the variables - Since the response variable is categorical, specify the level which should be considered as "success" - State the null hypothesis (yawning and whether or not you see someone yawn are independent) ```r mb_null_dist <- mb_yawn %>% specify(response = outcome, explanatory = group, success = "yawn") %>% * hypothesize(null = "independence") ``` --- ## Simulation by computation - 3 - Start with the data frame - Specify the variables - Since the response variable is categorical, specify the level which should be considered as "success" - State the null hypothesis (yawning and whether or not you see someone yawn are independent) - Generate simulated differences via permutation ```r mb_null_dist <- mb_yawn %>% specify(response = outcome, explanatory = group, success = "yawn") %>% hypothesize(null = "independence") %>% * generate(1000, type = "permute") ``` --- ## Simulation by computation - 4 - Start with the data frame - Specify the variables - Since the response variable is categorical, specify the level which should be considered as "success" - State the null hypothesis (yawning and whether or not you see someone yawn are independent) - Generate simulated differences via permutation - Calculate the sample statistic of interest (difference in proportions) - Since the explanatory variable is categorical, specify the order in which the subtraction should occur for the calculation of the sample statistic, `\((\hat{p}_{treatment} - \hat{p}_{control})\)`. ```r mb_null_dist <- mb_yawn %>% specify(response = outcome, explanatory = group, success = "yawn") %>% hypothesize(null = "independence") %>% generate(1000, type = "permute") %>% * calculate(stat = "diff in props", order = c("treatment", "control")) ``` --- ## Visualizing the null distribution What would you expect the center of the null distribution to be? -- ```r visualize(mb_null_dist) + theme_minimal(base_size = 16) ``` <img src="lec-13a-simulation-inference-2_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> --- ## Calculating the p-value visually What is the p-value (the probability of observing our results *or more extreme*) if the null hypothesis were in fact true)? ```r visualize(mb_null_dist) + shade_p_value(obs_stat = p_hat_diff, direction = "greater") + theme_minimal(base_size = 16) ``` <img src="lec-13a-simulation-inference-2_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> --- ## Calculating the p-value directly ```r mb_null_dist %>% filter(stat >= p_hat_diff) %>% summarise(p_value = n()/nrow(mb_null_dist)) ``` ``` #> # A tibble: 1 x 1 #> p_value #> <dbl> #> 1 0.495 ``` -- ```r get_p_value(mb_null_dist, obs_stat = p_hat_diff, direction = "greater") ``` ``` #> # A tibble: 1 x 1 #> p_value #> <dbl> #> 1 0.495 ``` -- What is the conclusion of the hypothesis test? --- ## Application exercise To get started, accept the application exercise assignment: https://classroom.github.com/a/MLJaWtqX --- ## References 1. Tidy Statistical Inference. (2020). Infer.netlify.com. Retrieved 29 February 2020, from https://infer.netlify.com/index.html