Simulation-Based Inference II

# Simulation-Based Inference II
## Intro to Data Science
### Shawn Santo
### 03-31-20

---

## Video table of contents

1. Introduction and overview: [slides 3 - 7](https://warpwire.duke.edu/w/TYsDAA/)
 
2. Hypothesis testing for a single numeric variable: [slides 8 - 15](https://warpwire.duke.edu/w/U4sDAA/)
 
3. Hypothesis testing for a single numeric variable (p-value): [slides 16 - 18](https://warpwire.duke.edu/w/VYsDAA/)
 
4. Permutation tests for independence: [slides 19 - 38](https://warpwire.duke.edu/w/V4sDAA/)

---

## Announcements

- Project proposal due 04-06-20 at 11:59pm EST

- Homework 4 (final homework) to be assigned 04-02-20

---

## Packages and seed

```r
library(tidyverse)
library(infer)
```

```r
set.seed(200331)
```

---

## The hypothesis testing framework

1. Start with two hypotheses about the population: the null hypothesis and the 
alternative hypothesis.

2. Choose a (representative) sample, collect data, and analyze the data.

3. Figure out how likely it is to see data like what we observed, IF the null 
hypothesis were in fact true.

4. If our data would have been extremely unlikely if the null claim were true, 
then we reject it and deem the alternative claim worthy of further study. 
Otherwise, we cannot reject the null claim.

---

## What can go wrong?

Recall the following four scenarios:

| Decision             | `$H_0$` is true | `$H_0$` is false |
|----------------------|---------------|----------------|
| Fail to reject `$H_0$` | Correct decision    | *Type II Error*  |
| Reject `$H_0$`         | *Type I Error*  | Correct decision     |

- The probability of making a type 1 error is `$\alpha$` (and is specified by the
researcher prior to conducting the test)

- The probability of making a type 2 error is `$\beta$`, and is equal to 1 - the
**power** of the test

.question[
The power of the test is the probability of rejecting the null hypothesis
when it is in fact truly false.
]

There is a trade-off between power and type 1 error rate, but the exact details
are more advanced topics best saved for future statistics courses.

---

## Equivalency of confidence and significance levels

- Two sided alternative HT with `$\alpha$` `$\rightarrow$` `$CL = 1 - \alpha$`

- One sided alternative HT with `$\alpha$` `$\rightarrow$` `$CL = 1 - (2 \times \alpha)$`

---

# Hypothesis testing for a single numeric variable

---

## Rent in Manhattan

Let's revisit the price to rent a one-bedroom apartment in Manhattan.

```r
library(tidyverse)
manhattan <- read_csv("data/manhattan.csv")
```

```r
manhattan %>% slice(1:10)
```

```
#> # A tibble: 10 x 1
#> rent
#> <dbl>
#> 1 3850
#> 2 3800
#> 3 2350
#> 4 3200
#> 5 2150
#> 6 3267
#> 7 2495
#> 8 2349
#> 9 3950
#> 10 1795
```
]
.pull-right[

```r
manhattan %>% slice(11:20)
```

```
#> # A tibble: 10 x 1
#> rent
#> <dbl>
#> 1 2145
#> 2 2300
#> 3 1775
#> 4 2000
#> 5 2175
#> 6 2350
#> 7 2550
#> 8 4195
#> 9 1470
#> 10 2350
```
]

---

## Rent in Manhattan

```r
rent_mean <- manhattan %>% 
 summarise(mean_rent = mean(rent)) %>% 
 pull(mean_rent)

rent_mean
```

```
#> [1] 2625.8
```

According to the site Rent Jungle, the average price to rent an apartment in LA 
is around $2400 / month. Is the average rent for a one-bedroom in Manhattan 
significantly different from $2400 / month?

Let's set up our hypothesis test and conduct our test at the `$\alpha = 0.05$`
significance level.

`$$H_0: \mu = 2400$$`
`$$H_1: \mu \neq 2400$$`

---

## Simulation process

We will use bootstrapping to generate a sampling distribution under the 
assumption of the null hypothesis being true. Then, calculate the p-value to 
make a decision on the hypotheses.

1. Take a bootstrap sample - a random sample taken with replacement from the 
original sample, of the same size as the original sample.

2. Calculate the mean of the bootstrap sample.

3. Repeat steps (1) and (2) many times to create a bootstrap distribution - a 
distribution of bootstrap means.

4. **Shift the bootstrap distribution to be centered at the null value by 
subtracting/adding the difference between the center of the bootstrap 
distribution and the null value to each bootstrap mean.**

5. Calculate the p-value as the proportion of simulations that yield a sample 
mean at least as extreme as the observed sample mean.

---

## Take a bootstrap sample

Take a bootstrap sample - a random sample taken with replacement from the 
original sample, of the same size as the original sample

```r
rent_bootstrap <- manhattan %>%
 specify(response = rent) %>%
 generate(reps = 1000, type = "bootstrap") %>%
 calculate(stat = "mean")
```

---

## Take a bootstrap sample

Where is the center of the distribution? What should it be under the null 
hypothesis?

---

## Shift the distribution

Shift the bootstrap distribution to be centered at the null value by 
subtracting/adding the difference between the center of the bootstrap 
distribution and the null value to each bootstrap mean.

What is the mean of all our bootstrap sample means?

```r
rent_boot_mean <- rent_bootstrap %>%
 summarise(mean = mean(stat)) %>% 
 pull()

rent_boot_mean
```

```
#> [1] 2625.491
```

Now, shift the bootstrap distribution to be centered at the null value.

```r
rent_bootstrap <- rent_bootstrap %>%
 mutate(null_dist_stat = stat - (rent_boot_mean - 2400))
```

---

## Shift the distribution

---

## Calculate the p-value

---

## Calculate the p-value

How can we use the table below to calculate the p-value? What would be a better
way to use `dplyr` functions to return the p-value directly?

```r
rent_bootstrap %>%
  count(null_dist_stat > rent_mean)
```

```
#> # A tibble: 2 x 2
#> `null_dist_stat > rent_mean` n
#> <lgl> <int>
#> 1 FALSE 904
#> 2 TRUE 96
```

---

## Using `hypothesize()`

To avoid having to shift the distribution, we can use `hypothesize()`.

```r
manhattan %>% 
  specify(response = rent) %>% 
* hypothesize(null = "point", mu = 2400) %>%
  generate(reps = 1000, type = "bootstrap") %>% 
  calculate(stat = "mean") %>% 
* get_p_value(obs_stat = rent_mean, direction = "two_sided")
```

```
#> # A tibble: 1 x 1
#> p_value
#> <dbl>
#> 1 0.212
```

What is your conclusion?

---

# Permutation tests for independence

---

## Is yawning contagious?

.question[
Do you think yawning is contagious?
]
.pull-left[
![empirical](img/09a/yawn1.png)
]
.pull-right[
![empirical](img/09a/yawn2.png)
]

---

## Is yawning contagious?
An experiment conducted by the MythBusters tested if a person can be 
subconsciously influenced into yawning if another person near them yawns.

[https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious-2](https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious-2)

*"Given how large our sample was, I'd say it's confirmed."*

*"There's little doubt, it does seem to be contagious."*

**Do we buy their conclusion?**

---

## Study description

In their study, 50 people were randomly assigned to two groups: 34 to a group 
where a person near them yawned (treatment) and 16 to a control group where they
didn't see someone yawn (control).

```r
mb_yawn <- read_csv("data/mb-yawn.csv")
```

```r
mb_yawn %>%
  count(group, outcome)
```

```
#> # A tibble: 4 x 3
#> group outcome n
#> <chr> <chr> <int>
#> 1 control not yawn 12
#> 2 control yawn 4
#> 3 treatment not yawn 24
#> 4 treatment yawn 10
```

---

## Proportion of yawners

```r
mb_yawn %>%
  count(group, outcome) %>%
  group_by(group) %>%
  mutate(p_hat = n / sum(n))
```

```
#> # A tibble: 4 x 4
#> # Groups: group [2]
#> group outcome n p_hat
#> <chr> <chr> <int> <dbl>
#> 1 control not yawn 12 0.75 
#> 2 control yawn 4 0.25 
#> 3 treatment not yawn 24 0.706
#> 4 treatment yawn 10 0.294
```

- Proportion of yawners in the treatment group: `$\frac{10}{34} = 0.2941$`
- Proportion of yawners in the control group: `$\frac{4}{16} = 0.25$`
- Difference: `$0.2941 - 0.25 = 0.0441$`
- (These are the results calculated on the MythBusters episode)

---

## Independence?

Based on the proportions we calculated, 
do you think yawning is really contagious, i.e. are seeing someone yawn 
and yawning dependent?

---

## Dependence, or another possible explanation?

- The observed differences might suggest that yawning is contagious, i.e. seeing 
  someone yawn and yawning are dependent.
  
- But the differences are small enough that we might wonder if they might simple 
  be **due to chance**.
  
- Perhaps if we were to repeat the experiment, we would see slightly different 
  results.
  
- Instead of actually conducting the experiment many times, we will **simulate** 
  our results using a **permutation test**.

**Careful**: even if two events are not independent, it doesn't mean that one 
*causes* the other.

.small-text[
...technically, we will perform an *approximate* permutation test, but don't 
worry about that for now!
]
---

## Two competing claims

`$H_0$`: "There is nothing going on."

Yawning (outcome) and seeing someone yawn (treatment vs. control) are 
**independent**: `$$p_{treatment} = p_{control}$$`

`$H_1$`: "There is something going on."

Yawning and seeing someone yawn are not independent 
(in fact, we will specify that the proportion of yawners is greater in the 
treatment group than in the control group: 
`$$p_{treatment} > p_{control}$$`

where `$p_{treatment}$` is the true proportion (the parameter) of yawners among 
those who saw someone else yawn, and similarly for `$p_{control}$`.

---

## Permuting the treatment assignments

We know from our observed data that 14 people yawned and 36 people didn't yawn. 
We also know that there were 16 people in the control group (didn't see someone 
yawn) and 34 people in the treatment group (saw someone yawn), resulting in a 
test statistic of 0.0441.

```r
p_hat_diff <- mb_yawn %>%
 count(group, outcome) %>%
 group_by(group) %>%
 mutate(p_hat = n / sum(n)) %>% 
 filter(outcome == "yawn") %>% 
 pull(p_hat) %>% 
 diff()

p_hat_diff
```

```
#> [1] 0.04411765
```
]

While *keeping the same responses* (yawn vs. no yawn) as we observed, we will 
**permute** (shuffle) the treatment assignments of the observations and 
recalculate the difference in proportions.

That is, we will recalculate the difference in proportions *as if* some of the 
yawners and some (or even all) of the non-yawners perhaps might have been in a 
different treatment group.

---

## The null hypothesis

Remember, under `$H_0$`, there is no association between seeing someone yawn and 
yawning.

If there truly is no association, then shuffling whether someone was assigned to 
watch somebody yawn or not yawn wouldn't matter -- we would expect similar 
proportions of people who yawn in each treatment group.

In hypothesis testing, we assume the null hypothesis is true, and then compute 
the sampling distribution of the test statistic under this assumption.

How might we obtain the sampling distribution?

---

## Repeated permutations

We will do this treatment-shuffling again and again, calculate the difference in 
proportion for each simulation, and use this as an approximation to the null 
distribution.

This distribution approximates all the possible differences in proportion we 
could have seen *if `$H_0$` were in fact true*. We then use this distribution to 
obtain the probability that we see our observed data (or more extreme) -- the 
*p-value*.

This is similar in spirit to bootstrap estimation, but here we sample 
**without** replacement; we merely permute the treatment labels of each of our 
outcomes.

---

## Let's do this in R with `infer`

```r
mb_null_dist <- mb_yawn %>%
 specify(response = outcome, explanatory = group, success = "yawn") %>%
* hypothesize(null = "independence") %>%
* generate(1000, type = "permute") %>%
 calculate(stat = "diff in props", order = c("treatment", "control"))
```
---

## Simulation by computation - 1

- Start with the data frame
- Specify the variables
    - Since the response variable is categorical, specify the level which should 
    be considered as "success"

```r
mb_null_dist <- mb_yawn %>%
* specify(response = outcome, explanatory = group, success = "yawn") 
```

---

## Simulation by computation - 2

```r
mb_null_dist <- mb_yawn %>%
 specify(response = outcome, explanatory = group, success = "yawn") %>%
* hypothesize(null = "independence") 
```

---

## Simulation by computation - 3

```r
mb_null_dist <- mb_yawn %>%
 specify(response = outcome, explanatory = group, success = "yawn") %>%
 hypothesize(null = "independence") %>%
* generate(1000, type = "permute") 
```

---

## Simulation by computation - 4

- Start with the data frame
- Specify the variables
    - Since the response variable is categorical, specify the level which should 
    be considered as "success"
- State the null hypothesis (yawning and whether or not you see someone yawn are 
independent)
- Generate simulated differences via permutation
- Calculate the sample statistic of interest (difference in proportions)
    - Since the explanatory variable is categorical, specify the order in which 
    the subtraction should occur for the calculation of the sample statistic,
    `$(\hat{p}_{treatment} - \hat{p}_{control})$`.

```r
mb_null_dist <- mb_yawn %>%
 specify(response = outcome, explanatory = group, 
 success = "yawn") %>%
 hypothesize(null = "independence") %>%
 generate(1000, type = "permute") %>%
* calculate(stat = "diff in props", order = c("treatment", "control"))
```

---

## Visualizing the null distribution

What would you expect the center of the null distribution to be?

```r
visualize(mb_null_dist) +
  theme_minimal(base_size = 16)
```

---

## Calculating the p-value visually

What is the p-value (the probability of observing our results *or more extreme*)
if the null hypothesis were in fact true)?

```r
visualize(mb_null_dist) +
  shade_p_value(obs_stat = p_hat_diff, direction = "greater") +
  theme_minimal(base_size = 16)
```

---

## Calculating the p-value directly

```r
mb_null_dist %>%
  filter(stat >= p_hat_diff) %>%
  summarise(p_value = n()/nrow(mb_null_dist))
```

```
#> # A tibble: 1 x 1
#> p_value
#> <dbl>
#> 1 0.495
```

```r
get_p_value(mb_null_dist, obs_stat = p_hat_diff, direction = "greater")
```

```
#> # A tibble: 1 x 1
#> p_value
#> <dbl>
#> 1 0.495
```

What is the conclusion of the hypothesis test?

---

## Application exercise

To get started, accept the application exercise assignment:

https://classroom.github.com/a/MLJaWtqX

---

## References

1. Tidy Statistical Inference. (2020). Infer.netlify.com. Retrieved 29 February 
   2020, from https://infer.netlify.com/index.html