Two-sample inference (1)

class: center, middle, inverse, title-slide

# Two-sample inference (1)
### Yue Jiang
### Duke University

---

## Recap

So far, we've talked about performing interval estimation and hypothesis 
testing for means using

- simulation-based methods, such as bootstrap or direct simulation, and
- the Central Limit Theorem

In all cases so far, we've only compared one sample against a hypothesized 
value.

.question[
But what if we wanted to compare two samples against *each other*?
]

---

## Two-sample inference for means

We can use the same statistical inference tools to perform *two sample* 
inference on two population means. Suppose we have two (representative)
samples, and wanted to either estimate the difference in means in the two
populations using a confidence interval (that is, a confidence interval for
`\(\mu_1 - \mu_2\)`), or test the hypotheses

`\begin{align*}
H_0: \mu_1 = \mu_2
H_1: \mu_1 \neq \mu_2,
\end{align*}`

where `\(\mu_1\)` and `\(\mu_2\)` are the population means in groups 1 and 2.

.question[
How might you perform interval estimation and address the above hypothesis
test using simulation-based methods? How about the CLT?
]

---

class: center, middle

## Hypothesis testing

---

## Today's data

Adapted from Erdogdu Sakar, B., et al. *Collection and Analysis of a Parkinson*
*Speech Dataset with Multiple Types of Sound Recordings*, IEEE Journal of 
Biomedical and Health Informatics, vol. 17(4), pp. 828-834, 2013

(image from [Wikipedia](https://en.wikipedia.org/wiki/Spectrogram))

---

## Some voice analysis terminology

- Jitter: frequency variation from cycle to cycle
- Shimmer: amplitude variation of the sound wave

Jitter and shimmer are affected by lack of control of vocal cord vibration, and
pathological differences from average values may be indicative of Parkinson's
Disease (PD).

(from Teixeira, Oliveira, and Lopes, 2013)

---

## Question of interest

Is there a difference in average voice jitter between PD patients and healthy
controls?

`parkinsons.csv` contains repeated voice recordings from a number of patients,
some with PD and some serving as non-PD controls (Erdogdu B et al.). For now,
**assume that all samples were taken independently from each other** (this is
not actually the case, but we'll make this assumption).

Jitter is given in milliseconds (ms), and shimmer is given in decibels (dB).

---

## Bootstrap estimation

Let's construct the bootstrap distribution for the *difference* in means.

```r
set.seed(2020)
parkinsons <- read_csv("data/parkinsons.csv")

healthy <- parkinsons %>% 
  filter(status == "Healthy")
pd <- parkinsons %>% 
  filter(status == "PD")

n_sims <- 2000
boot_diffs <- numeric(n_sims)
```

---

## Bootstrap estimation

Let's construct the bootstrap distribution for the *difference* in means.

```r
for(i in 1:n_sims){
  # create indices
  indices_h <- sample(1:nrow(healthy), replace = T)
  indices_p <- sample(1:nrow(pd), replace = T)
  
  # bootstrap est. group means
  temp_h <- healthy %>% 
    slice(indices_h) %>% 
    summarize(mean_jitter = mean(jitter)) %>% 
    select(mean_jitter) %>% 
    pull()
  
  temp_p <- pd %>% 
    slice(indices_p) %>% 
    summarize(mean_jitter = mean(jitter)) %>% 
    select(mean_jitter) %>% 
    pull()
  
  # diff. means in bootstrap sample
  #                Healthy    PD
  boot_diffs[i] <- temp_h - temp_p
}
```

---

## Bootstrap estimation

Let's construct the bootstrap distribution for the *difference* in means.

```r
boot_diffs <- tibble(diffs = boot_diffs)

ggplot(boot_diffs, aes(x = diffs)) +
  geom_histogram(binwidth = 0.0001,
                 fill = "skyblue", 
                 color = "darkblue") +
  labs(x = "", y = "")
```

![](15-two-samp-inference_files/figure-html/unnamed-chunk-5-1.png)

---

## CI for difference in means

Let's construct the bootstrap distribution for the *difference* in means.

```r
boot_diffs %>% 
  summarize(lower = quantile(diffs, 0.025),
            upper = quantile(diffs, 0.975))
```

```
## # A tibble: 1 x 2
##      lower    upper
##      <dbl>    <dbl>
## 1 -0.00412 -0.00215
```

.question[
Interpret this interval (be careful about the order in which we subtracted the
two groups!).

Is there evidence that there is a difference in mean voice jitter between PD 
patients and healthy controls?
]

---

## Hypothesis testing

Let `\(\mu_P\)` be the mean voice jitter among PD patients, and `\(\mu_H\)` be the mean
voice jitter among healthy controls. Let's test

`\begin{align*}
H_0: \mu_P = \mu_H\\
H_1: \mu_P \neq \mu_H
\end{align*}`

If the two means are truly equal (i.e., if `\(H_0\)` is true), then this difference
should be zero.

---

## Hypothesis testing

Let's construct the simulated **null** distribution for the difference in means.
If the two means are truly equal (i.e., if `\(H_0\)` is true), then this difference
should be zero.

```r
offset <- boot_diffs %>% 
  summarize(offset = 0 - mean(diffs)) %>% 
  pull()

null_dist <- boot_diffs %>% 
  mutate(centered_diffs = diffs + offset) %>% 
  select(centered_diffs)
```

---

## Hypothesis testing

```r
ggplot(null_dist, aes(x = centered_diffs)) +
  geom_histogram(binwidth = 0.0001,
                 fill = "skyblue", 
                 color = "darkblue") +
  labs(x = "", y = "")
```

![](15-two-samp-inference_files/figure-html/unnamed-chunk-8-1.png)

---

## Hypothesis testing

```r
obs_diff <- boot_diffs %>% 
  summarize(obs_diff = mean(diffs)) %>% 
  pull()

null_dist %>% 
  mutate(extreme = ifelse(centered_diffs > abs(obs_diff), 1, 0)) %>% 
  summarize(p_val = mean(extreme))
```

```
## # A tibble: 1 x 1
##   p_val
##   <dbl>
## 1     0
```

.vocab[
Is there evidence that there is a difference in mean voice jitter between PD 
patients and healthy controls? 
]

---

## Built-in CLT-based commands

CLT-based inference for a difference in means relies on the 
.vocab[two-sample t-test for independent samples]. Like the t-test we've seen
before, the .vocab[test-statistic] takes on the following form:

`\begin{align*}
T = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2) }{\widehat{SE}_{diff}}
\end{align*}`

The test statistic depends on whether we can assume that the two groups have the 
same underlying variability in their observations. It's safest to assume that 
they don't - that they have *unequal variances*.

The exact form of the test statistic under the null hypothesis, including the
degrees of freedom, are a complicated fraction that no one calculates by hand.
Best to let R handle it!

---

## Built-in CLT-based commands

```r
t.test(jitter ~ status, 
       data = parkinsons,
       mu = 0,
       var.equal = FALSE,
       alternative = "two.sided",
       conf.level = 0.95)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  jitter by status
## t = -5.9588, df = 187.04, p-value = 1.239e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.004157188 -0.002089232
## sample estimates:
## mean in group Healthy      mean in group PD 
##           0.003866042           0.006989252
```

.question[
Comprehensively evaluate the research question by specifying the hypotheses, 
the test statistic and its the distribution under the null, the p-value,
and decision at the `\(\alpha = 0.05\)` significance level. Interpret the 
conclusions from your hypothesis test in context of the original research 
question.
]

---

## Your turn!

[https://classroom.github.com/a/YwK6GNXy](https://classroom.github.com/a/YwK6GNXy)