Hypothesis testing, Pt 1

Today's agenda

Introduction to inference via simulations
By next week
- Complete Application Exercise by 5 pm on 11/1
- Read Chapter 2, Sections 2.1 - 2.4 of Introductory Statistics with Randomization and Simulation

Inference

Organ donors

People providing an organ for donation sometimes seek the help of a special "medical consultant". These consultants assist the patient in all aspects of the surgery, with the goal of reducing the possibility of complications during the medical procedure and recovery. Patients might choose a consultant based in part on the historical complication rate of the consultant's clients.

One consultant tried to attract patients by noting that the average complication rate for liver donor surgeries in the US is about 10%, but her clients have only had 3 complications in the 62 liver donor surgeries she has facilitated. She claims this is strong evidence that her work meaningfully contributes to reducing complications (and therefore she should be hired!).

Parameter vs. statistic

In the context of inference a parameter is a "true" (but unknown) value of interest.

We estimate the parameter of interest using a sample statistic as a point estimate, however these estimates are imperfect / uncertain.

Our goal is to use this imperfect point estimate to say something useful about the parameter.

$p$ - true rate of complication for this consultant

$\hat{p}$ - rate of complication in the sample ($\frac{3}{62}$ = 0.048)

Correlation vs. causation

Is it possible to assess the consultant's claim using the data?

No - The claim is that there is a causal connection, but the data are observational.

For example, maybe patients who can afford a medical consultant can afford better medical care, which can also lead to a lower complication rate.

While it is not possible to assess the causal claim, it is still possible to test for an association using these data.

Instead we will settle for asking, do we think that this low complication rate of $\hat{p}$ = 0.048 could be due to chance?

Two claims

Null hypothesis - Status quo, there is nothing going on
Alternative hypothesis - Some effect, there is something going on

In context,

Null hypothesis - The complication rate for this consultant is no different than the US average of 10%
Alternative hypothesis - The complication rate for this consultant is lower than the US average of 10%

Hypothesis testing as a court trial

Null hypothesis, $H_0$ - Defendant is innocent

Alternative hypothesis, $H_A$ - Defendant is guilty

Judge / Jury - decide if the defendant is guilty beyond a reasonable doubt

    - Guilty
    
    - Not guilty (note this isn't the same as innocent)

P-Value

A p-value is the probability of the observed data or a more extreme outcome in favor of $H_A$ given that the $H_0$ is true

A p-value is not:

Probability $H_0$ is true

Probability $H_A$ is false

Hypothesis testing framework

Want to decide: "Could these data plausibly have happened by chance if the null hypothesis were true?"

Start with a null hypothesis ($H_0$) that represents the status quo

Set an alternative hypothesis ($H_A$) that represents the research question, i.e. what we're testing for. Should be the complement of $H_0$.

We conduct a hypothesis tests under the assumption that the null hypothesis is true and calculate a p-value (probability of observed data or more extreme outcome in favor of $H_A$ given that the $H_0$ is true)
- if the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, fail to reject null hypothesis (not guilty)
- if they do, then reject the null hypothesis in favor of the alternative (guilty)

Setting the hypotheses

Which of the following is the correct set of hypotheses?

$H_0: p = 0.10$ $H_A: p \ne 0.10$
$H_0: p = 0.10$ $H_A: p > 0.10$
$H_0: p = 0.10$ $H_A: p < 0.10$
$H_0: \hat{p} = 0.10$ $H_A: \hat{p} \ne 0.10$
$H_0: \hat{p} = 0.10$ $H_A: \hat{p} > 0.10$
$H_0: \hat{p} = 0.10$ $H_A: \hat{p} < 0.10$

Simulating the null distribution

Since $H_0: p = 0.10$, we will simulate a null distribution where the probability of success (complication) for each trial (patient) is 0.10.

How would you simulate the null distribution for this study using a bag of chips.

How many chips? What colors? What do the colors indicate? How many draws? With replacement or without replacement?

The null distribution is the distribution of $\hat{p}$ when we assume that $H_0$ is true ($p=0.1$).

What do we expect?

When sampling from the null distribution, what is the expected proportion of successes (complications)?

Set-up

set.seed(20160925)
library(ggplot2)

Simulation #1

# create sample space
outcomes = c("complication", "no complication")

# draw the first sample of size 62 from the null distribution
sim1 = sample(outcomes, size = 62, prob = c(0.1, 0.9), replace = TRUE)

# view the sample
table(sim1)

## sim1
##    complication no complication 
##               4              58

# calculate the simulated sample proportion of complications (red chips)
(p_hat_sim1 = sum(sim1 == "complication") / length(sim1))

## [1] 0.06451613

Recording and plotting

# create an empty data frame
sim_dist = data.frame(p_hat_sim = rep(NA, 100))

# record the simulated p-hat as the first observation
sim_dist$p_hat_sim[1] = p_hat_sim1

# plot
ggplot(sim_dist, aes(x = p_hat_sim)) + 
  geom_dotplot() + 
  xlim(0, 0.26) + ylim(0, 10)

Simulation #2

sim2 = sample(outcomes, size = 62, prob = c(0.1, 0.9), replace = TRUE)

sim_dist$p_hat_sim[2] = sum(sim2 == "complication") / length(sim2)

ggplot(sim_dist, aes(x = p_hat_sim)) + 
  geom_dotplot() + 
  xlim(0,0.26) + ylim(0,10)

Simulation #3

sim3 = sample(outcomes, size = 62, prob = c(0.1, 0.9), replace = TRUE)

sim_dist$p_hat_sim[3] = sum(sim3 == "complication") / length(sim3)

ggplot(sim_dist, aes(x = p_hat_sim)) + 
  geom_dotplot() + 
  xlim(0,0.26) + ylim(0,10)

This is getting boring…

We need a way to automate this process!

Simple Looping

`for` loops

Simplest, and most common type of loop in R - iterate through the elements of a vector and evaluate the code block for each.

for(x in 1:10)
{
  cat(x^2," ", sep="") # cat: concatenate and print
}

## 1 4 9 16 25 36 49 64 81 100

for(y in list(1:3, LETTERS[1:7], c(TRUE,FALSE)))
{
  cat(length(y)," ",sep="")
}

## 3 7 2

Storing results

Almost always it is better to create an object to store your results first, rather than growing the object as you go.

# Good
res = rep(NA,10)
for(x in 1:10)
{
  res[x] = x^2
}
res

##  [1]   1   4   9  16  25  36  49  64  81 100

# Bad
res = c()
for (x in 1:10)
{
  res = c(res,x^2)
}
res

##  [1]   1   4   9  16  25  36  49  64  81 100

Back to inference

Using loops to create the null distribution

Earlier we simulated each iteration one-by-one and showed how we would fill in each element of sim_dist$p_hat_sim.
Using for loops we can automate this process

Simulating the null distribution with a `for` loop

sim_dist = data.frame(p_hat_sim = rep(NA, 100))
for (i in 1:100){
  sim = sample(outcomes, size = 62, prob = c(0.1, 0.9), replace = TRUE)
  sim_dist$p_hat_sim[i] = sum(sim == "complication") / length(sim)
}

ggplot(sim_dist, aes(x = p_hat_sim)) + 
  geom_dotplot()

Calculating the p-value

Remember p-value is the probability of the observed data or a more extreme outcome in favor of $H_A$ given the $H_0$ is true.

What is the p-value, i.e. in what % of the simulations was the simulated $\hat{p}$ was at least as extreme as the observed $\hat{p}$ of 0.048?

Calculating the p-value

sim_dist %>% arrange(p_hat_sim) %>% head(7)

##    p_hat_sim
## 1 0.03225806
## 2 0.03225806
## 3 0.03225806
## 4 0.03225806
## 5 0.03225806
## 6 0.03225806
## 7 0.04838710

sim_dist %>% 
  filter(p_hat_sim < 0.048) %>% 
  summarize(p_value = n()/nrow(sim_dist))

##   p_value
## 1    0.06

Significance level

We most often use 5% as the cutoff for whether a p-value is low enough to enable us to decide that the data are unlikely to have come from the null model. This cutoff value is called the significance level ($\alpha$) and its choice relates to our allowable error rate.

If p-value < $\alpha$, reject $H_0$ in favor of $H_A$: The data provide convincing evidence for the alternative hypothesis.
If p-value > $\alpha$, fail to reject $H_0$ in favor of $H_A$: The data do not provide convincing evidence for the alternative hypothesis.

Conclusion

What is the conclusion of the hypothesis test?

Since the p-value is greater than the significance level, we fail to reject the null hypothesis. These data do not provide convincing evidence that this consultant incurs a lower complication rate than 10% (overall US complication rate).

Let's get real

100 simulations is not sufficient
We usually simulate around 10,000 - 1,000,000 times to get an accurate distribution

Application Exercise

App Ex - Inference for a proportion

See course website for details.

Today's agenda

Today's agenda

Inference

Organ donors

Parameter vs. statistic

Correlation vs. causation

Two claims

Hypothesis testing as a court trial

P-Value

Hypothesis testing framework

Setting the hypotheses

Simulating the null distribution

What do we expect?

Set-up

Simulation #1

Recording and plotting

Simulation #2

Simulation #3

This is getting boring…

Simple Looping

for loops

Storing results

Back to inference

Using loops to create the null distribution

Simulating the null distribution with a for loop

Calculating the p-value

Calculating the p-value

Significance level

Conclusion

Let's get real

Application Exercise

App Ex - Inference for a proportion

`for` loops

Simulating the null distribution with a `for` loop