# Inference

## Organ donors

People providing an organ for donation sometimes seek the help of a special “medical consultant”. These consultants assist the patient in all aspects of the surgery, with the goal of reducing the possibility of complications during the medical procedure and recovery. Patients might choose a consultant based in part on the historical complication rate of the consultant’s clients.

One consultant tried to attract patients by noting that the average complication rate for liver donor surgeries in the US is about 10%, but her clients have only had 3 complications in the 62 liver donor surgeries she has facilitated. She claims this is strong evidence that her work meaningfully contributes to reducing complications (and therefore she should be hired!).

## Parameter vs. statistic

A parameter for a hypothesis test is the “true” value of interest. We typically estimate the parameter using a sample statistic as a point estimate.

$$p$$: true rate of complication

$$\hat{p}$$: rate of complication in the sample = $$\frac{3}{62}$$ = 0.048

## Correlation vs. causation

Is it possible to assess the consultant’s claim using the data?

No. The claim is that there is a causal connection, but the data are observational. For example, maybe patients who can afford a medical consultant can afford better medical care, which can also lead to a lower complication rate.

While it is not possible to assess the causal claim, it is still possible to test for an association using these data. For this question we ask, could the low complication rate of $$\hat{p}$$ = 0.048 be due to chance?

## Two claims

• Null hypothesis: “There is nothing going on”

Complication rate for this consultant is no different than the US average of 10%

• Alternative hypothesis: “There is something going on”

Complication rate for this consultant is lower than the US average of 10%

## Hypothesis testing as a court trial

• Null hypothesis, $$H_0$$: Defendant is innocent

• Alternative hypothesis, $$H_A$$: Defendant is guilty

• Present the evidence: Collect data

• Judge the evidence: “Could these data plausibly have happened by chance if the null hypothesis were true?”
• Yes: Fail to reject $$H_0$$
• No: Reject $$H_0$$

## Hypothesis testing framework

• Start with a null hypothesis ($$H_0$$) that represents the status quo

• Set an alternative hypothesis ($$H_A$$) that represents the research question, i.e. what we’re testing for

• Conduct a hypothesis test under the assumption that the null hypothesis is true and calculate a p-value (probability of observed or more extreme outcome given that the null hypothesis is true)
• if the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis
• if they do, then reject the null hypothesis in favor of the alternative

## Setting the hypotheses

Which of the following is the correct set of hypotheses?
1. $$H_0: p = 0.10$$; $$H_A: p \ne 0.10$$
2. $$H_0: p = 0.10$$; $$H_A: p > 0.10$$
3. $$H_0: p = 0.10$$; $$H_A: p < 0.10$$
4. $$H_0: \hat{p} = 0.10$$; $$H_A: \hat{p} \ne 0.10$$
5. $$H_0: \hat{p} = 0.10$$; $$H_A: \hat{p} > 0.10$$
6. $$H_0: \hat{p} = 0.10$$; $$H_A: \hat{p} < 0.10$$

# Theoretical approach

## Bernouilli random variables

• each person in the study can be thought of as a trial

• a person is labeled a success if there are complications, and a failure if there are no complications
• assuming that $$H_0$$ is true, $$P(success) = p = 0.10$$
• when an individual trial has only two possible outcomes, it is called a Bernoulli random variable

## Quick example

What is the probability that exactly 1 out of 4 patients have complications during liver donor surgery?

4 possible scenarios:

[C] - [NC] - [NC] - [NC] = $$0.1 \times 0.9 \times 0.9 \times 0.9$$ = 0.073

[NC] - [C] - [NC] - [NC] = $$0.9 \times 0.1 \times 0.9 \times 0.9$$ = 0.073

[NC] - [NC] - [C] - [NC] = $$0.9 \times 0.9 \times 0.1 \times 0.9$$ = 0.073

[NC] - [NC] - [NC] - [C] = $$0.9 \times 0.0 \times 0.9 \times 0.1$$ = 0.073

Total: 4 $$\times$$ 0.073 = 0.292

## Binomial distribution

The binomial distribution describes the probability of having exactly k successes in n independent Bernouilli trials with probability of success p

P(k successes in n trials) = # of scenarios x P(one scenario)

$= {n \choose k} p^k (1-p)^{n - k}$

where

${n \choose k} = \frac{n!}{k!(n-k)!}$

## Binomial conditions

1. the trials must be independent
2. the number of trials, n, must be fixed
3. each trial outcome must be classified as a success or a failure
4. the probability of success, p, must be the same for each trial

## Finding binomial probabilities using R

What is the probability that exactly 1 out of 4 patients have complications during liver donor surgery?
dbinom(x = 1, size = 4, prob = 0.1)
## [1] 0.2916

## Finding binomial probabilities using R

What is the probability that at least 1 out of 4 patients have complications during liver donor surgery?
dbinom(x = 1:4, size = 4, prob = 0.1)
## [1] 0.2916 0.0486 0.0036 0.0001
sum(dbinom(x = 1:4, size = 4, prob = 0.1))
## [1] 0.3439
1 - pbinom(q = 0, size = 4, prob = 0.1)
## [1] 0.3439

## Using the binomial to calculate the p-value

p-value = P(observed or more extreme outcome | $$H_0$$ true)

= P(3 or fewer complications | $$p = 0.10$$)

(p_val = sum(dbinom(x = 0:3, size = 62, prob = 0.1)))
## [1] 0.121

## Significance level

We often use 5% as the cutoff for whether the p-value is low enough that the data are unlikely to have come from the null model. This cutoff value is called the significance level ($$\alpha$$).

• If p-value < $$\alpha$$, reject $$H_0$$ in favor of $$H_A$$: The data provide convincing evidence for the alternative hypothesis.

• If p-value > $$\alpha$$, fail to reject $$H_0$$ in favor of $$H_A$$: The data do not provide convincing evidence for the alternative hypothesis.

## Conclusion

What is the conclusion of the hypothesis test?

Since the p-value is greater than the significance level (0.121 > 0.05), we fail to reject the null hypothesis. These data do not provide convincing evidence that this consultant incurs a lower complication rate than 10% (overall US complication rate).

# Simulation approach

## Simulating the null distribution

• Instead of constructing the exact null distribution using the binomial, we can also simulate it.

• Remember that $$H_0: p = 0.10$$, so we need to simulate a null distribution where the probability of success (complication) for each trial (patient) is 0.10.

Describe how you would simulate the null distribution for this study using a bag of chips. How many chips? What colors? What do the colors indicate? How many draws? With replacement or without replacement?

## What do we expect?

When sampling from the null distribution, what is the expected proportion of success (complications)?

## Set-up

set.seed(9)
library(ggplot2)

## Simulation #1

# create sample space
chips = c("red", "blue")
# draw the first sample of size 62 from the null distribution
sim1 = sample(chips, size = 62, prob = c(0.1, 0.9), replace = TRUE)
# view the sample
table(sim1)
## sim1
## blue  red
##   51   11
# calculate the simulated sample proportion of complications (red chips)
(p_hat_sim1 = sum(sim1 == "red") / length(sim1))
## [1] 0.1774

## Recording and plotting

sim_dist = data.frame(p_hat_sim = rep(NA, 100))

sim_dist$p_hat_sim[1] = p_hat_sim1 ggplot(sim_dist, aes(x = p_hat_sim)) + geom_dotplot() + xlim(0,0.26) + ylim(0,10) ## Simulation #2 sim2 = sample(chips, size = 62, prob = c(0.1, 0.9), replace = TRUE) (p_hat_sim2 = sum(sim2 == "red") / length(sim2)) ## [1] 0.08065 sim_dist$p_hat_sim[2] = p_hat_sim2

ggplot(sim_dist, aes(x = p_hat_sim)) +
geom_dotplot() +
xlim(0,0.26) + ylim(0,10)

## Simulation #3

sim3 = sample(chips, size = 62,
prob = c(0.1, 0.9), replace = TRUE)

(p_hat_sim3 = sum(sim3 == "red") / length(sim3))
## [1] 0.2097
sim_dist\$p_hat_sim[3] = p_hat_sim3

ggplot(sim_dist, aes(x = p_hat_sim)) +
geom_dotplot() +
xlim(0,0.26) + ylim(0,10)

Application exercise 6:

Automate the process of constucting the simulated null distribution using 100 simulations. Plot the distribution using a stacked dot plot, and calculate the p-value two ways. First, counting the dots on the plot, and then using R and subsetting.

Challenge: Your code should have as few hard coded arguments as possible. The ultimate goal is to be able to re-use the code with little modification for another dataset/hypothesis test.