Bayesian vs. frequentist inference

Morning after

A study addressed the question of whether the controversial abortion drug RU 486 could be an effective “morning after” contraceptive. The study participants were women who came to a health clinic asking for emergency contraception after having had sex within the previous 72 hours. Investigators randomly assigned the women to receive either RU486 or standard therapy consisting of high doses of the sex hormones estrogen and a synthetic version of progesterone. Of the women assigned to RU486 (T for Treatment), 4 became pregnant. Of the women who received standard therapy (C for Control), 16 became pregnant. How strongly does this information indicate that T is more effective than C?

Example modified from Don A. Berry’s, Statistics: A Bayesian Perspective, 1995, Ch. 6, pg 15.

Framework

  • To simplify matters let’s turn this problem of comparing two proportions to a one proportion problem: consider only the 20 total pregnancies, and ask how likely is it that 4 pregnancies occur in the T group.

  • If T and C are equally effective, and the sample sizes for the two groups are the same, then the probability the pregnancy come from the T group is simply \(p = 0.5\).

Frequentist approach

Hypotheses

In the frequentist framework we can set up the hypotheses as follows:

  • \(H_0: p = 0.5\) - No difference, the pregnancy is equally likely to come from the T or C group
  • \(H_A: p < 0.5\) - T more effective, the pregnancy is less likely to come from the T group

Note that here \(p\) is the probability that a given pregnancy comes from the T group.

p-value

  • \(n = 20\), since there are 20 total pregnancies
  • \(p = 0.5\), assuming \(H_0\) is true, since we’re using a Frequentist approach
  • \(p-value = P(k \le 4)\)
  • Using the Binomial distribution:
sum(dbinom(0:4, size = 20, p = 0.5))
## [1] 0.005909
  • Then, the chances of observing 4 or fewer pregnancies in the T group given that pregnancy was equally likely in the two groups is approximately 0.0059.

Bayesian approach

Hypotheses, i.e. models

  • Begin by delineating each of the models we consider plausible:
    • Assume that it is plausible that the chances that a pregnancy comes from the T group (\(p\)) could be
      10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90%
  • Hence we are considering 9 models, not 1 as in the classical (frequentist) paradigm
  • The model for 20% is says that given a pregnancy occurs, there is a 2:8 or 1:4 chance that it will occur in the T group. Similarly, the model for 80% says that given a pregnancy occurs, there is a 4:1 chance that it will occur in the T group.

Specifying the prior

  • Prior probabilities should reflect our state of belief prior to the current experiment.
  • It should incorporate the information learned from all relevant research up to the current point in time, and it should not incorporate information from the current experiment.
  • Suppose my prior probability for each of the 9 models is as presented below:
Model (p) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Total
Prior 0.06 0.06 0.06 0.06 0.52 0.06 0.06 0.06 0.06 1


  • The prior information says the benefit of the treatment is symmetric, that is, it is equally likely to be better or worse than the standard treatment. This arises because the prior is symmetric for values of \(p\) greater and less than 0.5.
  • This prior also specifies that there is a 52% chance that there is no difference between the treatments, as 52% of the prior is on the value \(p = 0.5\).

Likelihood

  • Now we are ready to calculate the P(data | model) for each model considered.
  • This probability is called the likelihood. In this example, this is simply
    P(data | model) = P(k = 4 | n = 20, p)

Calculating the likelihood

p = seq(0.1, 0.9, 0.1)
prior = c(rep(0.06, 4), 0.52, rep(0.06, 4))
likelihood = dbinom(4, size = 20, prob = p)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Total
prior, P(model) 0.06 0.06 0.060 0.06 0.5200 0.06 0.06 0.06 0.06 1
likelihood, P(data|model) 0.0898 0.2182 0.1304 0.035 0.0046 0.0003 0 0 0

Posterior

Calculating the posterior

numerator = prior * likelihood
denominator = sum(numerator)
posterior = numerator / denominator
sum(posterior)
## [1] 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Total
prior, P(model) 0.06 0.06 0.060 0.06 0.5200 0.06 0.06 0.06 0.06 1
likelihood, P(data|model) 0.0898 0.2182 0.1304 0.035 0.0046 0.0003 0 0 0
P(data|model) * P(model) 0.0054 0.0131 0.0078 0.0021 0.0024 0 0 0 0 0.0308
posterior, P(model|data) 0.1748 0.4248 0.2539 0.0681 0.0780 0.0005 0 0 0 1

Decision making in the Bayesian paradigm

Prior, likelihood, and posterior visualized

par(mfrow = c(1,3))
barplot(prior, names.arg = p, las = 2, main = "Prior")
barplot(likelihood, names.arg = p, las = 2, main = "Likelihood")
barplot(posterior, names.arg = p, las = 2, main = "Posterior")

plot of chunk unnamed-chunk-6

More synthesis of the Bayesian approach

What if we had more data?

likelihood = dbinom(4*2, size = 20*2, prob = p)
numerator = prior * likelihood
denominator = sum(numerator)
posterior = numerator / denominator
par(mfrow = c(1,3))
barplot(prior, names.arg = p, las = 2, main = "Prior")
barplot(likelihood, names.arg = p, las = 2, main = "Likelihood")
barplot(posterior, names.arg = p, las = 2, main = "Posterior")

plot of chunk unnamed-chunk-8

What if we had even more data?

likelihood = dbinom(4*10, size = 20*10, prob = p)
numerator = prior * likelihood
denominator = sum(numerator)
posterior = numerator / denominator
par(mfrow = c(1,3))
barplot(prior, names.arg = p, las = 2, main = "Prior")
barplot(likelihood, names.arg = p, las = 2, main = "Likelihood")
barplot(posterior, names.arg = p, las = 2, main = "Posterior")

plot of chunk unnamed-chunk-10

Prediction

Prediction

  • Often probabilities for parameters are not the quantities of most interest to us.
  • What we really want to know is what are the chances of particular outcomes for the next observation.
  • What are the chances the next pregnancy will come from the RU486 (T) group?

Morning after, continued

  • Next we will calculate the predictive probability for the next pregnancy.
  • By weighting each \(p\) by its posterior probability we calculate a weighted average for \(p\).
  • This weighted average is interpreted as the probability that the next pregnancy is from group T, and it is called a predictive probability.

Predictive probabilities

(predictive = round(p * posterior,4))
## [1] 0.0175 0.0850 0.0762 0.0272 0.0390 0.0003 0.0000 0.0000 0.0000
sum(predictive)
## [1] 0.2452
Model (p) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Total
prior, P(model) 0.06 0.06 0.06 0.06 0.52 0.06 0.06 0.06 0.06 1
posterior, P(model|data) 0.1748 0.4248 0.2539 0.0681 0.0780 0.0005 0 0 0 1
predictive, p * P(model|data) 0.0175 0.0850 0.0762 0.0272 0.0390 0.0003 0.0000 0.0000 0.0000 0.2452

Prediction and utility

  • We predict that the next pregnancy we observe will come from the T group with ~24.5% chance.
  • Using these predictive probabilities, one could perform a decision analysis on whether RU486 or the standard treatment would yield the higher utility.

Continuous parameter space

Discrete to continuous

  • In the previous example we considered only 9 possible values for \(p\).
  • Using calculus, we can consider any value of p between 0 and 1, \(0 \le p \le 1\).
  • This simply requires us to use integration rather than summations in the calculations presented above.

The uniform distribution as the prior

\[ f(x)=\begin{cases} \frac{1}{b - a} & \mathrm{for}\ a \le x \le b, \\[8pt] 0 & \mathrm{for}\ x<a\ \mathrm{or}\ x>b \end{cases} \]

Prior, likelihood, and posterior

  • Prior: \(f(p) = 1\)
  • Likelihood: \(f(k | p) = {n \choose k} p^k (1-p)^{n-k}\)
  • Posterior = Prior x Likelihood \[ \begin{align*} f(p | k) &= f(p) \times f(p | k) \\ &\propto 1 \times {n \choose k} p^k (1-p)^{n-k} \\ &\propto p^k (1-p)^{n-k} \end{align*} \]

Prior, likelihood, and posterior visualized

When \(n = 20\) and \(k = 4\)

plot of chunk unnamed-chunk-13

What if we had more data?

When \(n = 40\) and \(k = 8\)

plot of chunk unnamed-chunk-14

What if we had even more data?

When \(n = 200\) and \(k = 40\)

plot of chunk unnamed-chunk-15

All together

plot of chunk unnamed-chunk-16