$Id: 250wk11.txt,v 1.1 2015/01/27 23:04:25 rlw Exp rlw $
Bayesian Testing [Loosely based on DG+S 9.9]
------------------------------------------------------------------
Suppose we observe X.1 ... X.n and we know that they are iid from one of
two possible pdf's, f.0(x) (the "null hypothesis") or f.1(x) (the "alternate
hypothesis"). We have already seen that the best sampling-theory based
test of H.0 against H.1 will "Reject H.0 at level alpha" if the
LIKELIHOOD RATIO (against H.0)
f.1(x.1) ... f.1(x.n)
/\(x) = -----------------------
f.0(x.1) ... f.0(x.n)
is big--- specifically, if the "P-value"
P = Pr_0 [ /\(X) > t ] <= alpha
where "t" is the observed value of /\(x) and where the probability
is computed under the "null" distribution f.0 .
What would be a Bayesian solution to the problem of choosing one of
two simple hypotheses, H.0 or H.1???
Let's begin with a prior probability pi.0 for H.0 and pi.1 for H.1;
then the posterior probability of H.0 is:
pi.0 f.0(x.1) ... f.0(x.n)
P[ H.0 | x ] = ----------------------------------------------------------
pi.0 f.0(x.1) ... f.0(x.n) + pi.1 f.1(x.1) ... f.1(x.n)
Divide top and bottom by the numerator:
1
= --------------------------
1 + (pi.1/pi.0) * /\(x)
where /\(x) is our old friend the likelihood ratio, just as above.
It's equivalent and maybe simpler to think about the ODDS against H.0:
P[ H.1 | x ] pi.1
------------ = -------- /\ (x)
P[ H.0 | x ] pi.0
i.e. the posterior odds against H.0 are the PRODUCT of the PRIOR ODDS
against H.0 times the LIKELIHOOD RATIO against H.0.
NUMERICAL EXAMPLE:
For H.0: X.i ~ Po(2) vs H.1: X.i ~ Po(3)
the LHR is /\(x) = (3/2)^S.n exp(-n) [ where S.n is the sum ], so for a
prior giving equal probability pi.0 = 1/2 = pi.1 to the two hypotheses,
1) With n= 1 and X = 3, /\ = 1.241593 P[H.0|x] = 0.4461
2) With n= 10 and X-bar = 2.9, /\ = 5.803656 P[H.0|x] = 0.1470
3) With n=100 and X-bar = 2.9, /\ = 43352778 P[H.0|x] = 2.3067 e-08
Again we find the evidence compelling with n=100, and not at all with n=1;
at the more interesting n=10 case, beginning with equal prior probabilities
the Bayesian approach doesn't find the evidence very compelling.
---------------------------------------------------------------------------
Note that
BOTH the - Sampling-theory "P-value"
AND the - Bayesian "Posterior Probability of H.o"
are numbers between zero and one, with
SMALL values suggesting the Hypothesis H.0 is probably FALSE and
LARGE values suggesting the Hypothesis H.0 is probably TRUE,
but they are NOT THE SAME THING--- pay close attention to the
conditioning:
P-value = Prob [ T(x) >= t | H.0 true ]
Posterior Probability = Prob [ H.0 true | T(x) = t ]
As always, the sampling-theory quantity is computed
CONDITIONALLY ON THE PARAMETER VALUE (which we pretend we know)
while, as always, the Bayesian quantity is computed
CONDITIONALLY ON THE DATA (which we have observed)
The numerical values of these different evidence summaries are often
wildly different. For a great example, consider a binomial example in
which we observe
x = 473 successes in n = 1000 independent trials
Let's test the point-null hypothesis
H.0: X ~ Bi(n=1000, p = 0.05)
against the composite alternative hypothesis
H.1: X ~ Bi(n=1000, p < 0.5)
The P-value for H.0 is
P = P[ X <= 473 | p = 0.50 ] = pbinom(473, 1000, 0.5) = 0.04684365,
so H.0 would be rejected at level alpha = 0.05.
With equal prior probabilities pi.0 = pi.1 = 1/2, the Bayesian
posterior probability of H.0 against the specific alternative
H.1: X ~ Bi(n=1000, p)
for each 0 < p < 1/2 has probability
1
P[ H.0 | X=473 ] = -------------
1 + /\
for
p^473 (1-p)^527
/\ = ------------------------- = (2p)^473 (2q)^527, q = (1-p)
(1/2)^473 (1/2)^527
It turns out this posterior probability is NEVER BELOW 0.189 for ANY
CHOICE of 0 < p < 1/2 (the minimum comes at p = 0.473, where /\ is max).
Thus the evidence of "473 successes in 1000 trials" is:
strong enough for a sampling-based approach to reject at alpha=0.05;
weak enough that the posterior probabity of H.0 is at least 19%
for any possible prior distribution with p.0 = 1/2.
In some sense what is going wrong for the Sampling Theory approach is
that the P-value includes the probability of observing T >= t, i.e. of
evidence as strong *OR STRONGER* than what we observed
conditional on H.0 being true. To compute this we must look at the pdf for
the data, f(x | th), NOT ONLY at the actual data we observed but also at
those MORE EXTREME than we observed. For this problem (and many others),
the evidence of those "more extreme" points is much stronger than the
evidence we actually observed.
----------------------------------------------------------------------------
BAYES for COMPOSITE HYPOTHESES
Testing a simple hypothesis like
H.0: th = th.0
against a one-sided compound alternative like
H.1: th > th.0
or a two-sided compound alternative like
H.1: th != th.0
is more complex--- we must specify not only the prior probability
of the point th.0, but also we need to specify how the rest of the
prior probability is spread out over the other values of th.
It is NOT POSSIBLE to do this in a "vague" or "non-informed" way, ie,
we cannot use an improper prior distribution in testing. We could get
away with that in ESTIMATION problems because the arbitrary constant
"c" in formulas like
pi(x) = c, -oo < x < oo
for the improper uniform density on the whole real line would CANCEL
when computing the posterior distribution (it's in both the numerator
and denominator); for TESTING problems it doesn't cancel.