$Id: 250wk11.txt,v 1.1 2015/01/27 23:04:25 rlw Exp rlw $ Bayesian Testing [Loosely based on DG+S 9.9] ------------------------------------------------------------------ Suppose we observe X.1 ... X.n and we know that they are iid from one of two possible pdf's, f.0(x) (the "null hypothesis") or f.1(x) (the "alternate hypothesis"). We have already seen that the best sampling-theory based test of H.0 against H.1 will "Reject H.0 at level alpha" if the LIKELIHOOD RATIO (against H.0) f.1(x.1) ... f.1(x.n) /\(x) = ----------------------- f.0(x.1) ... f.0(x.n) is big--- specifically, if the "P-value" P = Pr_0 [ /\(X) > t ] <= alpha where "t" is the observed value of /\(x) and where the probability is computed under the "null" distribution f.0 . What would be a Bayesian solution to the problem of choosing one of two simple hypotheses, H.0 or H.1??? Let's begin with a prior probability pi.0 for H.0 and pi.1 for H.1; then the posterior probability of H.0 is: pi.0 f.0(x.1) ... f.0(x.n) P[ H.0 | x ] = ---------------------------------------------------------- pi.0 f.0(x.1) ... f.0(x.n) + pi.1 f.1(x.1) ... f.1(x.n) Divide top and bottom by the numerator: 1 = -------------------------- 1 + (pi.1/pi.0) * /\(x) where /\(x) is our old friend the likelihood ratio, just as above. It's equivalent and maybe simpler to think about the ODDS against H.0: P[ H.1 | x ] pi.1 ------------ = -------- /\ (x) P[ H.0 | x ] pi.0 i.e. the posterior odds against H.0 are the PRODUCT of the PRIOR ODDS against H.0 times the LIKELIHOOD RATIO against H.0. NUMERICAL EXAMPLE: For H.0: X.i ~ Po(2) vs H.1: X.i ~ Po(3) the LHR is /\(x) = (3/2)^S.n exp(-n) [ where S.n is the sum ], so for a prior giving equal probability pi.0 = 1/2 = pi.1 to the two hypotheses, 1) With n= 1 and X = 3, /\ = 1.241593 P[H.0|x] = 0.4461 2) With n= 10 and X-bar = 2.9, /\ = 5.803656 P[H.0|x] = 0.1470 3) With n=100 and X-bar = 2.9, /\ = 43352778 P[H.0|x] = 2.3067 e-08 Again we find the evidence compelling with n=100, and not at all with n=1; at the more interesting n=10 case, beginning with equal prior probabilities the Bayesian approach doesn't find the evidence very compelling. --------------------------------------------------------------------------- Note that BOTH the - Sampling-theory "P-value" AND the - Bayesian "Posterior Probability of H.o" are numbers between zero and one, with SMALL values suggesting the Hypothesis H.0 is probably FALSE and LARGE values suggesting the Hypothesis H.0 is probably TRUE, but they are NOT THE SAME THING--- pay close attention to the conditioning: P-value = Prob [ T(x) >= t | H.0 true ] Posterior Probability = Prob [ H.0 true | T(x) = t ] As always, the sampling-theory quantity is computed CONDITIONALLY ON THE PARAMETER VALUE (which we pretend we know) while, as always, the Bayesian quantity is computed CONDITIONALLY ON THE DATA (which we have observed) The numerical values of these different evidence summaries are often wildly different. For a great example, consider a binomial example in which we observe x = 473 successes in n = 1000 independent trials Let's test the point-null hypothesis H.0: X ~ Bi(n=1000, p = 0.05) against the composite alternative hypothesis H.1: X ~ Bi(n=1000, p < 0.5) The P-value for H.0 is P = P[ X <= 473 | p = 0.50 ] = pbinom(473, 1000, 0.5) = 0.04684365, so H.0 would be rejected at level alpha = 0.05. With equal prior probabilities pi.0 = pi.1 = 1/2, the Bayesian posterior probability of H.0 against the specific alternative H.1: X ~ Bi(n=1000, p) for each 0 < p < 1/2 has probability 1 P[ H.0 | X=473 ] = ------------- 1 + /\ for p^473 (1-p)^527 /\ = ------------------------- = (2p)^473 (2q)^527, q = (1-p) (1/2)^473 (1/2)^527 It turns out this posterior probability is NEVER BELOW 0.189 for ANY CHOICE of 0 < p < 1/2 (the minimum comes at p = 0.473, where /\ is max). Thus the evidence of "473 successes in 1000 trials" is: strong enough for a sampling-based approach to reject at alpha=0.05; weak enough that the posterior probabity of H.0 is at least 19% for any possible prior distribution with p.0 = 1/2. In some sense what is going wrong for the Sampling Theory approach is that the P-value includes the probability of observing T >= t, i.e. of evidence as strong *OR STRONGER* than what we observed conditional on H.0 being true. To compute this we must look at the pdf for the data, f(x | th), NOT ONLY at the actual data we observed but also at those MORE EXTREME than we observed. For this problem (and many others), the evidence of those "more extreme" points is much stronger than the evidence we actually observed. ---------------------------------------------------------------------------- BAYES for COMPOSITE HYPOTHESES Testing a simple hypothesis like H.0: th = th.0 against a one-sided compound alternative like H.1: th > th.0 or a two-sided compound alternative like H.1: th != th.0 is more complex--- we must specify not only the prior probability of the point th.0, but also we need to specify how the rest of the prior probability is spread out over the other values of th. It is NOT POSSIBLE to do this in a "vague" or "non-informed" way, ie, we cannot use an improper prior distribution in testing. We could get away with that in ESTIMATION problems because the arbitrary constant "c" in formulas like pi(x) = c, -oo < x < oo for the improper uniform density on the whole real line would CANCEL when computing the posterior distribution (it's in both the numerator and denominator); for TESTING problems it doesn't cancel.