Quantifying Surprise in the Data and Model Verification

M. J. Bayarri and James Berger

P-values are often perceived as measurements of the degree of surprise in the data, relative to a hypothesized model. They are also commonly used in model (or hypothesis) verification, i.e., to provide a basis for rejection of a model or hypothesis. We first make a distinction between these two goals: quantifying surprise can be important in deciding whether or not to search for alternative models, but is questionable as the basis for rejection of a model. For measuring surprise, we propose a simple calibration of the p-value which roughly converts a tail area into a Bayes factor or `odds' measure. Many Bayesians have suggested certain modifications of p-values for use in measuring surprise, including the predictive p-value and the posterior predictive p-value. We propose two alternatives, the conditional predictive p-value and the partial posterior predictive p-value, which we argue to be more acceptable from Bayesian (or conditional) reasoning. Postscript File (426kB)