STA 395

COURSE: STA 395 - Readings in Statistical Science
TIME: TTh 3:50-5:05
PLACE: Room 25, Old Chemistry Building

As in previous semesters, this course will primarily consist of discussion by faculty members (or advanced graduate students) of their current research interests. An effort will be made, however, to organize the lectures by subject area, with introductory lectures in an area being given if the area is likely to be unfamiliar to many of the students in the course. The course will thus also serve to fill "gaps" in formal course offerings, and/or provide concentrated review in a subject area. Student feedback will be sought to ensure that enough introductory material is presented for the research talks to be understandable.

Students who have registered to audit the course will, of course, not have official coursework. Students registered for credit will be responsible for writing reports on lectures given in the course. The number of reports required equals the number of credits (1 to 3) for which one has registered. These reports should not be a review of the material presented, but should rather involve a creative addition to the material. Ways in which this could be done include:

using proposed new methodology on a data set, with discussion of the results;
developing the proposed methodology in a special case that had not been considered in the lectures;
comparing the new methodology with existing methodologies.

Reports need not be extensive.

The initial concentration area of the course will be Bayesian Analysis of Mixture Models. Jim Berger will begin with at two introductory lectures in the area, covering

Basic definition of mixture models;
The statistical uses of mixture models;
Difficulties with classical and Bayesian analysis of mixture models;
Standard Gibbs sampling in mixture models.

Subsequent talks are expected to cover various applications of mixture models, reversible jump MCMC computation, identifiability, and default Bayesian analysis of mixture models. The schedule of later talks will be forthcoming.

Fall 1997 Speaker Schedule

On September 11, 1997: Peter Mueller will speak on
Issues in Bayesian Analysis of Neural Network Models Stemming from work by Buntine and Weigend (1991), MacKay (1992) and Neal (1996), there is a growing interest in Bayesian analysis of Neural Network Models. We study computational approaches to the problem, suggesting an efficient Markov chain Monte Carlo scheme for inference and prediction with fixed architecture feed forward neural networks. The scheme is then extended to the variable architecture case, providing a procedure for automatic data driven choice of architecture.

On Tuesday, September 16, 1997: Susan Paddock will speak on
Mixture Models in Exploration of Chemical Activity and Binding in Drug Designs

On Thursday, September 18, 1997: Giovanni Parmigiani will speak on
A Primer on Model Mixing In this talk I will introduce you to the basics of Bayesian model averaging, or model mixing. The idea is this. It sometimes happens that several models will fit your data comparably well, but each will lead to rather different predictions, conclusions or decisions. It may then be more sensible to incorporate the uncertainty about which model to use, and draw your conclusions based on mixing over models, rather than just choosing one. I will use the example of selecting variables for a linear model to illustrate some of the modelling and computational issues. I'll show you some pictures from an application to a designed experiment in the pharmaceutical field, and also some simulation results.

On Tuesday, September 23, 1997: Merlise Clyde will speak on
Does Particulate Matter Particularly Matter? The effect of particulate matter on mortality is currently an important issue, as the U.S. Environmental Protection Agency proposes new regulations. Many of the epidemiological models are based on Poisson regression models for the mortality counts, with independent variables including various daily and lagged meteorological variables and measures of particulate matter. The high correlations among many of the explanatory variables makes tradition model selection difficult. The statistical significance of PM10 and PM2.5 (particulate matter of size 10 and 2.5 micrometers or smaller, respectively), however, depends heavily on the choice of meteorological variables that are selected. Issues of variable choice and model uncertainty due to variable selection thus play an important role in making decisions. We use Bayesian model averaging to assess the effect of PM10 on mortality, taking into account model uncertainty about which meteorological variables should be included. Because there are a large number of possible models with over two hundred variables, we introduce an approximation to the posterior distribution that allows for efficient importance sampling or Markov Chain Monte Carlo sampling of models with high posterior probability from high dimensional model spaces. We present posterior distributions under model averaging of the effect of PM10 on mortality and of the relative risk and contrast these results with previous analyses. The methods are applicable to a wide range of generalized linear models where prediction and model uncertainty are important issues.

On Thursday, September 25, 1997: There will be no seminar

On Tuesday, September 30, 1997: Mike West will speak on
Mixtures in (a corner of) Neurophysiology I'll talk about a collection of mixture modelling ideas, methods and issues generated in the course of a collaborative research project with neurophysiologists 1991-1995. This will include selected aspects of the scientific problem area and experimental contexts, aspects of our early work with "standard" Bayesian mixture models, some details of the later development of more "customised" hierarchical mixture models, various modelling issues and inferential difficulties, illustrative examples with some of our (many) data sets, and commentary on outstanding/open problems. nb. 1: Some of the background and more recent (1994-5) work appears in a 1997 JASA paper available on the Web site, DP 94-23, and there are several other papers on the Web site related to this project. nb. 2: Several current PhD students will have seen some or all of this a couple of years ago in STA 395, and pieces in STA 214 last year.

On Thursday, October 2, 1997: Mike West will speak on
Mixtures in (a corner of) Neurosphysiology" (continued)

On Tuesday, October 7, 1997: Jose Miguel Perez will speak on
Default Analysis of Mixture Models using Improper Priors Consider the mixture model with k components X ~ sum_{i=1}^k{w_i f(.|theta_i)}, where each component have parameters theta_i. It is well known that when improper priors for theta_i are assigned then it is not possible to calculate the marginal of X, and hence the posteriors do not exist. Diebolt and Roberts (1994), and later Berger and Shui (1996) have suggested sampling algorithms in which enough observations are assigned to each component to obtain proper posteriors. The problem is then that the allocation of observations into components are not independent, making much harder the final inference. As a result, default a nalysis for mixture models are usually carried out using vague priors. We propose an extension to Richardson and Green (1996) Reversible Jump MCMC algorithm for mixture models, in which non proper component priors pi^N(theta_i) are modified in the following way pi^*(theta_1,...,theta_k| k)= integral{\product{pi^N_i(theta_i|x^*)} m^*(x^*)dx^*} Here we take m^*(.) to be the marginal of a minimal set of observations X^* for one component. Intuitively X^* is an imaginary common training set for all component improper priors. This training set is used to produce posteriors pi^N(theta_i|X^*). There are several advantages in this approach, which include independence of arbitrary constants on improper priors and posterior independence of the allocation of observations in each component.

On Thursday, October 9, 1997: There will be no seminar - Duke Workshops begin!

On Tuesday, October 14, 1997: There will be no seminar due to October Break

On Thursday, October 16, 1997: Jon Stroud will speak on
A New Method for Estimating Nonlinear State-Space Models We'll begin by reviewing the Bayesian literature on state-space models. We'll then describe our simulation-based method, which involves approximating conditional densities with locally-weighted mixtures of normals. Finally, we'll present results from simulation studies that illustrate the effectiveness of our method. This work is joint with Peter M\"uller.

On Tuesday, October 21, 1997: Susie Bayarri will speak on
Weighted Distributions, Selection Models and Selection Mechanisms Weighted distributions and selection models arise when a random sample from the entire population of interest cannot be obtained or it is wished not to do so. Instead, the probability or density that a particular value enters the sample gets multiplied by a non negative weight function (weighted distributions) that may, in particular, adopt the form of the indicator function of some selection set (selection models). Bayesian and related methods for the analysis of such models are discussed, with special emphasis on selection, or truncation models. Selection mechanisms that could led to selected data of this type are studied and represented as binomial sampling plans. Examples are given in which the selection mechanism can be ignored in the sense that the data can be analyzed as a random sample of fixed size from a selection model.

On Thursday, October 23, 1997: Susie Bayarri will speak on
Issues on Information and Robustness in Weighted Distributions We study whether a "weighted" or "selected" sample is more or less informative than a random sample from the whole population. We compare the standard statistical experiment in which a random sample is drawn from some distribution with samples obtained from different weighted versions of that distribution. This comparison is carried out by means of the concepts of sufficiency and pairwise sufficiency as developed in Blackwell's theory of comparison of statistical experiments; Fisher information is also used. Examples are presented which illustrate the wide variety of effects that weight functions can have on the information provided by the experiments. Not only the information, but in general the inferences usually depend heavily on the weight function used, and often there is considerable uncertainty concerning this weight function; for instance, it may only be known that it lies between two specified weight functions. We consider robust Bayesian analysis for this situation, finding the range of posterior quantities of interest as the weight function ranges over a class of weight functions.

On Tuesday, October 28, 1997: Mike West will speak on
Inference in Successive Sampling Discovery Models I'll talk about models and inference in finite population contexts with size-biased sampling -- the specific problem area being that of so-called "successive sampling discovery modelling". Here units of a finite population are sampled ("discovered") sequentially in time. At any sampling stage, each of the unsampled units has a chance of selection that is a function of its characteristics (e.g., its "size"). I'll tell you a story about some work from 1991/2 on Bayesian inference in the area, with background. In addition to the biased sampling aspects, there are some interesting issues in the mathematics, in the simulation-based computation (surprise!), and in an applied study involving discovery of oil reserves. There's also a bit of fun to be had comparing Bayesian results with non-Bayesian efforts. Paper listed as 19 on the "Papers" link on my homepage covers the story (it later appeared in Econometrica). Paper 25 there is some earlier work, which also talks about other kinds of selection/biased sampling issues.

On Thursday, October 30, 1997: Jaeyong Lee will speak on
Semiparametric Bayesian Analysis of Selection Models Selection models are appropriate when a datum, x, enters the sample with weight w(x), where w is typically a monotone function. In many cases, statisticians may have only a vague idea about the form of the weight function. In this paper, a Dirichlet process prior around a parametric form of the weight function is used as a prior on the weight function. An MCMC computational scheme, using latent variables and reversible jumps, is described. The difficulties and dangers of nonparametric analysis of the weight function will also be highlighted.

On Tuesday, November 4, 1997: Luis R. Pericchi, Univ. Simon Bolivar, will speak on
Accurate and Stable Bayesian Model Selection: the Median Intrinsic Bayes Factor For Hypothesis Testing and Model Selection, the Bayesian approach is attracting considerable attention. The reasons for this include: i) It yields the posterior probabilities of the models (and not only the reject-non-reject rules of frequentist statistics); ii) It is a predictive approach and iii) It automatically embodies the principle of scientific parsimony. Until recently, to obtain such benefits through the Bayesian approach, required elicitation of proper subjective prior distributions, or the use of approximations of questionable generality. In Berger and Pericchi (JASA, 1996), the Intrinsic Bayes Factor Strategy is introduced, and is shown to be an automatic default method corresponding to an actual (an sensible) Bayesian analysis. In particular, it is shown that the Intrinsic Bayes Factor yields an answer which is asymptotically equivalent to the use of a specific (and reasonable) prior distribution, called the Intrinsic Prior. In this sense, the IBF method is correct to second order and might be thought of as a method for constructing default proper priors appropriate for model comparisons. For practical implementation of the IBF strategy, particularly in complex situations and with small or moderate sample sizes, we suggest the use of the Median IBF. This appears to be quite stable with respect to small changes in the data and improper priors, and can be used to handle both separate and nested model comparisons. We also introduce a broad classification of Bayes Factors, that are called "sampling" and "resampling" Bayes Factors. Resampling Bayes Factors, of which the Median IBF is one case, have fascinating properties that seem to suggest that are more robust than Sampling Bayes Factors, when the sample is generated from outside the candidate set of models.

On Thursday, November 6, 1997: Julia Mortera, Univ. Rome III, will speak on
Default Bayes Factors for One-sided Hypothesis Testing Bayesian hypothesis testing for non-nested hypotheses is studied, using various default Bayes factors, such as the fractional Bayes factor, the median intrinsic Bayes factor and the encompassing and expected intrinsic Bayes factors. The different default methods are first compared with each other and with the p-value in normal one-sided testing, to illustrate the basic issues. General results for one-sided testing in location and scale models are then presented. The default Bayes factors are also studied for specific models involving multiple hypotheses. In all the examples presented we also derive the intrinsic prior, when it exists; this is the prior distribution which, if used directly, would yield answers (asymptotically) equivalent to those for the given default Bayes factor.

On Tuesday, November 11, 1997: Dave Higdon will speak on
Markov Chains, Markov Random Fields, Simulation and Specification There are strong connections between the simulation from a multivariate distribution and specifying a multivariate distribution. Their relationship brings out a number of points which are of particular importance for researchers using simulation based techinques for exploring posterior distributions. In this talk, I'll cover the following topics: - Markov chains (splitting them, sampling directly from their stationary distribution) - Markov random field graphs - Markov chain Monte Carlo - specifying a multivariate distribution thru the full conditional distributions (Hamersley Clifford Theorem) - inducing independence thru auxiliary (or latent) random variables If there's time, we'll look at some applications in spatial statistics and image analysis. Both of these talks will dwell on ideas, examples and intuitive understanding rather than theoretical rigor.

On Thursday, November 13, 1997: Dave Higdon will speak on
Evolution of MCMC Building blocks This talk will focus on the basic building blocks of MCMC: Gibbs updates; Metropolis updates; and Hastings updates. In what was perhaps the original application of Markov chain Monte Carlo, Metropolis et al. used what is now called Metropolis updating to construct a Markov chain with prescribed stationary distribution. As a statistician, I tend to think of Gibbs updates (sampling from the full conditionals) as the most natural starting point for constructing a MCMC algorithm. However, the so-called Gibbs sampler didn't arrive on the scene until later. In this talk I'll look at the evolution from Metropolis updating, to Gibbs updating, to Hastings updating. I'll also show how the generalized Swendsen-Wang algorithm (a rather general auxiliary variable updating scheme) could be thought of as an evolutionary cousin to Metropolis updating as well.

On Tuesday, November 18, 1997: Richard L. Smith (UNC and Adjunct Professor at Duke) will speak on
Predictive Inference, Rare Events and Hierarchical Models The problem of predictive inference is essentially that of making probabilistic statements about some future random variable, when the distribution of that random variable depends on unknown parameters. This may be regarded as one of the fundamental problems of statistics, and there exist a variety of both frequentist and Bayesian approaches. In these talks I shall argue in favor of a Bayesian approach but evaluated by a variety of criteria including some from classical decision theory. The evaluations themselves use some new asymptotic expansions. This leads to some very general results but also some unexpected contradictions: for instance, the Bayesian estimators may perform worse than a crude maximum-likelihood "plug-in" approach when assessed by certain loss criteria in the tails of the predictive distributions. This kind of phenomenon forces us to think more carefully about the loss function and how to select a Bayesian procedure so that it has good properties when assessed by a particular loss function. The results are of particular relevance in hierarchical models where they establish a link between the classical decision theory approach related to the Stein effect, and modern approaches to inference in hierarchical models. A provisional plan for the three talks is: 1. Introduction and motivation; results for the exponential distribution; discussion of the role of loss functions in comparing different estimators. 2. Outline of the asymptotic theory; extensions to the simplest hierarchical models based on normal means. 3. More extensive discussion of hierarchical models; choosing the prior parameters, relations to empirical Bayes methodology. A preliminary version of the paper is available from the UNC web page (http://www.stat.unc.edu/preprints.html; click on the title of the paper).

On Thursday, November 20, 1997: Richard L. Smith (UNC and Adjunct Professor at Duke) will speak on
Predictive Inference, Rare Events and Hierarchical Models II Hierarchical modelling is wonderful and here to stay, but we usually "cheat" in choosing the prior distributions for hyperparameters. By "cheating" I mean that we usually choose hyperparameter priors in a casual fashion, often feeling that the choice is not too important. Unfortunately, as the number of hyperparameters grows, the effects of casual choices can multiply, leading to considerably inferior performance. As an extreme but not uncommon example, use of the wrong hyperparameter priors can even lead to impropriety of the posterior. Finding a solution to this problem is, unfortunately, difficult; indeed, it is not even clear how to attack the problem. In this talk we simply give some illustrations of the problem, and some "solutions" in special cases. Among the topics to be discussed along the way are reference priors for covariance matrices, and propriety and admissibility of priors in exchangeable hierarchical normal models,

On Tuesday, November 25, 1997: Richard L. Smith (UNC and Adjunct Professor at Duke) will speak on
Predictive Inference, Rare Events and Hierarchical Models III

On Tuesday, December 2, 1997: Jim Berger will speak on
On the Choice of Hyperpriors in Normal Hierarchical Models In his lectures, Richard Smith raised a number of interesting issues relating to the interaction between loss functions and priors. Certain of these issues will be considered from the more traditional `default' Bayesian view, wherein one views the loss as given (and fixed) and seeks a default prior distribution which is `good' for that loss. The hierarchical prior discussed by Richard in the normal means problem will also be examined, from the viewpoint of some surprising hidden features (good ones) that it possesses.

On Tuesday, December 4, 1997: Michael Lavine will speak on
The `Bayesics' of Ranked Set Sampling Ranked set sampling can be useful when measurements are expensive but units from the population can be easily ranked. In this situation one may draw k units from the population, rank them, select one on which to make the expensive measurement, draw another k units, rank them, select one, and so on. The method was originally suggested by McIntyre in connection with pasture yields and is obviously applicable in other situations as well. Dell and Clutter and Patil et al. explain the basics from a classical point of view. Our aim is to examine the procedure from a Bayesian point of view, determine whether ranked set sampling provides advantages over simple random sampling and explore some optimality questions. (see Duke Statistics discussion paper 96-08)