COURSE: STA 395 - Readings in Statistical Science
TIME: TTh 3:50-5:05
PLACE: Room 25, Old Chemistry Building
As in previous semesters, this course will primarily
consist of discussion by faculty members (or advanced graduate
students) of their current research interests. An effort will be
made, however, to organize the lectures by subject area, with
introductory lectures in an area being given if the area is
likely to be unfamiliar to many of the students in the course.
The course will thus also serve to fill "gaps" in formal course
offerings, and/or provide concentrated review in a subject area.
Student feedback will be sought to ensure that enough introductory
material is presented for the research talks to be understandable.
Students who have registered to audit the course will,
of course, not have official coursework. Students registered for
credit will be responsible for writing reports on lectures given
in the course. The number of reports required equals the number of
credits (1 to 3) for which one has registered. These reports should
not be a review of the material presented, but should rather involve
a creative addition to the material. Ways in which this could be done
include:
- using proposed new methodology on a data set, with
discussion of the results;
- developing the proposed
methodology in a special case that had not been considered in
the lectures;
- comparing the new methodology with existing
methodologies.
Reports need not be extensive.
The initial concentration area of the course will be
Bayesian Analysis of Mixture Models. Jim Berger will begin
with at two introductory lectures in the area, covering
- Basic definition of mixture models;
- The statistical uses of mixture models;
- Difficulties with classical and Bayesian analysis of mixture models;
- Standard Gibbs sampling in mixture models.
Subsequent talks are expected to cover various applications of
mixture models, reversible jump MCMC computation, identifiability,
and default Bayesian analysis of mixture models. The schedule of
later talks will be forthcoming.
Fall 1997 Speaker Schedule
On September 11, 1997: Peter
Mueller will speak on
Issues in Bayesian Analysis of
Neural Network Models
Stemming from work by Buntine and Weigend (1991), MacKay (1992) and
Neal (1996), there is a growing interest in Bayesian analysis of Neural
Network Models. We study computational approaches to the problem,
suggesting an efficient Markov chain Monte Carlo scheme for inference
and prediction with fixed architecture feed forward neural networks.
The scheme is then extended to the variable architecture case,
providing a procedure for automatic data driven choice of
architecture.
On
Tuesday, September 16, 1997: Susan Paddock will speak on
Mixture Models in Exploration of Chemical Activity and Binding
in Drug Designs
On Thursday, September 18, 1997: Giovanni Parmigiani will
speak on
A Primer on Model Mixing
In this talk I will introduce you to the basics of Bayesian model
averaging, or model mixing. The idea is this. It sometimes happens
that several models will fit your data comparably well, but each will
lead to rather different predictions, conclusions or decisions. It may
then be more sensible to incorporate the uncertainty about which model
to use, and draw your conclusions based on mixing over models, rather
than just choosing one. I will use the example of selecting variables
for a linear model to illustrate some of the modelling and
computational issues. I'll show you some pictures from an application
to a designed experiment in the pharmaceutical field, and also some
simulation results.
On Tuesday, September 23, 1997: Merlise Clyde will speak on
Does Particulate Matter Particularly Matter?
The effect of particulate matter on mortality is currently an important
issue, as the U.S. Environmental Protection Agency proposes new
regulations. Many of the epidemiological models are based on
Poisson regression models for the mortality counts, with independent
variables including various daily and lagged meteorological variables and
measures of particulate matter. The high correlations among many of the
explanatory variables makes tradition model selection difficult. The
statistical significance of PM10 and PM2.5 (particulate matter of size
10 and 2.5 micrometers or smaller, respectively), however, depends
heavily on the choice of meteorological variables that are selected.
Issues of variable choice and model uncertainty due to variable
selection thus play an important role in making decisions. We use
Bayesian model averaging to assess the effect of PM10 on mortality,
taking into account model uncertainty about which meteorological
variables should be included. Because there are a large number of
possible models with over two hundred variables, we introduce an
approximation to the posterior distribution that allows for efficient
importance sampling or Markov Chain Monte Carlo sampling of models with
high posterior probability from high dimensional model spaces. We
present posterior distributions under model averaging of the effect of
PM10 on mortality and of the relative risk and contrast these results
with previous analyses. The methods are applicable to a wide range of
generalized linear models where prediction and model uncertainty are
important issues.
On Thursday, September 25, 1997: There will be no seminar
On Tuesday, September 30, 1997: Mike West will speak on
Mixtures in (a corner of) Neurophysiology
I'll talk about a collection of mixture modelling ideas, methods
and issues generated in the course of a collaborative research project
with neurophysiologists 1991-1995. This will include selected aspects of
the scientific problem area and experimental contexts, aspects of our
early work with "standard" Bayesian mixture models, some details of the
later development of more "customised" hierarchical mixture models, various
modelling issues and inferential difficulties, illustrative examples with
some of our (many) data sets, and commentary on outstanding/open problems.
nb. 1: Some of the background and more recent (1994-5) work appears in
a 1997 JASA paper available on the Web site, DP 94-23, and there
are several other papers on the Web site related to this project.
nb. 2: Several current PhD students will have seen some or all of this
a couple of years ago in STA 395, and pieces in STA 214 last year.
On Thursday, October 2, 1997: Mike West will speak on
Mixtures in (a corner of) Neurosphysiology"
(continued)
On Tuesday, October 7, 1997: Jose Miguel Perez will speak on
Default Analysis of Mixture Models using Improper
Priors
Consider the mixture model with k components X ~ sum_{i=1}^k{w_i
f(.|theta_i)}, where each component have parameters theta_i. It is well
known that when improper priors for theta_i are assigned then it is not
possible to calculate the marginal of X, and hence the posteriors do not
exist. Diebolt and Roberts (1994), and later Berger and Shui (1996) have
suggested sampling algorithms in which enough observations are assigned
to each component to obtain proper posteriors. The problem is then that
the allocation of observations into components are not independent, making
much harder the final inference. As a result, default a nalysis for
mixture models are usually carried out using vague priors.
We propose an extension to Richardson and Green (1996) Reversible Jump
MCMC algorithm for mixture models, in which non proper component priors
pi^N(theta_i) are modified in the following way
pi^*(theta_1,...,theta_k| k)= integral{\product{pi^N_i(theta_i|x^*)}
m^*(x^*)dx^*}
Here we take m^*(.) to be the marginal of a minimal set of observations
X^* for one component. Intuitively X^* is an imaginary common training set
for all component improper priors. This training set is used to produce
posteriors pi^N(theta_i|X^*). There are several advantages in this
approach, which include independence of arbitrary constants on improper
priors and posterior independence of the allocation of observations in
each component.
On Thursday, October 9, 1997: There will be no seminar - Duke
Workshops begin!
On Tuesday, October 14, 1997: There will be no seminar due
to October Break
On Thursday, October 16, 1997: Jon Stroud will speak on
A New Method for Estimating Nonlinear State-Space
Models
We'll begin by reviewing the Bayesian literature on state-space
models. We'll then describe our simulation-based method, which
involves approximating conditional densities with locally-weighted
mixtures of normals. Finally, we'll present results from simulation
studies that illustrate the effectiveness of our method. This work
is joint with Peter M\"uller.
On Tuesday, October 21, 1997: Susie Bayarri will speak on
Weighted Distributions, Selection Models and Selection
Mechanisms
Weighted distributions and selection models arise when a random
sample from the entire population of interest cannot be obtained or it
is wished not to do so. Instead, the probability or density that a
particular value enters the sample gets multiplied by a non negative
weight function (weighted distributions) that may, in particular, adopt
the form of the indicator function of some selection set (selection
models). Bayesian and related methods for the analysis of such models
are discussed, with special emphasis on selection, or truncation models.
Selection mechanisms that could led to selected data of this type are
studied and represented as binomial sampling plans. Examples are given
in which the selection mechanism can be ignored in the sense that the
data can be analyzed as a random sample of fixed size from a selection
model.
On Thursday, October 23, 1997: Susie Bayarri will speak on
Issues on Information and Robustness in Weighted
Distributions
We study whether a "weighted" or "selected" sample is more or less
informative than a random sample from the whole population. We compare the
standard statistical experiment in which a random sample is drawn from
some distribution with samples obtained from different weighted versions
of that distribution. This comparison is carried out by means of the
concepts of sufficiency and pairwise sufficiency as developed in
Blackwell's theory of comparison of statistical experiments; Fisher
information is also used. Examples are presented which illustrate the wide
variety of effects that weight functions can have on the information
provided by the experiments. Not only the information, but in general the
inferences usually depend heavily on the weight function used, and often
there is considerable uncertainty concerning this weight function; for
instance, it may only be known that it lies between two specified weight
functions. We consider robust Bayesian analysis for this situation,
finding the range of posterior quantities of interest as the weight
function ranges over a class of weight functions.
On Tuesday, October 28, 1997: Mike West will speak on
Inference in Successive Sampling Discovery Models
I'll talk about models and inference in finite population contexts with
size-biased sampling -- the specific problem area being that of so-called
"successive sampling discovery modelling". Here units of a finite
population are sampled ("discovered") sequentially in time. At any sampling
stage, each of the unsampled units has a chance of selection that is a
function of its characteristics (e.g., its "size"). I'll tell you a story
about some work from 1991/2 on Bayesian inference in the area, with background.
In addition to the biased sampling aspects, there are some interesting issues
in the mathematics, in the simulation-based computation (surprise!), and in an
applied study involving discovery of oil reserves. There's also a bit of fun
to be had comparing Bayesian results with non-Bayesian efforts.
Paper listed as 19 on the "Papers" link on my homepage covers the story
(it later appeared in Econometrica). Paper 25 there is some earlier work,
which also talks about other kinds of selection/biased sampling issues.
On Thursday, October 30, 1997: Jaeyong Lee will speak on
Semiparametric Bayesian Analysis of Selection Models
Selection models are appropriate when a datum, x, enters the sample
with weight w(x), where w is typically a monotone function. In many cases,
statisticians may have only a vague idea about the form of the weight
function. In this paper, a Dirichlet process prior around a parametric
form of the weight function is used as a prior on the weight function. An
MCMC computational scheme, using latent variables and reversible jumps,
is described. The difficulties and dangers of nonparametric analysis of
the weight function will also be highlighted.
On Tuesday, November 4, 1997: Luis R. Pericchi, Univ. Simon Bolivar, will speak on
Accurate and Stable Bayesian Model Selection: the Median Intrinsic Bayes Factor
For Hypothesis Testing and Model Selection, the Bayesian approach
is attracting considerable attention. The reasons for this include:
i) It yields the posterior probabilities of the models (and not only
the reject-non-reject rules of frequentist statistics); ii) It is
a predictive approach and iii) It automatically embodies the principle
of scientific parsimony.
Until recently, to obtain such benefits through the Bayesian approach,
required elicitation of proper subjective prior distributions, or the use
of approximations of questionable generality.
In Berger and Pericchi (JASA, 1996), the Intrinsic Bayes Factor Strategy
is introduced, and is shown to be an automatic default method corresponding
to an actual (an sensible) Bayesian analysis. In particular, it is shown
that the Intrinsic Bayes Factor yields an answer which is asymptotically
equivalent to the use of a specific (and reasonable) prior distribution,
called the Intrinsic Prior. In this sense, the IBF method is correct
to second order and might be thought of as a method for constructing
default proper priors appropriate for model comparisons.
For practical implementation of the IBF strategy, particularly in complex
situations and with small or moderate sample sizes, we suggest the use of
the Median IBF. This appears to be quite stable with respect to small
changes in the data and improper priors, and can be used to handle
both separate and nested model comparisons.
We also introduce a broad classification of Bayes Factors, that are called
"sampling" and "resampling" Bayes Factors. Resampling Bayes Factors,
of which the Median IBF is one case, have fascinating properties
that seem to suggest that are more robust than Sampling Bayes Factors,
when the sample is generated from outside the candidate set of models.
On Thursday, November 6, 1997: Julia Mortera, Univ. Rome III, will speak on
Default Bayes Factors for One-sided Hypothesis Testing
Bayesian hypothesis testing for non-nested hypotheses is studied, using various
default Bayes factors, such as the fractional Bayes factor, the median
intrinsic
Bayes factor and the encompassing and expected intrinsic Bayes factors.
The different default methods are first compared with each other and with
the p-value in normal one-sided testing, to illustrate the basic issues.
General results for one-sided testing in location and scale models are then
presented. The default Bayes factors are also studied for specific models
involving multiple hypotheses. In all the examples presented we also derive
the intrinsic prior, when it exists; this is the prior distribution which,
if used directly, would yield answers (asymptotically) equivalent to those
for the given default Bayes factor.
On Tuesday, November 11, 1997: Dave Higdon will speak on
Markov Chains, Markov Random Fields, Simulation and Specification
There are strong connections between the simulation from
a multivariate distribution and specifying a multivariate
distribution. Their relationship brings out a number of points
which are of particular importance for researchers using simulation
based techinques for exploring posterior distributions. In this
talk, I'll cover the following topics:
- Markov chains (splitting them, sampling directly from their
stationary distribution)
- Markov random field graphs
- Markov chain Monte Carlo
- specifying a multivariate distribution thru the full conditional
distributions (Hamersley Clifford Theorem)
- inducing independence thru auxiliary (or latent) random
variables
If there's time, we'll look at some applications in
spatial statistics and image analysis.
Both of these talks will dwell on ideas, examples and intuitive
understanding rather than theoretical rigor.
On Thursday, November 13, 1997: Dave Higdon will speak on
Evolution of MCMC Building blocks
This talk will focus on the basic building blocks of
MCMC: Gibbs updates; Metropolis updates; and Hastings updates.
In what was perhaps the original application of Markov chain
Monte Carlo, Metropolis et al. used what is now called
Metropolis updating to construct a Markov chain with prescribed
stationary distribution. As a statistician, I tend to think
of Gibbs updates (sampling from the full conditionals) as the
most natural starting point for constructing a MCMC algorithm.
However, the so-called Gibbs sampler didn't arrive on the scene
until later. In this talk I'll look at the evolution from
Metropolis updating, to Gibbs updating, to Hastings updating.
I'll also show how the generalized Swendsen-Wang algorithm
(a rather general auxiliary variable updating scheme)
could be thought of as an evolutionary cousin to Metropolis
updating as well.
On Tuesday, November 18, 1997: Richard L. Smith (UNC and Adjunct Professor at Duke) will speak on
Predictive Inference, Rare Events and Hierarchical Models
The problem of predictive inference is essentially that of making
probabilistic statements about some future random variable, when
the distribution of that random variable depends on unknown
parameters. This may be regarded as one of the fundamental problems
of statistics, and there exist a variety of both frequentist and
Bayesian approaches. In these talks I shall argue in favor of a
Bayesian approach but evaluated by a variety of criteria including
some from classical decision theory. The evaluations themselves use
some new asymptotic expansions. This leads to some very general
results but also some unexpected contradictions: for instance, the
Bayesian estimators may perform worse than a crude maximum-likelihood
"plug-in" approach when assessed by certain loss criteria in the tails
of the predictive distributions. This kind of phenomenon forces us to
think more carefully about the loss function and how to select a
Bayesian procedure so that it has good properties when assessed by
a particular loss function. The results are of particular relevance
in hierarchical models where they establish a link between the
classical decision theory approach related to the Stein effect, and
modern approaches to inference in hierarchical models.
A provisional plan for the three talks is:
1. Introduction and motivation; results for the exponential distribution;
discussion of the role of loss functions in comparing different
estimators.
2. Outline of the asymptotic theory; extensions to the simplest hierarchical
models based on normal means.
3. More extensive discussion of hierarchical models; choosing the prior
parameters, relations to empirical Bayes methodology.
A preliminary version of the paper is available from the UNC web page
(http://www.stat.unc.edu/preprints.html; click on the title of the
paper).
On Thursday, November 20, 1997: Richard L. Smith (UNC and Adjunct Professor at Duke) will speak on
Predictive Inference, Rare Events and Hierarchical Models II
Hierarchical modelling is wonderful and here to stay, but we
usually "cheat" in choosing the prior distributions for hyperparameters.
By "cheating" I mean that we usually choose hyperparameter priors in a casual
fashion, often feeling that the choice is not too important. Unfortunately,
as the number of hyperparameters grows, the effects of casual choices can
multiply, leading to considerably inferior performance. As an extreme but
not uncommon example, use of the wrong hyperparameter priors can even lead
to impropriety of the posterior.
Finding a solution to this problem is, unfortunately, difficult;
indeed, it is not even clear how to attack the problem. In this talk we
simply give some illustrations of the problem, and some "solutions" in
special cases. Among the topics to be discussed along the way are reference
priors for covariance matrices, and propriety and admissibility of priors in
exchangeable hierarchical normal models,
On Tuesday, November 25, 1997: Richard L. Smith (UNC and Adjunct Professor at Duke) will speak on
Predictive Inference, Rare Events and Hierarchical Models III
On Tuesday, December 2, 1997: Jim Berger will speak on
On the Choice of Hyperpriors in Normal Hierarchical Models
In his lectures, Richard Smith raised a number of interesting issues
relating to the interaction between loss functions and priors. Certain of these
issues will be considered from the more traditional `default' Bayesian view,
wherein one views the loss as given (and fixed) and seeks a default
prior distribution which is `good' for that loss.
The hierarchical prior discussed by Richard in the normal means
problem will also be examined, from the viewpoint of some surprising
hidden features (good ones) that it possesses.
On Tuesday, December 4, 1997: Michael Lavine will speak on
The `Bayesics' of Ranked Set Sampling
Ranked set sampling can be useful when measurements are expensive
but units from the population can be easily ranked. In this
situation one may draw k units from the population, rank them,
select one on which to make the expensive measurement, draw another
k units, rank them, select one, and so on. The method was
originally suggested by McIntyre in connection with
pasture yields and is obviously applicable in other situations as
well. Dell and Clutter and Patil et al. explain
the basics from a classical point of view. Our aim is to examine
the procedure from a Bayesian point of view, determine whether
ranked set sampling provides advantages over simple random sampling
and explore some optimality questions. (see Duke Statistics discussion paper 96-08)