September 1

Speaker: Brani Vidakovic

Title: Introduction to Multiscale Statistical Methods

In this tutorial and non-technical lecture students will be introduced to basics of wavelet and wavelet-like function families. A variety of applications in Geophysics, Signal and Image Processing, Mathematics and Statistics will be discussed.


September 03

Speaker: Brunero Liseo, Universita' di Roma - La Sapienza

Title: The Skew-Normal class densities

In the talk I will introduce the class and I will stress its properties. Also I will discuss pros and cons of its uses both as sampling densities to check skewness in the data, and as prior distributions in Bayesian analysis.


September 08

Speaker: Hedibert Lopes, PhD Candidate at Duke Statistics

Title: Predictive Computation in Factor Models

Bayesian inference in factor analytic models has received renewed attention in recent years due partly to computational advances, but also partly to applied focuses generating factor structures, as exemplified by recent work in financial time series modeling. The focus of our current work is on exploring questions of uncertainty about the number of latent factors in a multivariate factor model, combined with methodological and computational issues of model specification and model fitting. We explore reversible jump MCMC methods that build on sets of parallel Gibbs sampling-based analyses to generate suitable "empirical" proposal distributions and that address the challenging problem of finding efficient proposals in high-dimensional models. Various additional computational issues are discussed, and we explore applications in an econometric context.


September 10

Speaker: Jacob Laading, Duke Statistics

Title: Hierarchical deformation modeling for medical images

We present a flexible model for the deformation of images based on a hierarchically defined probability density. The model is based on the idea of "facets"; landmarks which are not tied to specific image phenomena. A large number of these facets are used in a hierarchy to capture the deformation on several scales. An atlas structure is deformed onto a new observed realization from an image class via a conditionally defined hierarchical normal distribution intended to capture shape. In addition we define a distribution on deviations in local intensity profile or other image-derived quantities for a given facet. Several alternative methods for defining the model structure will be presented, as well as a two example applications. This work was carried out with Drs. Colin McCulloch and Valen Johnson.


September 15

Speaker: Jaeyong Lee: Purdue University and Duke Statistics

Title: Acceleration of Metropolis-Hastings Algorithms

It is a common sense among those who use Markov Chain Monte Carlo that large rejection rates of Markov chain result in slow mixing of the chain. Indeed, there are two theoretical results (Peskun, 1973, Tierney, 1997) which confirm this common sense. We will give another look at the Metropolis-Hastings algorithm and possible improvements will be suggested, which includes the splitting rejection algorithm termed by Mira and Tierney.


September 17

Speaker: Merlise Clyde, Duke Statistics

Title: Model Uncertainty and Bayesian Model Averaging

In this talk I will provide an overview of Bayesian model averaging and model selection. For example, in regression models there is often substantial prior uncertainty about which covariates one should include. Variable selection typically results in the selection of a single model for estimating quantities of interest, and as a result, ignores model uncertainty in statistical inferences. In Bayesian model averaging one estimates quantities of interest by a weighted average of model specific quantities, where weights are determined by how much support each model receives from the data. I will discuss the use of improper prior distributions and connections to model selection criteria, such as AIC, BIC, and RIC. In high dimensional problems, one must approximate Bayesian model averaging based on a sample of models. I will present relationships among some common algorithms for sampling models in linear regression, such as reversible jump Markov chain Monte Carlo, Stochastic Search Variable Selection, and Markov chain Monte Carlo Model Composition and random sampling, and discuss various methods for estimation based on the sampled output.


September 22

Speaker: Dongchu Sun, University of Missouri-Columbia, NISS and Duke University

Title: Random Effects in Generalized Linear Mixed Models

We examine the use of special forms of correlated random effects in the generalized linear mixed model (GLMM) setting. A special feature of our GLMM is the inclusion of random residual effects to account for lack of fit due to extra variation, outliers, and other unexplained sources of variation. For random effects, we consider, in particular, the correlation structure and improper priors associated with the autoregressive (AR) model of Ord (1975) and the conditional auto-regressive (CAR) model of Besag (1974). We give conditions for the propriety of the posterior distribution of the GLMM when the fixed effects have a constant improper prior and the random effects have a possibly improper conditional autoregressive prior. Several examples of exponential families as well as computational details for Markov chain Monte Carlo simulation are also presented.


September 24

Speaker: Susie Bayarri, University of Valencia and Duke Statistics

Title: Conditional Measures of Surprise

Measures of surprise refer to studying the compatibility of data with an assumed hipothesis without a careful formulation of alternative hipotheses. This seems in contradiction with Bayesian reasoning, but we argue that these measures do have a useful role to play even in the Bayesian world. As a matter of fact, many (Bayesian) authors have tried to develop such. We first make a brief summary of these measures. Then we ellaborate on the appropriate distribution in which 'surprise' should be measued. We show that we have to narrow down the prior predictive distribution by appropriately conditioning. Several posibilites are discussed and an optimal conditioning proposed.


September 29

Speaker: Mike West, Duke Statistics

Title: Multivariate Non-Gaussian Time Series: Bayesian Analysis of Longitudinal Data in a Case Study in the VA Hospital System, I

Discussions of the developments on the VA project as reported in Duke Statistics Discussion Papers 97-22a,b. A slow and easy discussion of the basics: VA interests, policy questions, background. Data structure, exploration, characteristics. Modelling ideas and first models. Model fitting and model assessment. Some summary findings.


October 1

Speaker: Mike West, Duke Statistics

Title: Multivariate Non-Gaussian Time Series: Bayesian Analysis of Longitudinal Data in a Case Study in the VA Hospital System, II

Discussions of the developments on the VA project as reported in Duke Statistics Discussion Papers 97-22a,b. Continuation: Don't miss Talk 1 if you want to understand Talk 2. More advanced modelling and inferential questions in the VA study: multivariate/longitudinal/time series/hierarchical random effects models. Issues of institutional comparisons, and other matters as time permits.


October 06

Speaker: Jim Berger

Title: Default Bayesian Hypothesis Testing and Model Selection

This will be a review of the motivation for, and difficulties with, default Bayesian hypothesis testing and model selection. Included will be a discussion of some of the general automatic model selection and testing procedures, including BIC, the "intrinsic Bayes factor" and the "fractional Bayes factor".


October 08

Speaker: Jim Berger

Title: Bayesian Model Selection via the Expected Posterior Prior

Recently developed automatic Bayesian methods of model selection, such as the "intrinsic Bayes factor" and the "fractional Bayes factor," have proven to be highly effective but are often difficult to work with. In particular, intrinsic Bayes factors can be challenging to compute, while fractional Bayes factors require considerable care in definition and use. A highly promising new approach to the problem is based on developing explicit default priors for the models under consideration, called "expected posterior priors." These are strongly related to "intrinsic priors" arising from the intrinsic Bayes factor approach, but have the advantages of being explicitly given and being relatively easy to use in MCMC computational schemes. A variety of examples of use of expected posterior priors will be given, including an application to analysis of a mixture model arising in an astrophysical problem.


October 15

Speaker: Gabriel Katul

Title: Modeling Turbulent Transport within Forested Canopies: Why statistics?

In this lecture, equations of motions that describe air flow inside vegetation and the "closure" problem in turbulence are briefly introduced. Motivation for using statistical description of turbulence is then presented. Preliminary results on "closure" model results and comparison with field measurements performed at Duke Forest are also shown. Model limitations are then discussed and are used to motivate detailed analysis of specific types of eddy motion commonly observed using high frequency detailed turbulence measurements. Identification of such eddy motion using newly developed wavelet thresholding methods with the hope of refining closure models and better understanding turbulent transport concludes this lecture.


October 20

Speaker: Giovanni Parmigiani

Title: Statistical issues in understanding disease genes, I

The investigation of how we inherit susceptibility to a diseases from our parents is one of the current frontiers of medicine. Earlier work focussed on inheritance of features (phenotypes) that are very closely determined by a single gene. Our recently increased ability to "measure" the human genome is giving us the option to investigate more complex and also more common situations, involving many genes at once, and concerning genetic effects that are weaker and subtler. In these situations, linkage analysis (the search for the gene(s) that are responsible for a disease) needs to be complemented by subsequent analyses that investigate the nature and magnitude of the genetic effect. These are the analysis that carry most of the clinical and public health implications and that can lead the way, many years down the line, to preventive treatments. A key issue is penetrance, which, in the simplest case, is the probability of developing disease when one carries a "defective" gene. Tuesday, I will review the fundamentals of disease inheritance and describe some of the standard study designs. Thursday I will discuss statistical methodologies, presenting examples from problems I have worked on and highlighting promising research and modeling approaches.


October 22

Speaker: Giovanni Parmigiani

Title: Statistical issues in understanding disease genes, II

See the abtract for: Statistical issues in understanding disease genes, I.


October 27

Speaker: Francesca Dominici

Title: National Mortality, Morbidity and Air Pollution Study: Statistical Challenges

Time series studies have shown associations between air pollution concentrations and morbidity and mortality. These studies have largely been conducted within single cities, and with varying methods. Key questions remain unaddressed concerning the findings, including 1) the extent and sources of heterogeneity of air pollution effects across locations; 2) the public health significance of the short-term associations ("harvesting"); and 3) the effect of error in the measurement of the exposure variable on the estimated effect of air pollution. The NMMAPS study comprises the development of statistical methods to address these questions and the application of these methods to national data sets on mortality and hospitalization among persons 65 years of age and older. The latter serves as an index of morbidity. In this talk I will review some of the statistical challenges that arise in addressing the questions of the NMMAPS study. For analyzing data from multiple locations, we develop a semiparametric Poisson regression analyses of daily time-series data from the largest 20 U.S. cities, and we introduce hierarchical models for combining estimates of the pollution-mortality relationship. For addressing ``harvesting'' in air pollution studies, we propose a novel statistical strategy based on frequency domain log-linear regression methods. Finally, for evaluating the effects of measurement error we introduce a semiparametric Poisson-normal model to estimate the bias in the relative rate of mortality due of using ambient concentrations instead of personal exposures of PM10. The model is applied to the combined analysis of five studies with personal and outdoor sampling of particulate matter. Data bases have been assembled on mortality of the 100 largest U.S. cities. The next phase of the NMMAPS study will completed the morbidity and mortality analyses and carry out a combined analyses of both morbidity and mortality. The methods of NMMAPS should prove useful for future surveillance of the health effects of air pollution. Joint work with Jonathan Samet and Scott L. Zeger.


October 29

Speaker: Peter Mueller and Don Berry, Duke University

Title: Simulation Based Sequential Design: Optimal Stopping in a Clinical Trial

We discuss simulation based methods for exploration and maximization of expected utility in sequential decision problems. We consider problems which require backward induction with analytically intractable expected utility integrals at each stage. We propose to use forward simulation to approximate the integral expressions, and a reduction of the allowable action space to avoid problems related to an exponentially exploding number of possible trajectories in the backward induction. The artificially reduced action space allows strategies to depend on the full history of earlier observations and decisions only indirectly through a low dimensional summary statistic. We illustrate the proposed approach with an application to an optimal stopping problem in a clinical trial.

Key words: Backward induction, Forward simulation, Monte Carlo simulation, Optimal design, Sequential decision.


November 03

Speaker: Lurdes Inoue, Peter Mueller, Gary Rosner, and Mark Dewhirst, Duke University and Duke University Medical Center

Title: A Bayesian Model for Detecting Changes in Nonlinear Profiles

We propose a model for longitudinal data with random effects which includes a flexible nonparametric regression for the profile of responses over time for individual subjects. This research is motivated by experiments evaluating the hemodynamic effects of various agents in tumor-bearing rats. In one set of experiments, the mice breathed room air, followed by carbogen (a mixture of pure oxygen and carbon dioxide), with different groups of animals receiving different concentrations of the two gases. Interest focuses on changes in hemodynamic profiles, e.g., longitudinal measurements of oxygen pressure, heart rate, tumor blood flow, tumor arteriolar diameter, etc. For example: Do individual profiles change once the breathing mixture changes? How does changing the concentration of carbon dioxide alter the effect of carbogen on hemodynamics? The nature of the recorded responses does not allow any meaningful parametric form for a regression of these profiles on time. Additionally, response patterns differ widely across individuals. Therefore, we propose a non-parametric regression model of the profile data on time, with a hierarchical structure to account for subject-to-subject variability.We explore several alternative implementations of the non-parametric regression, including a dynamic state space model.


November 05

Speaker: Richard De Veaux, Department of Mathematics, Williams College

Title: Hybrid Neural Networks for Environmental Process Control

In many environmental processes there is a great deal of scientific and engineering knowledge about the system. This domain knowledge may range from a simple energy balance taking the form of a constraint, to a complex first principles models containing unmeasurable reaction rates. The challenge is to incorporate this knowledge into the data analysis and eventual control of the process. A feed forward neural network provides a particularly convenient form for folding in such prior knowledge into the estimation, prediction and control of the system. The resulting model is known as a hybrid neural network model. We will show how the neural network is trained and compare its performance to more traditional techniques. November 10

Speaker: David Higdon

Title: Markov Random Fields and Applications in Image Analysis

We apply Bayesian image analysis techniques to a problem in a newly developed scanned probe technology which uses commercial magnetoresistive (MR) record/playback heads as probes to sense magnetic fields. This technology can be used both for magnetic imaging, and for evaluating playback and record processes in magnetic recording. In MR microscopy, an MR head is raster-scanned while in physical contact with a magnetic sample (e.g., hard disk media, tape, or fine magnetic particles). By plotting the MR resistance as a function of position, a very high resolution (on the order of $.1 \times 1.0$ $\mu$m) magnetic image of the sample is constructed. This case study focuses on characterizing the head sensitivity function (HSF), which depends on the physical dimensions and the magnetic properties of the MR head. These sensitivity functions are of great practical interest since they ultimately relate to the head's performance in a high density data storage environment. We use a Bayesian approach to model and estimate the HSF, while accounting for noise and other nuisance effects such as thermal drift. Besides yielding a point estimate, which is a fairly difficult task here, this approach also quantifies uncertainty so we can assess whether certain features of the estimated head sensitivity function appear to be genuine.


November 12

Speaker: David Conesa

Title: Bayesian Analysis of Bulk Arrival Queues

Statistical analysis of bulk arrival queues from a Bayesian point of view is presented. We review briefly some basics of Queueing Theory, and bulk arrival models. We deal with the most basic type of bulk arrival queue, namely $M^{X}/M/1$. The focus is on prediction of the usual measures of performance of the system in equilibrium. We present a way to compute the posterior predictive distribution of the number of customers in the system, through the inversion of its probability generating function. Posterior distribution of the waiting time, in the queue and in the system, of the first customer of an arriving group is also computed, but now in terms of their Laplace and Laplace-Stieltjes transforms. Finally, a numerical example is also addressed.


November 17

Speaker: Jamie Robins, Harvard School of Public Health

Title: Marginal Structural Models and Causal Inference in Epidemiology

Standard approaches for adjustment of confounding are biased when there exist time-dependent confounders that are also intermediate variables. This paper introduces marginal structural models (MSMs), a new class of causal models that allow for appropriate adjustment of confounding in those situations. The parameters of a MSM can be consistently estimated using a new class of estimators: the inverse-probability-of-treatment weighted estimators.


November 24

Speaker: Peter M\"uller and Brani Vidakovic

Title: Bayesian Inference with Wavelets: Density Estimation

We propose a prior probability model in the wavelet coefficient space. The proposed model implements wavelet coefficient thresholding by full posterior inference in a coherent probability model. We introduce a prior probability model with mixture priors for the wavelet coefficients. The prior includes a positive prior probability mass at zero which leads to a posteriori thresholding and generally to a posteriori shrinkage on the coefficients. We discuss an efficient posterior simulation scheme to implement inference in the proposed model. The discussion is focused on the density estimation problem. However, the introduced prior probability model on the wavelet coefficient space and the Markov chain Monte Carlo scheme are general.


November 24

Speaker: Jane Liu, Duke Statistics

Title: Particle Filtering in Dynamic Models

We discuss the issues of simulation-based sequential analysis -- or particle filtering -- in dynamic models. Our focus is sequential Bayesian learning about time-varying state vectors and fixed model parameters simultaneously. We discuss a general approach that combines old ideas of smoothing using kernel methods with newer ideas of auxilliary particle filtering of Shephard and Pitt (1997). We show that specific smoothing approaches can interpret and suggest modifications to techniques that add "artificial evolution noise" to fixed model parameters at each time point, an idea introduced by Gordon, Salmond and Smith (1993) to address the problems of sample attrition and prior:data conflict arising in simulation-based sequential analysis using SIR and other standard methods for parameter learning. Unlike the Gordon et al method, our new approach permits smoothing and regeneration of sample points of model parameters without the "loss of historical information" inherent in the Gordon et al approach. This is achieved using shrinkage modifications of kernel smoothing, as introduced by West (1992). An example where analytical forms of posterior distributions are available provides assessment of the method, and further illustration and comparisons with auxilliary particle filtering are given in a stochastic volatility model.


December 01

Speaker: Ed Iversen, Duke Statistics, Duke University

Title: A Model for Risk of Breast Cancer

A statistical model for predicting risk of breast cancer conditional on a fixed set of covariates is described. The model assumes known the joint distribution of genotype, race, age, and age at diagnosis of breast cancer in the general (U.S.) population; the conditional distribution of a fixed set of covariates given these variables is, however, unknown. Data is available from both retrospective (covariates are collected for a sample of individuals conditional on their disease status) and prospective (disease status is observed for a sample of individuals conditional on their risk factors) studies, and an approach is outlined for combining data from the two types of studies (Mueller, et al. 1996). The choice of retrospective versus prospective modeling is discussed in the situation where one of the margins (essentially the response here) is assumed known.


December 03

Speaker: James D. Lynch, University of South Carolina and NISS

Title: On the Ergodicity of General State Space Markov Chains

The variation norm ergodic theorem for general state space Markov chains is shown to be equivalent to the a.s. convergence to one of the likelihood ratio of the transition density and the equilibrium density for samples from the chain. The results are derived using martingales arguments and are for completely general state spaces. For the ergodic case, Doob's inequality can be used to show how the variation norm regulates how far, in some sense, a Markov simulation is from the desired equilibrium distribution. This talk is based on work with J. Sethuraman.


B. S.
Fri Apr 17