abstracts Aug 27

Speaker: Michael Lavine

Title: Proper Inference from Improper Posteriors - I.

We will look at an example of how an improper posterior is used to speed up MCMC convergence for the proper posterior of a parameter of interest. We will also look at other interesting examples of Markov chains on improper posteriors.

Aug 29

Speaker: Michael Lavine

Title: Proper Inference from Improper Posteriors - II.

We will prove a Marginal Ergodic Theorem which gives conditions under which MCMC chains on improper posteriors yield proper inference for parameters of interest.

Sept 3

Speaker: Robert Brown

Title: Introduction to the Beowulf Design - I.

In recent years small to medium scale University parallel supercomputing has largely moved from big, expensive supercomputing centers and computers such as the Cray or SP2 or SP3 to small, cheap "Beowulf"-style compute clusters. What is a beowulf? Why is it such an effective way to aggregate compute resources even for large projects? And how does parallel computing work, anyway? These are some of the questions that will be addressed in this "beowulfery for beginners" talk.

Sept 5

Speaker: Robert Brown

Title: Introduction to the Beowulf Design - II.

Sept 10

Speaker: Christopher Holloman

Title: Parallel Virtual Machines -- Statistical Analysis and Distributed Computing

One disadvantage of using Markov chain Monte Carlo to make inferences about posterior distributions is the amount of time the chain must be run to adequately explore the parameter space. In some problems, parallel computing can be used to make sampling more efficient. First, we present a simple example that demonstrates the sort of problems in which parallel computing can be applied effectively. We also present an application in hydrology involving the modeling of permeabilities given information about fluid flow in an aquifer. Parallel computing allows us to take advantage of the multi-scale nature of this problem by allowing multiple processors to explore different parts of the posterior simultaneously. This talk will focus on the details of implementing parallel processing rather than on the theoretical aspects of statistical inference in these types of problems.

Sept 12

Speaker: Yuguo Chen

Title: Sequential Importance Sampling and Its Applications

I will introduce a general framework for sequential importance sampling and discuss how to incorporate resampling steps to improve efficiency. Some important aspects of sequential importance sampling will be illustrated through several computationally challenging problems, including conditional inference on contingency tables, statistical inference in population genetics and filtering and smoothing in change-point problems.

Sept 17

Speaker: Yuguo Chen

Title: Sequential Importance Sampling with Resampling in Phylogenetic Inference

Several recent Monte Carlo algorithms for full likelihood based inference on modern population genetics data (Griffiths and Tavare, 1994; Stephens and Donnelly, 2000) make use of sequential importance sampling (SIS). We propose a new resampling schedule to increase the efficiency of the SIS approach. Our scheme generalizes the usual resampling idea. Through some examples we will show that our resampling scheme can improve the efficiency by several orders of magnitude. We offer insights into the new resampling schedule, and discuss some possible generalizations of the idea.

Sept 19

Speaker: Yuguo Chen

Title: Testing the Rasch Model

The measurement of the degree to which a person possesses an ability or trait, such as intelligence or honesty, is an important problem in education and psychology. The Rasch model is probably the most widely recognized statistical measurement model and is often used in applied settings. But the strong assumptions of the Rasch model can lead to misleading conclusions if the assumptions are not met. This has led to considerable focus on testing the goodness of fit of the model to the observed data. All of the test statistics proposed in the literature involve estimating parameters of the Rasch model and evaluating the goodness of fit of the model based on these estimates. However, Rasch claimed that tests of the model should be based on the conditional distribution of the observed data given both the item totals and the person scores. This involves evaluation of the uniform distribution of zero-one tables conditional on the row and column sums which is very complicated and no such testing procedure is available in the literature. In this talk, we try to approach this problem by using the sequential importance sampling method we developed for sampling zero-one tables. The simulation results show that our test statistic is substantially more powerful than the test statistics proposed in the literature.

Sept 24

Speaker: Merrill Liechty

Title: Estimating Correlation Matrices Using Mixture Priors

Variables can be clustered according to many different classifications. One intuitive way to classify them is by their correlations. Writing the covariance matrix in terms of the standard deviation and correlation matrices allows for incorporation of prior information of this kind. Individual correlations and variables can be put into clusters or groups within the framework implied by using mixture prior distributions. MCMC methods allow for efficient sampling of these models, and posterior inference regarding which group a correlation or variable belongs to is straightforward.

Sept 26

Speaker: Gabriel Huerta

Title: A Spatio-temporal Model for Mexico City Ozone Levels

We consider hourly readings of ozone concentrations over Mexico City and propose a model for spatial as well as temporal interpolation and prediction. The model is based on regressing the observed readings on a set of meteorological variables, such as temperature and humidity. A few harmonic components are added to account for the main periodicities that ozone presents during a given day. The model incorporates spatial covariance structure for the observations and the parameters that define the harmonic components. Using the Dynamic Linear model framework, we show how to compute smoothed means and predictive values. The methodology is illustrated with observations corresponding to September of 1997. (joint work with Bruno Sanso and Jonathan Stroud)

Oct 1

Speaker: Jim Berger

Title: Nonparametric Bayes via Parametric Model Selection or Averaging - I.

There is an increasing interest in approaching nonparametric Bayesian problems through selection of parametric models of an arbitrary size, or performing model averaging over a (near infinite) set of parametric models. These talks will focus on estimation of a nonparametric regression function, modeled as an infinite-order polynomial. (A wavelet example will also be given.) Nonparametric regression and model selection techniques will be explained, as they are introduced. Various Bayesian and empirical Bayesian implementations will be discussed.

Oct 3

Speaker: Jim Berger

Title: Nonparametric Bayes via Parametric Model Selection or Averaging - II.

Oct 8

Speaker: Susie Bayarri

Title: Incorporating Uncertainties into CORSIMS

CORSIMS is a frequently used stochastic simulator of highway and street traffic. It has a number of uncertain inputs, and produces as outputs measures of the congestion of the system. Unknown inputs are usually replaced by fixed values (estimates, tuned, guesses, default values, etc.) and hence uncertainty is not taken into account. A Bayesian solution consists of feeding CORSIMS with simulated values from the (joint) posterior distribution of the inputs thus producing distributions on the outputs reflecting the inherent uncertainty in the problem. Moreover, the Bayesian model can also handle in a natural way the substantial error in the observations as well as missing counts. CORSIMS runs typically take 2 or 3 minutes, thus precluding MCMC computations. We use a simplified model (fast simulator) to carry out the required MCMC runs.

This is joint work with Jim Berger and German Molina.

Oct 10

Speaker: Hedibert Lopes

Title: Simulation-based Smoothing and Filtering in Factor Stochastic Volatility Models: Two Econometric Applications

In this talk I will be presenting some recent applications of simulation-based smoothing and filtering techniques applied to financial econometric problems that appear in Lopes, Aguilar and West (2000) and Lopes and Migon (2001). Both reports can be downloaded from http://eagle.ufrj.br/~hedibert/papers.html.

In the first study we combine Pitt and Shephard's (1999) auxiliary particle filter with Liu and West's (2001) fixed parameters updating algorithm. Our main practical interest is to investigate whether or not time-varying factor loadings improve the performance of existing exchange rate portfolios (Aguilar and West, 2000).

The second application investigate possible comovements in emerging economy stock markets, such as the brazilian and the mexican ones. We measure the transmission of shocks by cross-market correlation coefficients following Forbes and Rigobon's (2000) notion of "shift-contagion". We empirically show that the time varying covariance structure of the four most important latin american markets (Brazil, Mexico, Argentina and Chile) and the US exhibits strong codependence that can be characterized by two major common factors. We also argue that some sort of contagion is present during periods of economical instability, or "crisis".

Oct 17

Speaker: Laura Gunn

Title: A Bayesian Approach to Modeling the Proportion of AIDS Patients Prescribed Appropriate Care

In past years, AIDS patients were often treated not only with less respect but with inappropriate medical care (United States Food and Drug Administration, Office of Special Health Issues, 1996). Thus, modeling who receives appropriate care is exactly what this talk will set out to determine. As an expansion of a previous, more simplistic analysis conducted by the center for Health Policy, Law, and Management here at Duke, we offer a hierarchical Bayesian logistic random effects model to describe the treatment of AIDS patients across 55 physicians within 6 academic clinics in the southeast. The Gibbs sampler is used to implement this process.

Oct 22

Speaker: Mark Huber

Title: Introduction to Perfect Sampling

Perfect sampling algorithms generate random variates from distributions for which the normalizing constant is unknown. This is the building block of Monte Carlo algorithms, and is an important tool in many fields of endevour. The classical approach of building a Markov chain with the desired distribution suffers from questions concerning the mixing time (a.k.a burn-in time, initialization time, settling time) of the chain. Perfect samplers do not need to know the mixing time of a Markov chain in order to work. We'll begin with the oldest (about six years old) general method called coupling from the past (CFTP). This is a type of complete coupling method, and we'll briefly examine the pitfalls and advantages of CFTP and other complete couplers. Concepts such as bounding chains and monotonicity and their relationship to complete coupling algorithms will be explored.

Oct 24

Speaker: Mark Huber

Title: The Randomness Recycler

The Randomness Recycler is a new type of perfect sampling algorithm that solves some of the issues that arise with coupling from the past. A hyrbrid of Markov chains and acceptance/rejection ideas, this approach does not use the classical Gibbs samplers or Metropolis/Hastings Markov chains at all, and through this approach gives the first linear time algorithms for generating random variates from many distributions of interest. We'll give an introduction to how it works, and construct RR type algorithms for several problems of interest.

Oct 29

Speaker: Marco Ferreira

Title: Multi-scale Modeling of 1-D Permeability Fields

Permeability plays an important rule in subsurface fluid flow studies, being one of the most important quantities for the prediction of fluid flow patterns. The estimation of permeability fields is therefore critical and necessary for the prediction of the behavior of contaminant plumes in aquifers and the production of petroleum from oil fields. In the particular case of production of petroleum, part of the available data for the estimation of permeability fields is a "production curve". In formal statistical analysis to incorporate such information, corresponding likelihood functions for the high-dimensional random field parameters representing permeability field can be computed with the help of a fluid flow simulator (FFS). In addition, there usually exists information about the permeability fields relevant at different scales of resolution as a result of studies of the geological characteristics of the oil field, well tests, and laboratory measurements. Our work reported here uses a recently developed multi-scale model as a prior for 1-D permeability fields in order to incorporate the information available at the different scales of resolution. Estimation of the permeability field is then performed using an MCMC algorithm with an embedded FFS to incorporate the information given by the observed production curve. The performance of the proposed approach with respect to the recovery of the original permeability field is studied with simulated data.

Oct 31

Speaker: Ed Iversen

Title: Assessing Evidence for Gene-Environment Interactions Given High Risk Family Data

A primary function of genetic counseling is to provide individuals with disease predisposing genetic mutations an accurate estimate of their risk of developing disease and advice on potential risk modifying behavior. A key ingredient is identification of environmental exposures that modify risk through interaction with the disease genotype. We describe methods for testing for the presence of gene-environment interactions using family history, genetic testing and epidemiological data on a sample of high-risk individuals. These methods address several important complexities in this type of data including potential ascertainment bias, genetic testing errors, missing data and study to study heterogeneity. We illustrate these methods by applying them to a multi-center sample of breast cancer susceptibility genes, BRCA1/2, tested individuals to assess evidence for oral contraceptive use and alcohol consumption as modifiers of BRCA1/2 penetrance.

Nov 5

Speaker: Jennifer Pittman

Title: Adaptive Splines and Genetic Algorithms with an Application to fMRI Data

In many statistical applications, a modeling technique is needed which can capture a relationship between two variables x and y that is more complex than a simple linear relationship. One approach to solving this problem is to attempt to replace the noisy and/or complex relationship which the data represent by something simple yet reasonable which captures the nature of the dependence in the data. In the case where little is known about the function f, the modeling technique should be flexible or adaptive, i.e., able to handle a wide variety of functional shapes and behaviors. Nonparametric modeling is one such technique which has been successful in characterizing features of datasets that could not be obtainable by other means (Hansen and Kooperberg 1999).

Due in part to the increased availability of computational power, spatially adaptive smoothing methods involving regression splines have become a popular and rapidly developing class of nonparametric modeling techniques. Most existing algorithms for fitting adaptive splines are based on non-linear optimization and/or stepwise selection; a possible alternative is to use a more intensive numerical optimization technique such as a genetic algorithm to perform knot selection.

A spatially adaptive modeling technique referred to as adaptive genetic splines (AGS) will be introduced which combines the optimization power of a genetic algorithm with the flexibility of polynomial splines for adaptive spline modeling in low dimensional settings. The basics of genetic algorithms will be reviewed and preliminary simulation results comparing the performance of the genetic algorithm method to other current methods, such as HAS (Luo and Wahba 1997), SUREshrink (Donoho and Johnstone 1995), MARS (Friedman 1991), and a Bayesian spline method (Denison, Mallick and Smith 1998), will be discussed, as well as a current application of AGS to fMRI time series data.

Nov 7

Speaker: Athanasios Kottas

Title: A Nonparametric Bayesian Modeling Approach for Cytogenetic Dosimetry

In cytogenetic dosimetry, samples of cell cultures are exposed to a range of doses of a given agent. In each sample, at each dose level some measure of cell disability is recorded. The objective is to develop models which explain cell response to dose. Such models can be used to predict response at unobserved doses. More importantly, such models can provide inference for unknown exposure doses given the observed responses. Typically, cell disability is viewed as a Poisson count but in the present work a more appropriate response is a categorical classification. In the literature, modeling in this case is very limited. What exists is purely parametric. We propose a fully Bayesian nonparametric approach to this problem, offering comparison with a parametric model through a simulation study. We also examine a dataset modeling blood cultures exposed to radiation where classification is with regard to number of micronuclei per cell.

(Joint work with Marcia Branco and Alan Gelfand)

Nov 12

Speaker: Chuanshu Ji

Title: Markov Chain Monte Carlo Calibration of Stochastic Volatility Models

We propose a Bayesian computational scheme for calibration of stochastic volatility models using data from underlying asset returns and derivative prices. A special feature of this work is the combination of historical volatility and implied volatility. MCMC methods play a pivotal role in the simulation of historical time series for the real-world dynamics and pricing formulas driven by the risk-neutral dynamics. The paper can be downloaded from http://www.stat.unc.edu/faculty/cji/research.html .

(Joint work with Xin Ge)

Nov 14

Speaker: Mike West

Title: Musings on Factor Analysis, Understanding Multivariate Structure, and Factor Regression Models

I will talk about relationships between empirical data decompositions and formal factor models for exploratory data analysis in problems with large-scale data matrices, and the use of factor models in regression with many predictor variables. Our work in molecular phenotyping in functional genomics has been a key motivating context, and I will discuss analyses of breast cancer gene expression data for illustrations. Key topics will be latent factor models, their connections with data decomposition methods (SVD, PCA), the notion of sparse factor models, prior specification and Bayesian analysis. This will be an unformatted discussion on topics of current research interest in which there are currently more questions than answers.

Nov 19

Speaker: Elias Moreno

Title: Intrinsic Priors in Problems with a Change-Point

The Bayesian formulation of the changepoint problem involves priors for discrete and continuous parameters. When the prior information is vague a default Bayesian analysis might be useful and presents some difficulties that can be solved with the use of intrinsic priors.

In this talk a default Bayesian model selection approach is considered to the problem of making inference on the point in a sequence of random variables at which the underlying distribution changes.

Inferences are based on the posterior probabilities of the possible changepoints. However, these probabilities depend on Bayes factors for which improper default priors for the parameters leave the Bayes factors defined up to a multiplicative constant. To overcome that difficulty intrinsic priors arising from the conventional priors are considered.

With intrinsic priors the posterior distribution of the changepoint and the size of the change can be computed. The results are applied to some common sampling distributions and illustrations to some much studied dataset are given.

Nov 21

Speaker: Jeffrey Krolik

Title: Statistical Signal Processing for Radar and Sonar in Complex Multipath Propagation Conditions

The processing of signals carried by propagating waves has traditionally been developed assuming plane-wave propagation models due to their analytic and computational simplicity. This is despite the fact that in many problems, reception of the signal via multiple propagation paths between the source and receiver is a dominant feature. Although difficulties with plane-wave approximations in multipath environments have been dealt with by a variety of mitigation techniques, the performance of such methods is inevitably upper bounded by the case where multipath is absent. The notion that instead of trying to undo the effects of coherent multipath, one could actually exploit them to achieve dramatically improved performance with the assistance of a numerical propagation model is the essence what is now known as matched-field processing (MFP). The availability of inexpensive, high power computing to rapidly calculate numerical solutions of the wave equation is what has really driven the development of matched-field methods over the last two decades. This talk will describe how computational acoustic and electromagnetic models can be coupled with statistical signal processing techniques in order to deal with the inevitable uncertainties in the assumed propagation environment. Systems in which MFP techniques are being developed range from passive and active sonars at frequencies below 500 Hz., to skywave over-the-horizon radar operating between 3 and 30 Mhz., to shipboard microwave radar in the 3 GHz. band. Application areas which will be discussed include counter-drug operations, undersea surveillance, and remote sensing of the marine boundary layer. Although the requirements of each application are quite different, a common framework for integrating statistical and computational physical models will be presented.

Nov 26

Speaker: Brian Neelon

Title: Bayesian Order Restricted Inference Using Transformation Priors

Researchers often have prior knowledge of the ordering in a location or scale parameter across groups or categories of a covariate. We propose a class of prior distributions based on transforming draws from an underlying density in $R^k$ to a restricted space. The transformation function is chosen to follow a minimax form, motivated by classical order-restricted estimators. Unlike commonly used truncated conjugate priors, which only allow for strict inequalities, our proposed prior can accommodate inequalities of the form $\mu_j \ge \mu_{j'}$, with $\mu_j = \mu_{j'}$ assigned positive prior probability. An efficient Markov chain Monte Carlo sampling algorithm, in which the restricted parameters are updated simultaneously, is provided for posterior computation. Methods are described for incorporating ordering constraints in linear and generalized linear models. The approach is illustrated through application to data from a time to pregnancy study.

Nov 28

Speaker: Ming Liao

Title: Bayesian Estimation of Gene Expression Index

In the application of Oligonucleotide expression array technology, the reliable estimation of expression index is critical for "high-lever analysis" such as classification, clustering and regulatory network. Recently, a statistical model has been proposed to model probe effect explicitly and has shown the advantage over classical Average Difference. Here we develop a Bayesian version of the method, which is basically a Bayesian factor analysis model with Gibbs sampling. After introducing proper constraint, robust noise function, mixture component and mixture prior, our model has been shown that it is much more reliable and accurate than original one. In this talk, First I will give a brief introduction to the background of our research, then I will talk about our method in detail. Finally, some results based on both artificial data and real microarray data will be given.

Yuguo Chen
August, 2001