abstracts Jan 8

Speaker: James Berger

Title: Statistical Validation of Computer Models I

One of the major activities in science and engineering today is the development of math-based computer models of scientific and engineering processes. The most basic question in the evaluation of such computer models is ``Does the computer model adequately represent reality?''

A six-step computer model validation methodology will be discussed. The methodology is based on using spatial and Bayesian statistical tools. The latter are particularly suited to treating the major issues associated with the validation process: quantifying multiple sources of error and uncertainty in computer models; combining multiple sources of information; and updating validation assessments as new information is acquired. Moreover, hierarchical Bayesian techniques allow inferential statements to be made about predictive error associated with model predictions in untested situations.

The framework has been implemented for two test bed computer models (a vehicle crash model and a resistance spot welding model) that will be used to illustrate the proposed validation process.

Jan 13

Speaker: Merrill Liechty

Title: Portfolio Selection with Higher Moments

We build on the Markowitz portfolio selection process by incorporating higher order moments of the assets, as well as utility functions based on predictive asset returns. We propose the use of the skew normal distribution as a characterization of for the asset returns. We show that this distribution has many attractive features when it comes to modeling multivariate returns. Preference over portfolios is framed in terms of expected utility maximization. We discuss estimation and optimal portfolio selection using Bayesian methods. These methods allow for a comparison to other optimization approaches where parameter uncertainty is either ignored or accommodated in a non-traditional manner. Our results suggest that it is important to incorporate higher order moments in portfolio selection. Further, we show that our approach leads to higher expected utility than the resampling methods common in the practice of finance.

Jan 20

Speaker:

Title:

Jan 27

Speaker: Susie Bayarri

Title: Bayesian Checking of Hierarchical Models

With the availability of MC and MCMC methods, hierarchical models are used extremely often in applications. There are many goodness-of-fit techniques for traditional models, but basically none for hierarchical models. Thus, these models are widely used and rarely criticized. Model checking can also be approached from a Bayesian point of view, and applied to the checking of hierarchical models. The crucial issue is how to deal with the unknown (nuisance) parameters when improper objective priors are used. In this talk we review and compare several methods.

Feb 3

Speaker: David Dunson

Title: Selecting Between Additive and Proportional Hazards

Although Cox proportional hazards regression is the default analysis for time to event data, there is typically uncertainty about whether the effects of a predictor are more appropriately characterized by a multiplicative or additive model. To accommodate this uncertainty, we place a model selection prior on the coefficients in an additive-multiplicative hazards model. This prior assigns positive probability, not only to the model that has both additive and multiplicative effects for each predictor, but also to sub-models corresponding to no association, to only additive effects, and to only proportional effects. Special cases permit variable selection and inferences in the Cox model and in the additive hazards model. Constraints are incorporated on the coefficients in the additive component of the model to ensure non-negative hazards, a condition often violated by current frequentist methods. Formulating the hazards model within a counting process framework and augmenting the data with Poisson latent variables, the prior is conditionally conjugate, and posterior computation can proceed via a Gibbs sampling algorithm. Results are presented from a simulation study and from an analysis of data from the Framingham heart study.

Feb 10

Speaker: Alan Gelfand

Title: Dirichlet Processes and their Use in Nonparametric Bayesian Inference

Dirichlet processes have a relatively long history in the literature, dating at least to Ferguson (1973,1974). However, they existed primarily as probablistic constructs until MCMC/Gibbs sampling appeared on the statistical scene. Such simulation-based model fitting opened up their practical usage and there has been a subsequent proliferation of papers developing this implementation. In this talk, I will present an introduction to Dirichlet processes. Then, I will move to Dirichlet process mixing which offers both practical and computational advantages. In fact, I will develop in some detail the computational details. Finally, I will discuss how to perform full Bayesian inference in this context (introducing further computational issues) and illustrate with examples.

Feb 17

Speaker: Ming Liao

Title: Bayesian Nonlinear Factor Regression for Gene Expression Profiles

Microarray technology allows monitoring of gene expression for thousands of genes in parallel. With gene expression profiles, we address the tissue classification problem where the number of predictor variables is huge and the sample size is substantially smaller. First, we discuss the latent factor model which includes the classical factor analysis (FA), Probabilistic Principal Component Analysis (PPCA) and Independent Component Analysis (ICA). Applying a new class of structured prior for factor loading matrix, we introduce the sparse factor model and the empirical block factor model, which make factor model feasible for gene expression data in thousands of dimension. Then we talk about the regression model on low-dimensional latent factors and related computational method. By introducing kernel model, we extend linear regression to nonlinear regression. Finally, we develop a novel Dirichlet Process sparse kernel model for the problem with large sample size.

Feb 24

Speaker: Enrique ter Horst

Title: A Levy Generalization of Compound Poisson Processes in Finance-Theory & Applications

Since Black& Scholes, mathematical finance has grown in a branch of mathematics of its own right. The rejection of normality of assets returns by Benoit Mandelbrot, led to consider Levy-stable stochastic processes as an interesting alternative. Modelling asset returns through a stochastic volatility model, composed by a diffusion and a general Levy pure jump process, is set up in the first part. The pricing of european options is then considered leading to some conditions that allow us to use the transform analysis from Duffie & Pan (2000), where a change of measure to a risk-neutral one is then performed. The equivalence between risk-neutrality and no-arbitrage is then guaranteed by Delbaen-Schachermayer (1994). In a third part, we perform a Bayesian analysis of a stochastic volatility model (see Bardorff-Nielsen and Shephard (2001)), where we treat jump times and sizes as parameters and are interested in their posteriors. We also get the posteriors of the parameters governing the law of the subordinator (driving the stochastic volatility of the O-U type). The whole analysis is done through the reversible MCMC as in Green (1995), since we deal with a Levy pure jump process that has infinite jumps.

Mar 3

Speaker: Jun Duan

Title: Nonstationary Spatial Process Modeling through Discrete Mixing

It has been widely recognized that a stationary spatial process model for collected spatial data will usually be inappropriate. In this presentation we propose a discrete mixture of distributions of Gaussian random field to model a nonstationary spatial process. We begin with a spatial process of a discrete support, which is generally nonstationary. The undesirable ''sparseness'' of its realizations can be overcome by mixing its distribution with that of a white noise, resulting in a finite or countable mixture of distributions of Gaussian random fields. The sample surfaces of the initial spatial process are themselves realizations of a stationary Gaussian random field. Thus a modeling hierarchy of spatial processes is formulated. Fitting and inference for such models, particularly spatial prediction is carried out within a Bayesian framework using Gibbs sampling.

Mar 24

Speaker: Scott Schmidler

Title: Stochastic Grammars and Statistical Shape Analysis in Bioinformatics

This talk will survey some ongoing research projects in the area of biomolecular sequence and structure modeling. I will give an introduction to models for random sequences known as 'stochastic grammars', which have received little attention from statisticians. I will then introduce some ideas from the statistical theory of shape and describe their use in analysis of molecular structures. Throughout I will attempt to point out potential areas for graduate student research in statistics.

Apr 7

Speaker: Ana Grohovac Rappold

Title: Modeling Yearly Cycle and Spatial Dependence of the Ocean's Mixed Layer Depth by Process Convolution

Work presented in this talk is motivated by my involvement with an interdisciplinary project for modeling ocean properties. Oceanographers are interested in the vertical temperature profile of the ocean; specifically, in determining the depth M of a wind-mixed surface layer. The exact depth M of the mixed layer is uncertain since measurements of water temperature are taken at approximately 10 meter intervals. Moreover, among other less interesting sources of variability, M is dependent on the yearly cycle of the surface temperature, on the location in the ocean, proximity to the ice and/or shore, and modes of global climate. Currently, instead of modeling thermoclines as functions we refrain to an adhoc method for extracting M and accounting for uncertainty in M explicitly through a two component mixture; the assumed distribution $F_1$ and estimation error distribution $F_2$. The expectation of $E_{F_1}[M]$ is modeled as a Gaussian process with spatial and temporal domain as proposed by Higdon (1998). The annual cyclic pattern of M is accounted for by restricting temporal domain of a gaussian process to a circle whose circumference corresponds to one annual cycle. This work is still at the begining stages and I would like to share some ideas for future work. The ideas involve reduction of parameter space through the ortogonalization of kernal function, assimilation of data from the mathematical models, and feasibility of parallelizing computations in order to expand the inference to the entire North Atlantic Ocean.

Apr 14

Speaker: Laura Gunn

Title: Bayesian Inferences on Shape Constrained Hormone Trajectories in the Menstrual Cycle

In studies of hormone patterns in the menstrual cycle, it is often reasonable to assume that the mean trajectory increases monotonically to an unknown peak and decreases thereafter. To account for the dependency in hormone measurements, one can apply a hierarchical model having women-specific random effects and autocorrelated errors. In the unconstrained case, Bayesian computation can proceed via a Gibbs sampling algorithm. Unfortunately, standard Gibbs sampling approaches for incorporating parameter constraints cannot be used when the constraints are on higher level parameters in the hierarchy. To solve this problem, this article proposes a transformation approach in which samples from the unconstrained posterior density for the cycle-specific trajectories are transformed to the restricted space using a minimal distance mapping. This approach is shown to result in substantially improved efficiency relative to unconstrained analyses and analyses that place constraints on the population parameters. The methods are illustrated through an application to progesterone data from the literature.

Apr 21

Speaker: Ming Liao

Title: Bayesian Nonlinear Factor Regression for Gene Expression Profiles

Microarray technology allows monitoring of gene expression for thousands of genes in parallel. With gene expression profiles, we address the tissue classification problem where the number of predictor variables is huge and the sample size is substantially smaller. This talk includes two parts. In the first part, we discuss the factor regression model for large p, small n problems. Applying a new class of structured prior for the factor loading matrix, we introduce the sparse factor model and the empirical block factor model, which make factor models feasible for gene expression data in thousands of dimension. Then we talk about the regression model on low-dimensional latent factors and related computational methods. In the second part, we discuss the nonlinear model by applying kernel expansion. Motivated by the idea of Gaussian process and radial basis functions, we introduce a novel Dirichlet Process sparse kernel model, which is especially useful for the problem with large sample size. By using our kernel model, we extend the linear factor regression to the nonlinear factor regression and apply it to the tissue classification problem.

Jim Berger
January, 2003