abstracts Jan 12

Speaker: Mark Vangel

Title: Spot Test Kits for Lead in Paint: A Laboratory Evaluation

The use of lead in paint has been banned in the US since the 1970s. However, much old leaded paint is still around, usually covered by many coats of nonleaded paints. Since the consumption of lead is a hazard, particularly to the health of children, various methods have been developed for testing for its presence. Of the various method available for testing for lead, spot-test kits are by far the most likely to be used by consumers. Several commercial kits were compared under laboratory conditions at the National Institute of Standards and Technology (NIST).

This presentation will consist of a discussion of statistical design and analysis related to this project. Data were collected from a large factorial experiment, including such factors as lead level, lead type, operator, and substrate type. Logistic regression was the main statistical tool used. The performance of the kits was evaluated by determining a 95\% upper bound on the concentration corresponding to a 95\% probability of a positive response. These quantities, analogous to one-sided tolerance limits, were calculated using Bayesian hierarchical models.

Jan 19

Speaker: Mark Vangel

Title: Volume Recovery of Polymer Glasses: Re-Examining a Historic Dataset

When a polymer is heated to one temperature, and then allowed to equillibrate at another (higher or lower) temperature, the volume changes smoothly between the equllibrium volumes corresponding to the initial and final temperatures. In a very influential paper in 1964, A.J. Kovacs reported on the results of extensive meticulous experiments on these volume changes for poly (vinyl acetate). He claimed that the rate of change of volume depended on the initial temperature even when the sample was quite close to the new equilibrium. This was very surprising, even paradoxical, to the polymer science community. Some have questioned whether the data (which are still the best available) supported Kovacs' claims. Alternative theories have been developed to describe the statistical mechanics of volume relaxation of polymer glasses, and the validity of these theories depends on whether or not one accepts Kovacs' conclusions.

In this talk, I will present the results of a re-analysis of Kovacs' historic data, taking into account correlation among measurements made at different times on the same sample, and using some techniques from functional data analysis. Unfortunately, the analysis was not Bayesian, but it should still be of interest as an example of statistical consulting on a fundamental problem of pure science. And I'm of course open to suggestions for improved analyses!

Jan 24

Speaker: Jim Berger

Title: Basics and Philosophy of Model Selection

The first two or three sessions in this series will provide background on model selection, while later sessions will delve into current research issues. The first lecture will provide a quick overview of model selection, introducing key concepts and issues in very simple settings. Subjects include:

The purpose of model selection (finding the best model, prediction, etc.), and how this affects analysis (e.g., selection versus model averaging)
Certain standard methodologies for model selection (AIC, BIC, etc.)
Basics of Bayes (Bayes factors, difficulties with improper priors, ...)
Comparison of Bayes and classical model selection methods

Feb7

Speaker: Merlise Clyde

Title: Basics and Philosophy of Model Selection

Many of the concepts discussed in the first lecture will be illustrated in the context of standard linear models. Subjects include:

Commonly used classes of prior distributions for model selection (conjugate, g-priors, noninformative priors, etc.)
Analytical expressions for analysis with these prior distributions, and important features of these expressions
Relationship with BIC and AIC
Comparison of model selection and model averaging

Feb 9

Speaker: Mark Vangel

Title: Bayesian Approaches to Interlaboratory Proficiency Studies

The evaluation and certification of commercial analytical laboratories is an important function of laboratory accreditation organizations, standards organizations, and national standards laboratories. The main component of such evaluations is usually an interlaboratory proficiency study, in which each laboratory is sent samples to evaluate, and the results are compared.

Sometimes comparison is made with a known "answer", but often the materials are not well characterized, so that emphasis is rather on which labs perform substantially different than "most" of the other labs. This presentation will review statistical methods currently in use for these proficiency studies, and investigate some simple Bayesian alternatives, including the ranking of laboratory effects, following up on ideas which Spiegelhalter has applied to the ranking of hospitals and schools. Results will be illustrated with real-data examples.

March 1

Speaker: Mark Vangel

Title: Indirect Measurement of Temperature and Pressure from Fluorescence Spectra

It would be very useful to be able to be able to measure the temperature and pressure within a polymer while the plastic is being fabricated. This task is made difficult for several reasons. The polymer will be in a heated mold undergoing mechanical stresses. Because of these stresses, and because it takes time for plastic to heat and cool, the applied temperature and pressure in the mold will not be the same as the state within the material. Also, penetrating the material with probes is not desirable because it damages the plastic.

One approach which is currently being investigated involves adding a fluorescent dye to the polymer, and observing laser-induced fluorescence spectra during fabrication. For many polymer/dye combinations, these spectra seem to change systematically with temperature and pressure.

Experimental spectra are available, for a grid of temperature and pressure values. From these spectra, one would like to fit calibration functions for temperature and pressure. An approach involving functional ANOVA shows some promise for predicting temperature, and will be discussed in some detail.

Feb 23

Speaker: Brani Vidakovic

Title: Statistical Models in the Wavelet Domain: Some Results, Applications, and Perspectives

In the first part of the talk I will give an overview of my recent research results in Bayesianly induced wavelet shrinkage. An efficient smoothing method, BAMS (Bayesian Adaptive Model Shrinker), developed in collaboration with Fabrizio Ruggeri, is briefly discussed, applied to standard test functions, and compared to the traditional shrinkage methods.

The second part of the talk will cover two ongoing research projects involving wavelet-based functional ANOVA and distributions of local Hurst exponents in self-similar processes. Interesting applications involving turbulence, genome analysis, and analysis of Internet traffic are indicated.

Feb 28

Speaker: Dave Higdon

Title: Markov Random Field Models: from Agriculture to Zoology.

This talk will explore using simple Gaussian Markov random field models (MRF) as spatial priors. These priors have rather simple form and are quite amenable to MCMC - even for quite large problems.

This talk will give some basic background for these Gaussian MRF models and look at motivating examples in agriculture, very small scale magnetic imaging, and zoology. I'll also talk about extensions which link several of these processes. These extensions are the topic of ongoing research and I would be quite happy to find a student who might be interested in working on this stuff with me.

Come check it out.

March 8

Speaker: Mark Vangel

Title: Univariate Linear Calibration: Frequentist Controversy and Bayesian Resolution

Linear calibration (or "inverse regression") is concerned with situations where one would like to estimate an independent variable corresponding to future observed responses, given a linear regression model and a training dataset. Surprisingly, there is still no consensus on which frequentist approach to this problem is most appropriate, even in the special case of simple linear regression. From a Bayesian perspective, of course, there is no such confusion. This presentation will review the literature, discuss a Bayesian solution for multiple regression calibration, and provide illustrative examples.

March 22

Speaker: Mark Vangel

Title: Lot Acceptance Using a Sample Mean and an Extremum

In several industries, a sample from a batch is regarded as acceptable if both the sample mean exceeds a criterion and the sample minimum exceeds a second criterion. Examples include canned foods, electric motor efficiencies, and aerospace composite materials.

The exact joint distribution of the mean and an extremum is quite complicated for most probability models; hence applied statisticians in industry do not have much theory to guide them when establishing criteria for the mean and extremum of a sample from acceptable material.

The approach taken here is to note that, conditional on an extremum, the remaining observations in a sample can be regarded as iid from a truncated distribution, to which a saddlepoint approximation can be applied. The expression for the joint distribution which results is extremely accurate, at least for a normal model. Calculation of contours and critical values is straightforward, and some tables have been prepared.

March 27

Speaker: Merlise Clyde

Title: Empirical Bayes Prior Distributions and Model Choice

March 29

Speaker: Alan Seheult

Title: Bayesian Forecasting and Calibration for Complex Phenomena Using Multi-level Computer Codes

Note: This talk will be in Physics 120, at 4PM

We describe a general Bayesian approach for using computer codes for a complex physical system to assist in forecasting actual system outcomes. Our approach is based on expert judgements and experiments on fast versions of the computer code. These are combined to construct models for the relationships between the code's inputs and outputs, respecting the natural space/time features of the physical system. The resulting beliefs are systematically updated as we make evaluations of the code for varying input sets and calibrate the input space against past data on the system. The updated beliefs are then used to construct forecasts for future system outcomes. While the approach is quite general, it has been developed particularly to handle problems with high-dimensional input and output spaces, for which each run of the computer code is expensive. The methodology will be applied to problems in uncertainty analysis for hydrocarbon reservoirs.

April 5

Speaker: Mark Vangel

Title: Bayesian Multiple-Use Linear Calibration

Consider a linear regression model in which some (or all) of the independent variables are functions of a single unknown parameter of interest (e.g., a polynomial model). The data consist of a training dataset, followed by observations of the dependent variable alone. One would like to express uncertainty in the parameter of interest. There are many applications in engineering and medicine; we will use the estimation of gestational age by ultrasound measurement of the length of the femur of a fetus as an example.

One type of calibration problem for which a Bayesian approach appears particularly promising is the multiple-use situation, where one intends to use a fitted calibration curve repeatedly. Frequentist approaches involve the specification of two probabilities in the confidence statement: roughly, one confidence level for getting a "good" calibration curve, and a second confidence level for the long-run behavior of a particular fitted curve. These intervals are hard to understand, and the methodology is sufficiently difficult to substantially restrict the class of models which can be considered. Preliminary work on multiple-use calibration will be presented in the context of the femur/gestational age example.

April 12, Physics 120, 4-5pm

Speaker: Max Morris, Iowa State University

Title: Design and Analysis for an Inverse Problem Arising From an Advection-Dispersion Process

We consider a process of one-dimensional fluid flow through a soil packed tube in which a contaminant is initially distributed. The contaminant concentration, as a function of location in the tube and time after flushing begins, is classically modeled as the solution of a linear second order partial differential equation. Here, we consider the related issues of how contaminant concentration measured at some location-time combinations can be used to approximate concentration at other locations and times (ie., exprimental design). The method is demonstrated for the case in which initial concentrations are approximated based on data collected only at the downstream end of the tube. Finally, the effect of misspecifying one of the model parameters is discussed, and alternative designs are developed for instances in which that parameter must be estimated from the data.

April 17

Speaker: Harry Zuzan

Title: Measuring Gene Expression

Today I will give a brief outline of what gene expression is and how it is measured using the Affymetrix GeneChip. Techninal issues which need to be dealt with in order to model raw expression levels of genes will be described. I will briefly introduce biological relevence which Rainer will discuss in more depth on Wednesday.

April 19

Speaker: Rainer Spang

Title: Taking a Snapshot of Cell Metabolism on a DNA Chip

I will discuss two biological applications of the DNA chip technology, and how they relate to statistical concepts. Both applications are based on data that was recently produced by the genetics department here in Duke.

1. Cells can be in a growing or in a non growing state. The transition between both states involves changes in the cell's metabolism and hence in their gene expression patterns. These changes can be monitored on a DNA chip.

2. Breast cancers can be subdivided into two classes depending on whether the tumor cells contain estrogen receptors or not. This classification of tumors is crucial for treatment. We have data showing that gene expression profiles are different for both cancer types, and that the classification can be done by using expression profiles.

This class will focus on describing the biological problems and the different types of statistical challenges associated to them. Next Wednesday, Mike West will discuss some first ideas for solutions to these problems.

April 25

Speaker: Mike West

Title: Data Analysis and Modelling of DNA Microarrays in Genetic Expression Profiling -- some Bayesian Bioinformatics

Last week, Harry and Rainer discussed a wide range of issues -- technological, biological and statistical -- arising in dealing with oligonucleotide DNA microarrays for high-throughput functional genomics. They highlighted some of the questions of data extraction, date quality, and basic exploratory analysis of summary genetic expression profiles in a couple of biological problems.

Today I will pick up the conversation, focusing on the use of array-based data in breast cancer phenotyping: linking measured expression of large numbers of genes to clinical outcomes in breast cancer. The context today will be discrimination of ER+ cancers from ER-, as introduced by Rainer last week. This is a small test data set in a "proof of principle" study. Technically, we face problems due to the dimensionality of the expression data and small numbers of microarrays -- this is a problem in the "large p, small n" paradigm. These are addressed based on the innovative coupling of singular-value decomposition methods with Bayesian binary regression analysis. I'll walk through these ideas and methods, describing model implementations and some analysis summaries in the breast cancer phenotyping project (very preliminary results, of course).

This work is collaborative with bioinformaticians Rainer Spang and Harry Zuzan of Duke Statistics and the National Institute of Statistical Sciences, together with Drs Joseph Nevins, Jeff Marks and Seiichi Ishida of the Duke School of Medicine.

Mark Vangel
January, 2000