Speaker: Mark Vangel
Title: Spot Test Kits for Lead in Paint: A Laboratory Evaluation
The use of lead in paint has been banned in the US since the 1970s. However, much old leaded paint is still around, usually covered by many coats of nonleaded paints. Since the consumption of lead is a hazard, particularly to the health of children, various methods have been developed for testing for its presence. Of the various method available for testing for lead, spot-test kits are by far the most likely to be used by consumers. Several commercial kits were compared under laboratory conditions at the National Institute of Standards and Technology (NIST).
This presentation will consist of a discussion of statistical design and analysis related to this project. Data were collected from a large factorial experiment, including such factors as lead level, lead type, operator, and substrate type. Logistic regression was the main statistical tool used. The performance of the kits was evaluated by determining a 95\% upper bound on the concentration corresponding to a 95\% probability of a positive response. These quantities, analogous to one-sided tolerance limits, were calculated using Bayesian hierarchical models.
Speaker: Mark Vangel
Title: Volume Recovery of Polymer Glasses: Re-Examining a Historic Dataset
When a polymer is heated to one temperature, and then allowed to equillibrate at another (higher or lower) temperature, the volume changes smoothly between the equllibrium volumes corresponding to the initial and final temperatures. In a very influential paper in 1964, A.J. Kovacs reported on the results of extensive meticulous experiments on these volume changes for poly (vinyl acetate). He claimed that the rate of change of volume depended on the initial temperature even when the sample was quite close to the new equilibrium. This was very surprising, even paradoxical, to the polymer science community. Some have questioned whether the data (which are still the best available) supported Kovacs' claims. Alternative theories have been developed to describe the statistical mechanics of volume relaxation of polymer glasses, and the validity of these theories depends on whether or not one accepts Kovacs' conclusions.
In this talk, I will present the results of a re-analysis of Kovacs' historic data, taking into account correlation among measurements made at different times on the same sample, and using some techniques from functional data analysis. Unfortunately, the analysis was not Bayesian, but it should still be of interest as an example of statistical consulting on a fundamental problem of pure science. And I'm of course open to suggestions for improved analyses!
Speaker: Jim Berger
Title: Basics and Philosophy of Model Selection
The first two or three sessions in this series will provide background on model selection, while later sessions will delve into current research issues. The first lecture will provide a quick overview of model selection, introducing key concepts and issues in very simple settings. Subjects include:
Speaker: Merlise Clyde
Title: Basics and Philosophy of Model Selection
Many of the concepts discussed in the first lecture will be illustrated in the context of standard linear models. Subjects include:
Speaker: Mark Vangel
Title: Bayesian Approaches to Interlaboratory Proficiency Studies
The evaluation and certification of commercial analytical laboratories is an important function of laboratory accreditation organizations, standards organizations, and national standards laboratories. The main component of such evaluations is usually an interlaboratory proficiency study, in which each laboratory is sent samples to evaluate, and the results are compared.
Sometimes comparison is made with a known "answer", but often the materials are not well characterized, so that emphasis is rather on which labs perform substantially different than "most" of the other labs. This presentation will review statistical methods currently in use for these proficiency studies, and investigate some simple Bayesian alternatives, including the ranking of laboratory effects, following up on ideas which Spiegelhalter has applied to the ranking of hospitals and schools. Results will be illustrated with real-data examples.
Speaker: Mark Vangel
Title: Indirect Measurement of Temperature and Pressure from Fluorescence Spectra
It would be very useful to be able to be able to measure the temperature and pressure within a polymer while the plastic is being fabricated. This task is made difficult for several reasons. The polymer will be in a heated mold undergoing mechanical stresses. Because of these stresses, and because it takes time for plastic to heat and cool, the applied temperature and pressure in the mold will not be the same as the state within the material. Also, penetrating the material with probes is not desirable because it damages the plastic.
One approach which is currently being investigated involves adding a fluorescent dye to the polymer, and observing laser-induced fluorescence spectra during fabrication. For many polymer/dye combinations, these spectra seem to change systematically with temperature and pressure.
Experimental spectra are available, for a grid of temperature and pressure values. From these spectra, one would like to fit calibration functions for temperature and pressure. An approach involving functional ANOVA shows some promise for predicting temperature, and will be discussed in some detail.
Speaker: Brani Vidakovic
Title: Statistical Models in the Wavelet Domain: Some Results, Applications, and Perspectives
In the first part of the talk I will give an overview of my recent research results in Bayesianly induced wavelet shrinkage. An efficient smoothing method, BAMS (Bayesian Adaptive Model Shrinker), developed in collaboration with Fabrizio Ruggeri, is briefly discussed, applied to standard test functions, and compared to the traditional shrinkage methods.
The second part of the talk will cover two ongoing research projects involving wavelet-based functional ANOVA and distributions of local Hurst exponents in self-similar processes. Interesting applications involving turbulence, genome analysis, and analysis of Internet traffic are indicated.
Speaker: Dave Higdon
Title: Markov Random Field Models: from Agriculture to Zoology.
This talk will explore using simple Gaussian Markov random field models (MRF) as spatial priors. These priors have rather simple form and are quite amenable to MCMC - even for quite large problems.
This talk will give some basic background for these Gaussian MRF models and look at motivating examples in agriculture, very small scale magnetic imaging, and zoology. I'll also talk about extensions which link several of these processes. These extensions are the topic of ongoing research and I would be quite happy to find a student who might be interested in working on this stuff with me.
Come check it out.
Speaker: Mark Vangel
Title: Univariate Linear Calibration: Frequentist Controversy and Bayesian Resolution
Speaker: Mark Vangel
Title: Lot Acceptance Using a Sample Mean and an Extremum
In several industries, a sample from a batch is regarded as acceptable if both the sample mean exceeds a criterion and the sample minimum exceeds a second criterion. Examples include canned foods, electric motor efficiencies, and aerospace composite materials.
The exact joint distribution of the mean and an extremum is quite complicated for most probability models; hence applied statisticians in industry do not have much theory to guide them when establishing criteria for the mean and extremum of a sample from acceptable material.
The approach taken here is to note that, conditional on an extremum, the remaining observations in a sample can be regarded as iid from a truncated distribution, to which a saddlepoint approximation can be applied. The expression for the joint distribution which results is extremely accurate, at least for a normal model. Calculation of contours and critical values is straightforward, and some tables have been prepared.
Speaker: Merlise Clyde
Title: Empirical Bayes Prior Distributions and Model Choice
Speaker: Alan Seheult
Title:
Bayesian Forecasting and Calibration for Complex Phenomena Using
Multi-level Computer Codes
Note: This talk will be in Physics 120, at 4PM
We describe a general Bayesian approach for using computer codes for a complex physical system to assist in forecasting actual system outcomes. Our approach is based on expert judgements and experiments on fast versions of the computer code. These are combined to construct models for the relationships between the code's inputs and outputs, respecting the natural space/time features of the physical system. The resulting beliefs are systematically updated as we make evaluations of the code for varying input sets and calibrate the input space against past data on the system. The updated beliefs are then used to construct forecasts for future system outcomes. While the approach is quite general, it has been developed particularly to handle problems with high-dimensional input and output spaces, for which each run of the computer code is expensive. The methodology will be applied to problems in uncertainty analysis for hydrocarbon reservoirs.
Speaker: Mark Vangel
Title: Bayesian Multiple-Use Linear Calibration
Consider a linear regression model in which some (or all) of the independent variables are functions of a single unknown parameter of interest (e.g., a polynomial model). The data consist of a training dataset, followed by observations of the dependent variable alone. One would like to express uncertainty in the parameter of interest. There are many applications in engineering and medicine; we will use the estimation of gestational age by ultrasound measurement of the length of the femur of a fetus as an example.
One type of calibration problem for which a Bayesian approach appears particularly promising is the multiple-use situation, where one intends to use a fitted calibration curve repeatedly. Frequentist approaches involve the specification of two probabilities in the confidence statement: roughly, one confidence level for getting a "good" calibration curve, and a second confidence level for the long-run behavior of a particular fitted curve. These intervals are hard to understand, and the methodology is sufficiently difficult to substantially restrict the class of models which can be considered. Preliminary work on multiple-use calibration will be presented in the context of the femur/gestational age example.
Speaker: Max Morris, Iowa State University
Title: Design and Analysis for an Inverse Problem Arising From an Advection-Dispersion Process
We consider a process of one-dimensional fluid flow through a soil packed tube in which a contaminant is initially distributed. The contaminant concentration, as a function of location in the tube and time after flushing begins, is classically modeled as the solution of a linear second order partial differential equation. Here, we consider the related issues of how contaminant concentration measured at some location-time combinations can be used to approximate concentration at other locations and times (ie., exprimental design). The method is demonstrated for the case in which initial concentrations are approximated based on data collected only at the downstream end of the tube. Finally, the effect of misspecifying one of the model parameters is discussed, and alternative designs are developed for instances in which that parameter must be estimated from the data.
Speaker: Harry Zuzan
Title: Measuring Gene Expression
Today I will give a brief outline of what gene expression is and how it is measured using the Affymetrix GeneChip. Techninal issues which need to be dealt with in order to model raw expression levels of genes will be described. I will briefly introduce biological relevence which Rainer will discuss in more depth on Wednesday.
Speaker: Rainer Spang
Title: Taking a Snapshot of Cell Metabolism on a DNA Chip
I will discuss two biological applications of the DNA chip technology, and how they relate to statistical concepts. Both applications are based on data that was recently produced by the genetics department here in Duke.
1. Cells can be in a growing or in a non growing state. The transition between both states involves changes in the cell's metabolism and hence in their gene expression patterns. These changes can be monitored on a DNA chip.
2. Breast cancers can be subdivided into two classes depending on whether the tumor cells contain estrogen receptors or not. This classification of tumors is crucial for treatment. We have data showing that gene expression profiles are different for both cancer types, and that the classification can be done by using expression profiles.
This class will focus on describing the biological problems and the different types of statistical challenges associated to them. Next Wednesday, Mike West will discuss some first ideas for solutions to these problems.
Speaker: Mike West
Title: Data Analysis and Modelling of DNA Microarrays in Genetic Expression Profiling -- some Bayesian Bioinformatics
Last week, Harry and Rainer discussed a wide range of issues -- technological, biological and statistical -- arising in dealing with oligonucleotide DNA microarrays for high-throughput functional genomics. They highlighted some of the questions of data extraction, date quality, and basic exploratory analysis of summary genetic expression profiles in a couple of biological problems.
Today I will pick up the conversation, focusing on the use of array-based data in breast cancer phenotyping: linking measured expression of large numbers of genes to clinical outcomes in breast cancer. The context today will be discrimination of ER+ cancers from ER-, as introduced by Rainer last week. This is a small test data set in a "proof of principle" study. Technically, we face problems due to the dimensionality of the expression data and small numbers of microarrays -- this is a problem in the "large p, small n" paradigm. These are addressed based on the innovative coupling of singular-value decomposition methods with Bayesian binary regression analysis. I'll walk through these ideas and methods, describing model implementations and some analysis summaries in the breast cancer phenotyping project (very preliminary results, of course).
This work is collaborative with bioinformaticians Rainer Spang and Harry Zuzan of Duke Statistics and the National Institute of Statistical Sciences, together with Drs Joseph Nevins, Jeff Marks and Seiichi Ishida of the Duke School of Medicine.