STA 376: Advanced Modelling & Scientific Computing
- Spring 2006 -

STA 356 Schedule Support



  • January-Early February:

    Download and read this paper before Jan 19th: Some modelling and computational challenges in a finite population, biased sampling problem. We'll be discussing this in class, and you'll be developing computational implementations and data analysis in this model class. It introduces a range of numerical issues and methods. You will see the EM algorithm, standard mode hunting, some need for evaluation of direct integrals as part of a much more complicated computational problem, manipulations of multivariate normals and Wishart distributions, and more.

  • Weeks of January 16 and 23:

    Discovery sampling problem:

      Supporting material on numerical methods:

  • Week of January 30:

    • Discussion of progress on discovery sampling model implementations and analysis, including quadrature, delta-method and analytic approximations for univariate integrations. Further MCMC developments. Manipulation of prior-posterior analysis using discrete mixtures of conjugate priors. Discussion of application results.
    • Discussion of multivariate normal models - normal/Wishart distributions, Wishart prior-posteriors in multinormal analysis, and conditional (singular) normals -- all as motivated by discovery sampling applications. Detailed development of prior-posterior updating under reference and conjugate prior for normal parameters. See especially Chapter 15 of the STA 214 notes on Wisharts.
    • Finish overview discussion of discovery sampling paper - with modelling questions and extensions related to "fraility" ideas and uncertainty about sampling weights.

  • Weeks of February 6,13 & 20:

    • General modelling and computational matters: Mixture models. Multivariate normal mixtures: density estimation, nonlinear regression, classification and discrimination. Theory and structure of mixture models. EM and MCMC computational approaches Infinite mixtures and Dirichlet process mixture modelling for density estimation and hierarchical regression.

      Supporting papers:

      • STA 214 notes on multivariate normal and also Dirichlet/multinomial
      • A nice short discussion of EM for multivariate normal mixtures in section 6 of Figueiredo's EM notes (this also contains interesting material on EM in other contexts - notably here Bayesian regression with non-Gaussian shrinkage priors and errors)
      • A short and concise very early (1992 J. Canadian Stats.) paper on Bayesian MCMC for classification and discrimination in mixtures

  • Week of February 27:

    • (Already covered earlier) - Discussion of Dirichlet processes: models for uncertain CDFs, discrete structure, role of precision parameter and distribution of the number of support points k in a sample of size n
    • Mixtures of Dirichlet processes as models for mixture distributions.

  • Week of March 6:

    • Further discussion of Dirichlet process mixtures as models for mixture distributions. Further review of MCMC, configuration sampling, multivariate normal DP mixtures, computational issues and practical examples/experiences; and of general hierarchical modelling applications.
    • Discussion led by Zhi Ouyang: advanced modelling and computation in mass spectrometry for proteomics.
    • Discussion led by Abel Rodriguez: advanced modelling and computation in time-dependent Dirichlet process models for time-spatial modelling

    Likely of additional interest to some of you, here are three additional papers on mixtures:

    • An early application of MCMC in DP mixtures in a neurological application, by Turner & West 1992
    • A rather different, more scientifically stylised hierarchical mixture modelling framework for the same neurological application area
    • A nice paper on learning about mixtures and general MCMC for mixtures (not DP - but close: Poisson priors on the number of components) by Matthew Stephens, Annals 2000

  • Week of March 13:

    • No classes: Spring Break

  • Week of March 20 (no class on March 27):

    • General modelling and computational matters: regression, hierarchical modelling and shrinkage. Bayesian modelling concepts and numerical and computational methods (EM, but especially MCMC). Linear regression with shrinkage, prior specification, and development of binary regression extension. Discussion of standard normal and mixture-of-normal shrinkage priors, and Bayesian "point mass-mixture" model.

      Some papers and slides:

      • Slides from MW's 2005 SemStats summer school tutorials (at Warwick University, UK) cover basic material on (and very much more we can discuss later). We'll look at some of these from regression and shrinkage in SemStat Part 1 (a few slides starting at #13).
      • Old but still excellent (nowadays tutorial) paper on mixture priors in regression variable selection by George & McCulloch, JASA 1993

    • Section 5 of Figueiredo's EM notes covers some aspects of EM in models with Laplacian priors that are finding some popularity as alternatives for variable selection -- we'll discuss why. These are scale mixtures of normals priors too. (This is just one very special case of a broad class of scale mixture models).

  • Week of April 3:

    • More on point-mass mixture priors in regression variable uncertainty and model search. Gibbs sampling and local search Metropolis. Discussion of large p problems - regression with many candidate predictors.
    • Principal components regression and introduction to factor models - latent factor models for multivariate data and latent factor regression.

  • Week of April 10:

    • More on latent factor models - above Valencia 7 paper and slides - and large p problems.
    • Latent factor modelling and computational issues, and applications -- Bayesian inference and latent structure in high-dimensional data, sparsity modelling, connections with graphical models. We'll look at some of these from SemStat Part 2 slides.

  • Other topics post-semester:

    • Shotgun stochastic search in regression variable selection/uncertainty/model search with many candidate predictors: comparisons to MCMC, and use of parallel computing on clusters.

      Key papers and slide will be made available.