STA 721: Linear Models

Fall 2019

Lectures

Under construction!

Overview of linear models higlighting what’s to come. Click onlinks for additional information and supporting material

Maximum likelihood estimation in linear models via projections.

In this lecture we exam the geometric properties of OLS and the role of projections. In particular we will find expectations of OLS …

In this lecture we will review/present distribution theory related to the Multivariate normal distribution, in particular, linear …

In this lecture we will review/present distribution theory related to the sampling distribution of the MLEs, in particular, Student-t …

In this lecture we will cover prediction and optimal estimation/prediction and what quantities can be estimated or predicted.

In this lecture we will cover the Gauss-Markov theorem that establishes that out of the class of all linear unbiased estimators that …

In this lecture we will introduce Bayesian estimation for linear models using the Normal-Gamma conjugate prior.

In this lecture we will go into more details about the Normal-Gamma conjugate prior and limiting cases in linear models.

In this lecture we will go into more details about the Normal-Gamma conjugate prior and limiting cases in linear models, including …

In this lecture we will go into more details about the Normal-Gamma conjugate prior looking at a special case of the g-prior.

In this lecture we will go show how Cauchy priors can be derived as a mixture of normal distributions and introduce MCMC sampling for …

In this lecture we will illustrate MCMC sampling with the Cauchy prior as mixtures of g-priors and look at properties of estimators. To …

In this lecture we will look at properties of estimators. To address problems for estimation with nearly singular matrices, we will …

In this lecture we look at ridge regression from a Bayesian perspective and discuss choice of priors and inference via MCMC.

In this lecture we look at model comparison using ANOVA and sequential F tests.

In this lecture we look at properties on priors for shrinkage estimators to have desirable properties.

In this lecture we look at shrinkage and selection estimators based on LASSO regression from a penalized likelihood approach and a …

In this lecture we look at model selection from a Bayesian perspective.

In this lecture we look at Bayesian model averaging and choice of prior distributions with a focus on g-priors or mixtures of g-priors.

In this lecture we look at desirable features that priors for Bayesian model averaging or varialbe selection which leads us to mixtures …

In this lecture show how MCMC can be used for BMA/BVS and challenges within. Using the output we discuss variaous estimators for …

In this lecture we look at residual disgnostics and methods to identify influential points and outliers.

In this lecture we look at robust regression methods to automatically account for potential outliers.

Homework

Project

What do Barbie dolls, food wrap, edamame, and spermicides have in common? And what do they have to do with low sperm counts, precocious puberty, and breast cancer? “Everything” say those who support the notion that hormone mimics are disrupting everything from fish gender to human fertility. “Nothing” counter others who regard the connection as trumped up, alarmist chemophobia. The controversy swirls around the significance of a number of substances that behave like estrogens and appear to be practically everywhere–from plastic toys to topical sunscreens. Read more

Calendar

Tentative outline; please refresh for the latest version. Each Lecture/HW has additional details, including reading assignments, code and data.

Week Date Topic HW
1 08-26-2019 Introduction
08-28-2019 MLE
08-30-2019 Lab 1: Intro to Weaving Latex and R
2 09-02-2019 Projections & Expectations HW1
09-04-2019 Normal Theory
09-06-2019 Lab 2: Introduction to GitHub and Rstudio See invitation sent from Sakai
3 09-09-2019 Sampling Distributions HW2
09-11-2019 Prediction
09-13-2019 Lab 3:
4 09-16-2019 Gauss-Markov and Prediction HW3
09-18-2019 Bayes Estimation in Linear Models
09-20-2019 Lab 4: Writing functions, coding style, and Q&A
5 09-23-2019 [Conjugate Priors in Linear Models] HW4
09-25-2019 Non-informative Priors
09-27-2019 Lab 5: Q&A
6 09-30-2019 G-Priors and Prior Choices
10-02-2019 Review
10-04-2019 Midterm
7 10-07-2019 Fall Break
10-09-2019 Cauchy Priors: Mixtures & MCMC
10-11-2019 Lab 6: JAGS HW5
8 10-14-2019 Bayes Estimation
10-14-2019 Ridge Regression
10-16-2019 Bayesian Ridge Regression
9 10-21-2019 Lasso and Bayesian Lasso Regression HW6
10-23-2019 Shrinkage Priors and Selection
10 10-28-2019 Testing and Model Comparison HW7 (Nott & Kohn code)
10-30-2019 Testing and Model Comparison continued
11 11-04-2019 Model Choice
11-06-2019 BMA HW8
12 11-11-2019 Criteria for Priors for use in BMA/BVS
11-13-2019 MCMC in BMA/BVS and inference
13 11-18-2019 Factors and Hierarchical Models
11-20-2019 Residuals and Checking
Transformations & Normality
14 11-25-2019 Robustness TakeHome Data Analysis
11-27-2019 Thanksgiving Break
15 12-02-2019 Graduate Reading Period
12-04-2019 Graduate Reading Period
16 12-09-2019 Graduate Reading Period
12-12-2019 Final Exam 2-5 Link Classroom 5

Resources

Computing & Other Resources

R resources:


CRAN Comprehensive R Archive Network


R Books


JAGS

JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. The name is a misnomer as JAGS implements more than just Gibbs Samplers. JAGS was written with three aims in mind:

  • To have a cross-platform engine for the BUGS language
  • To be extensible, allowing users to write their own functions, distributions and samplers.
  • To be a plaftorm for experimentation with ideas in Bayesian modelling

Resources for JAGS:


Linear/Matrix Algebra

Emacs

Using emacs as an editor for R, C/C++, LaTeX provides a great environment for editing, compiling and debugging - you can even use it as a shell!

  • emacs reference card
  • Emacs Speaks Statistics is an add-on package for GNU Emacs and XEmacs. It is designed to support editing of scripts and interaction with various statistical analysis programs such as R, S-Plus, SAS, Stata and OpenBUGS/JAGS. Although all users of these statistical analysis programs are welcome to apply ESS, advanced users or professionals who regularly work with text-based statistical analysis scripts, with various statistical languages/programs, or with different operating systems might benefit from it the most.

Syllabus

Course expectations, outline, grading policy, and more

Course goals & objectives

This course introduces students to linear models and its extensions for model building, including exploratory data analysis techniques, variable transformations and selection, parameter estimation and interpretation, prediction, hierarchical models, model selection and Bayesian model averaging. The concepts of linear models will be covered from Bayesian and classical viewpoints. Topics in Markov chain Monte Carlo simulation will be introduced as required, however it is expected that students have either taken STA 601 or are co-registered.

All students should be extremely comfortable with linear/matrix algebra and mathematical statistics at the level of STA 611 or equivalent; Statistical Inference - Casella and Berger is an excellent resource in case you need to review any mathematical statistics. If you need to review linear algebra, please explore material under Resources and links - Gilbert Strang’s online course is highly recommended.

The course goals are as follows:

  1. Understand the different philosophical approaches to statistical analyses (Bayesian and frequentists)
  2. Build a solid foundation for the probability theory and inference for Gaussian linear models and extensions.
  3. Build appropriate statistical models for data, perform data analysis using appropriate software, and communicate results without use of statistical jargon.
  4. Become familiar with reproducible research using github, RStudio, knitr and $\LaTeX$ to produce technical, literate data analyses.

Course Outline

Course topics will be drawn (but subject to change) from

  • Motivation for Studying Linear Models as Foundation
  • Random Vectors and Matrices
  • Multivariate Normal Distribution Theory
  • Conditional Normal Distribution Theory
  • Linear Models via Coordinate free representations (examples)
  • Maximum Likelihood Estimation & Projections
  • Interval Estimation: Distribution of Quadratic Forms
  • Gauss-Markov Theorem & Optimality of OLS
  • Formulation of Bayesian Inference
  • Subjective and Default Priors
  • Related Shrinkage Methods and Penalized Likelihoods (Ridge regression, lasso, horseshoe etc)
  • Model Selection (comparison of classical and Bayesian approaches)
  • Bayes Factors
  • Bayesian Model Averaging
  • Model Checking: Residual Analysis, Added-Variable Plots, Cooks-Distance Transformations
  • Bayesian Outliers
  • Bayesian Robust Methods for Outliers
  • Generalized Linear Model and Weighted Regression
  • Hierarchical Models

Please check the website for updates, slides and current readings.


Grading:

Homework 20%
Midterm 25%
TakeHome 25%
Final 25%
Participation 5%

Grades may be curved at the end of the semester. Cumulative numerical averages of 90 - 100 are guaranteed at least an A-, 80 - 89 at least a B-, and 70 - 79 at least a C-, however the exact ranges for letter grades will be determined after the final exam. The more evidence there is that the class has mastered the material, the more generous the curve will be.


Homework:

These will be assigned weekly on the course webpage.

The objective of the problem sets is to help you develop a more in-depth understanding of the material and help you prepare for exams and projects. Grading will be based on completeness as well as accuracy. In order to receive credit you must show all your work.

No late assignments will be allowed, however the lowest score will be dropped.

You are welcomed, and encouraged, to work with each other on the problems, but you must turn in your own work. If you copy someone else’s work, both parties will receive a 0 for the problem set grade as well as being reported to the Office of Student Conduct. Work submitted on Sakai will be checked for instances of plagiarism prior to being graded.

Submission instructions: You will submit your HW on Sakai by uploading a PDF. If the TAs cannot view your work, or read your handwriting, you will lose points accordingly. We will be using R/knitr with $\LaTeX$ for preparing assignments using github classroom for data analysis.


Attendance & Participation:

You are expected to be present at class meeting and actively participate in the discussion. Your attendance and participation during class, as well as your activity on the discussion forum on Sakai will make up 5% of your grade in this class. While I might sometimes call on you during the class discussion, it is your responsibility to be an active participant without being called on.


Takehome Data Analysis Problem

The objective of the TakeHome is to give you independent applied research experience using real data and statistical methods. You will use all (relevant) techniques learned in this class to analyze a dataset provided by me.

Further details on the TakeHome will be provided as due dates approach.

Note that you must score at least 30% of the points on the TakeHome Exam in order to pass this class.


Exams:

There will be one midterm and one final in this class. See course info for dates and times of the exams. You are allowed to use one sheet of notes (``cheat sheet”) on the midterm and two for the final. This sheet must be no larger than 8 12 x 11, and must be prepared by you. You may use both sides of the sheet and can write as small as you wish.

Policies Regarding Homework:

  • No late Homework

  • The lowest HW score will be dropped automatically at the end of the semester

  • Late work policy for TakeHome Data Analysis: 10% off for each day late.

  • The final exam must be taken at the stated time. Please book flights accordingly!

  • There will be no Makeup exams; if you miss the midterm for any reason, your predicted grade given the other information from the class will be used to fill in the missing grade.

  • Regrade requests must be made within 3 days of when the assignment is returned, and must be submitted in writing. These will be honored if points were tallied incorrectly, or if you feel your answer is correct but it was marked wrong. No regrade will be made to alter the number of points deducted for a mistake. There will be no grade changes after the final exam.

  • Use of disallowed materials (textbook, class notes, web references, any form of communication with classmates or other persons, etc.) during inclass exams will not be tolerated. For the Take Home data analysis, students are limited to materials covered in class or course resources; no external queries or use of outside resources. This will result in a 0 on the exam for all students involved, possible failure of the course, and will be reported to the Office of Student Conduct. If you have any questions about whether something is or is not allowed, please ask me beforehand.


Email & Forum (Piazza):

I will regularly send announcements by email through Sakai; please make sure to check your email daily.

Any non-personal questions related to the material covered in class, problem sets, labs, projects, etc. should be posted on Piazza forum. Before posting a new question please make sure to check if your question has already been answered. The TAs and myself will be answering questions on the forum daily and all students are expected to answer questions as well. Please use informative titles for your posts.

Note that it is more efficient to answer most statistical questions ``in person” so make use of Office Hours.


Students with disabilities:

Students with disabilities who believe they may need accommodations in this class are encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible to better ensure that such accommodations can be made.


Academic integrity:

Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and non-academic endeavors, and to protect and promote a culture of integrity. Cheating on exams and quizzes, plagiarism on homework assignments and projects, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a 0 grade for all parties involved as well as being reported to the Office of Student Conduct. Additionally, there may be penalties to your final class grade. Please review the Duke’s Academic Dishonesty policies.


Posts

Most Announcements will be made through Sakai

Academic is designed to give technical content creators a seamless experience. You can focus on the content and Academic handles the …

Learn how to blog in Academic using Jupyter notebooks

Create a beautifully simple website in under 10 minutes.

R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For …

Contact