Current Lecture

In this lecture we look at transformations of variables for normality.

Lectures

Assignments

  • HW10

    Mon, Nov 20, 2017,

  • HW9

    Thu, Nov 2, 2017,

  • HW8

    Thu, Oct 12, 2017,

  • HW7

    Mon, Oct 2, 2017,

  • HW6

    Fri, Sep 22, 2017,

  • HW5

    Thu, Sep 14, 2017,

  • HW4

    Tue, Sep 12, 2017,

  • HW3

    Thu, Sep 7, 2017,

  • HW2

    Tue, Sep 5, 2017,

  • HW1

    Thu, Aug 31, 2017,

  • HW0

    Tue, Aug 29, 2017,

  • template

    Tue, Aug 29, 2017,

Announcements

The final exam is scheduled for December 14, from 9-12 in the usual classroom (Link Classroom 4)

CONTINUE READING

Data Analysis Project

Due Date: Dec 12 5 pm

Data Analysis Project

Data Analysis Project for STA721.

Syllabus

Course expectations, outline, grading policy, and more

Course goals & objectives:

This course introduces students to linear models and its extensions for model building, including exploratory data analysis techniques, variable transformations and selection, parameter estimation and interpretation, prediction, hierarchical models, model selection and Bayesian model averaging. The concepts of linear models will be covered from Bayesian and classical viewpoints. Topics in Markov chain Monte Carlo simulation will be introduced as required, however it is expected that students have either taken STA 601 or are co-registed.

All students should be comfortable with linear algebra and mathematical statistics at the level of STA 611.

The course goals are as follows:

  1. Understand the different philosophical approaches to statistical analyses (Bayesian and frequentists)
  2. Build a solid foundation for the probability theory of Gaussian linear models and hierarchical models.
  3. Build appropriate statistical models for data perform data analysis using appropriate software, and communicate results without use of statistical jargon.

Course Outline

Course topics will be drawn (but subject ot change) from

  • Motivation for Studying Linear Models as Foundation
  • Random Vectors and Matrices
  • Multivariate Normal Distribution Theory
  • Conditional Normal Distribution Theory
  • Linear Models via Coordinate free representations (examples)
  • Maximum Likelihood Estimation & Projections
  • Interval Estimation: Distribution of Quadratic Forms
  • Gauss-Markov Theorem & Optimality of OLS
  • Formulation of Bayesian Inference
  • Subjective and Default Priors
  • Related Shrinkage Methods and Penalized Likelihoods (Ridge regression, lasso, horseshoe etc)
  • Model Selection (comparison of classical and Bayesian approaches)
  • Bayes Factors
  • Bayesian Model Averaging
  • Model Checking: Residual Analysis, Added-Variable Plots, Cooks-Distance Transformations
  • Bayesian Outliers
  • Bayesian Robust Methods for Outliers
  • Generalized Linear Model and Weighted Regression
  • Hierarchical Models

Grading:

Homework 20%
Midterm 25%
TakeHome 25%
Final 25%
Participation 5%

Grades may be curved at the end of the semester. Cumulative numerical averages of 90 - 100 are guaranteed at least an A-, 80 - 89 at least a B-, and 70 - 79 at least a C-, however the exact ranges for letter grades will be determined after the final exam. The more evidence there is that the class has mastered the material, the more generous the curve will be.


Homework:

These will be assigned at each class or weekly on the course webpage.

The objective of the problem sets is to help you develop a more in-depth understanding of the material and help you prepare for exams and projects. Grading will be based on completeness as well as accuracy. In order to receive credit you must show all your work.

Lowest score will be dropped.

You are welcomed, and encouraged, to work with each other on the problems, but you must turn in your own work. If you copy someone else’s work, both parties will receive a 0 for the problem set grade as well as being reported to the Office of Student Conduct. Work submitted on Sakai will be checked for instances of plagiarism prior to being graded.

Submission instructions: You will submit your HW on Sakai by uploading a PDF. If the TAs cannot view your work, or read your handwriting, you will lose points accordingly.


Attendance & Participation:

You are expected to be present at class meeting and actively participate in the discussion. Your attendance and participation during class, as well as your activity on the discussion forum on Sakai will make up 5% of your grade in this class. While I might sometimes call on you during the class discussion, it is your responsibility to be an active participant without being called on.


Takehome Data Analysis Problem

The objective of the TakeHome is to give you independent applied research experience using real data and statistical methods. You will use all (relevant) techniques learned in this class to analyze a dataset provided by me.

Further details on the TakeHome will be provided as due dates approach.

Note that you must score at least 30% of the points on the TakeHome Exam in order to pass this class.


Exams:

There will be one midterm and one final in this class. See course info for dates and times of the exams. You are allowed to use one sheet of notes (``cheat sheet”) on the midterm and the final. This sheet must be no larger than 8 12 x 11, and must be prepared by you. You may use both sides of the sheet and can write as small as you wish.

Policies Regarding Homework:

  • No late Homework

  • The lowest HW score will be dropped automatically at the end of the semester

  • Late work policy for TakeHome Data Analysis: 10% off for each day late.

  • The final exam must be taken at the stated time. Please book flights accordingly!

  • There will be no Makeup exams; if you miss the midterm for any reason, your grade on the final will be used to fill in the missing grade.

  • Regrade requests must be made within 3 days of when the assignment is returned, and must be submitted in writing. These will be honored if points were tallied incorrectly, or if you feel your answer is correct but it was marked wrong. No regrade will be made to alter the number of points deducted for a mistake. There will be no grade changes after the final exam.

  • Use of disallowed materials (textbook, class notes, web references, any form of communication with classmates or other persons, etc.) during inclass exams will not be tolerated. For the Take Home data analysis, students are limited to materials covered in class or course resources; no external queries or use of outside resources. This will result in a 0 on the exam for all students involved, possible failure of the course, and will be reported to the Office of Student Conduct. If you have any questions about whether something is or is not allowed, please ask me beforehand.


Email & Forum (Piazza):

I will regularly send announcements by email through Sakai; please make sure to check your email daily.

Any non-personal questions related to the material covered in class, problem sets, labs, projects, etc. should be posted on Piazza forum. Before posting a new question please make sure to check if your question has already been answered. The TAs and myself will be answering questions on the forum daily and all students are expected to answer questions as well. Please use informative titles for your posts.

Note that it is more efficient to answer most statistical questions ``in person” so make use of Office Hours.


Students with disabilities:

Students with disabilities who believe they may need accommodations in this class are encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible to better ensure that such accommodations can be made.


Academic integrity:

Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and non-academic endeavors, and to protect and promote a culture of integrity. Cheating on exams and quizzes, plagiarism on homework assignments and projects, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a 0 grade for all parties involved as well as being reported to the Office of Student Conduct. Additionally, there may be penalties to your final class grade. Please review the Duke’s Academic Dishonesty policies.


Calendar

Please refresh for the latest version

Week Date Topic HW
1 08-29-2017 Introduction HW0
08-31-2017 MLE HW1
2 09-05-2017 Normal Theory HW2
09-07-2017 Sampling Distributions HW3
3 09-12-2017 Prediction HW4
09-14-2017 Gauss-Markov and Prediction HW5
4 09-19-2017 Bayes Estimation in Linear Models
09-21-2017 Conjugate Priors in Linear Models HW6
5 09-26-2017 G-Priors and Mixtures
09-28-2017 Estimation HW7
6 10-03-2017 Ridge Regression
10-05-2017 Q&A in class
7 10-10-2017 Fall Break
10-12-2017 Bayesian Ridge Regression HW8
8 10-17-2017 Lasso and Bayesian Lasso Regression
10-19-2017 Midterm
9 10-24-2017 Shrinkage Priors and Selection
10-26-2017 Testing and Model Comparison
10 10-31-2017 Testing and Model Comparison continued
11-02-2017 Model Choice HW9
11 11-07-2017 BMA
11-09-2017 Criteria for Priors for use in BMA/BVS
12 11-14-2017 MCMC in BMA/BVS and inference
11-16-2017 Residuals and Checking
13 11-21-2017 Robustness HW10
11-22-2017 Thanksgiving Break
14 11-28-2017 Transformations & Normality
11-30-2017 Random Effects and Priors
15 12-05-2017 Graduate Reading Period
12-07-2017 Graduate Reading Period
16 12-12-2017 Graduate Reading Period
12-14-2017 Final Exam 9-12 Link Classroom 4

Resources

Computing & Other Resources

R resources:


CRAN Comprehensive R Archive Network


R Books


JAGS

JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. The name is a misnomer as JAGS implements more than just Gibbs Samplers. JAGS was written with three aims in mind:

  • To have a cross-platform engine for the BUGS language
  • To be extensible, allowing users to write their own functions, distributions and samplers.
  • To be a plaftorm for experimentation with ideas in Bayesian modelling

Resources for JAGS:


Linear/Matrix Algebra

Emacs

Using emacs as an editor for R, C/C++, LaTeX provides a great environment for editing, compiling and debugging - you can even use it as a shell!

  • emacs reference card
  • Emacs Speaks Statistics is an add-on package for GNU Emacs and XEmacs. It is designed to support editing of scripts and interaction with various statistical analysis programs such as R, S-Plus, SAS, Stata and OpenBUGS/JAGS. Although all users of these statistical analysis programs are welcome to apply ESS, advanced users or professionals who regularly work with text-based statistical analysis scripts, with various statistical languages/programs, or with different operating systems might benefit from it the most.

Contact

Prof Merlise Clyde