STA532: Theory of Statistical Inference

Prof:Robert L. Wolpert wolpert@stat.duke.edu OH: Mon 2:00-3:00pm, 211c Old Chem
TAs: Jialiang Mao jialiang.mao@duke.edu  OH: Tue 7:00-9:00pm, 211a Old Chem
Xu Chen xu.chen2@duke.edu  OH: Wed 7:00-9:00pm, 211a Old Chem
Class: Tue Thu 1:25-2:40pm, 116 Old Chem
Opt'l:G Young & R Smith, Essentials of Statistical Inference (On-line, Duke only)
G Casella & R Berger, Statistical Inference (2/e)
A Gelman, JB Carlin, et al. Bayesian Data Analysis (3/e)

Syllabus

f
WeekTopicHomework
I. Foundations & Estimation ProblemsDue
Jan     -14 Models & Inference hw1Jan 21
Jan 19-21 Estimating CDFs and Statistical Functionals hw2Feb 04
Jan 26-28 No class (Volcano workshop at Kilauea)
Feb 02-04 Parametric Inference I: MoM & MLEs & Fish Info hw3Feb 11
Feb 09-11 Parametric Inference II: Properties & Asymptotics hw4Feb 18
Feb 16-18 Subjective & Objective Bayesian Estimation hw5Feb 25
Feb 23-25 Confidence and Credible Interval Estimates hw6Mar 01
Mar 01-03 Review & in-class Midterm Exam I (S15) Hists: Exam, Course
II. Testing Statistical Hypotheses
Mar 08-10 P-values, Significance, & Hypothesis Tests hw7Mar 24
--- Spring Recess (Mar 12-20) ---
Mar 22-24 Likelihood Ratios & Neyman-Pearson Tests hw8 Mar 31
Mar 29-31 Bayes Factors & Bayesian Testing hw9 Apr 12
Apr 05-07 No class (SIAM/ASA Conference in Lausanne)
Apr 12-14Review & in-class Midterm Exam II (S15) Hists: Exam, Course
Apr 19 Empirical and Hierarchical Bayes hw10 Apr 26
Apr 26 Review for Final Exam
May 06 7-10pm Fri: In-class Final Examination (S15) Hists: exam, course


Description

This is a MS-level course about statistical inference, concentrating on the two leading contemporary paradigms (Frequentist and Bayesian), and introducing others (fiducial, likelihoodist, etc.). Theories of point and interval estimation and testing are introduced, and estimators' properties (efficiency, consistency, sufficiency, robustness) are studied. Maximum likelihood, moments and non-parametric methods based on exact or large sample distribution theory; associated EM, asymptotic normality and bootstrap computational techniques. Theoretical aspects of objective Bayesian inference, prediction, and testing. Selected additional topics drawn from, for example, multiparameter testing, contingency tables, multiplicity studies.

There is no textbook for the course but lecture notes are available on-line (click on the "Week" column if it's blue or green). If you bring a copy of these notes to lectures you can spend more time understanding and less time writing. These notes are a work-in-progress, and will evolve as I try to improve them by adding material, correcting errors, and clarifying difficult points. If something in the notes looks wrong or confusing, first check to see if the website has a more recent version (refresh your browser, and look at the "Last edited" date at the bottom of the last page). If it still looks puzzling, please send me an e-mail with a question or comment so I can fix it if it's wrong, try to explain better or add an example if it's confusing, or help you understand if it was just a difficult issue. I'm not aware of any book that covers both theoretical and computational aspects of both Bayesian and Frequentist (or sampling-based) statistics at the level we need. The book cited above by Young and Smith comes as close as any and costs under $45, so I'd recommend it as a companion if you'd like a second perspective on some course topics.

This is syllabus is also tentative, last revised , and will almost surely be superseded— reload your browser for the current version.


Note on Background:

Statistical modeling and inference depend on the mathematical theory of probability, and solving practical problems usually requires integration or optimization in several dimensions, either analytically or numerically. Thus this course requires a solid mathematical background: multivariate calculus at the level of Duke's MTH212 or MTH222 and linear algebra at the level of Duke's MTH221 or MTH216. Students must be proficient in calculus-based probability theory at the level of MTH230/STA230, and in particular should be familiar with the most common probability distributions (here is a list of most of them, in the notation we'll be using in this course, and here is a brief discussion).

Some questions will be computational, and will require skill in any one of the computing environments commonly used in statistical analysis such as R, Matlab, or Python. Students without strong preparation in these will need to invest significant additional time to fill in the gaps. Don't expect spreadsheets or calculators to be sufficient.

Note on Homework:

This is a demanding course. The homework exercises are difficult, and the problem sets are long. The only way to learn this material is to solve problems, and for most students this will take a substantial amount of time outside class— six to ten hours for many students. Be prepared to commit the time it will take to succeed, and don't expect the material to come easily. Working with one or more classmates is fine; but write up your own solutions in your own way, don't copy someone else's solutions. Students who rely too much on others for homework tend to do badly on the exams, which count far more.

Weekly problem sets are assigned on the class website here. Homeworks are collected at the start of each Thursday class (so I can answer questions about them in class) and are returned at the following Tuesday class. LaTeX'd homework assignments can also be submitted electronically as pdf attachments to an e-mail sent to sta532@stat.duke.edu. Until solutions are posted, late homeworks are accepted but are penalized 10% per day. The lowest homework score will be dropped. Exam week homeworks are due on Tuesday to give you a chance to ask questions about them and get feedback before the test.

Homework problems are awarded points based on your success in communicating a correct solution. For full credit the solution must be clear, concise, and correct; even a correct solution will lose points or be returned ungraded if it is not clear and concise. Neatness counts. Consider using LaTeX and submitting your work in pdf form if necessary (it's good practice anyway).

Note on Exams:

In-class Midterm and Final examinations are closed-book and closed-notes with one 8½"×11" sheet of your own notes permitted. Tests from a recent STA532 offering are available to help you know what to expect and to help you prepare for this year's tests:
Spring 2015: 1st Midterm 2nd Midterm Final Exam
Solutions are not made available for these, because many students can't resist looking up the answer when they get stuck and then the exams lose their value for you. Most exams are given in multiple versions that differ in very minor ways that don't affect their difficulty, to help ensure all students are doing their own work (see Academic Integrity section below).

Note on Evaluation:

Course grade is based on homework (10%), in-class midterm exams (25% each), and final exam (40%). Most years grades range from B- to A, with a median grade near the B+/A- boundary. Grades of C+ or lower are possible (best strategy: skip some homeworks, skip several classes, tank an exam or two), as is A+ (given about once every two or three years for exceptional performance).

Note on Enrollment:

Spaces in this course are reserved for Statistical Science Masters students. Other well-prepared students are welcome, but space in the course is limited and in most years it is over-subscribed.

Note on Auditing:

Unregistered students are welcome to sit in on or (preferably) audit this course if:
  1. There are enough seats in the room, and
  2. They are willing to commit to active participation.
I expect all students to participate actively. It hurts the class atmosphere and lowers students' expectations when some attenders are just spectators. I try to discourage that by requiring active participation of everyone, including auditors, to make the classes more fun and productive for us all. Past experiences suggests that most auditors stop attending midway through the semester, when they have to balance competing demands on their time; if this course material is important to you, it is better to take the class for credit.

Note on Absence:

No excuse is needed for missing class. Class attendance is entirely optional. You remain responsible for turning in homework on time and for material presented in class that is not in the readings. Try not to get sick at scheduled examination times.

Academic Integrity

You may discuss and collaborate in solving homework problems, but you may not copy— each student should write up his or her solution. Cheating on exams, copying or plagiarizing homeworks or projects, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, and will not be tolerated. They also violate Duke's Community Standard and will be referred to the Graduate School Judicial Board or the Dean of the Graduate School.