STA 841 - Discrete data (Fall 2016)

Course description

This class covers data analytic tools for categorical data. Starting from the properties of exponential families, we will investigate the general concepts behind generalized linear models (GLMs), and survey a variety of different models in the particular contexts for which they are suited. Some examples include models for binary data, polytomous data (ordered and unordered), count data, contingency tables, matrix data, and tree-structured data. These models cover a wide range of applications, from classical to modern. We will cover models involving random/mixed effects and (if time permits) some recent methodological development for coping with the increasing dimensionality and complexity in modern data sets. These include regularized/penalized methods and nonparametric/semiparametric models.

Prerequisites

Instructor

Li Ma
Old Chem 217
Email: mylastname AT stat DOT duke DOT edu (Please replace ‘‘mylastname’’ with ‘‘ma’’.)
Office hours: Mon 9 - 10am.

TAs

Jialiang Mao
Email: jialiang DOT mao PENGUINE AT duke DOT edu (Please remove the antarctic bird.)
Office hours: Tu 5-7pm in Old Chem 211A.

Jake Coleman
Email: jacob DOT coleman PANDA AT duke DOT edu (Please remove the Chinese bear.)
Office hours: M 5:30-6:30pm in Old Chem 025.

Classes

WF 11:45PM-1PM in Gray 228

Readings

Lecture notes

Textbook

Computing references

Other references (that may be helpful but are not necessary)

Grading

Homework: About 4 to 5 assignments (40%). Answers or reports for the data analytical problems must be typed with LaTeX. Answers to theoretical problems can be handwritten. You must submit all your computer code through the dropbox on Sakai.

Exam: One in-class closed book exam, Time: TBD (30%).

A course project (30%).

Late homework policy: Homework turned in after class but on the due day will be counted as one day late. The next day will be two days late, etc. No homework more than three days late will be accepted. Each late day will result in a penalty of 15% of the value of that homeowork. If travelling, please email the TA a PDF copy. Up to three late days will be forgiven at the end of the semester to allow exceptions such as sickness and job interviews.

About collaboration on homeworks: While discussions and collaborations are extremely important and greatly encouraged in scientific research, the essential skills for data analysis are best acquired through independent efforts during the training stage. Therefore some of the homework problems, especially open-ended data analysis problems will be marked as ‘‘to be completed independently’’. For those problems, before the homework is submitted, you may only discuss with the instructor and/or the TAs.