STA 841 - Discrete data (Spring 2025)

Course description

This class covers data analytic tools for discrete data. Starting from the properties of exponential families, we will investigate the general concepts behind generalized linear models (GLMs), and survey a variety of different models in the particular contexts for which they are suited. Some examples include models for binary data, polytomous data (ordered and unordered), count data, contingency tables, matrix data, and tree-structured data. These models cover a wide range of applications, from classical to modern. We will cover models involving random/mixed effects and (if time permits) some recent methodological development for coping with the increasing dimensionality and complexity in modern data sets, in particular generative models, graphical models, and latent variable models.

Prerequisites

Instructional Crew

Li Ma (Instructor), Email: li.maPENGUIN@dukePENGUIN.edu

Joe Mathews (TA), Email: joseph.mathewsPUFFIN@dukePUFFIN.edu

Don't forget to remove the arctic birds from the email addresses!

Office hours (Tentative and may be adjusted in the first two weeks.)

TBD.

Classes

WF 1:25-2:40PM in Old Chem 025

Readings

Lecture notes

Textbook

Computing references

Other references (that may be helpful but are not necessary)

Grading

Homework: 3 to 4 assignments (30%). Answers or reports for the data analytical problems must be typed with LaTeX. Answers to theoretical problems can be handwritten. Code should be attached in appendix. Homework Assignments will be graded on a 4-point scale (Excellent, Good, Fair, and Poor). Both an Excellent and a Good will give you full credit for grading purposes. You must show your work to receive credit. Late homeworks will be accepted, but will incur a one-level grade penalty for each 24-hour period it is late (starting from the minute past the deadline). The lowest homework grade will be dropped. Homework assignments are to be released and submitted on Gradescope.

Review on one designated topic area (30%): Presentation and leading discussion in-class (20%) + active participation during others’ presentation and discussion (10%).

A course project (40%): a proposal (5%) + a final report (25%) + an in-class presentation (10%).

About collaboration on homework: While discussions and collaborations are extremely important and greatly encouraged in scientific research, the essential skills for data analysis are best acquired through independent efforts during the training stage. Therefore some of the homework problems, especially open-ended data analysis problems will be marked as ‘‘to be completed independently’’. For those problems, before the homework is submitted, you may only discuss with the instructor and/or the TAs.