In Spring 2020, I taught
STA 322/522: Design of Surveys and Causal Studies.
In this course, we learn about different designs for collecting data and their implications for statistical inference. We cover two main topics: how to design surveys of populations in ways that give reliable estimates, and how to design studies in ways that allow for valid causal claims. With regard to surveys, we investigate the mathematical underpinnings of randomization as a tool for data collection. We focus on the benefits and pitfalls of deviating from purely randomized samples, including stratification, clustering, and convenience sampling. We learn how to design and analyze complicated surveys typically employed by government agencies. We also discuss special designs for hard to reach populations and issues of fairness and generalizability when using big data to train algorithms for predictive analytics. With regard to causal studies, we again discuss the central role of randomization as a tool for ensuring fair comparisons of treatments. We focus on the benefits and pitfalls of variations on randomized designs, including blocking and factorial designs. We discuss design for observational studies, focusing on methods like propensity score matching. Throughout we discuss a variety of genuine designs spanning applications in public policy, health, and the social and natural sciences.
In fall 2019, I taught STA 790
, a mini-course (4 weeks, 1 graduate credit) on theory and methods for handling missing data . We discuss types of missing data and how they affect inference, the pros and cons of different approaches to handling missing data commonly used, and the Bayesian and frequentist theory underpinning multiple imputation, which is often used by applied scientists as a way to handle missing data. We review some open research problems that offer opportunities for thesis work. The course is taught at a level appropriate for PhD students and advanced MSS students.