Course Home
Page
Course Description
In this course, we learn about different designs for collecting data and their implications for statistical inference.
We cover two main topics: how to design surveys of populations in ways that give reliable estimates, and how to design
studies in ways that allow for valid causal claims. With regard to surveys, we investigate the mathematical
underpinnings of randomization as a tool for data collection. We focus on the benefits and pitfalls of deviating
from purely randomized samples, including stratification, clustering, and convenience sampling. We learn how to
design and analyze complicated surveys typically employed by government agencies. We also discuss special designs
for hard to reach populations and issues of fairness and generalizability when using big data to train algorithms
for predictive analytics. With regard to causal studies, we again discuss the central role of randomization as a
tool for ensuring fair comparisons of treatments. We focus on the benefits and pitfalls of variations on randomized
designs, including blocking and factorial designs. We discuss design for observational studies, focusing on methods
like propensity score matching. Throughout we discuss a variety of genuine designs spanning applications
in public policy, health, and the social and natural sciences.
Course Objectives
Logistics
Prerequisites
Readings
There are no required texts for this course. Instead, we will read book chapters and articles posted on the course website on Sakai. Two useful texts for reference include:
Lohr, S. L. (2010), Sampling: Design and
Analysis, 2nd edition Cengage Learning. ISBN 0495105279.
Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Behavioral Sciences: An Introduction, Cambridge University Press. ISBN 0521885884
Computing
We will use the statistical software package R for analyzing data.
It can be downloaded for free at
http://www.r-project.org/.
Alternatively, R is available on the public computers on campus.
Calculator
Students don't need a calculator for this course.
Schedule of Topics
We will cover the topics in the table below. We may spend
different amounts of time on each topic than shown, depending on the
interests of the participants in the course.
Introduction to course. | 1 lectures |
Basics of surveys of finite populations. Questionnaire design. | 1 lectures |
Design-based estimation in
simple random samples |
1 lectures |
Design-based estimation in
general samples (Horvitz-Thompson estimator) |
1 lectures |
Stratified samples |
2 lectures |
Cluster and unequal probability samples |
2 lectures |
Multi-stage sampling designs |
2 lectures |
Model-based methods for finite population inference |
1 lectures |
Non-random sampling for finite population inference |
1 lectures |
Data privacy and surveys |
1 lectures |
Capture-recapture methods for population size estimation |
1 lectures |
Basics of causal studies. Potential outcomes. Randomization. |
1 lectures |
Fisher randomization tests |
1 lectures |
Blocked designs |
1 lectures |
Factorial designs |
1 lectures |
Fractional factorial designs |
2 lectures |
Observational study design, including propensity score methods |
4 lectures |
Graded work
Graded work for the course will consist of two term exams, home work assignments, and two projects. Students' final grades will be determined as follows:
Assignments |
30% |
Midterm Exam 1 |
30% |
Midterm Exam 2 |
30% |
Projects |
5% each |
There are no make-ups for graded work except for medical or familial emergencies or for reasons approved by the instructor before the due date. See the instructor in advance of relevant due dates to discuss possible alternatives.
Descriptions of graded work
Assignments:
Assignments are posted on the Statistics 322/522 course web
site on
Sakai. Students turn in these assignments at the beginning
of class on the due date. Students are permitted to work with
others on the assignments, but each person must write up and turn in
their own answers. The assignments are designed to build
students' knowledge of
the computational and the mathematical aspects of study design,
and to analyze survey or causal inference data.
Exams:
The first midterm exam will cover mathematical and conceptual aspects of
survey sampling. The second midterm exam will cover mathematical and conceptual aspects of
causal inference.
Projects:
One project will cover surveys and the other project will cover causal inferences. Students work in pairs on the
projects. The projects involve designing or analyzing surveys/causal studies, applying the methods learned in the course.
Students are expected to abide by Duke's Community Standard for all
work
for this course. Violations of the Standard will result in a
zero grade for the relevant assignment and will be reported to the Dean of
Students for
adjudication. Additionally, there may be penalties to the final grade for the course. Ignorance of what constitutes academic dishonesty
is
not a justifiable excuse for violations.
For the exams, students are required to work alone. For the
assignments, students may work with
others but each student must submit his or her own answers. For assignments involving computer programming, students can get advice from each other but are required to write their own code.
For the projects, students are required to work in teams of two individuals. Teams are permitted to talk with others in the course,
but each team must write up their own project report.