Course Home
Page
Course Description
In this course, we learn about different designs for collecting data and their implications for statistical inference.
We cover two main topics: how to design surveys of populations in ways that give reliable estimates, and how to design
studies in ways that allow for valid causal claims. With regard to surveys, we investigate the mathematical
underpinnings of randomization as a tool for data collection. We focus on the benefits and pitfalls of deviating
from purely randomized samples, including stratification, clustering, and convenience sampling. We learn how to
design and analyze complicated surveys typically employed by government agencies. With regard to causal studies, we again discuss the central role of randomization as a
tool for ensuring fair comparisons of treatments. We focus on the benefits and pitfalls of variations on randomized
designs, including blocking and factorial designs. We discuss design for observational studies, focusing on methods
like propensity score matching. Throughout we discuss a variety of genuine designs spanning applications
in public policy, health, and the social and natural sciences.
Course Objectives
Logistics
Prerequisites
Readings
There are no required texts for this course. Instead, we will read articles and other materials posted on the course website on Sakai. Two useful texts for reference, both available electronically via the Duke Library, include:
Thompson, S. K. (2010), Sampling, 3rd edition John Wiley & Sons.
Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Behavioral Sciences: An Introduction, Cambridge University Press.
Computing
We will use the statistical software package R for analyzing data.
It can be downloaded for free at
http://www.r-project.org/.
Calculator
Students don't need a calculator for this course.
Schedule of Topics
We will cover the topics in the table below. We may spend
different amounts of time on each topic than shown, depending on the
interests of the participants in the course.
Introduction to course. | 1 lectures |
Probability warm up problems. Basics of surveys of finite populations. Questionnaire design. | 1 lectures |
Design-based estimation in
simple random samples |
1 lectures |
Design-based estimation in
general samples (Horvitz-Thompson estimator) |
1 lectures |
Computer simulations for evaluating estimators in survey sampling |
1 lectures |
Stratified samples |
2 lectures |
Cluster and unequal probability samples |
2 lectures |
Multi-stage sampling designs |
1 lectures |
Regression in complex sampling designs |
1 lectures |
Non-random sampling for finite population inference |
1 lectures |
Basics of causal studies. Potential outcomes. Randomization. |
1 lectures |
Fisher randomization tests and Neyman tests |
1.5 lectures |
Blocked designs |
1.5 lectures |
Factorial designs |
1.5 lectures |
Fractional factorial designs |
2 lectures |
Observational study design, including propensity score methods |
2 lectures |
Graded work
Graded work for the course will consist of two term exams, home work assignments, and two projects. Weights on assessments are as follows:
Assignments |
40% |
Midterm Exam 1 |
20% |
Midterm Exam 2 |
20% |
Project 1 |
10% |
Project 2 |
10% |
There are no make-ups for graded work except for medical or familial emergencies or for reasons approved by the instructor. When possible, talk to the instructor in advance of relevant due dates to discuss possible alternatives. Final grades in STA 322 and STA 522 will be assigned separately.
Descriptions of graded work
Assignments:
Assignments are posted on the Statistics 322/522 course web
site on
Sakai. Students turn in these assignments at the beginning
of class on the due date in person. Students are permitted to work with
others on the assignments, but each person must write up and turn in
their own answers. The assignments are designed to build
students' knowledge of
the computational and the mathematical aspects of study design,
and to analyze survey or causal inference data. Assignments for STA 522 will have additional or different questions.
Exams:
The first midterm exam will cover mathematical and conceptual aspects of
survey sampling. The second midterm exam will cover mathematical and conceptual aspects of
causal inference. There is no final exam for this course. Exams for STA 522 and STA 322 may differ.
Projects:
One project will cover surveys and the other project will cover causal inferences. Students work in teams of two on both projects. The projects involve designing or analyzing surveys/causal studies, applying the methods learned in the course.
Students are expected to abide by Duke's Community Standard for all
work
for this course. Violations of the Standard will result in a
zero grade for the relevant assignment and will be reported to the Dean of
Students for
adjudication. Additionally, there may be penalties to the final grade for the course. Ignorance of what constitutes academic dishonesty
is
not a justifiable excuse for violations.
For the exams, students are required to work alone. For the
assignments, students may work with
others but each student must submit their own answers. For assignments involving computer programming, students can get advice from each other but are required to write their own code.
For the projects, students are required to work in teams of two individuals. Teams are permitted to talk with others in the course,
but each team must write up their own project report. Students are not permitted to use ChatGPT or other AI services to complete projects or assignments.