Statistics 322/522
Design of Surveys and Causal Studies

  Fall 2023

Course Home Page


Course Description

In this course, we learn about different designs for collecting data and their implications for statistical inference. We cover two main topics: how to design surveys of populations in ways that give reliable estimates, and how to design studies in ways that allow for valid causal claims. With regard to surveys, we investigate the mathematical underpinnings of randomization as a tool for data collection. We focus on the benefits and pitfalls of deviating from purely randomized samples, including stratification, clustering, and convenience sampling. We learn how to design and analyze complicated surveys typically employed by government agencies. With regard to causal studies, we again discuss the central role of randomization as a tool for ensuring fair comparisons of treatments. We focus on the benefits and pitfalls of variations on randomized designs, including blocking and factorial designs. We discuss design for observational studies, focusing on methods like propensity score matching. Throughout we discuss a variety of genuine designs spanning applications in public policy, health, and the social and natural sciences.

Course Objectives

Logistics

Prerequisites

Students must have passed STA 210, STA 521, or a course in regression analysis. We do a lot of manipulations with discrete random variables, so comfort applying expectation and variance formulas is necessary.

Readings

There are no required texts for this course. Instead, we will read articles and other materials posted on the course website on Sakai. Two useful texts for reference, both available electronically via the Duke Library, include:

Thompson, S. K.  (2010),  Sampling, 3rd edition  John Wiley & Sons. 

Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Behavioral Sciences: An Introduction, Cambridge University Press.

Computing

We will use the statistical software package R for analyzing data.  It can be downloaded for free at http://www.r-project.org/.

Calculator

Students don't need a calculator for this course.

Schedule of Topics

We will cover the topics in the table below.  We may spend different amounts of time on each topic than shown, depending on the interests of the participants in the course.

Introduction to course.
1 lectures
Probability warm up problems. Basics of surveys of finite populations. Questionnaire design. 1 lectures
Design-based estimation in simple random samples
1 lectures
Design-based estimation in general samples (Horvitz-Thompson estimator)
1 lectures
Computer simulations for evaluating estimators in survey sampling
1 lectures
Stratified samples
2 lectures
Cluster and unequal probability samples
2 lectures
Multi-stage sampling designs
1 lectures
Regression in complex sampling designs
1 lectures
Non-random sampling for finite population inference
1 lectures
Basics of causal studies. Potential outcomes. Randomization.
1 lectures
Fisher randomization tests and Neyman tests
1.5 lectures
Blocked designs
1.5 lectures
Factorial designs
1.5 lectures
Fractional factorial designs
2 lectures
Observational study design, including propensity score methods
2 lectures


Graded work

Graded work for the course will consist of two term exams, home work assignments, and two projects.  Weights on assessments are as follows:
 
Assignments
40%
Midterm Exam 1
20%
Midterm Exam 2
20%
Project 1
10%
Project 2
10%

There are no make-ups for graded work except for medical or familial emergencies or for reasons approved by the instructor.  When possible, talk to the instructor in advance of relevant due dates to discuss possible alternatives. Final grades in STA 322 and STA 522 will be assigned separately.

Descriptions of graded work

Assignments:

Assignments are posted on the Statistics 322/522 course web site on Sakai.  Students turn in these assignments at the beginning of class on the due date in person.  Students are permitted to work with others on the assignments, but each person must write up and turn in their own answers.  The assignments are designed to build students' knowledge of the computational and the mathematical aspects of study design, and to analyze survey or causal inference data. Assignments for STA 522 will have additional or different questions.

Exams:

The first midterm exam will cover mathematical and conceptual aspects of survey sampling. The second midterm exam will cover mathematical and conceptual aspects of causal inference. There is no final exam for this course. Exams for STA 522 and STA 322 may differ.

Projects:

One project will cover surveys and the other project will cover causal inferences. Students work in teams of two on both projects. The projects involve designing or analyzing surveys/causal studies, applying the methods learned in the course.   

Academic honesty

Students are expected to abide by Duke's Community Standard for all work for this course.  Violations of the Standard will result in a zero grade for the relevant assignment and will be reported to the Dean of Students for adjudication. Additionally, there may be penalties to the final grade for the course.   Ignorance of what constitutes academic dishonesty is not a justifiable excuse for violations.

For the exams, students are required to work alone.  For the assignments, students may work with others but each student must submit their own answers. For assignments involving computer programming, students can get advice from each other but are required to write their own code. For the projects, students are required to work in teams of two individuals. Teams are permitted to talk with others in the course, but each team must write up their own project report. Students are not permitted to use ChatGPT or other AI services to complete projects or assignments.