Observational Causal Inference
Estimate cause and effect when you can't run an experiment • 40 papers
Matching & Propensity Scores
Compare similar treated and untreated groups fairly
The Central Role of the Propensity Score in Observational Studies
The foundational paper introducing propensity scores for causal inference in observational studies.
Matching as Nonparametric Preprocessing for Reducing Model Dependence
Best practices for matching methods and the MatchIt software implementation.
Double/Debiased Machine Learning for Treatment and Structural Parameters
The double ML framework using cross-fitting to obtain valid inference with ML first-stage estimation.
Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies
Introduced entropy balancing, which achieves exact covariate balance through maximum entropy reweighting—eliminating iterative propensity score model searching.
Doubly Robust Estimation in Missing Data and Causal Inference Models
Accessible exposition of the augmented IPW estimator, consistent if either propensity score or outcome model is correct—the foundational 'doubly robust' property.
Semiparametric Efficiency in Multivariate Regression Models with Missing Data
Foundational theoretical paper deriving semiparametric efficiency bounds and introducing the AIPW estimator class underlying all modern doubly robust methods.
Difference-in-Differences
Measure impact of changes that roll out over time
What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature
Comprehensive review of modern DiD methods including staggered adoption and heterogeneous effects.
Difference-in-Differences with Variation in Treatment Timing
Decomposes two-way fixed effects estimators and reveals issues with staggered DiD designs.
Difference-in-Differences with Multiple Time Periods
Group-time average treatment effects and aggregation methods for staggered DiD.
Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects
Demonstrates TWFE regressions estimate weighted sums of ATEs with potentially negative weights—proposing the DIDM estimator as solution.
Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects
Shows TWFE event-study coefficients are contaminated by effects from other periods, proposing an interaction-weighted estimator.
A More Credible Approach to Parallel Trends
Formal sensitivity analysis for parallel trends violations—implemented in the widely-used HonestDiD package.
Synthetic Control
Create a comparison group when you only have one treated unit
Synthetic Control Methods for Comparative Case Studies
The foundational synthetic control paper with the California tobacco application.
Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects
Practical guidance on when and how to apply synthetic control methods.
Synthetic Difference in Differences
Combines synthetic control and DiD for improved inference in panel data settings.
Matrix Completion Methods for Causal Panel Data Models
Bridges matrix completion/ML with synthetic control using nuclear norm regularization—handles staggered adoption and outperforms traditional SC.
The Augmented Synthetic Control Method
Extends synthetic control to settings where perfect pre-treatment fit is infeasible using ridge regression to de-bias estimates.
Synthetic Control Method: Inference, Sensitivity Analysis and Confidence Sets
Essential theoretical foundation for statistical inference in SC applications, extending permutation tests and constructing proper confidence sets.
Instrumental Variables & LATE
Find causal effects using natural experiments
Identification and Estimation of Local Average Treatment Effects
The LATE framework for interpreting IV estimates as effects on compliers.
Identification of Causal Effects Using Instrumental Variables
Defines the assumptions needed for IV and connects to potential outcomes framework.
Testing for Weak Instruments in Linear IV Regression
Foundational framework for detecting weak instruments—the origin of the 'first-stage F > 10' rule now required in all IV applications.
Quasi-Experimental Shift-Share Research Designs
Modern econometric framework for Bartik instruments—identification follows from quasi-random shock assignment rather than exogenous shares.
Judging Judge Fixed Effects
Develops nonparametric tests for exclusion and monotonicity in examiner IV designs—essential for criminal justice, disability, and immigration research.
Regression Discontinuity
Exploit eligibility cutoffs to measure program effects
Regression Discontinuity Designs in Economics
Comprehensive guide to RDD identification, estimation, and practical implementation.
A Practical Introduction to Regression Discontinuity Designs
Modern implementation guide with rdrobust software for sharp and fuzzy RDD.
Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test
Introduces the canonical density discontinuity test for detecting manipulation at the cutoff—now a required falsification check in all RDD work.
Optimal Bandwidth Choice for the Regression Discontinuity Estimator
Derives the MSE-optimal bandwidth for local linear RD estimation—the first principled approach to bandwidth selection.
Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs
Shows MSE-optimal bandwidths yield invalid conventional CIs and develops bias-corrected robust inference—foundation for the rdrobust package.
Double ML & Heterogeneous Effects
Find which customers benefit most from an intervention
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
Causal forests for estimating conditional average treatment effects with valid confidence intervals.
Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments
Framework for using any ML method to find heterogeneous effects with valid inference.
Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning
Taxonomy of metalearners (S-learner, T-learner, X-learner) for CATE estimation.
Quasi-Oracle Estimation of Heterogeneous Treatment Effects
Introduces the R-learner framework achieving quasi-oracle efficiency—matching error bounds of an oracle knowing nuisance components.
Towards Optimal Doubly Robust Estimation of Heterogeneous Causal Effects
Establishes model-free oracle inequalities for the DR-learner—doubly robust CATE estimation achieves faster convergence rates.
Policy Learning With Observational Data
Theoretical foundations for learning optimal treatment policies from observational data using doubly robust scores.
Sensitivity & Bounds
Stress-test your causal conclusions
Sensitivity Analysis in Observational Research: Introducing the E-Value
The E-value for quantifying sensitivity to unmeasured confounding.
Making Sense of Sensitivity: Extending Omitted Variable Bias
Modern sensitivity analysis framework with intuitive benchmarking against observed covariates.
Nonparametric Bounds on Treatment Effects
Foundational paper establishing the partial identification paradigm—showing what can be learned under minimal assumptions when point identification fails.
Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects
Developed the 'Lee bounds' trimming procedure for handling attrition under monotonicity—now a standard robustness check for differential selection.
Unobservable Selection and Coefficient Stability: Theory and Evidence
Shows how to jointly use coefficient movements and R-squared changes to bound omitted variable bias—the workhorse sensitivity analysis.
Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools
Pioneered the insight that selection on observables guides selection on unobservables—framework for assessing confounding needed to explain away effects.