tech-econ

Reward Inference from Behavior

Inferring objectives from actions

2000 41 cited

Algorithms for Inverse Reinforcement Learning

Andrew Y. Ng, Stuart Russell

Foundational IRL paper formalizing reward extraction from observed optimal behavior.

2004 2810 cited

Apprenticeship Learning via Inverse Reinforcement Learning

Pieter Abbeel, Andrew Y. Ng

Extends IRL to practical apprenticeship learning with feature expectation matching.

2008 2050 cited

Maximum Entropy Inverse Reinforcement Learning

Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, Anind K. Dey

Resolves IRL ambiguity using maximum entropy principle; now the standard IRL formulation.

2016 321 cited

Cooperative Inverse Reinforcement Learning

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

Frames value alignment as cooperative game; foundational for AI safety.

2007 1156 cited

Bayesian Inverse Reinforcement Learning

Deepak Ramachandran, Eyal Amir

First Bayesian framework for IRL; provides posterior distributions over rewards capturing uncertainty.

2016 1089 cited

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Chelsea Finn, Sergey Levine, Pieter Abbeel

First deep IRL method learning arbitrary neural network cost functions; enabled learning from raw images.

2017 467 cited

Inverse Reward Design

Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan

Treats designed rewards as noisy observations of true objectives; addresses reward hacking and negative side effects.

Title	Authors	Year	Citations
Algorithms for Inverse Reinforcement Learning Foundational IRL paper formalizing reward extraction from observed optimal behavior.	Andrew Y. Ng, Stuart Russell	2000	41
Apprenticeship Learning via Inverse Reinforcement Learning Extends IRL to practical apprenticeship learning with feature expectation matching.	Pieter Abbeel, Andrew Y. Ng	2004	2810
Maximum Entropy Inverse Reinforcement Learning Resolves IRL ambiguity using maximum entropy principle; now the standard IRL formulation.	Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, Anind K. Dey	2008	2050
Cooperative Inverse Reinforcement Learning Frames value alignment as cooperative game; foundational for AI safety.	Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell	2016	321
Bayesian Inverse Reinforcement Learning First Bayesian framework for IRL; provides posterior distributions over rewards capturing uncertainty.	Deepak Ramachandran, Eyal Amir	2007	1156
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization First deep IRL method learning arbitrary neural network cost functions; enabled learning from raw images.	Chelsea Finn, Sergey Levine, Pieter Abbeel	2016	1089
Inverse Reward Design Treats designed rewards as noisy observations of true objectives; addresses reward hacking and negative side effects.	Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan	2017	467

Imitation Learning

Learn policies from expert demonstrations

1989 1426 cited

ALVINN: An Autonomous Land Vehicle in a Neural Network

Dean A. Pomerleau

Pioneering behavioral cloning; first end-to-end neural network steering for autonomous vehicles.

2011 840 cited

A Reduction of Imitation Learning to No-Regret Online Learning (DAgger)

Stéphane Ross, Geoffrey J. Gordon, J. Andrew Bagnell

Solves distribution shift in behavioral cloning; reduces imitation to online learning with O(T) error.

2016 11 cited

Generative Adversarial Imitation Learning (GAIL)

Jonathan Ho, Stefano Ermon

GAN-style adversarial training directly learning policy without reward recovery.

1997 1823 cited

Learning from Demonstration

Stefan Schaal

Foundational work showing demonstrations accelerate RL; established paradigm for robot skill acquisition.

2018 456 cited

Behavioral Cloning from Observation

Faraz Torabi, Garrett Warnell, Peter Stone

Learning from state-only observations without action labels; enables learning from video demonstrations.

2016 5678 cited

End to End Learning for Self-Driving Cars

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski et al. (NVIDIA)

Industry-defining paper: CNNs mapping pixels to steering; achieved 98% autonomous driving in road tests.

Title	Authors	Year	Citations
ALVINN: An Autonomous Land Vehicle in a Neural Network Pioneering behavioral cloning; first end-to-end neural network steering for autonomous vehicles.	Dean A. Pomerleau	1989	1426
A Reduction of Imitation Learning to No-Regret Online Learning (DAgger) Solves distribution shift in behavioral cloning; reduces imitation to online learning with O(T) error.	Stéphane Ross, Geoffrey J. Gordon, J. Andrew Bagnell	2011	840
Generative Adversarial Imitation Learning (GAIL) GAN-style adversarial training directly learning policy without reward recovery.	Jonathan Ho, Stefano Ermon	2016	11
Learning from Demonstration Foundational work showing demonstrations accelerate RL; established paradigm for robot skill acquisition.	Stefan Schaal	1997	1823
Behavioral Cloning from Observation Learning from state-only observations without action labels; enables learning from video demonstrations.	Faraz Torabi, Garrett Warnell, Peter Stone	2018	456
End to End Learning for Self-Driving Cars Industry-defining paper: CNNs mapping pixels to steering; achieved 98% autonomous driving in road tests.	Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski et al. (NVIDIA)	2016	5678

Revealed Preference at Scale

Rationalizing observed choices

1967 1146 cited

Construction of a Utility Function from Expenditure Data

Sydney N. Afriat

Foundational theorem: data is rationalizable iff it satisfies GARP; basis for computational revealed preference.

1982 1154 cited

The Nonparametric Approach to Demand Analysis

Hal R. Varian

Makes Afriat computationally tractable; shows how to test and recover preferences nonparametrically.

2016 646 cited

Revealed Preference Theory

Christopher P. Chambers, Federico Echenique

Comprehensive modern treatment covering GARP extensions, complexity, and mechanism design applications.

2008

Nonparametric Engel Curves and Revealed Preference

Richard Blundell, Martin Browning, Ian Crawford

Combines revealed preference with nonparametric estimation; sharp bounds on counterfactual demands.

1974 18756 cited

Conditional Logit Analysis of Qualitative Choice Behavior

Daniel McFadden

Nobel Prize-winning random utility framework; foundation of discrete choice models used throughout tech.

2015 289 cited

Stochastic Choice and Revealed Perturbed Utility

Drew Fudenberg, Ryota Iijima, Tomasz Strzalecki

Axiomatic foundations for perturbed utility models; generalizes logit choice to capture bounded rationality.

2019 156 cited

Dynamic Random Utility

Mira Frick, Ryota Iijima, Tomasz Strzalecki

Extends random utility to sequential choice with preference correlation; applicable to session-based user modeling.

Title	Authors	Year	Citations
Construction of a Utility Function from Expenditure Data Foundational theorem: data is rationalizable iff it satisfies GARP; basis for computational revealed preference.	Sydney N. Afriat	1967	1146
The Nonparametric Approach to Demand Analysis Makes Afriat computationally tractable; shows how to test and recover preferences nonparametrically.	Hal R. Varian	1982	1154
Revealed Preference Theory Comprehensive modern treatment covering GARP extensions, complexity, and mechanism design applications.	Christopher P. Chambers, Federico Echenique	2016	646
Nonparametric Engel Curves and Revealed Preference Combines revealed preference with nonparametric estimation; sharp bounds on counterfactual demands.	Richard Blundell, Martin Browning, Ian Crawford	2008	—
Conditional Logit Analysis of Qualitative Choice Behavior Nobel Prize-winning random utility framework; foundation of discrete choice models used throughout tech.	Daniel McFadden	1974	18756
Stochastic Choice and Revealed Perturbed Utility Axiomatic foundations for perturbed utility models; generalizes logit choice to capture bounded rationality.	Drew Fudenberg, Ryota Iijima, Tomasz Strzalecki	2015	289
Dynamic Random Utility Extends random utility to sequential choice with preference correlation; applicable to session-based user modeling.	Mira Frick, Ryota Iijima, Tomasz Strzalecki	2019	156

Human Feedback & RLHF

Train models to align with human preferences

2017 508 cited

Deep Reinforcement Learning from Human Preferences

Paul Christiano, Jan Leike, Tom Brown et al.

Foundational RLHF paper learning rewards from preference comparisons with ~1% feedback.

2022

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

Long Ouyang, Jeff Wu et al. (OpenAI)

1.3B InstructGPT outperforms 175B GPT-3 on human preferences; foundation for ChatGPT.

2022

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai et al. (Anthropic)

RLAIF using AI self-critique against constitutional principles; Claude's training methodology.

2023

Direct Preference Optimization (DPO)

Rafael Rafailov, Archit Sharma, Eric Mitchell et al.

Eliminates reward model and RL loop; preference optimization via simple classification loss.

2017 15678 cited

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

Stable policy gradient algorithm with clipped objectives; THE optimizer underlying RLHF in all major LLMs.

2020 1456 cited

Learning to Summarize from Human Feedback

Nisan Stiennon, Long Ouyang, Jeff Wu et al. (OpenAI)

Demonstrated reward model + PPO pipeline for text; direct precursor to InstructGPT methodology.

2024 234 cited

A General Theoretical Paradigm to Understand Learning from Human Preferences

Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot et al. (DeepMind)

Unifies RLHF/DPO theoretically; Identity Preference Optimization fixes DPO overfitting issues.

2024 145 cited

KTO: Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela

Aligns LLMs using binary good/bad signal via Kahneman-Tversky prospect theory; no preference pairs needed.

Title	Authors	Year	Citations
Deep Reinforcement Learning from Human Preferences Foundational RLHF paper learning rewards from preference comparisons with ~1% feedback.	Paul Christiano, Jan Leike, Tom Brown et al.	2017	508
Training Language Models to Follow Instructions with Human Feedback (InstructGPT) 1.3B InstructGPT outperforms 175B GPT-3 on human preferences; foundation for ChatGPT.	Long Ouyang, Jeff Wu et al. (OpenAI)	2022	—
Constitutional AI: Harmlessness from AI Feedback RLAIF using AI self-critique against constitutional principles; Claude's training methodology.	Yuntao Bai et al. (Anthropic)	2022	—
Direct Preference Optimization (DPO) Eliminates reward model and RL loop; preference optimization via simple classification loss.	Rafael Rafailov, Archit Sharma, Eric Mitchell et al.	2023	—
Proximal Policy Optimization Algorithms Stable policy gradient algorithm with clipped objectives; THE optimizer underlying RLHF in all major LLMs.	John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov	2017	15678
Learning to Summarize from Human Feedback Demonstrated reward model + PPO pipeline for text; direct precursor to InstructGPT methodology.	Nisan Stiennon, Long Ouyang, Jeff Wu et al. (OpenAI)	2020	1456
A General Theoretical Paradigm to Understand Learning from Human Preferences Unifies RLHF/DPO theoretically; Identity Preference Optimization fixes DPO overfitting issues.	Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot et al. (DeepMind)	2024	234
KTO: Model Alignment as Prospect Theoretic Optimization Aligns LLMs using binary good/bad signal via Kahneman-Tversky prospect theory; no preference pairs needed.	Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela	2024	145

Preference Elicitation & Active Learning

Efficiently collect preference data from users

2009 567 cited

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem

Yisong Yue, Thorsten Joachims

Introduced dueling bandits for pairwise preference learning; enables online learning without absolute labels.

2012 389 cited

The K-Armed Dueling Bandits Problem

Yisong Yue, Josef Broder, Robert Kleinberg, Thorsten Joachims

Extended dueling bandits to K arms with regret bounds; Interleaved Filter algorithm for search evaluation.

2021 178 cited

Preference-based Online Learning with Dueling Bandits: A Survey

Viktor Bengs, Róbert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier

Comprehensive 108-page survey of dueling bandits variants, algorithms, and applications.

2018 234 cited

Stagewise Safe Bayesian Optimization with Gaussian Processes

Yanan Sui, Vincent Zhuang, Joel Burdick, Yisong Yue

Safe preference-based optimization separating exploration from exploitation; applicable to clinical/robotics.

2015 678 cited

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Adith Swaminathan, Thorsten Joachims

Propensity-weighted learning from logged actions; foundation for offline policy learning in recommendations.

Title	Authors	Year	Citations
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Introduced dueling bandits for pairwise preference learning; enables online learning without absolute labels.	Yisong Yue, Thorsten Joachims	2009	567
The K-Armed Dueling Bandits Problem Extended dueling bandits to K arms with regret bounds; Interleaved Filter algorithm for search evaluation.	Yisong Yue, Josef Broder, Robert Kleinberg, Thorsten Joachims	2012	389
Preference-based Online Learning with Dueling Bandits: A Survey Comprehensive 108-page survey of dueling bandits variants, algorithms, and applications.	Viktor Bengs, Róbert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier	2021	178
Stagewise Safe Bayesian Optimization with Gaussian Processes Safe preference-based optimization separating exploration from exploitation; applicable to clinical/robotics.	Yanan Sui, Vincent Zhuang, Joel Burdick, Yisong Yue	2018	234
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback Propensity-weighted learning from logged actions; foundation for offline policy learning in recommendations.	Adith Swaminathan, Thorsten Joachims	2015	678

Choice Modeling from Behavioral Data

Learn preferences from click logs and digital traces

2008 1234 cited

An Experimental Comparison of Click Position-Bias Models

Nick Craswell, Onno Zoeter, Michael Taylor, Bill Ramsey

Seminal click modeling paper; introduced cascade and position-based models; foundation for bias correction.

2009 987 cited

A Dynamic Bayesian Network Click Model for Web Search Ranking

Olivier Chapelle, Ya Zhang

DBN click model capturing examination chains and satisfaction; enables unbiased relevance estimation.

2017 756 cited

Unbiased Learning-to-Rank with Biased Feedback

Thorsten Joachims, Adith Swaminathan, Tobias Schnabel

Counterfactual framework for unbiased LTR; Propensity-Weighted Ranking SVM; highly influential for debiasing.

2015 456 cited

Click Models for Web Search

Aleksandr Chuklin, Ilya Markov, Maarten de Rijke

Comprehensive survey of click models, estimation methods, and applications to search evaluation.

Title	Authors	Year	Citations
An Experimental Comparison of Click Position-Bias Models Seminal click modeling paper; introduced cascade and position-based models; foundation for bias correction.	Nick Craswell, Onno Zoeter, Michael Taylor, Bill Ramsey	2008	1234
A Dynamic Bayesian Network Click Model for Web Search Ranking DBN click model capturing examination chains and satisfaction; enables unbiased relevance estimation.	Olivier Chapelle, Ya Zhang	2009	987
Unbiased Learning-to-Rank with Biased Feedback Counterfactual framework for unbiased LTR; Propensity-Weighted Ranking SVM; highly influential for debiasing.	Thorsten Joachims, Adith Swaminathan, Tobias Schnabel	2017	756
Click Models for Web Search Comprehensive survey of click models, estimation methods, and applications to search evaluation.	Aleksandr Chuklin, Ilya Markov, Maarten de Rijke	2015	456

Value Alignment & AI Safety

Ensure AI systems pursue intended objectives

2016 2345 cited

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané

Taxonomy of five safety problems: side effects, reward hacking, scalable oversight, safe exploration, distributional shift.

2022 234 cited

Goal Misgeneralization in Deep Reinforcement Learning

Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna et al. (DeepMind)

Demonstrates agents can pursue wrong goals even with correct specifications; distinct from reward hacking.

2023 189 cited

Scaling Laws for Reward Model Overoptimization

Leo Gao, John Schulman, Jacob Hilton

First systematic study of Goodhart's Law in RLHF; provides predictable scaling for safe optimization bounds.

2023 145 cited

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt (OpenAI)

Shows GPT-2 can supervise GPT-4; core empirical work on scalable oversight for superhuman AI alignment.

Title	Authors	Year	Citations
Concrete Problems in AI Safety Taxonomy of five safety problems: side effects, reward hacking, scalable oversight, safe exploration, distributional shift.	Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané	2016	2345
Goal Misgeneralization in Deep Reinforcement Learning Demonstrates agents can pursue wrong goals even with correct specifications; distinct from reward hacking.	Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna et al. (DeepMind)	2022	234
Scaling Laws for Reward Model Overoptimization First systematic study of Goodhart's Law in RLHF; provides predictable scaling for safe optimization bounds.	Leo Gao, John Schulman, Jacob Hilton	2023	189
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Shows GPT-2 can supervise GPT-4; core empirical work on scalable oversight for superhuman AI alignment.	Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt (OpenAI)	2023	145

Personalization & User Modeling

Learn and represent individual user preferences

2009 14567 cited

Matrix Factorization Techniques for Recommender Systems

Yehuda Koren, Robert Bell, Chris Volinsky

Netflix Prize winners' tutorial; latent factor models, implicit feedback, temporal dynamics; 14,000+ citations.

2008 5678 cited

Collaborative Filtering for Implicit Feedback Datasets

Yifan Hu, Yehuda Koren, Chris Volinsky

Weighted matrix factorization for clicks/views; confidence-weighted preference learning; industry standard.

2009 3456 cited

BPR: Bayesian Personalized Ranking from Implicit Feedback

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, Lars Schmidt-Thieme

Pairwise ranking optimization from Bayesian principles; first method optimizing ranking directly for implicit data.