Privacy

Protect user data while still enabling useful analysis • 52 papers

10 subtopics

Differential Privacy

Add mathematical privacy guarantees to data analysis

2006 6726 cited

Calibrating Noise to Sensitivity in Private Data Analysis

Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith

Foundational paper introducing ε-differential privacy, Laplace mechanism, and noise calibration to sensitivity.

2014 3827 cited

The Algorithmic Foundations of Differential Privacy

Cynthia Dwork, Aaron Roth

The definitive textbook covering DP techniques, composition, mechanism design, and ML applications.

2014

RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response

Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova

Google's practical local DP system deployed in Chrome with longitudinal privacy guarantees.

2016 5249 cited

Deep Learning with Differential Privacy

Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang

Introduces DP-SGD algorithm and moments accountant for training neural networks with formal DP guarantees.

2015 34 cited

The Composition Theorem for Differential Privacy

Peter Kairouz, Sewoong Oh, Pramod Viswanath

Proves optimal composition bounds for differential privacy; essential for privacy budget management.

2017 1456 cited

Rényi Differential Privacy

Ilya Mironov

Defines RDP using Rényi divergence; cleaner composition analysis used in TensorFlow Privacy.

2019 567 cited

Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Abhradeep Thakurta

Proves shuffle model bridges local/central DP gap; deployed in Apple and Google systems.

2018 456 cited

Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences

Borja Balle, Gilles Barthe, Marco Gaboardi

Unified framework for subsampling privacy amplification; essential for DP-SGD analysis.

2022 234 cited

Gaussian Differential Privacy

Jinshuo Dong, Aaron Roth, Weijie J. Su

f-DP and GDP with lossless composition via CLT; modern privacy accounting foundation.

Federated Learning

Train models without centralizing user data

2017 5171 cited

Communication-Efficient Learning of Deep Networks from Decentralized Data

H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas

Seminal paper introducing Federated Learning and FedAvg algorithm; enables ML without raw data collection.

2017 3002 cited

Practical Secure Aggregation for Privacy-Preserving Machine Learning

Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, et al.

Cryptographic protocol for aggregating model updates without seeing individual contributions.

2016 3046 cited

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

Introduces structured updates and sketching to reduce communication costs in federated settings.

2020 2345 cited

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Control variates for client drift under heterogeneity; tight convergence guarantees for non-IID data.

2018 789 cited

Learning Differentially Private Recurrent Language Models

H. Brendan McMahan, Daniel Ramage, Kunal Talwar, Li Zhang

User-level DP for federated learning; production deployment in Gboard next-word prediction.

2021 567 cited

Ditto: Fair and Robust Federated Learning Through Personalization

Tian Li, Shengyuan Hu, Ahmad Beirami, Virginia Smith

Personalized FL addressing fairness and robustness via local regularization toward global model.

2021 4567 cited

Advances and Open Problems in Federated Learning

Peter Kairouz, H. Brendan McMahan, et al. (58 authors)

Definitive 210-page survey defining cross-device/cross-silo taxonomy and 50+ open problems.

Privacy-Preserving Measurement

Measure ad effectiveness while protecting privacy

2014

RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response

Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova

Foundation for privacy-preserving telemetry with local DP guarantees for aggregate statistics.

2018 247 cited

Scalable Private Learning with PATE

Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Úlfar Erlingsson

Scaled PATE framework with GNMax aggregation; achieves strong privacy (ε < 1) while maintaining utility.

2017 1 cited

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data (PATE)

Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar

Private Aggregation of Teacher Ensembles; enables privacy-preserving ML via knowledge transfer.

2023 23 cited

Interoperable Private Attribution: A Proposal

Benjamin Case, Richa Jain, Alex Koshelev, Andy Leiserson, Daniel Masny, Erik Taubeneck, Martin Thomson, et al.

MPC + DP protocol for cross-site attribution without user tracking; W3C Privacy CG proposal by Meta/Mozilla.

2022 34 cited

Ibex: Privacy-Preserving Ad Conversion Tracking and Bidding

Ke Zhong, Yiping Ma, Sebastian Angel

Encrypted conversion measurement and oblivious real-time bidding using MPC and secret sharing.

Synthetic Data

Generate fake data that preserves statistical properties

2019

Modeling Tabular Data using Conditional GAN (CTGAN)

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni

State-of-the-art GAN for synthetic tabular data; handles mixed types with mode-specific normalization.

2017 1 cited

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data (PATE)

Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar

PATE enables training on synthetic/unlabeled data with DP guarantees transferred from teacher ensemble.

2019 456 cited

PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees

James Jordon, Jinsung Yoon, Mihaela van der Schaar

Combines PATE framework with GANs for DP synthetic data generation.

2017 678 cited

PrivBayes: Private Data Release via Bayesian Networks

Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, Xiaokui Xiao

Bayesian network approach to DP synthesis; widely deployed baseline for synthetic data.

2021 234 cited

Winning the NIST Contest: A Scalable Approach to DP Synthetic Data

Ryan McKenna, Gerome Miklau, Daniel Sheldon

MST/Private-PGM marginal-based synthesis that won NIST DP synthetic data competition.

Anonymization & De-identification

Remove identifying information from datasets

2002 8284 cited

k-Anonymity: A Model for Protecting Privacy

Latanya Sweeney

Foundational paper introducing k-anonymity; each record indistinguishable from k-1 others on quasi-identifiers.

2008

Robust De-anonymization of Large Sparse Datasets (Netflix)

Arvind Narayanan, Vitaly Shmatikov

Landmark attack demonstrating re-identification of Netflix users; showed k-anonymity fails on high-dimensional data.

2007 3504 cited

ℓ-Diversity: Privacy Beyond k-Anonymity

Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, Muthuramakrishnan Venkitasubramaniam

Extends k-anonymity by requiring diversity in sensitive attributes; defends against homogeneity attacks.

2007

t-Closeness: Privacy Beyond k-Anonymity and ℓ-Diversity

Ninghui Li, Tiancheng Li, Suresh Venkatasubramanian

Requires sensitive attribute distribution in each equivalence class be close to overall distribution.

2018 567 cited

Identity Inference of Genomic Data Using Long-Range Familial Searches

Yaniv Erlich, Tal Shor, Itsik Pe'er, Shai Carmi

Shows 60% of European-descent Americans re-identifiable via genetic genealogy databases like GEDmatch.

Secure Computation & MPC

Enable computation on private data using cryptographic protocols

1986 4567 cited

How to Generate and Exchange Secrets

Andrew C. Yao

Garbled circuits for secure two-party computation; foundational 2PC technique enabling oblivious computation.

1987 3456 cited

How to Play Any Mental Game (GMW Protocol)

Oded Goldreich, Silvio Micali, Avi Wigderson

GMW protocol proving completeness of secure MPC with honest majority; any function computable securely.

1988 2890 cited

Completeness Theorems for Non-Cryptographic Fault-Tolerant Distributed Computation (BGW)

Michael Ben-Or, Shafi Goldwasser, Avi Wigderson

BGW protocol for information-theoretic MPC; 2023 Dijkstra Prize winner for seminal distributed computing paper.

2009 8900 cited

Fully Homomorphic Encryption Using Ideal Lattices

Craig Gentry

First FHE construction enabling arbitrary computation on encrypted data; breakthrough cryptographic result.

2017 456 cited

Practical Multi-party Private Set Intersection from Symmetric-Key Techniques

Vladimir Kolesnikov, Naor Matania, Benny Pinkas, Mike Rosulek, Ni Trieu

Practical multi-party PSI using OPPRF; deployed for privacy-preserving ad matching and contact discovery.

Privacy Economics & Regulation

Understand economic effects of privacy regulation and data markets

2011 1234 cited

Privacy Regulation and Online Advertising

Avi Goldfarb, Catherine E. Tucker

EU privacy directive reduced ad effectiveness by 65%; first major empirical study of privacy regulation impact.

2021 345 cited

The Short-Run Effects of GDPR on Technology Venture Investment

Jian Jia, Ginger Zhe Jin, Liad Wagman

GDPR reduced EU tech venture investment by ~26%; rigorous diff-in-diff analysis of regulation effects.

2009 567 cited

Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records

Amalia R. Miller, Catherine Tucker

State privacy laws reduced EMR adoption by 24%; pioneering regulation-innovation tradeoff study.

ML Privacy Attacks

Understand privacy risks in machine learning systems

2017 3456 cited

Membership Inference Attacks Against Machine Learning Models

Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov

Foundational paper introducing shadow model-based membership inference; spawned entire research area.

2015 1890 cited

Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures

Matt Fredrikson, Somesh Jha, Thomas Ristenpart

Demonstrated recovery of training faces from facial recognition confidence scores; motivates output privacy.

2021 1234 cited

Extracting Training Data from Large Language Models

Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, et al.

First demonstration of verbatim training data extraction from GPT-2 including PII; motivated LLM safety research.

2019 1567 cited

Deep Leakage from Gradients

Ligeng Zhu, Zhijian Liu, Song Han

Gradient inversion reconstructs pixel-perfect training images from gradients; motivated secure aggregation.

2019 789 cited

Exploiting Unintended Feature Leakage in Collaborative Learning

Luca Melis, Congzheng Song, Emiliano De Cristofaro, Vitaly Shmatikov

Property inference attacks on federated learning; passive and active variants extract sensitive attributes.

PIR & Anonymous Systems

Access information without revealing query patterns

1998 2345 cited

Private Information Retrieval

Benny Chor, Oded Goldreich, Eyal Kushilevitz, Madhu Sudan

Foundational PIR paper; proves single-server IT-PIR impossible sub-linearly, introduces multi-server PIR.

1981 5678 cited

Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms

David L. Chaum

Introduces mix networks; first practical solution to traffic analysis enabling anonymous communication.

2004 6789 cited

Tor: The Second-Generation Onion Router

Roger Dingledine, Nick Mathewson, Paul Syverson

Design of Tor with perfect forward secrecy, directory servers, hidden services; deployed to millions.

2007 456 cited

Improving the Robustness of Private Information Retrieval

Ian Goldberg

Byzantine-robust multi-server PIR; first practical open-source PIR implementation (Percy++).

Privacy-Preserving ML

Train and run ML models on encrypted or private data

2017 1234 cited

SecureML: A System for Scalable Privacy-Preserving Machine Learning

Payman Mohassel, Yupeng Zhang

First practical MPC-based neural network training system; enables ML on private data from multiple parties.

2019 456 cited

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

Florian Tramèr, Dan Boneh

Efficient DNN execution in Intel SGX with cryptographic verification of correct computation.

2020 567 cited

Delphi: A Cryptographic Inference Service for Neural Networks

Pratyush Mishra, Ryan Lehmkuhl, Akshayaram Srinivasan, Wenting Zheng, Raluca Ada Popa

22× faster secure inference via ML-crypto co-design; hybrid HE + garbled circuits approach.

2022 123 cited

Iron: Private Inference on Transformers

Meng Hao, Hongwei Li, Hanxiao Chen, Pengzhi Xing, Guowen Xu, Tianwei Zhang

First efficient 2PC framework for BERT/GPT inference; specialized protocols for Softmax and GELU.

Must-read papers for tech economists and applied researchers