Privacy
Protect user data while still enabling useful analysis • 52 papers
Differential Privacy
Add mathematical privacy guarantees to data analysis
Calibrating Noise to Sensitivity in Private Data Analysis
Foundational paper introducing ε-differential privacy, Laplace mechanism, and noise calibration to sensitivity.
The Algorithmic Foundations of Differential Privacy
The definitive textbook covering DP techniques, composition, mechanism design, and ML applications.
RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response
Google's practical local DP system deployed in Chrome with longitudinal privacy guarantees.
Deep Learning with Differential Privacy
Introduces DP-SGD algorithm and moments accountant for training neural networks with formal DP guarantees.
The Composition Theorem for Differential Privacy
Proves optimal composition bounds for differential privacy; essential for privacy budget management.
Rényi Differential Privacy
Defines RDP using Rényi divergence; cleaner composition analysis used in TensorFlow Privacy.
Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity
Proves shuffle model bridges local/central DP gap; deployed in Apple and Google systems.
Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences
Unified framework for subsampling privacy amplification; essential for DP-SGD analysis.
Gaussian Differential Privacy
f-DP and GDP with lossless composition via CLT; modern privacy accounting foundation.
Federated Learning
Train models without centralizing user data
Communication-Efficient Learning of Deep Networks from Decentralized Data
Seminal paper introducing Federated Learning and FedAvg algorithm; enables ML without raw data collection.
Practical Secure Aggregation for Privacy-Preserving Machine Learning
Cryptographic protocol for aggregating model updates without seeing individual contributions.
Federated Learning: Strategies for Improving Communication Efficiency
Introduces structured updates and sketching to reduce communication costs in federated settings.
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
Control variates for client drift under heterogeneity; tight convergence guarantees for non-IID data.
Learning Differentially Private Recurrent Language Models
User-level DP for federated learning; production deployment in Gboard next-word prediction.
Ditto: Fair and Robust Federated Learning Through Personalization
Personalized FL addressing fairness and robustness via local regularization toward global model.
Advances and Open Problems in Federated Learning
Definitive 210-page survey defining cross-device/cross-silo taxonomy and 50+ open problems.
Privacy-Preserving Measurement
Measure ad effectiveness while protecting privacy
RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response
Foundation for privacy-preserving telemetry with local DP guarantees for aggregate statistics.
Scalable Private Learning with PATE
Scaled PATE framework with GNMax aggregation; achieves strong privacy (ε < 1) while maintaining utility.
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data (PATE)
Private Aggregation of Teacher Ensembles; enables privacy-preserving ML via knowledge transfer.
Interoperable Private Attribution: A Proposal
MPC + DP protocol for cross-site attribution without user tracking; W3C Privacy CG proposal by Meta/Mozilla.
Ibex: Privacy-Preserving Ad Conversion Tracking and Bidding
Encrypted conversion measurement and oblivious real-time bidding using MPC and secret sharing.
Synthetic Data
Generate fake data that preserves statistical properties
Modeling Tabular Data using Conditional GAN (CTGAN)
State-of-the-art GAN for synthetic tabular data; handles mixed types with mode-specific normalization.
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data (PATE)
PATE enables training on synthetic/unlabeled data with DP guarantees transferred from teacher ensemble.
PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees
Combines PATE framework with GANs for DP synthetic data generation.
PrivBayes: Private Data Release via Bayesian Networks
Bayesian network approach to DP synthesis; widely deployed baseline for synthetic data.
Winning the NIST Contest: A Scalable Approach to DP Synthetic Data
MST/Private-PGM marginal-based synthesis that won NIST DP synthetic data competition.
Anonymization & De-identification
Remove identifying information from datasets
k-Anonymity: A Model for Protecting Privacy
Foundational paper introducing k-anonymity; each record indistinguishable from k-1 others on quasi-identifiers.
Robust De-anonymization of Large Sparse Datasets (Netflix)
Landmark attack demonstrating re-identification of Netflix users; showed k-anonymity fails on high-dimensional data.
ℓ-Diversity: Privacy Beyond k-Anonymity
Extends k-anonymity by requiring diversity in sensitive attributes; defends against homogeneity attacks.
t-Closeness: Privacy Beyond k-Anonymity and ℓ-Diversity
Requires sensitive attribute distribution in each equivalence class be close to overall distribution.
Identity Inference of Genomic Data Using Long-Range Familial Searches
Shows 60% of European-descent Americans re-identifiable via genetic genealogy databases like GEDmatch.
Secure Computation & MPC
Enable computation on private data using cryptographic protocols
How to Generate and Exchange Secrets
Garbled circuits for secure two-party computation; foundational 2PC technique enabling oblivious computation.
How to Play Any Mental Game (GMW Protocol)
GMW protocol proving completeness of secure MPC with honest majority; any function computable securely.
Completeness Theorems for Non-Cryptographic Fault-Tolerant Distributed Computation (BGW)
BGW protocol for information-theoretic MPC; 2023 Dijkstra Prize winner for seminal distributed computing paper.
Fully Homomorphic Encryption Using Ideal Lattices
First FHE construction enabling arbitrary computation on encrypted data; breakthrough cryptographic result.
Practical Multi-party Private Set Intersection from Symmetric-Key Techniques
Practical multi-party PSI using OPPRF; deployed for privacy-preserving ad matching and contact discovery.
Privacy Economics & Regulation
Understand economic effects of privacy regulation and data markets
Privacy Regulation and Online Advertising
EU privacy directive reduced ad effectiveness by 65%; first major empirical study of privacy regulation impact.
The Short-Run Effects of GDPR on Technology Venture Investment
GDPR reduced EU tech venture investment by ~26%; rigorous diff-in-diff analysis of regulation effects.
Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records
State privacy laws reduced EMR adoption by 24%; pioneering regulation-innovation tradeoff study.
ML Privacy Attacks
Understand privacy risks in machine learning systems
Membership Inference Attacks Against Machine Learning Models
Foundational paper introducing shadow model-based membership inference; spawned entire research area.
Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
Demonstrated recovery of training faces from facial recognition confidence scores; motivates output privacy.
Extracting Training Data from Large Language Models
First demonstration of verbatim training data extraction from GPT-2 including PII; motivated LLM safety research.
Deep Leakage from Gradients
Gradient inversion reconstructs pixel-perfect training images from gradients; motivated secure aggregation.
Exploiting Unintended Feature Leakage in Collaborative Learning
Property inference attacks on federated learning; passive and active variants extract sensitive attributes.
PIR & Anonymous Systems
Access information without revealing query patterns
Private Information Retrieval
Foundational PIR paper; proves single-server IT-PIR impossible sub-linearly, introduces multi-server PIR.
Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms
Introduces mix networks; first practical solution to traffic analysis enabling anonymous communication.
Tor: The Second-Generation Onion Router
Design of Tor with perfect forward secrecy, directory servers, hidden services; deployed to millions.
Improving the Robustness of Private Information Retrieval
Byzantine-robust multi-server PIR; first practical open-source PIR implementation (Percy++).
Privacy-Preserving ML
Train and run ML models on encrypted or private data
SecureML: A System for Scalable Privacy-Preserving Machine Learning
First practical MPC-based neural network training system; enables ML on private data from multiple parties.
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
Efficient DNN execution in Intel SGX with cryptographic verification of correct computation.
Delphi: A Cryptographic Inference Service for Neural Networks
22× faster secure inference via ML-crypto co-design; hybrid HE + garbled circuits approach.
Iron: Private Inference on Transformers
First efficient 2PC framework for BERT/GPT inference; specialized protocols for Softmax and GELU.