Search & Ranking
Help users find exactly what they're looking for • 53 papers
Learning to Rank
Train models that order search results optimally
From RankNet to LambdaRank to LambdaMART: An Overview
Definitive reference unifying the RankNet family; LambdaMART remains the industry workhorse for gradient-boosted ranking.
Learning to Rank using Gradient Descent (RankNet)
Foundational pairwise neural ranking using cross-entropy loss; won ICML Test of Time Award 2015.
Learning to Rank: From Pairwise to Listwise (ListNet)
First listwise method using probability distributions over permutations; influenced all subsequent listwise methods.
Listwise Approach to Learning to Rank (ListMLE)
Foundational listwise LTR with theoretical analysis of listwise loss functions.
Optimizing Search Engines using Clickthrough Data
Seminal SVM-based pairwise LTR using click data for preference constraints; established click-based learning to rank paradigm.
Query Understanding
Figure out what users actually want from their searches
Understanding User Goals in Web Search
Seminal taxonomy of query intent (navigational, informational, transactional); foundational framework still used today.
Building Bridges for Web Query Classification
Foundational work on mapping queries to topic categories using intermediary data sources.
Deep Search Query Intent Understanding
Industrial-scale BERT-based intent classification for typeahead and search blending.
Relevance-Based Language Models
Foundational pseudo-relevance feedback method introducing RM3; widely used for query expansion in both traditional and neural retrieval.
Context-Sensitive Information Retrieval Using Implicit Feedback
Pioneered use of session context for query interpretation; demonstrated significant gains from implicit user feedback.
Few-Shot Generative Conversational Query Rewriting
Modern neural query reformulation using GPT-2 for conversational settings; addresses context carryover in multi-turn search.
Relevance vs. Engagement
Short-term clicks vs. satisfaction
Deep Neural Networks for YouTube Recommendations
Landmark paper on watch-time optimization vs. clicks; explains freshness handling at scale.
Recommending What Video to Watch Next
Multi-objective ranking balancing engagement with satisfaction using multi-gate mixture-of-experts.
150 Successful ML Models: 6 Lessons at Booking.com
Influential industry paper on balancing business metrics vs. user value in production systems.
Engagement, User Satisfaction, and Divisive Content Amplification
Demonstrates empirically that engagement-based ranking underperforms for user satisfaction.
Modeling Dwell Time to Predict Click-level Satisfaction
Established context-dependent satisfaction thresholds beyond fixed dwell time cutoffs; showed satisfaction prediction requires query-document context.
Beyond Clicks: Query Reformulation as a Predictor of Search Satisfaction
Demonstrated satisfaction signals exist beyond clicks; query reformulation patterns predict user satisfaction better than clicks alone.
Position Bias & Debiasing
Account for the fact that top results get more clicks
An Experimental Comparison of Click Position-Bias Models
Seminal empirical study establishing cascade model as best explanation for user examination behavior.
Unbiased Learning-to-Rank with Biased Feedback
Foundational counterfactual/IPS framework for unbiased LTR; propensity-weighted ranking SVM.
Click Models for Web Search
Comprehensive synthesis covering all major click models (PBM, DCM, DBN, UBM); essential reference.
Unbiased LTR with Unbiased Propensity Estimation (DLA)
Dual learning algorithm jointly training ranking and propensity models; widely used for production debiasing.
Improving Deep Learning for Airbnb Search
Practical industry case on position bias correction via dropout at inference; shows major production gains.
Addressing Trust Bias for Unbiased Learning-to-Rank
First rigorous treatment of trust bias in ULTR framework; extends counterfactual work to account for users trusting higher-ranked results more.
Semantic Search & Embeddings
Match meaning, not just keywords
Dense Passage Retrieval for Open-Domain QA (DPR)
Foundational dual-encoder dense retrieval; outperforms BM25 by 9-19%; established the dense retrieval paradigm.
ColBERT: Efficient Passage Search via Late Interaction
Introduced late interaction achieving near cross-encoder quality with bi-encoder speed.
Passage Re-ranking with BERT
Transformative demonstration of BERT for passage re-ranking; 27% improvement on MS MARCO.
Real-time Personalization using Embeddings at Airbnb
Best Paper Award. Production embedding system driving 99% of Airbnb conversions.
Sentence-BERT: Sentence Embeddings using Siamese Networks
Made BERT practical for dense retrieval with siamese architecture; widely used for semantic similarity.
SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
Learned sparse representations via MLM head with sparsity regularization; bridges gap between dense and sparse retrieval with interpretable term weights.
RocketQA: An Optimized Training Approach to Dense Passage Retrieval
Critical training strategies for dense retrieval: cross-batch negatives, denoised hard negatives, knowledge distillation from cross-encoder. Baidu research.
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
Global hard negative mining via ANN index (ANCE); addresses limitation of in-batch negatives by refreshing hard negatives from evolving model.
Neural Ranking Models
Use deep learning for search relevance
Multi-Stage Document Ranking with BERT
Established BERT-based cross-encoder reranking paradigm with MonoBERT and duoBERT; 'Expando-Mono-Duo' design pattern for neural IR pipelines.
From doc2query to docTTTTTquery
Neural document expansion via T5-generated queries; improves BM25 without runtime overhead by enriching documents at indexing time.
A Deep Relevance Matching Model for Ad-hoc Retrieval
Distinguished 'relevance matching' from 'semantic matching' in neural IR; histogram-based matching with term gating mechanism.
End-to-End Neural Ad-hoc Ranking with Kernel Pooling
Pioneered end-to-end neural ranking with interpretable soft-match kernels (K-NRM); spawned KNRM variants used in production.
Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Pre-training specifically for dense retrieval via Condenser architecture; coCondenser adds corpus-aware contrastive objective for further gains.
Retrieval-Augmented Generation
Combine search with generative AI
REALM: Retrieval-Augmented Language Model Pre-Training
First end-to-end pre-training of retrieval + LM with backpropagation through retrieval; foundational architecture for knowledge-intensive tasks.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
THE paper that coined 'RAG'; combines pre-trained retriever with seq2seq generator for open-domain QA. Foundation for modern RAG systems.
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Fusion-in-Decoder (FiD) architecture enabling efficient scaling to 100+ passages; backbone of Atlas and subsequent RAG systems.
Transformer Memory as a Differentiable Search Index
Generative retrieval paradigm (DSI): model memorizes corpus and generates document IDs directly; alternative to dense/sparse retrieve-then-rank.
Evaluation Methods for IR
Measure search quality effectively
Cumulated Gain-Based Evaluation of IR Techniques
Introduced DCG and NDCG; most widely used ranking metric enabling graded relevance evaluation.
Expected Reciprocal Rank for Graded Relevance
Cascade-based metric modeling user stopping behavior; primary TREC Web Track metric accounting for diminishing returns.
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Large-scale passage ranking benchmark with 1M queries; enabled neural retrieval research and remains primary leaderboard.
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models
18-dataset benchmark testing out-of-distribution generalization; key finding that BM25 remains robust while dense models struggle on domain shift.
Large-Scale Validation and Analysis of Interleaved Search Evaluation
Definitive validation of interleaving as online A/B testing gold standard; demonstrated high agreement with editorial judgments at scale.
Personalized & Conversational Search
Adapt search to individual users and context
Personalizing Search via Automated Analysis of Interests and Activities
Foundational personalization paper establishing paradigm for implicit user modeling from desktop and search history.
Modeling the Impact of Short- and Long-Term Behavior on Search Personalization
Established distinction between session-level and historical personalization signals; showed combination outperforms either alone.
Context-Aware Ranking in Web Search
Operationalized session and context signals in production LTR frameworks; showed how to integrate behavioral context into ranking features.
TREC CAsT 2019: The Conversational Assistance Track
Defining benchmark for conversational IR with 80 dialogues over 38M passages; established evaluation methodology for multi-turn search.
Open-Retrieval Conversational Question Answering
Open-domain retrieval for conversational QA; advances beyond simplified settings with ORConvQA dataset and baselines.
Result Diversification
Show varied results that cover different intents
The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries
THE seminal diversification paper introducing Maximal Marginal Relevance (MMR) formula; balances relevance with novelty.
Novelty and Diversity in Information Retrieval Evaluation
Standard evaluation framework for diversity introducing α-nDCG; enabled TREC Web diversity track and systematic diversity research.
Diversifying Search Results
Intent-aware diversification with formal coverage guarantees; models query as distribution over intents and optimizes expected coverage.