Clustering & Segmentation

Group similar customers to personalize experiences • 61 papers

10 subtopics

Customer Segmentation

Group customers by behavior for targeting

1994

RFM Analysis for Customer Segmentation

Arthur Hughes

Recency-Frequency-Monetary framework still widely used in retail and e-commerce.

2009 27 cited

Customer Lifetime Value Segmentation

Peter Fader, Bruce Hardie

Probability models (BG/NBD) for CLV estimation enabling value-based segmentation.

2015

Spotify's Discover Weekly: Machine Learning Meets Human Curation

Spotify Engineering

Clustering taste profiles to power personalized playlists at scale.

2004 1987 cited

Customer Lifetime Value: Modeling and Recommendations

Rajkumar Venkatesan, V. Kumar

Framework for predicting individual-level CLV using past transaction data—foundational for value-based marketing.

1987 3456 cited

Counting Your Customers: Who Are They and What Will They Do Next?

David Schmittlein, Donald Morrison, Richard Colombo

Pareto/NBD model for customer counting and transaction prediction—still used at scale in industry.

2003 2345 cited

A Model of Customer Lifetime Value

Sunil Gupta, Donald Lehmann

Linking customer acquisition, retention, and expansion to CLV—connects marketing spend to lifetime value.

Classical Clustering Algorithms

Foundational partitioning, hierarchical, and density-based clustering algorithms

1979 14024 cited

Algorithm AS 136: A K-Means Clustering Algorithm

John Hartigan, Manchek Wong

Definitive k-means formulation with convergence guarantees—still the default.

1963 18547 cited

Hierarchical Grouping to Optimize an Objective Function

Joe Ward

Ward's method for agglomerative clustering minimizing within-cluster variance.

2012 612 cited

Scalable K-Means++

Bahman Bahmani et al.

Parallel k-means initialization achieving logarithmic rounds—default in Spark MLlib.

1996 32456 cited

A Density-Based Algorithm for Discovering Clusters (DBSCAN)

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu

Density-based clustering finding arbitrarily shaped clusters and handling noise—foundational for spatial clustering.

1996 5678 cited

BIRCH: An Efficient Data Clustering Method for Very Large Databases

Tian Zhang, Raghu Ramakrishnan, Miron Livny

Hierarchical clustering using CF-trees for single-scan scalability—enables clustering of millions of records.

1999 6789 cited

OPTICS: Ordering Points To Identify the Clustering Structure

Mihael Ankerst, Markus Breunig, Hans-Peter Kriegel, Jörg Sander

Density-based ordering producing cluster hierarchy without fixed epsilon—extends DBSCAN for variable densities.

2002 12345 cited

Mean Shift: A Robust Approach Toward Feature Space Analysis

Dorin Comaniciu, Peter Meer

Non-parametric mode-seeking algorithm for clustering without specifying number of clusters.

2010 2345 cited

Web-Scale K-Means Clustering

David Sculley

Mini-batch k-means enabling streaming updates—Google's approach for web-scale clustering.

Model-Based Clustering

Use probabilistic models for clustering

2000 7411 cited

Finite Mixture Models

Geoffrey McLachlan, David Peel

Comprehensive treatment of Gaussian and non-Gaussian mixture estimation.

1968 1446 cited

Latent Class Analysis

Paul Lazarsfeld, Neil Henry

Foundational discrete mixture model for categorical response patterns.

2006 1439 cited

Variational Inference for Dirichlet Process Mixtures

David Blei, Michael Jordan

Scalable non-parametric Bayesian clustering without specifying K.

2003 52000 cited

Latent Dirichlet Allocation

David Blei, Andrew Ng, Michael Jordan

Generative probabilistic model for topic modeling—foundational for discovering latent topics in text collections.

2006 5678 cited

Hierarchical Dirichlet Processes

Yee Whye Teh, Michael Jordan, Matthew Beal, David Blei

Non-parametric Bayesian approach for sharing clusters across grouped data—automatic topic number selection.

2002 8432 cited

Model-Based Clustering, Discriminant Analysis, and Density Estimation

Chris Fraley, Adrian Raftery

Gaussian mixture model framework with automatic model selection via BIC—implemented in R's mclust package.

Embedding-Based Clustering

Cluster using learned representations

2016 186 cited

Deep Embedded Clustering (DEC)

Junyuan Xie, Ross Girshick, Ali Farhadi

Joint representation learning and clustering via autoencoders.

2011 499 cited

Spectral Clustering and the High-Dimensional Stochastic Blockmodel

Karl Rohe, Sourav Chatterjee, Bin Yu

Theoretical foundation for spectral methods in network/embedding clustering.

2021 559 cited

Contrastive Clustering

Yunfan Li et al.

Self-supervised contrastive objectives for cluster-friendly representations.

2001 15678 cited

On Spectral Clustering: Analysis and an Algorithm

Andrew Ng, Michael Jordan, Yair Weiss

Foundational spectral clustering using Laplacian eigenvectors—NeurIPS best paper, widely implemented.

2020 2345 cited

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (SwAV)

Mathilde Caron et al.

Self-supervised visual learning via online clustering—state-of-the-art for unsupervised image representations.

2020 987 cited

SCAN: Learning to Classify Images without Labels

Wouter Van Gansbeke et al.

Two-step unsupervised classification via representation learning then clustering—strong ImageNet results.

2017 1234 cited

Variational Deep Embedding (VaDE)

Zhuxi Jiang et al.

VAE-based clustering combining variational autoencoders with GMM priors for end-to-end learning.

2018 2567 cited

Deep Clustering for Unsupervised Learning of Visual Features (DeepCluster)

Mathilde Caron et al.

Iterative clustering and CNN training for unsupervised feature learning—Facebook AI's breakthrough.

Segmentation for Targeting

Create actionable segments for personalization

2021 66 cited

Heterogeneous Treatment Effects and Optimal Targeting

Susan Athey, Stefan Wager

Causal forests for estimating HTEs and deriving optimal targeting policies.

2012 870 cited

Uplift Modeling for Clinical Trial Data

Maciej Jaskowski, Szymon Jaroszewicz

Foundational uplift/treatment-effect modeling enabling segment-specific interventions.

2015

Personalization at Spotify Using Cassandra

Spotify Engineering

Large-scale user segmentation powering real-time recommendations.

2017 78 cited

Netflix Artwork Personalization

Netflix Tech Blog

Segment-based image selection improving engagement through visual personalization.

2010 4567 cited

A Contextual-Bandit Approach to Personalized News Article Recommendation (LinUCB)

Lihong Li et al.

Linear UCB algorithm for personalization at Yahoo—foundational contextual bandit for segment-based recommendations.

2016 234 cited

The Microsoft Decision Service

Alekh Agarwal et al.

Production system for personalization via contextual bandits—deployed across Microsoft products.

2014 345 cited

Online Clustering of Bandits

Claudio Gentile, Shuai Li, Giovanni Zappella

Dynamic user clustering for bandits—learns segment structure while optimizing recommendations.

Music & Audio Clustering

Organize music and audio content for discovery and recommendations

2011 2345 cited

Million Song Dataset

Thierry Bertin-Mahieux et al.

Benchmark dataset for music analysis enabling audio feature clustering research at scale.

2002 1234 cited

Content-Based Music Information Retrieval: Current Directions and Future Challenges

Malcolm Slaney

Survey of audio feature extraction for music similarity and clustering—foundational for MIR.

2016 8765 cited

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord et al.

Deep generative model for audio—enables learned audio embeddings for clustering.

2015 3456 cited

librosa: Audio and Music Signal Analysis in Python

Brian McFee et al.

Standard Python library for audio feature extraction—MFCCs, spectrograms for clustering.

2016 987 cited

Automatic Tagging Using Deep Convolutional Neural Networks

Keunwoo Choi et al.

CNN for music auto-tagging enabling tag-based clustering and organization.

2018

Spotify's Audio Features and Track Analysis

Spotify Engineering

Audio analysis API powering playlist generation and music clustering at scale.

2012 567 cited

Music Genre Classification with the Million Song Dataset

Bob Sturm

Benchmark for genre classification—evaluates clustering approaches on real music data.

Video & Movie Clustering

Organize video content for streaming recommendations

2016 4567 cited

Deep Neural Networks for YouTube Recommendations

Paul Covington et al.

Two-tower architecture for video clustering and candidate generation at YouTube scale.

2015 2345 cited

The Netflix Recommender System: Algorithms, Business Value, and Innovation

Carlos Gomez-Uribe, Neil Hunt

Overview of Netflix's recommendation system including movie clustering and personalization.

2016 1876 cited

YouTube-8M: A Large-Scale Video Classification Benchmark

Sami Abu-El-Haija et al.

8 million videos with labels for video understanding—benchmark for video clustering research.

2019 345 cited

Embarrassingly Shallow Autoencoders for Sparse Data (EASE)

Harald Steck

Simple but effective collaborative filtering for movie recommendations—Netflix competition winner approach.

2009 12345 cited

Matrix Factorization Techniques for Recommender Systems

Yehuda Koren, Robert Bell, Chris Volinsky

Netflix Prize winning approach using latent factors—foundational for content clustering.

2018 876 cited

Variational Autoencoders for Collaborative Filtering

Dawen Liang et al.

VAE-based collaborative filtering—Netflix research on implicit feedback clustering.

Game & UGC Clustering

Organize games, user-generated content, and player experiences

2013 987 cited

Player Modeling in Video Games

Georgios N. Yannakakis et al.

Survey of player modeling techniques including behavior clustering and segmentation.

2015 234 cited

Predicting Player Churn in Video Games Using Survival Analysis and Clustering

Rafet Sifa et al.

Combining survival models with player clusters for churn prediction in games.

2006 1567 cited

Analyzing User Behavior in MMORPGs

Nick Yee

Foundational study of player motivations and behavioral clustering in online games.

2020 345 cited

Deep Learning for Video Game Content Generation

Julian Togelius et al.

Survey of ML for game content—includes UGC clustering and content organization.

2018 456 cited

Game Data Mining

Mohamed Medhat Gaber, Arkady Zaslavsky, Shonali Krishnaswamy

Comprehensive guide to mining game data including player segmentation techniques.

Text & Document Clustering

Organize text content and documents for search and discovery

2014 876 cited

A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering (GSDMM)

Jianhua Yin, Jianyong Wang

Collapsed Gibbs sampling for short text clustering—handles sparse, short documents like tweets.

2019 4567 cited

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers, Iryna Gurevych

Efficient sentence embeddings for semantic similarity—enables fast document clustering.

2021 123 cited

Self-Training with Contrastive Clustering for Short Text Clustering (STC2)

Ting Gu et al.

State-of-the-art short text clustering combining contrastive learning with self-training.

2012 2345 cited

A Survey of Text Clustering Algorithms

Charu Aggarwal, ChengXiang Zhai

Comprehensive survey of document clustering methods from TF-IDF to neural approaches.

Visual Content Clustering

Organize images and visual content for discovery and search

2019 234 cited

Unifying Visual Embeddings for Visual Search at Pinterest

Andrew Zhai et al.

Multi-task visual embeddings for image clustering and similarity search at Pinterest scale.

2016 156789 cited

Deep Residual Learning for Image Recognition (ResNet)

Kaiming He et al.

ResNet architecture enabling powerful visual features for image clustering.

2021 12345 cited

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

Alec Radford et al.

Vision-language model enabling zero-shot image clustering via text descriptions.

2020

Pinterest Visual Search: The Evolution and Beyond

Pinterest Engineering

Evolution of visual search and image clustering at Pinterest—practical lessons at scale.

Must-read papers for tech economists and applied researchers