Cold Start in Recommendations

How to bootstrap personalization when you have no data

The Problem

Every recommendation system faces an existential crisis at birth: how do you recommend anything when you know nothing about your users?

This is the cold start problem—one of the most fundamental challenges in personalization. It appears in three forms:

Three Types of Cold Start

  • New User: A user with no interaction history
  • New Item: A product/content with no ratings yet
  • New System: A platform launching from scratch

Companies like Netflix, Spotify, and Amazon have each developed sophisticated solutions. The key insight from Andrew Chen’s research at Andreessen Horowitz: cold start isn’t just a technical problem—it’s a network effects problem.

In his book The Cold Start Problem, Chen describes the “atomic network”—the smallest viable unit of users that makes a product useful. For recommendations, this means having enough data to find meaningful patterns.

Common Approaches

1. Content-Based Bootstrapping

Use item features (genre, tags, description) to make initial recommendations without user history. This is how Spotify’s “Discover Weekly” handles new users—it analyzes audio features like tempo, energy, and acousticness to find similar songs.

The advantage: you can recommend new items immediately based on their metadata. The downside: you miss the serendipity that comes from collaborative patterns.

2. Hybrid Models

Combine collaborative filtering with content features. LightFM is the go-to library for this approach:

“LightFM can produce good results even for new users or items through its incorporation of side information.” — Maciej Kula, Lyst Engineering

The key innovation: LightFM learns embeddings for both users/items AND their features. When a new item arrives, it can immediately be placed in the embedding space based on its features.

3. Exploration Strategies

Use multi-armed bandits to actively learn preferences through controlled exploration. Netflix uses this for new profile onboarding—they show a diverse set of titles and use the ratings to quickly learn your taste.

The explore-exploit tradeoff: show what you think the user likes (exploit) vs. show something new to learn more (explore). Thompson Sampling and UCB are popular algorithms for this balance.

4. Transfer Learning

Use knowledge from related domains. If you know a user’s music preferences, you might infer something about their podcast preferences. Spotify does this across their audio products.

Try It Yourself

Here’s a minimal example using LightFM with the MovieLens dataset. Install with pip install lightfm:

from lightfm import LightFM
from lightfm.datasets import fetch_movielens
from lightfm.evaluation import precision_at_k

# Load data with item features (genre tags)
data = fetch_movielens(min_rating=4.0)

# Create hybrid model - 'warp' loss works well for implicit feedback
model = LightFM(loss='warp', no_components=64)

# Train on interactions + item features
# The item_features parameter is what enables cold-start!
model.fit(data['train'],
          item_features=data['item_features'],
          epochs=30,
          num_threads=4)

# Evaluate
train_precision = precision_at_k(model, data['train'], k=5).mean()
test_precision = precision_at_k(model, data['test'], k=5).mean()

print(f"Train Precision@5: {train_precision:.3f}")
print(f"Test Precision@5: {test_precision:.3f}")

Key insight: The item_features parameter is what enables cold-start recommendations. New items with known features (genres) can get recommendations immediately, even with zero ratings.

For new users, you can generate recommendations using only item features:

import numpy as np

def recommend_for_new_user(model, item_features, n_items, top_k=10):
    """
    Generate recommendations for a brand new user.
    Uses only item features - no interaction history needed.
    """
    # Create a new user ID (doesn't exist in training data)
    new_user_id = 0

    # Score all items using just their features
    scores = model.predict(
        user_ids=new_user_id,
        item_ids=np.arange(n_items),
        item_features=item_features
    )

    # Return top-k items
    top_items = np.argsort(-scores)[:top_k]
    return top_items, scores[top_items]

# Get recommendations for a cold-start user
n_items = data['train'].shape[1]
recommendations, scores = recommend_for_new_user(
    model,
    data['item_features'],
    n_items,
    top_k=10
)

print("Top 10 recommendations for new user:")
for i, (item_id, score) in enumerate(zip(recommendations, scores)):
    print(f"  {i+1}. Item {item_id} (score: {score:.3f})")

Real-World Applications

CompanyHow They Solve Cold Start
SpotifyUses audio features (tempo, energy, acousticness, danceability) to recommend songs to new users before learning their taste. Their “taste profiles” are built from 30-second listening segments.
AmazonLeverages product categories, “customers also bought” patterns, and browse history to bootstrap recommendations. New products get initial visibility through category placement.
NetflixAsks new users to rate ~10 titles during onboarding—a classic exploration strategy. They also use content features like cast, director, and genre for new releases.
Uber EatsUses location, time-of-day, and cuisine popularity to recommend restaurants before knowing individual preferences. Your first order strongly shapes future recommendations.
TikTokTheir “interest graph” starts with your country and device. The first few videos are diverse; your engagement (watch time, not just likes) rapidly personalizes the feed.
LinkedInUses your job title, company, and connections to recommend jobs and content. Professional signals provide strong cold-start features for B2B recommendations.

Further Reading

Essential Reading

Tools & Libraries

  • LightFM — Hybrid matrix factorization with cold-start support
  • Surprise — scikit-learn style API for collaborative filtering
  • RecBole — 94 deep learning recommendation models in PyTorch
  • Microsoft Recommenders — Production-quality examples and benchmarks

Courses

Key Researchers

  • Andrew Chen — General Partner at a16z, author of The Cold Start Problem
  • Maciej Kula — Creator of LightFM, previously ML at Lyst and Spotify

Datasets for Practice

  • MovieLens — Classic benchmark with 25M ratings and rich metadata
  • Netflix Prize — 100M+ anonymous movie ratings from 480k users
  • Amazon Reviews — 233M reviews with product metadata across categories