P05: Recommendation Engine with Matrix Factorization

Build a movie/music recommendation system that uses matrix factorization to predict user preferences from sparse ratings data.

Overview

Attribute	Value
Difficulty	Intermediate
Time Estimate	1-2 weeks
Language	Python (recommended) or C
Prerequisites	Basic matrix operations, understanding of SVD (P02)
Primary Book	“Data Science for Business” by Provost & Fawcett

Learning Objectives

By completing this project, you will:

See sparse matrices in action - Most entries unknown, but patterns exist
Master matrix factorization - Decompose user-item interactions
Understand latent factors - Hidden dimensions of taste
Apply optimization - Gradient descent for factorization
Implement similarity measures - Dot products and cosine similarity
Connect math to real products - How Netflix, Spotify, Amazon work

Theoretical Foundation

Part 1: The Recommendation Problem

What We’re Trying to Solve

┌────────────────────────────────────────────────────────────────┐
│                THE RATING MATRIX                               │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│                    Movies                                      │
│              M₁   M₂   M₃   M₄   M₅   M₆                       │
│         U₁ [ 5    3    ?    1    ?    ? ]                      │
│         U₂ [ 4    ?    ?    1    ?    ? ]                      │
│  Users  U₃ [ 1    1    ?    5    ?    4 ]                      │
│         U₄ [ ?    ?    5    4    ?    ? ]                      │
│         U₅ [ ?    ?    4    ?    ?    5 ]                      │
│                                                                │
│  ? = unknown rating (user hasn't seen movie)                   │
│  Numbers = 1-5 star ratings                                    │
│                                                                │
│  Goal: Predict the ? values                                    │
│        Recommend highest predicted ratings                     │
│                                                                │
└────────────────────────────────────────────────────────────────┘

The Rating Matrix - User-Item Interaction Matrix

The matrix is sparse: Netflix has millions of users and thousands of movies, but each user has rated perhaps 100 movies (0.1% filled).

The Core Insight

Users who like similar things probably have similar taste. Items liked by similar users are probably similar.

If U₁ and U₂ both love action movies and hate romances, and U₁ rated M₃ highly, U₂ will probably like M₃ too.

Part 2: Matrix Factorization Intuition

The Latent Factor Model

Assume ratings are generated by hidden (latent) factors:

┌────────────────────────────────────────────────────────────────┐
│                LATENT FACTOR MODEL                             │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│   Hidden factors might represent:                              │
│   - Factor 1: Action vs Romance                                │
│   - Factor 2: Serious vs Comedy                                │
│   - Factor 3: Old vs New                                       │
│   - Factor 4: Hollywood vs Indie                               │
│   etc. (we don't label them, they're learned!)                │
│                                                                │
│   Each user has preferences for each factor:                   │
│   User 1: loves action (+0.9), hates romance (-0.7), ...      │
│                                                                │
│   Each movie has loadings on each factor:                      │
│   Movie "Die Hard": high action (+0.95), low romance (-0.8)   │
│                                                                │
│   Predicted rating = user preferences · movie loadings        │
│                    = Σ (user_factor_k × movie_factor_k)        │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Latent Factor Model - Hidden Dimensions of User Preferences

The Factorization

We factor the rating matrix R into two smaller matrices:

R ≈ P × Qᵀ

Where:
- R is m×n (m users, n items)
- P is m×k (m users, k factors) — user feature matrix
- Q is n×k (n items, k factors) — item feature matrix
- k << min(m, n) — typically 10-200 factors

User i's predicted rating for item j:
r̂ᵢⱼ = pᵢ · qⱼ = Σₖ pᵢₖ × qⱼₖ

This is a dot product of user and item vectors!

Geometrically:

┌────────────────────────────────────────────────────────────────┐
│              USERS AND ITEMS IN LATENT SPACE                   │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│    Factor 2 (comedy)                                           │
│        ↑                                                       │
│        │     • Comedy Movie A                                  │
│        │   ╱  • User who loves comedies                       │
│        │ ╱                                                     │
│        │╱  • User who likes action comedies                   │
│    ────●────────────────────→ Factor 1 (action)               │
│       ╱│╲                                                      │
│      ╱ │ ╲                                                     │
│     •  │  • Action Movie B                                    │
│   User who   │                                                 │
│   loves action                                                │
│                                                                │
│   High rating when user and item vectors point same direction │
│   (large positive dot product)                                │
│                                                                │
│   Low rating when they point opposite directions              │
│   (large negative dot product)                                │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Users and Items in Latent Factor Space

Part 3: Learning the Factorization

The Objective Function

Find P and Q that minimize reconstruction error on known ratings:

minimize  Σ_(i,j)∈known [(rᵢⱼ - pᵢ·qⱼ)²] + λ(||P||² + ||Q||²)
             ↑                               ↑
        reconstruction error           regularization
                                    (prevent overfitting)

This is a non-convex optimization problem!

Stochastic Gradient Descent (SGD)

For each known rating rᵢⱼ:

# Compute error
prediction = dot(P[i], Q[j])
error = r[i,j] - prediction

# Update user and item factors
P[i] += learning_rate * (error * Q[j] - regularization * P[i])
Q[j] += learning_rate * (error * P[i] - regularization * Q[j])

Why this works:

If we under-predict (error > 0), move P[i] and Q[j] closer together
If we over-predict (error < 0), move them apart
Regularization shrinks factors toward zero (prevents huge values)

Alternating Least Squares (ALS)

Alternative approach: fix one matrix, optimize the other analytically:

Fix P, solve for Q (least squares problem)
Fix Q, solve for P (least squares problem)
Repeat until convergence

Each step is a linear regression! More parallelizable than SGD.

Part 4: Handling Sparsity

The Challenge

Netflix has:
- ~200 million users
- ~15,000 movies
- = 3 trillion possible ratings
- But only ~5 billion actual ratings (0.17% filled!)

We can't store or operate on a dense 200M × 15K matrix.

Sparse Matrix Representation

Store only non-zero (known) values:

# Coordinate format (COO)
ratings = [
    (user_id=0, item_id=2, rating=4.5),
    (user_id=0, item_id=7, rating=3.0),
    (user_id=1, item_id=2, rating=5.0),
    ...
]

# Compressed Sparse Row (CSR) - efficient for row slicing
# Or Compressed Sparse Column (CSC) - efficient for column slicing

Training on Sparse Data

We only compute loss and updates for known ratings:

for (i, j, rating) in training_data:
    pred = dot(P[i], Q[j])
    error = rating - pred
    # Update only P[i] and Q[j]

This is efficient — we never touch the billions of unknown entries.

Part 5: Biases and Extensions

User and Item Biases

Some users rate everything high; some movies are universally loved:

r̂ᵢⱼ = μ + bᵤᵢ + bᵢⱼ + pᵢ · qⱼ

Where:
- μ = global average rating (~3.5 for 1-5 scale)
- bᵤᵢ = user i's bias (how much they deviate from average)
- bᵢⱼ = item j's bias (how much it deviates from average)
- pᵢ · qⱼ = interaction term (personalized component)

Example:

Global average: 3.5
User "harsh critic": bias = -1.0 (rates 1 star lower than average)
Movie "The Godfather": bias = +1.5 (rated 1.5 stars higher)
Interaction: +0.3 (this user especially likes this genre)

Prediction: 3.5 + (-1.0) + 1.5 + 0.3 = 4.3 stars

Implicit Feedback

Not just explicit ratings — consider views, clicks, purchases:

R = explicit ratings (sparse)
C = confidence matrix (dense, based on implicit data)

Higher confidence for items with more interaction:
c_ij = 1 + α × (number of times user i interacted with item j)

This is the basis of ALS for Implicit Feedback (iALS).

Part 6: Similarity Measures

Cosine Similarity

How similar are two vectors? Measure the angle between them:

cosine(u, v) = (u · v) / (||u|| × ||v||)

Range: [-1, 1]
- 1: identical direction (perfectly similar)
- 0: perpendicular (unrelated)
- -1: opposite (perfectly dissimilar)

For recommendation:

def find_similar_items(item_id, Q, k=10):
    """Find k most similar items based on latent factors."""
    target = Q[item_id]
    similarities = []
    for other_id in range(len(Q)):
        if other_id != item_id:
            sim = cosine_similarity(target, Q[other_id])
            similarities.append((other_id, sim))
    return sorted(similarities, key=lambda x: -x[1])[:k]

Euclidean Distance

Alternative: measure straight-line distance:

distance(u, v) = ||u - v|| = √Σ(uᵢ - vᵢ)²

Closer = more similar
Can convert to similarity: sim = 1 / (1 + distance)

When to Use Which

Cosine similarity:
- Ignores magnitude (user who rates 1-5 similar to user who rates 3-5)
- Good when direction matters more than scale

Euclidean distance:
- Sensitive to magnitude
- Good when absolute values matter

Pearson correlation:
- Cosine similarity of mean-centered vectors
- Handles different rating scales

Part 7: Evaluation Metrics

Root Mean Square Error (RMSE)

RMSE = √[(1/n) × Σ(rᵢⱼ - r̂ᵢⱼ)²]

Lower is better. On 1-5 scale:
- RMSE < 0.8: Excellent
- RMSE 0.8-0.9: Good
- RMSE 0.9-1.0: Acceptable
- RMSE > 1.0: Poor

Mean Absolute Error (MAE)

MAE = (1/n) × Σ|rᵢⱼ - r̂ᵢⱼ|

Easier to interpret: "on average, predictions are off by X stars"

Ranking Metrics

For recommendations, ranking often matters more than exact ratings:

Precision@K: Of top K recommended items, how many are relevant?
Recall@K:    Of all relevant items, how many are in top K?
NDCG:        Normalized Discounted Cumulative Gain (position-aware)

Train/Test Split

Critical: Don’t leak future data!

# Random split
train, test = random_split(ratings, 0.8)

# Temporal split (more realistic)
train = ratings[ratings['timestamp'] < cutoff]
test = ratings[ratings['timestamp'] >= cutoff]

# Leave-one-out
For each user, hold out one rating for testing

Part 8: Cold Start Problem

The Challenge

New users or new items have no ratings — how do we recommend?

New User:
- No ratings → no user vector P[i]
- Can't compute pᵢ · qⱼ for any item

New Item:
- No ratings → no item vector Q[j]
- Can't compute pᵢ · qⱼ for any user

Solutions

For cold-start users:

Ask for initial preferences (onboarding)
Use demographic features (age, location)
Show popular items until enough data

For cold-start items:

Use content features (genre, actors, description)
Show to diverse users for exploration
Hybrid: combine collaborative filtering with content-based

Content-augmented factorization:

pᵢ = P[i] + Σ(user features)
qⱼ = Q[j] + Σ(item features)

Now even new items have a representation from their features!

Part 9: Connection to SVD

Relationship

Matrix factorization for recommendations is related to SVD:

SVD:  R = U × Σ × Vᵀ    (exact decomposition)

MF:   R ≈ P × Qᵀ        (approximate, learns from data)

P roughly corresponds to U × √Σ
Q roughly corresponds to V × √Σ

Key differences:

SVD requires knowing all entries (or imputation)
MF optimizes only on known entries
MF can incorporate regularization, biases, etc.
MF is directly optimized for prediction, not reconstruction

Truncated SVD for Recommendation

You can use SVD for recommendations:

Fill unknown entries (with zeros, mean, or imputed values)
Compute SVD
Keep top k singular values/vectors
Predict with truncated matrices

But this is less effective than training directly on the sparse loss.

Project Specification

What You’re Building

A recommendation system that:

Loads user-item rating data
Learns latent factor representations
Predicts ratings for unseen items
Recommends top-N items for each user
Evaluates predictions with RMSE/MAE

Minimum Requirements

Load and parse MovieLens dataset
Implement matrix factorization with SGD
Include user and item biases
Regularization to prevent overfitting
Train/test split for evaluation
Report RMSE on test set
Generate top-N recommendations for users

Stretch Goals

Implement ALS as alternative training method
Add time-aware recommendations
Implement content-based features (genres)
Build simple web interface
Compare with baseline methods

Solution Architecture

Data Structures

class MatrixFactorization:
    def __init__(self, n_users: int, n_items: int, n_factors: int):
        # Latent factor matrices
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))

        # Biases
        self.user_bias = np.zeros(n_users)
        self.item_bias = np.zeros(n_items)
        self.global_mean = 0

    def predict(self, user_id: int, item_id: int) -> float:
        """Predict rating for user-item pair."""
        pass

    def fit(self, ratings: list, epochs: int, lr: float, reg: float):
        """Train on list of (user, item, rating) tuples."""
        pass

    def recommend(self, user_id: int, n: int = 10) -> list:
        """Get top N recommendations for user."""
        pass

Module Structure

┌─────────────────────────────────────────────────────────────────┐
│                           main.py                               │
│  - Load data                                                    │
│  - Train model                                                  │
│  - Evaluate and visualize                                       │
└─────────────────────────────────────────────────────────────────┘
                               │
        ┌──────────────────────┼──────────────────────┐
        ▼                      ▼                      ▼
┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│   data.py    │      │   model.py   │      │  evaluate.py │
│              │      │              │      │              │
│ load_movielens│     │ MatrixFactor │      │ rmse         │
│ train_test_split│   │ predict      │      │ mae          │
│ user_item_matrix│   │ fit_sgd      │      │ precision_at_k│
│              │      │ fit_als      │      │ recall_at_k  │
└──────────────┘      └──────────────┘      └──────────────┘
                               │
                               ▼
                      ┌──────────────┐
                      │  utils.py    │
                      │              │
                      │ cosine_sim   │
                      │ find_similar │
                      │ visualize_factors │
                      └──────────────┘

Recommendation System Module Structure

Training Pipeline

┌─────────────────────────────────────────────────────────────────────────┐
│                      TRAINING PIPELINE                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────┐                                                        │
│  │ Raw Data    │  MovieLens ratings.csv                                │
│  │ (u, i, r, t)│  user_id, movie_id, rating, timestamp                 │
│  └──────┬──────┘                                                        │
│         │                                                               │
│         ▼  Parse and index                                              │
│  ┌─────────────┐                                                        │
│  │ Indexed Data│  (0-indexed user/item IDs)                            │
│  │ + ID maps   │  user_id_to_idx, item_id_to_idx                       │
│  └──────┬──────┘                                                        │
│         │                                                               │
│         ▼  Split                                                        │
│  ┌─────────────────────────────────────────────────────────────┐       │
│  │ Train (80%)           │         Test (20%)                   │       │
│  └──────┬────────────────┴────────────┬────────────────────────┘       │
│         │                              │                                │
│         ▼                              │                                │
│  ┌─────────────┐                       │                                │
│  │ Initialize  │                       │                                │
│  │ P, Q, biases│                       │                                │
│  └──────┬──────┘                       │                                │
│         │                              │                                │
│         ▼  For each epoch              │                                │
│  ┌─────────────┐                       │                                │
│  │ Shuffle     │                       │                                │
│  │ train data  │                       │                                │
│  └──────┬──────┘                       │                                │
│         │                              │                                │
│         ▼  For each (u, i, r)          │                                │
│  ┌─────────────────────────────────────┼─────────────────────────────┐ │
│  │ 1. Predict: r̂ = μ + bᵤ + bᵢ + pᵤ·qᵢ │                             │ │
│  │ 2. Error:   e = r - r̂               │                             │ │
│  │ 3. Update:  pᵤ += lr(e·qᵢ - λpᵤ)    │                             │ │
│  │             qᵢ += lr(e·pᵤ - λqᵢ)    │                             │ │
│  │             bᵤ += lr(e - λbᵤ)       │                             │ │
│  │             bᵢ += lr(e - λbᵢ)       │                             │ │
│  └─────────────────────────────────────┼─────────────────────────────┘ │
│         │                              │                                │
│         ▼  After epoch                 │                                │
│  ┌─────────────┐                       │                                │
│  │ Evaluate on │◀──────────────────────┘                                │
│  │ test set    │                                                        │
│  └──────┬──────┘                                                        │
│         │                                                               │
│         ▼                                                               │
│  ┌─────────────┐                                                        │
│  │ Log RMSE,   │                                                        │
│  │ check for   │                                                        │
│  │ convergence │                                                        │
│  └─────────────┘                                                        │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Recommendation System Training Pipeline

Phased Implementation Guide

Phase 1: Data Loading (Days 1-2)

Goal: Load and explore MovieLens dataset

Tasks:

Download MovieLens 100K or 1M dataset
Parse CSV files (ratings, movies, users)
Create user/item ID mappings
Explore data statistics

Implementation:

def load_movielens(path):
    """Load MovieLens dataset."""
    ratings = pd.read_csv(
        f"{path}/ratings.csv",
        names=['user_id', 'item_id', 'rating', 'timestamp']
    )

    # Create mappings to 0-indexed IDs
    user_ids = ratings['user_id'].unique()
    item_ids = ratings['item_id'].unique()

    user_to_idx = {u: i for i, u in enumerate(user_ids)}
    item_to_idx = {i: j for j, i in enumerate(item_ids)}

    # Convert to tuples
    data = [
        (user_to_idx[r.user_id], item_to_idx[r.item_id], r.rating)
        for r in ratings.itertuples()
    ]

    return data, user_to_idx, item_to_idx

Verification:

Correct number of users/items
Rating distribution looks reasonable (1-5 scale)
No missing or invalid data

Phase 2: Basic Matrix Factorization (Days 3-5)

Goal: Implement core prediction and training

Tasks:

Initialize P and Q matrices randomly
Implement prediction function
Implement SGD update step
Train on small subset first

Implementation:

class MatrixFactorization:
    def __init__(self, n_users, n_items, n_factors=10):
        # Initialize with small random values
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))

    def predict(self, u, i):
        return np.dot(self.P[u], self.Q[i])

    def train_step(self, u, i, r, lr=0.01, reg=0.02):
        pred = self.predict(u, i)
        error = r - pred

        # Gradient descent update
        p_update = lr * (error * self.Q[i] - reg * self.P[u])
        q_update = lr * (error * self.P[u] - reg * self.Q[i])

        self.P[u] += p_update
        self.Q[i] += q_update

        return error ** 2

Verification:

Loss decreases over epochs
Predictions are in reasonable range
No NaN or inf values

Phase 3: Biases and Global Mean (Days 6-7)

Goal: Add bias terms for better predictions

Tasks:

Compute global mean rating
Add user bias array
Add item bias array
Update training to include biases

Implementation:

class MatrixFactorization:
    def __init__(self, n_users, n_items, n_factors=10):
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))
        self.user_bias = np.zeros(n_users)
        self.item_bias = np.zeros(n_items)
        self.global_mean = 0

    def predict(self, u, i):
        return (self.global_mean +
                self.user_bias[u] +
                self.item_bias[i] +
                np.dot(self.P[u], self.Q[i]))

    def train_step(self, u, i, r, lr=0.01, reg=0.02):
        pred = self.predict(u, i)
        error = r - pred

        # Update biases
        self.user_bias[u] += lr * (error - reg * self.user_bias[u])
        self.item_bias[i] += lr * (error - reg * self.item_bias[i])

        # Update factors
        p_update = lr * (error * self.Q[i] - reg * self.P[u])
        q_update = lr * (error * self.P[u] - reg * self.Q[i])
        self.P[u] += p_update
        self.Q[i] += q_update

        return error ** 2

Verification:

Biases capture user/item tendencies
RMSE improves with biases
High-rated items have positive item bias

Phase 4: Evaluation (Days 8-9)

Goal: Properly evaluate model performance

Tasks:

Implement train/test split
Implement RMSE and MAE
Track metrics over epochs
Plot learning curves

Implementation:

def train_test_split(data, test_ratio=0.2):
    """Random split of ratings."""
    np.random.shuffle(data)
    split = int(len(data) * (1 - test_ratio))
    return data[:split], data[split:]

def evaluate(model, test_data):
    """Compute RMSE on test set."""
    squared_errors = []
    for u, i, r in test_data:
        pred = model.predict(u, i)
        squared_errors.append((r - pred) ** 2)
    return np.sqrt(np.mean(squared_errors))

def train(model, train_data, test_data, epochs=20, lr=0.01, reg=0.02):
    """Training loop with evaluation."""
    for epoch in range(epochs):
        np.random.shuffle(train_data)
        total_loss = 0

        for u, i, r in train_data:
            total_loss += model.train_step(u, i, r, lr, reg)

        train_rmse = np.sqrt(total_loss / len(train_data))
        test_rmse = evaluate(model, test_data)
        print(f"Epoch {epoch}: Train RMSE={train_rmse:.4f}, Test RMSE={test_rmse:.4f}")

Verification:

Test RMSE is reasonable (< 1.0)
Training loss decreases
No overfitting (test RMSE doesn’t increase much)

Phase 5: Recommendation Generation (Days 10-11)

Goal: Generate personalized recommendations

Tasks:

Find items user hasn’t rated
Predict ratings for all unseen items
Return top-N recommendations
Show movie titles with predictions

Implementation:

def get_recommendations(model, user_id, user_rated_items, all_items, n=10):
    """Get top N recommendations for user."""
    unseen = set(all_items) - set(user_rated_items[user_id])

    predictions = []
    for item_id in unseen:
        pred = model.predict(user_id, item_id)
        predictions.append((item_id, pred))

    # Sort by predicted rating, descending
    predictions.sort(key=lambda x: -x[1])

    return predictions[:n]

# Usage
recs = get_recommendations(model, user_id=42, user_rated_items, all_items, n=10)
for item_id, pred_rating in recs:
    print(f"{movie_titles[item_id]}: {pred_rating:.2f} stars (predicted)")

Verification:

Recommendations make intuitive sense
User’s highly-rated genres appear in recommendations
No already-rated items in recommendations

Phase 6: Similarity and Exploration (Days 12-14)

Goal: Find similar items and interpret factors

Tasks:

Implement cosine similarity
Find similar items/users
Visualize latent factors
Interpret what factors might mean

Implementation:

def cosine_similarity(v1, v2):
    """Compute cosine similarity between two vectors."""
    dot = np.dot(v1, v2)
    norm1 = np.linalg.norm(v1)
    norm2 = np.linalg.norm(v2)
    return dot / (norm1 * norm2 + 1e-8)

def find_similar_items(model, item_id, k=10):
    """Find k most similar items based on latent factors."""
    target = model.Q[item_id]
    similarities = []

    for other_id in range(len(model.Q)):
        if other_id != item_id:
            sim = cosine_similarity(target, model.Q[other_id])
            similarities.append((other_id, sim))

    return sorted(similarities, key=lambda x: -x[1])[:k]

def visualize_factors(model, item_ids, item_names, factors=[0, 1]):
    """Plot items in 2D factor space."""
    plt.figure(figsize=(10, 8))
    for item_id in item_ids:
        x = model.Q[item_id, factors[0]]
        y = model.Q[item_id, factors[1]]
        plt.scatter(x, y)
        plt.annotate(item_names[item_id][:20], (x, y))
    plt.xlabel(f"Factor {factors[0]}")
    plt.ylabel(f"Factor {factors[1]}")
    plt.show()

Verification:

Similar items by similarity are sensibly related
Factors show clustering by genre
Visualization reveals interpretable structure

Testing Strategy

Unit Tests

Data loading:
- Correct number of ratings loaded
- All ratings in valid range (1-5)
- ID mappings are bijective
Model components:
- Prediction is dot product plus biases
- Updates change P and Q
- Regularization shrinks factors
Metrics:
- RMSE of constant prediction equals std deviation
- Perfect predictions give RMSE of 0

Integration Tests

Overfitting test: Can model achieve near-zero loss on small dataset?
Bias test: Does adding biases improve RMSE?
Regularization test: Does reg prevent overfitting?

Sanity Checks

Known item similarity: Do “Star Wars” and “Empire Strikes Back” have high similarity?
User consistency: If user rates action movies high, do action movies appear in recommendations?
Extreme predictions: Are all predictions in reasonable range (1-5)?

Common Pitfalls and Debugging

Learning Rate Issues

Symptom: Loss oscillates or diverges

Cause: Learning rate too high

Fix: Start with lr=0.001, increase gradually

Regularization Issues

Symptom: Test RMSE much higher than train RMSE

Cause: Overfitting (regularization too low)

Fix: Increase λ (try 0.02, 0.05, 0.1)

Cold Start Errors

Symptom: Errors on new users/items

Cause: User/item not in training data

Fix: Use fallback (global mean, popular items)

Numerical Issues

Symptom: NaN in predictions or loss

Cause: Division by zero, overflow

Fix: Add epsilon to denominators, clip values

Slow Training

Symptom: Hours to train on MovieLens 1M

Cause: Pure Python loops

Fix: Vectorize, use NumPy, or implement in C

Extensions and Challenges

Beginner Extensions

Tune hyperparameters (factors, lr, reg)
Add early stopping
Save/load trained model

Intermediate Extensions

Implement ALS instead of SGD
Add temporal dynamics (ratings change over time)
Incorporate item genres as features
Build a simple web demo

Advanced Extensions

Implement implicit feedback model
Add side information (user demographics)
Neural collaborative filtering
A/B testing simulation

Real-World Connections

Netflix

The Netflix Prize (2006-2009) popularized matrix factorization:

$1 million for 10% improvement in RMSE
Winning solution: ensemble of many models, MF as core
Showed practical value of latent factor models

Spotify

Uses collaborative filtering for:

Discover Weekly playlist
Related artists
Radio stations

Plus content features (audio analysis).

Amazon

“Customers who bought X also bought Y”:

Item-item collaborative filtering
Fast to compute (precomputed similarities)
Core of product recommendation

YouTube

Deep learning + collaborative filtering:

Candidate generation (MF-like)
Ranking (deep neural network)
Billions of users and videos

Self-Assessment Checklist

Conceptual Understanding

I can explain what latent factors represent
I understand why the dot product measures preference
I know how biases improve predictions
I can explain the cold start problem
I understand the tradeoff between bias and variance

Practical Skills

I implemented matrix factorization with SGD
My model achieves reasonable RMSE (< 0.95 on MovieLens)
I can generate and interpret recommendations
I can find similar items using learned factors
I can tune hyperparameters effectively

Teaching Test

Can you explain to someone else:

Why do we factorize the rating matrix?
What do the user and item vectors represent?
How does training work?
Why is regularization important?

Resources

Primary

“Data Science for Business” by Provost & Fawcett - Business context
“Recommender Systems Handbook” - Comprehensive reference
Netflix Prize papers - Original inspiration

Reference

Surprise library - Python recommendation toolkit
LightFM - Hybrid recommendation library
MovieLens datasets - Standard benchmarks

Online

Google’s Recommendation ML course - Practical guide
Koren et al. “Matrix Factorization Techniques” - Classic paper
Coursera: Recommender Systems Specialization - Academic depth

When you complete this project, you’ll understand that recommendation systems are applied linear algebra—users and items as vectors, preferences as dot products, learning as optimization. Every time Netflix suggests a movie, this math is running behind the scenes.