P05: Recommendation Engine with Matrix Factorization

P05: Recommendation Engine with Matrix Factorization

Build a movie/music recommendation system that uses matrix factorization to predict user preferences from sparse ratings data.


Overview

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language Python (recommended) or C
Prerequisites Basic matrix operations, understanding of SVD (P02)
Primary Book โ€œData Science for Businessโ€ by Provost & Fawcett

Learning Objectives

By completing this project, you will:

  1. See sparse matrices in action - Most entries unknown, but patterns exist
  2. Master matrix factorization - Decompose user-item interactions
  3. Understand latent factors - Hidden dimensions of taste
  4. Apply optimization - Gradient descent for factorization
  5. Implement similarity measures - Dot products and cosine similarity
  6. Connect math to real products - How Netflix, Spotify, Amazon work

Theoretical Foundation

Part 1: The Recommendation Problem

What Weโ€™re Trying to Solve

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                THE RATING MATRIX                               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                โ”‚
โ”‚                    Movies                                      โ”‚
โ”‚              Mโ‚   Mโ‚‚   Mโ‚ƒ   Mโ‚„   Mโ‚…   Mโ‚†                       โ”‚
โ”‚         Uโ‚ [ 5    3    ?    1    ?    ? ]                      โ”‚
โ”‚         Uโ‚‚ [ 4    ?    ?    1    ?    ? ]                      โ”‚
โ”‚  Users  Uโ‚ƒ [ 1    1    ?    5    ?    4 ]                      โ”‚
โ”‚         Uโ‚„ [ ?    ?    5    4    ?    ? ]                      โ”‚
โ”‚         Uโ‚… [ ?    ?    4    ?    ?    5 ]                      โ”‚
โ”‚                                                                โ”‚
โ”‚  ? = unknown rating (user hasn't seen movie)                   โ”‚
โ”‚  Numbers = 1-5 star ratings                                    โ”‚
โ”‚                                                                โ”‚
โ”‚  Goal: Predict the ? values                                    โ”‚
โ”‚        Recommend highest predicted ratings                     โ”‚
โ”‚                                                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The Rating Matrix - User-Item Interaction Matrix

The matrix is sparse: Netflix has millions of users and thousands of movies, but each user has rated perhaps 100 movies (0.1% filled).

The Core Insight

Users who like similar things probably have similar taste. Items liked by similar users are probably similar.

If Uโ‚ and Uโ‚‚ both love action movies and hate romances, and Uโ‚ rated Mโ‚ƒ highly, Uโ‚‚ will probably like Mโ‚ƒ too.


Part 2: Matrix Factorization Intuition

The Latent Factor Model

Assume ratings are generated by hidden (latent) factors:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                LATENT FACTOR MODEL                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                โ”‚
โ”‚   Hidden factors might represent:                              โ”‚
โ”‚   - Factor 1: Action vs Romance                                โ”‚
โ”‚   - Factor 2: Serious vs Comedy                                โ”‚
โ”‚   - Factor 3: Old vs New                                       โ”‚
โ”‚   - Factor 4: Hollywood vs Indie                               โ”‚
โ”‚   etc. (we don't label them, they're learned!)                โ”‚
โ”‚                                                                โ”‚
โ”‚   Each user has preferences for each factor:                   โ”‚
โ”‚   User 1: loves action (+0.9), hates romance (-0.7), ...      โ”‚
โ”‚                                                                โ”‚
โ”‚   Each movie has loadings on each factor:                      โ”‚
โ”‚   Movie "Die Hard": high action (+0.95), low romance (-0.8)   โ”‚
โ”‚                                                                โ”‚
โ”‚   Predicted rating = user preferences ยท movie loadings        โ”‚
โ”‚                    = ฮฃ (user_factor_k ร— movie_factor_k)        โ”‚
โ”‚                                                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Latent Factor Model - Hidden Dimensions of User Preferences

The Factorization

We factor the rating matrix R into two smaller matrices:

R โ‰ˆ P ร— Qแต€

Where:
- R is mร—n (m users, n items)
- P is mร—k (m users, k factors) โ€” user feature matrix
- Q is nร—k (n items, k factors) โ€” item feature matrix
- k << min(m, n) โ€” typically 10-200 factors

User i's predicted rating for item j:
rฬ‚แตขโฑผ = pแตข ยท qโฑผ = ฮฃโ‚– pแตขโ‚– ร— qโฑผโ‚–

This is a dot product of user and item vectors!

Geometrically:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              USERS AND ITEMS IN LATENT SPACE                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                โ”‚
โ”‚    Factor 2 (comedy)                                           โ”‚
โ”‚        โ†‘                                                       โ”‚
โ”‚        โ”‚     โ€ข Comedy Movie A                                  โ”‚
โ”‚        โ”‚   โ•ฑ  โ€ข User who loves comedies                       โ”‚
โ”‚        โ”‚ โ•ฑ                                                     โ”‚
โ”‚        โ”‚โ•ฑ  โ€ข User who likes action comedies                   โ”‚
โ”‚    โ”€โ”€โ”€โ”€โ—โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ Factor 1 (action)               โ”‚
โ”‚       โ•ฑโ”‚โ•ฒ                                                      โ”‚
โ”‚      โ•ฑ โ”‚ โ•ฒ                                                     โ”‚
โ”‚     โ€ข  โ”‚  โ€ข Action Movie B                                    โ”‚
โ”‚   User who   โ”‚                                                 โ”‚
โ”‚   loves action                                                โ”‚
โ”‚                                                                โ”‚
โ”‚   High rating when user and item vectors point same direction โ”‚
โ”‚   (large positive dot product)                                โ”‚
โ”‚                                                                โ”‚
โ”‚   Low rating when they point opposite directions              โ”‚
โ”‚   (large negative dot product)                                โ”‚
โ”‚                                                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Users and Items in Latent Factor Space


Part 3: Learning the Factorization

The Objective Function

Find P and Q that minimize reconstruction error on known ratings:

minimize  ฮฃ_(i,j)โˆˆknown [(rแตขโฑผ - pแตขยทqโฑผ)ยฒ] + ฮป(||P||ยฒ + ||Q||ยฒ)
             โ†‘                               โ†‘
        reconstruction error           regularization
                                    (prevent overfitting)

This is a non-convex optimization problem!

Stochastic Gradient Descent (SGD)

For each known rating rแตขโฑผ:

# Compute error
prediction = dot(P[i], Q[j])
error = r[i,j] - prediction

# Update user and item factors
P[i] += learning_rate * (error * Q[j] - regularization * P[i])
Q[j] += learning_rate * (error * P[i] - regularization * Q[j])

Why this works:

  • If we under-predict (error > 0), move P[i] and Q[j] closer together
  • If we over-predict (error < 0), move them apart
  • Regularization shrinks factors toward zero (prevents huge values)

Alternating Least Squares (ALS)

Alternative approach: fix one matrix, optimize the other analytically:

1. Fix P, solve for Q (least squares problem)
2. Fix Q, solve for P (least squares problem)
3. Repeat until convergence

Each step is a linear regression! More parallelizable than SGD.


Part 4: Handling Sparsity

The Challenge

Netflix has:
- ~200 million users
- ~15,000 movies
- = 3 trillion possible ratings
- But only ~5 billion actual ratings (0.17% filled!)

We can't store or operate on a dense 200M ร— 15K matrix.

Sparse Matrix Representation

Store only non-zero (known) values:

# Coordinate format (COO)
ratings = [
    (user_id=0, item_id=2, rating=4.5),
    (user_id=0, item_id=7, rating=3.0),
    (user_id=1, item_id=2, rating=5.0),
    ...
]

# Compressed Sparse Row (CSR) - efficient for row slicing
# Or Compressed Sparse Column (CSC) - efficient for column slicing

Training on Sparse Data

We only compute loss and updates for known ratings:

for (i, j, rating) in training_data:
    pred = dot(P[i], Q[j])
    error = rating - pred
    # Update only P[i] and Q[j]

This is efficient โ€” we never touch the billions of unknown entries.


Part 5: Biases and Extensions

User and Item Biases

Some users rate everything high; some movies are universally loved:

rฬ‚แตขโฑผ = ฮผ + bแตคแตข + bแตขโฑผ + pแตข ยท qโฑผ

Where:
- ฮผ = global average rating (~3.5 for 1-5 scale)
- bแตคแตข = user i's bias (how much they deviate from average)
- bแตขโฑผ = item j's bias (how much it deviates from average)
- pแตข ยท qโฑผ = interaction term (personalized component)

Example:

Global average: 3.5
User "harsh critic": bias = -1.0 (rates 1 star lower than average)
Movie "The Godfather": bias = +1.5 (rated 1.5 stars higher)
Interaction: +0.3 (this user especially likes this genre)

Prediction: 3.5 + (-1.0) + 1.5 + 0.3 = 4.3 stars

Implicit Feedback

Not just explicit ratings โ€” consider views, clicks, purchases:

R = explicit ratings (sparse)
C = confidence matrix (dense, based on implicit data)

Higher confidence for items with more interaction:
c_ij = 1 + ฮฑ ร— (number of times user i interacted with item j)

This is the basis of ALS for Implicit Feedback (iALS).


Part 6: Similarity Measures

Cosine Similarity

How similar are two vectors? Measure the angle between them:

cosine(u, v) = (u ยท v) / (||u|| ร— ||v||)

Range: [-1, 1]
- 1: identical direction (perfectly similar)
- 0: perpendicular (unrelated)
- -1: opposite (perfectly dissimilar)

For recommendation:

def find_similar_items(item_id, Q, k=10):
    """Find k most similar items based on latent factors."""
    target = Q[item_id]
    similarities = []
    for other_id in range(len(Q)):
        if other_id != item_id:
            sim = cosine_similarity(target, Q[other_id])
            similarities.append((other_id, sim))
    return sorted(similarities, key=lambda x: -x[1])[:k]

Euclidean Distance

Alternative: measure straight-line distance:

distance(u, v) = ||u - v|| = โˆšฮฃ(uแตข - vแตข)ยฒ

Closer = more similar
Can convert to similarity: sim = 1 / (1 + distance)

When to Use Which

Cosine similarity:
- Ignores magnitude (user who rates 1-5 similar to user who rates 3-5)
- Good when direction matters more than scale

Euclidean distance:
- Sensitive to magnitude
- Good when absolute values matter

Pearson correlation:
- Cosine similarity of mean-centered vectors
- Handles different rating scales

Part 7: Evaluation Metrics

Root Mean Square Error (RMSE)

RMSE = โˆš[(1/n) ร— ฮฃ(rแตขโฑผ - rฬ‚แตขโฑผ)ยฒ]

Lower is better. On 1-5 scale:
- RMSE < 0.8: Excellent
- RMSE 0.8-0.9: Good
- RMSE 0.9-1.0: Acceptable
- RMSE > 1.0: Poor

Mean Absolute Error (MAE)

MAE = (1/n) ร— ฮฃ|rแตขโฑผ - rฬ‚แตขโฑผ|

Easier to interpret: "on average, predictions are off by X stars"

Ranking Metrics

For recommendations, ranking often matters more than exact ratings:

Precision@K: Of top K recommended items, how many are relevant?
Recall@K:    Of all relevant items, how many are in top K?
NDCG:        Normalized Discounted Cumulative Gain (position-aware)

Train/Test Split

Critical: Donโ€™t leak future data!

# Random split
train, test = random_split(ratings, 0.8)

# Temporal split (more realistic)
train = ratings[ratings['timestamp'] < cutoff]
test = ratings[ratings['timestamp'] >= cutoff]

# Leave-one-out
For each user, hold out one rating for testing

Part 8: Cold Start Problem

The Challenge

New users or new items have no ratings โ€” how do we recommend?

New User:
- No ratings โ†’ no user vector P[i]
- Can't compute pแตข ยท qโฑผ for any item

New Item:
- No ratings โ†’ no item vector Q[j]
- Can't compute pแตข ยท qโฑผ for any user

Solutions

For cold-start users:

  1. Ask for initial preferences (onboarding)
  2. Use demographic features (age, location)
  3. Show popular items until enough data

For cold-start items:

  1. Use content features (genre, actors, description)
  2. Show to diverse users for exploration
  3. Hybrid: combine collaborative filtering with content-based

Content-augmented factorization:

pแตข = P[i] + ฮฃ(user features)
qโฑผ = Q[j] + ฮฃ(item features)

Now even new items have a representation from their features!

Part 9: Connection to SVD

Relationship

Matrix factorization for recommendations is related to SVD:

SVD:  R = U ร— ฮฃ ร— Vแต€    (exact decomposition)

MF:   R โ‰ˆ P ร— Qแต€        (approximate, learns from data)

P roughly corresponds to U ร— โˆšฮฃ
Q roughly corresponds to V ร— โˆšฮฃ

Key differences:

  • SVD requires knowing all entries (or imputation)
  • MF optimizes only on known entries
  • MF can incorporate regularization, biases, etc.
  • MF is directly optimized for prediction, not reconstruction

Truncated SVD for Recommendation

You can use SVD for recommendations:

  1. Fill unknown entries (with zeros, mean, or imputed values)
  2. Compute SVD
  3. Keep top k singular values/vectors
  4. Predict with truncated matrices

But this is less effective than training directly on the sparse loss.


Project Specification

What Youโ€™re Building

A recommendation system that:

  1. Loads user-item rating data
  2. Learns latent factor representations
  3. Predicts ratings for unseen items
  4. Recommends top-N items for each user
  5. Evaluates predictions with RMSE/MAE

Minimum Requirements

  • Load and parse MovieLens dataset
  • Implement matrix factorization with SGD
  • Include user and item biases
  • Regularization to prevent overfitting
  • Train/test split for evaluation
  • Report RMSE on test set
  • Generate top-N recommendations for users

Stretch Goals

  • Implement ALS as alternative training method
  • Add time-aware recommendations
  • Implement content-based features (genres)
  • Build simple web interface
  • Compare with baseline methods

Solution Architecture

Data Structures

class MatrixFactorization:
    def __init__(self, n_users: int, n_items: int, n_factors: int):
        # Latent factor matrices
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))

        # Biases
        self.user_bias = np.zeros(n_users)
        self.item_bias = np.zeros(n_items)
        self.global_mean = 0

    def predict(self, user_id: int, item_id: int) -> float:
        """Predict rating for user-item pair."""
        pass

    def fit(self, ratings: list, epochs: int, lr: float, reg: float):
        """Train on list of (user, item, rating) tuples."""
        pass

    def recommend(self, user_id: int, n: int = 10) -> list:
        """Get top N recommendations for user."""
        pass

Module Structure

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                           main.py                               โ”‚
โ”‚  - Load data                                                    โ”‚
โ”‚  - Train model                                                  โ”‚
โ”‚  - Evaluate and visualize                                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ–ผ                      โ–ผ                      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   data.py    โ”‚      โ”‚   model.py   โ”‚      โ”‚  evaluate.py โ”‚
โ”‚              โ”‚      โ”‚              โ”‚      โ”‚              โ”‚
โ”‚ load_movielensโ”‚     โ”‚ MatrixFactor โ”‚      โ”‚ rmse         โ”‚
โ”‚ train_test_splitโ”‚   โ”‚ predict      โ”‚      โ”‚ mae          โ”‚
โ”‚ user_item_matrixโ”‚   โ”‚ fit_sgd      โ”‚      โ”‚ precision_at_kโ”‚
โ”‚              โ”‚      โ”‚ fit_als      โ”‚      โ”‚ recall_at_k  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                               โ–ผ
                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                      โ”‚  utils.py    โ”‚
                      โ”‚              โ”‚
                      โ”‚ cosine_sim   โ”‚
                      โ”‚ find_similar โ”‚
                      โ”‚ visualize_factors โ”‚
                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Recommendation System Module Structure

Training Pipeline

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      TRAINING PIPELINE                                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                        โ”‚
โ”‚  โ”‚ Raw Data    โ”‚  MovieLens ratings.csv                                โ”‚
โ”‚  โ”‚ (u, i, r, t)โ”‚  user_id, movie_id, rating, timestamp                 โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                        โ”‚
โ”‚         โ”‚                                                               โ”‚
โ”‚         โ–ผ  Parse and index                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                        โ”‚
โ”‚  โ”‚ Indexed Dataโ”‚  (0-indexed user/item IDs)                            โ”‚
โ”‚  โ”‚ + ID maps   โ”‚  user_id_to_idx, item_id_to_idx                       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                        โ”‚
โ”‚         โ”‚                                                               โ”‚
โ”‚         โ–ผ  Split                                                        โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚ Train (80%)           โ”‚         Test (20%)                   โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚         โ”‚                              โ”‚                                โ”‚
โ”‚         โ–ผ                              โ”‚                                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚                                โ”‚
โ”‚  โ”‚ Initialize  โ”‚                       โ”‚                                โ”‚
โ”‚  โ”‚ P, Q, biasesโ”‚                       โ”‚                                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ”‚                                โ”‚
โ”‚         โ”‚                              โ”‚                                โ”‚
โ”‚         โ–ผ  For each epoch              โ”‚                                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚                                โ”‚
โ”‚  โ”‚ Shuffle     โ”‚                       โ”‚                                โ”‚
โ”‚  โ”‚ train data  โ”‚                       โ”‚                                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ”‚                                โ”‚
โ”‚         โ”‚                              โ”‚                                โ”‚
โ”‚         โ–ผ  For each (u, i, r)          โ”‚                                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ 1. Predict: rฬ‚ = ฮผ + bแตค + bแตข + pแตคยทqแตข โ”‚                             โ”‚ โ”‚
โ”‚  โ”‚ 2. Error:   e = r - rฬ‚               โ”‚                             โ”‚ โ”‚
โ”‚  โ”‚ 3. Update:  pแตค += lr(eยทqแตข - ฮปpแตค)    โ”‚                             โ”‚ โ”‚
โ”‚  โ”‚             qแตข += lr(eยทpแตค - ฮปqแตข)    โ”‚                             โ”‚ โ”‚
โ”‚  โ”‚             bแตค += lr(e - ฮปbแตค)       โ”‚                             โ”‚ โ”‚
โ”‚  โ”‚             bแตข += lr(e - ฮปbแตข)       โ”‚                             โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚         โ”‚                              โ”‚                                โ”‚
โ”‚         โ–ผ  After epoch                 โ”‚                                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚                                โ”‚
โ”‚  โ”‚ Evaluate on โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                โ”‚
โ”‚  โ”‚ test set    โ”‚                                                        โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                        โ”‚
โ”‚         โ”‚                                                               โ”‚
โ”‚         โ–ผ                                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                        โ”‚
โ”‚  โ”‚ Log RMSE,   โ”‚                                                        โ”‚
โ”‚  โ”‚ check for   โ”‚                                                        โ”‚
โ”‚  โ”‚ convergence โ”‚                                                        โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                        โ”‚
โ”‚                                                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Recommendation System Training Pipeline


Phased Implementation Guide

Phase 1: Data Loading (Days 1-2)

Goal: Load and explore MovieLens dataset

Tasks:

  1. Download MovieLens 100K or 1M dataset
  2. Parse CSV files (ratings, movies, users)
  3. Create user/item ID mappings
  4. Explore data statistics

Implementation:

def load_movielens(path):
    """Load MovieLens dataset."""
    ratings = pd.read_csv(
        f"{path}/ratings.csv",
        names=['user_id', 'item_id', 'rating', 'timestamp']
    )

    # Create mappings to 0-indexed IDs
    user_ids = ratings['user_id'].unique()
    item_ids = ratings['item_id'].unique()

    user_to_idx = {u: i for i, u in enumerate(user_ids)}
    item_to_idx = {i: j for j, i in enumerate(item_ids)}

    # Convert to tuples
    data = [
        (user_to_idx[r.user_id], item_to_idx[r.item_id], r.rating)
        for r in ratings.itertuples()
    ]

    return data, user_to_idx, item_to_idx

Verification:

  • Correct number of users/items
  • Rating distribution looks reasonable (1-5 scale)
  • No missing or invalid data

Phase 2: Basic Matrix Factorization (Days 3-5)

Goal: Implement core prediction and training

Tasks:

  1. Initialize P and Q matrices randomly
  2. Implement prediction function
  3. Implement SGD update step
  4. Train on small subset first

Implementation:

class MatrixFactorization:
    def __init__(self, n_users, n_items, n_factors=10):
        # Initialize with small random values
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))

    def predict(self, u, i):
        return np.dot(self.P[u], self.Q[i])

    def train_step(self, u, i, r, lr=0.01, reg=0.02):
        pred = self.predict(u, i)
        error = r - pred

        # Gradient descent update
        p_update = lr * (error * self.Q[i] - reg * self.P[u])
        q_update = lr * (error * self.P[u] - reg * self.Q[i])

        self.P[u] += p_update
        self.Q[i] += q_update

        return error ** 2

Verification:

  • Loss decreases over epochs
  • Predictions are in reasonable range
  • No NaN or inf values

Phase 3: Biases and Global Mean (Days 6-7)

Goal: Add bias terms for better predictions

Tasks:

  1. Compute global mean rating
  2. Add user bias array
  3. Add item bias array
  4. Update training to include biases

Implementation:

class MatrixFactorization:
    def __init__(self, n_users, n_items, n_factors=10):
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))
        self.user_bias = np.zeros(n_users)
        self.item_bias = np.zeros(n_items)
        self.global_mean = 0

    def predict(self, u, i):
        return (self.global_mean +
                self.user_bias[u] +
                self.item_bias[i] +
                np.dot(self.P[u], self.Q[i]))

    def train_step(self, u, i, r, lr=0.01, reg=0.02):
        pred = self.predict(u, i)
        error = r - pred

        # Update biases
        self.user_bias[u] += lr * (error - reg * self.user_bias[u])
        self.item_bias[i] += lr * (error - reg * self.item_bias[i])

        # Update factors
        p_update = lr * (error * self.Q[i] - reg * self.P[u])
        q_update = lr * (error * self.P[u] - reg * self.Q[i])
        self.P[u] += p_update
        self.Q[i] += q_update

        return error ** 2

Verification:

  • Biases capture user/item tendencies
  • RMSE improves with biases
  • High-rated items have positive item bias

Phase 4: Evaluation (Days 8-9)

Goal: Properly evaluate model performance

Tasks:

  1. Implement train/test split
  2. Implement RMSE and MAE
  3. Track metrics over epochs
  4. Plot learning curves

Implementation:

def train_test_split(data, test_ratio=0.2):
    """Random split of ratings."""
    np.random.shuffle(data)
    split = int(len(data) * (1 - test_ratio))
    return data[:split], data[split:]

def evaluate(model, test_data):
    """Compute RMSE on test set."""
    squared_errors = []
    for u, i, r in test_data:
        pred = model.predict(u, i)
        squared_errors.append((r - pred) ** 2)
    return np.sqrt(np.mean(squared_errors))

def train(model, train_data, test_data, epochs=20, lr=0.01, reg=0.02):
    """Training loop with evaluation."""
    for epoch in range(epochs):
        np.random.shuffle(train_data)
        total_loss = 0

        for u, i, r in train_data:
            total_loss += model.train_step(u, i, r, lr, reg)

        train_rmse = np.sqrt(total_loss / len(train_data))
        test_rmse = evaluate(model, test_data)
        print(f"Epoch {epoch}: Train RMSE={train_rmse:.4f}, Test RMSE={test_rmse:.4f}")

Verification:

  • Test RMSE is reasonable (< 1.0)
  • Training loss decreases
  • No overfitting (test RMSE doesnโ€™t increase much)

Phase 5: Recommendation Generation (Days 10-11)

Goal: Generate personalized recommendations

Tasks:

  1. Find items user hasnโ€™t rated
  2. Predict ratings for all unseen items
  3. Return top-N recommendations
  4. Show movie titles with predictions

Implementation:

def get_recommendations(model, user_id, user_rated_items, all_items, n=10):
    """Get top N recommendations for user."""
    unseen = set(all_items) - set(user_rated_items[user_id])

    predictions = []
    for item_id in unseen:
        pred = model.predict(user_id, item_id)
        predictions.append((item_id, pred))

    # Sort by predicted rating, descending
    predictions.sort(key=lambda x: -x[1])

    return predictions[:n]

# Usage
recs = get_recommendations(model, user_id=42, user_rated_items, all_items, n=10)
for item_id, pred_rating in recs:
    print(f"{movie_titles[item_id]}: {pred_rating:.2f} stars (predicted)")

Verification:

  • Recommendations make intuitive sense
  • Userโ€™s highly-rated genres appear in recommendations
  • No already-rated items in recommendations

Phase 6: Similarity and Exploration (Days 12-14)

Goal: Find similar items and interpret factors

Tasks:

  1. Implement cosine similarity
  2. Find similar items/users
  3. Visualize latent factors
  4. Interpret what factors might mean

Implementation:

def cosine_similarity(v1, v2):
    """Compute cosine similarity between two vectors."""
    dot = np.dot(v1, v2)
    norm1 = np.linalg.norm(v1)
    norm2 = np.linalg.norm(v2)
    return dot / (norm1 * norm2 + 1e-8)

def find_similar_items(model, item_id, k=10):
    """Find k most similar items based on latent factors."""
    target = model.Q[item_id]
    similarities = []

    for other_id in range(len(model.Q)):
        if other_id != item_id:
            sim = cosine_similarity(target, model.Q[other_id])
            similarities.append((other_id, sim))

    return sorted(similarities, key=lambda x: -x[1])[:k]

def visualize_factors(model, item_ids, item_names, factors=[0, 1]):
    """Plot items in 2D factor space."""
    plt.figure(figsize=(10, 8))
    for item_id in item_ids:
        x = model.Q[item_id, factors[0]]
        y = model.Q[item_id, factors[1]]
        plt.scatter(x, y)
        plt.annotate(item_names[item_id][:20], (x, y))
    plt.xlabel(f"Factor {factors[0]}")
    plt.ylabel(f"Factor {factors[1]}")
    plt.show()

Verification:

  • Similar items by similarity are sensibly related
  • Factors show clustering by genre
  • Visualization reveals interpretable structure

Testing Strategy

Unit Tests

  1. Data loading:
    • Correct number of ratings loaded
    • All ratings in valid range (1-5)
    • ID mappings are bijective
  2. Model components:
    • Prediction is dot product plus biases
    • Updates change P and Q
    • Regularization shrinks factors
  3. Metrics:
    • RMSE of constant prediction equals std deviation
    • Perfect predictions give RMSE of 0

Integration Tests

  1. Overfitting test: Can model achieve near-zero loss on small dataset?
  2. Bias test: Does adding biases improve RMSE?
  3. Regularization test: Does reg prevent overfitting?

Sanity Checks

  1. Known item similarity: Do โ€œStar Warsโ€ and โ€œEmpire Strikes Backโ€ have high similarity?
  2. User consistency: If user rates action movies high, do action movies appear in recommendations?
  3. Extreme predictions: Are all predictions in reasonable range (1-5)?

Common Pitfalls and Debugging

Learning Rate Issues

Symptom: Loss oscillates or diverges

Cause: Learning rate too high

Fix: Start with lr=0.001, increase gradually

Regularization Issues

Symptom: Test RMSE much higher than train RMSE

Cause: Overfitting (regularization too low)

Fix: Increase ฮป (try 0.02, 0.05, 0.1)

Cold Start Errors

Symptom: Errors on new users/items

Cause: User/item not in training data

Fix: Use fallback (global mean, popular items)

Numerical Issues

Symptom: NaN in predictions or loss

Cause: Division by zero, overflow

Fix: Add epsilon to denominators, clip values

Slow Training

Symptom: Hours to train on MovieLens 1M

Cause: Pure Python loops

Fix: Vectorize, use NumPy, or implement in C


Extensions and Challenges

Beginner Extensions

  • Tune hyperparameters (factors, lr, reg)
  • Add early stopping
  • Save/load trained model

Intermediate Extensions

  • Implement ALS instead of SGD
  • Add temporal dynamics (ratings change over time)
  • Incorporate item genres as features
  • Build a simple web demo

Advanced Extensions

  • Implement implicit feedback model
  • Add side information (user demographics)
  • Neural collaborative filtering
  • A/B testing simulation

Real-World Connections

Netflix

The Netflix Prize (2006-2009) popularized matrix factorization:

  • $1 million for 10% improvement in RMSE
  • Winning solution: ensemble of many models, MF as core
  • Showed practical value of latent factor models

Spotify

Uses collaborative filtering for:

  • Discover Weekly playlist
  • Related artists
  • Radio stations

Plus content features (audio analysis).

Amazon

โ€œCustomers who bought X also bought Yโ€:

  • Item-item collaborative filtering
  • Fast to compute (precomputed similarities)
  • Core of product recommendation

YouTube

Deep learning + collaborative filtering:

  • Candidate generation (MF-like)
  • Ranking (deep neural network)
  • Billions of users and videos

Self-Assessment Checklist

Conceptual Understanding

  • I can explain what latent factors represent
  • I understand why the dot product measures preference
  • I know how biases improve predictions
  • I can explain the cold start problem
  • I understand the tradeoff between bias and variance

Practical Skills

  • I implemented matrix factorization with SGD
  • My model achieves reasonable RMSE (< 0.95 on MovieLens)
  • I can generate and interpret recommendations
  • I can find similar items using learned factors
  • I can tune hyperparameters effectively

Teaching Test

Can you explain to someone else:

  • Why do we factorize the rating matrix?
  • What do the user and item vectors represent?
  • How does training work?
  • Why is regularization important?

Resources

Primary

  • โ€œData Science for Businessโ€ by Provost & Fawcett - Business context
  • โ€œRecommender Systems Handbookโ€ - Comprehensive reference
  • Netflix Prize papers - Original inspiration

Reference

  • Surprise library - Python recommendation toolkit
  • LightFM - Hybrid recommendation library
  • MovieLens datasets - Standard benchmarks

Online

  • Googleโ€™s Recommendation ML course - Practical guide
  • Koren et al. โ€œMatrix Factorization Techniquesโ€ - Classic paper
  • Coursera: Recommender Systems Specialization - Academic depth

When you complete this project, youโ€™ll understand that recommendation systems are applied linear algebraโ€”users and items as vectors, preferences as dot products, learning as optimization. Every time Netflix suggests a movie, this math is running behind the scenes.