Project 22: Matrix Calculus Backpropagation Workbench

Build a matrix-form gradient verification lab that makes backpropagation auditable instead of magical.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert (The Systems Architect)
Time Estimate	2-3 weeks
Main Programming Language	Python
Alternative Programming Languages	Julia, Rust, MATLAB
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	1. The “Resume Gold” (Educational/Personal Brand)
Knowledge Area	Matrix Calculus / Deep Learning Internals
Main Book	“The Matrix Calculus You Need for Deep Learning” by Parr and Howard

1. Learning Objectives

Derive Jacobian/Hessian components with shape discipline.
Implement matrix-chain-rule backpropagation end to end.
Build finite-difference gradient checks for each parameter block.
Diagnose mismatch signatures (transpose bugs, reduction bugs, saturation).

2. All Theory Needed (Per-Concept Breakdown)

Concept A: Jacobians and Shape Semantics

Fundamentals A Jacobian captures output sensitivity to input dimensions. Shape errors in Jacobians are one of the most common deep-learning implementation failures.

Deep Dive into the concept Pick one Jacobian convention and keep it global. Every forward operator should publish output shape and local derivative shape contract. Composed layers then become explicit matrix products of local Jacobians. This reveals where transposes, broadcasts, and reductions must appear.

Concept B: Matrix Chain Rule

Fundamentals Backpropagation is reverse-mode application of chain rule on composed vector functions.

Deep Dive into the concept Layerwise gradients flow from loss to parameters through intermediate activations. The matrix form compresses scalar derivations and better matches modern implementations. Correctness depends on matching both arithmetic and shape transformations.

Concept C: Numerical Gradient Checking

Fundamentals Finite differences are a sanity oracle for analytical gradients.

Deep Dive into the concept Use centered differences and robust relative-error denominators. Very small gradients need careful interpretation to avoid false positives.

3. Build Blueprint

Implement affine-only network and analytic gradients.
Add nonlinearity and softmax-cross-entropy.
Add gradient checker per parameter tensor.
Emit layerwise mismatch diagnostics.

4. Real-World Outcome (Target)

$ python matrix_calculus_lab.py --model mlp2 --batch 8

Forward loss: 1.209341
Gradient check dL/dW1: 4.7e-07
Gradient check dL/dW2: 5.3e-07
Chain-rule consistency: PASS

5. Core Design Notes from Main Guide

Core Question

“How does backpropagation work when every derivative is a matrix derivative?”

Common Pitfalls

Inconsistent batch reduction conventions
Hidden transpose mistakes
Misleading gradient-check thresholds near zero

Definition of Done

All layer gradients pass finite-difference checks
Shape assertions exist for every forward/backward step
Failure logs isolate offending tensor blocks
Matrix-form derivation is documented

6. Extensions

Add Hessian-vector product checks.
Compare reverse-mode and forward-mode cost.
Add tensor-notation documentation for 3D batches.