← Back to all projects

LINEAR ALGEBRA LEARNING PROJECTS

After completing these projects, you will **deeply understand linear algebra as the language of transformations**. You won't just know the formulas---you'll see matrices as machines that warp space, eigenvectors as the natural axes that transformations preserve, and decompositions as ways to reveal hidden structure. When someone mentions projecting onto a subspace or change of basis, you'll visualize it instantly because you've built systems that depend on these operations working correctly.

Learning Linear Algebra Through Building

Goal

After completing these projects, you will deeply understand linear algebra as the language of transformations. You won’t just know the formulas—you’ll see matrices as machines that warp space, eigenvectors as the natural axes that transformations preserve, and decompositions as ways to reveal hidden structure. When someone mentions “projecting onto a subspace” or “change of basis,” you’ll visualize it instantly because you’ve built systems that depend on these operations working correctly.

This isn’t about passing exams. It’s about reaching the point where linear algebra becomes your native language for thinking about space, data, and computation.

Why Linear Algebra Matters

Linear algebra is arguably the most practically important branch of mathematics for programmers, scientists, and engineers. It underlies nearly every computational domain:

Domain	How Linear Algebra Powers It
Computer Graphics	Every 3D object you see in games/movies is positioned, rotated, and projected using matrices
Machine Learning / AI	Neural networks are stacks of matrix multiplications with nonlinearities
Physics Simulation	Forces, velocities, and constraints are all vectors; solving physics means solving linear systems
Computer Vision	Images are matrices; detecting edges, faces, and objects uses eigenanalysis and convolutions
Robotics	Robot arm movement uses transformation matrices and inverse kinematics
Data Science	PCA, SVD, and regression are all linear algebra at their core
Signal Processing	Audio and radio signals are decomposed into frequency components via Fourier (matrix operations)
Cryptography	Many encryption schemes rely on matrix operations over finite fields

The Core Insight: Everything is a Transformation

The key insight that makes linear algebra click is this: matrices are not just grids of numbers—they are transformation machines.

                    MATRIX AS TRANSFORMATION MACHINE

      Input Vector                              Output Vector
           |                                         |
           v                                         v
        [ 2 ]          +-----------------+        [ 5 ]
        [ 3 ]  ------> |  [ 1  1 ]       | -----> [ -1 ]
                       |  [ 1 -1 ]       |
                       +-----------------+
                         Transformation

    The matrix takes a 2D vector and outputs a new 2D vector.
    This particular matrix rotates and scales the input.

When you multiply a matrix by a vector, you’re not “doing arithmetic”—you’re applying a transformation to a point in space. The matrix encodes the transformation; the vector is what gets transformed.

Composing Transformations

One of the most powerful aspects of matrices is that multiplying matrices composes transformations:

    COMPOSING TRANSFORMATIONS

    First rotate 45 deg, then scale by 2:

         Rotate          Scale          Combined
        [ cos  -sin ]   [ 2  0 ]       [ 1.41  -1.41 ]
        [ sin   cos ] x [ 0  2 ]  =    [ 1.41   1.41 ]

    One matrix = both operations!

    Apply to a point:

        Original         After Combined
           *                    *
          /|\                 / | \
         / | \               /  |  \
        *--+--*             *   |   *
           |                    |
           |                    |

    (Square becomes rotated, larger square)

This is why graphics pipelines use matrix multiplication chains: model matrix -> view matrix -> projection matrix. Each matrix transforms space, and multiplying them gives you one matrix that does everything.

The Big Picture: What Linear Algebra Really Is

Linear algebra is the study of vector spaces and linear transformations between them. Let’s build geometric intuition for each core concept.

Vectors: Points and Directions in Space

A vector can be thought of two ways:

A point: coordinates [3, 2] means “the point 3 units right, 2 units up”
A direction with magnitude: an arrow from the origin to that point

    VECTOR AS ARROW IN 2D SPACE

         y
         ^
         |
       3 +         * (3, 2)
         |        /|
       2 +       / |
         |      /  |
       1 +     /   |
         |    /    |
    -----+---*-----+-----> x
         0   1  2  3  4

    The vector [3, 2] is an arrow from origin to point (3, 2).
    Length = sqrt(3^2 + 2^2) = sqrt(13) ~ 3.6

In higher dimensions, vectors work the same way—just more coordinates. A vector in 100-dimensional space is a list of 100 numbers, representing a point in 100D space.

Matrices: Machines That Transform Space

A matrix is a linear transformation—a function that takes vectors in and outputs transformed vectors. What makes it “linear” is that it preserves:

Addition: T(v + w) = T(v) + T(w)
Scalar multiplication: T(c * v) = c * T(v)

Geometrically, this means lines stay lines and the origin stays fixed.

Here are the fundamental 2D transformations and their matrices:

    FUNDAMENTAL 2D TRANSFORMATIONS

    IDENTITY (no change)          SCALE (stretch)
    [ 1  0 ]                      [ 2  0 ]
    [ 0  1 ]                      [ 0  2 ]

        *---*                         *-------*
        |   |         --->            |       |
        *---*                         |       |
                                      *-------*

    ROTATION (90 deg CCW)         SHEAR (slant)
    [ 0 -1 ]                      [ 1  1 ]
    [ 1  0 ]                      [ 0  1 ]

        *---*                         *---*
        |   |         --->           /   /
        *---*                       *---*

    REFLECTION (over x-axis)      PROJECTION (onto x-axis)
    [ 1  0 ]                      [ 1  0 ]
    [ 0 -1 ]                      [ 0  0 ]

        *---*
        |   |         --->        *---*---- (flattened)
        *---*
        |   |   (flipped)
        *---*

Each 2x2 matrix defines a unique transformation of the 2D plane. Understanding this visually is the key to internalizing linear algebra.

Eigenvalues and Eigenvectors: The Natural Directions

When you apply a transformation, most vectors change direction. But some special vectors only get scaled—they point in the same (or opposite) direction after transformation. These are eigenvectors, and the scale factor is the eigenvalue.

    EIGENVECTORS: DIRECTIONS THAT DON'T ROTATE

    Shear matrix: [ 1  1 ]
                  [ 0  1 ]

    Most vectors change direction:

         Original              After Shear
            ^                      ^
           /                      / \
          /                      /   \
         *                      *     \
                                       \

    But horizontal vectors ONLY get scaled (eigenvalue = 1):

         Original              After Shear
         ------>               -------> (same direction!)

    This horizontal direction is an eigenvector of the shear.
    The eigenvalue is 1 (no scaling).

Eigenvectors reveal the intrinsic structure of a transformation:

Rotation matrices have no real eigenvectors (nothing stays fixed except origin)
Scaling matrices have coordinate axes as eigenvectors
Projection matrices have eigenvalues 0 and 1 (things either stay or collapse)

Understanding eigenvectors is crucial for:

Stability analysis: Will a system explode or converge?
Dimensionality reduction: Which directions capture the most variance?
Graph algorithms: PageRank is finding the dominant eigenvector

Matrix Decompositions: Breaking Down Transformations

Just as numbers can be factored (12 = 3 x 4), matrices can be decomposed into simpler pieces:

    SVD: A = U * Sigma * V^T

    Any matrix can be broken into:

        A          =    U      *   Sigma   *    V^T
    (transform)      (rotate)    (scale)     (rotate)

    [complex        [ orthogonal   [ diagonal   [ orthogonal
     messy    =      rotation ] x   stretch ] x  rotation ]
     matrix]

    This reveals that ANY linear transformation is just:
    1. Rotate
    2. Scale along axes
    3. Rotate again

Different decompositions reveal different structure:

LU decomposition: For solving systems of equations efficiently
QR decomposition: For least squares and stability
SVD: For understanding data structure and compression
Eigendecomposition: For understanding transformation behavior

Core Concept Analysis

Linear algebra breaks down into these fundamental building blocks:

Concept	What It Really Means	Geometric Intuition
Vectors	Arrows in space (direction + magnitude), or ordered lists of numbers	Points or directions in n-dimensional space
Matrices	Transformation machines that stretch, rotate, shear, or project vectors	Functions that warp all of space consistently
Linear Transformations	Functions that preserve lines and the origin	Grid lines stay parallel and evenly spaced
Matrix Multiplication	Composing transformations; applying transformation to each column	Do transformation B, then transformation A
Systems of Equations	Finding where multiple constraints intersect	Where do planes/lines meet in space?
Eigenvalues/Eigenvectors	The “natural axes” of a transformation—directions that only scale	Directions where the transformation acts simplest
Dot Product	Measuring alignment (how much of one vector goes in another’s direction)	Projection length times length
Cross Product	Finding perpendicular direction with magnitude = area	Normal vector to a plane, with area info
Determinants	How much a transformation scales area/volume	If det = 2, areas double; if det = 0, everything collapses
Matrix Decompositions	Breaking matrices into simpler, meaningful pieces (LU, QR, SVD)	Factor transformations into primitive operations
Rank	Dimensionality of the output space	How many dimensions survive after transformation
Null Space	Vectors that get mapped to zero	What information does the transformation destroy?

Concept Summary Table

Concept Cluster	What You Need to Internalize	Projects That Build This
Vectors & Operations	Vectors as arrows, addition as tip-to-tail, scaling as stretching, dot product as projection	All projects (foundational)
Matrix-Vector Multiplication	Matrix columns are where basis vectors land; multiplication transforms the input vector	3D Renderer, Neural Network
Transformation Matrices	Rotation, scaling, shear, projection—and how they combine via multiplication	3D Renderer, Ray Tracer
Change of Basis	Same vector, different coordinate systems; how cameras and local coordinates work	3D Renderer, Ray Tracer
Systems of Equations	Intersecting planes, constraint satisfaction, Gaussian elimination	Physics Simulation
Eigenanalysis	Natural directions, stability, what a transformation “really does”	SVD Compression, Neural Network
Matrix Decomposition	SVD as rotate-scale-rotate; revealing hidden structure	SVD Compression, Recommendation Engine
Numerical Stability	Why computation order matters; condition numbers; avoiding catastrophic cancellation	Physics Simulation

Deep Dive Reading by Concept

This section maps each concept to specific chapters in recommended books. Work through these alongside your projects.

Recommended Books

Book	Strength	Best For
“Math for Programmers” by Paul Orland	Intuitive, code-first, practical examples	Building geometric intuition, vectors, transformations
“Computer Graphics from Scratch” by Gabriel Gambetta	Hands-on 3D graphics from first principles	Understanding graphics pipeline, projections
“Linear Algebra Done Right” by Sheldon Axler	Rigorous but accessible, proof-based	Deep theoretical understanding, eigenspaces
“Hands-On Machine Learning” by Aurelien Geron	Applied ML with math explanations	Connecting linear algebra to ML applications
3Blue1Brown “Essence of Linear Algebra” (YouTube)	Visual geometric intuition	Building mental models before diving into math

Concept-to-Chapter Mapping

Vectors and Vector Operations

Matrix Transformations

Matrix Multiplication and Composition

Systems of Linear Equations

Determinants

Eigenvalues and Eigenvectors

Matrix Decompositions (SVD, etc.)

Projections and Orthogonality

Project 1: Software 3D Renderer from Scratch

📖 View Detailed Guide →

📖 Detailed Implementation Guide →

File: LINEAR_ALGEBRA_LEARNING_PROJECTS.md
Programming Language: C
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Computer Graphics / Linear Algebra
Software or Tool: Graphics Pipeline
Main Book: “Computer Graphics from Scratch” by Gabriel Gambetta

What you’ll build: A program that renders 3D wireframe models to 2D screen coordinates, with rotation, scaling, and perspective projection—no graphics libraries.

Why it teaches linear algebra: Every single operation in 3D graphics IS linear algebra. You can’t fake it—the math is literally what makes pixels appear in the right place. Rotating a cube? That’s a rotation matrix. Perspective? That’s a projection matrix. Camera movement? Translation and transformation composition.

Core challenges you’ll face:

Representing 3D points as vectors and transforming them (maps to vector operations)
Building rotation matrices for X, Y, Z axes (maps to matrix construction and multiplication)
Composing multiple transformations in the correct order (maps to matrix multiplication non-commutativity)
Projecting 3D coordinates to 2D screen space (maps to projection matrices)
Implementing a camera/view matrix (maps to change of basis)

Key Concepts:

Vectors and coordinates: “Computer Graphics from Scratch” by Gabriel Gambetta - Chapter 1
Transformation matrices: “3Blue1Brown: Essence of Linear Algebra” - Episode 3 (YouTube)
Matrix multiplication: “Math for Programmers” by Paul Orland - Chapter 4
Homogeneous coordinates: “Computer Graphics from Scratch” by Gabriel Gambetta - Chapter 12
Projection: “Math for Programmers” by Paul Orland - Chapter 5

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Basic programming, trigonometry fundamentals

Real world outcome: You will see a rotating 3D wireframe cube (or any OBJ model you load) rendered on your screen. You can move a virtual camera around the scene. When you change a single number in your rotation matrix, you’ll SEE the object rotate—making the abstract concrete.

Learning milestones:

Render a static 2D projection of a 3D cube → You understand vectors as coordinates
Rotate the cube smoothly with keyboard input → You understand transformation matrices
Add perspective (things farther away appear smaller) → You understand projection matrices
Move a camera through the scene → You understand change of basis and inverse transformations

The Core Question You’re Answering

“How do computers turn mathematical numbers into visual 3D worlds on a flat screen?”

This project answers one of the most fundamental questions in computer graphics: how do we represent three-dimensional objects using numbers, and how do we transform those numbers to create the illusion of depth on a 2D display?

The answer lies entirely in linear algebra. A 3D point is just a vector (x, y, z). Moving, rotating, or scaling that point means multiplying it by a matrix. Projecting it onto a 2D screen means applying another matrix that “squashes” the z-dimension while preserving perspective. Every frame of every 3D game, movie, or CAD application performs millions of these operations.

When you build this renderer, you’re not learning graphics programming with some math sprinkled in. You’re learning that graphics programming IS matrix operations. The distinction between “the math” and “the code” completely dissolves.

Concepts You Must Understand First

Before you write your first line of code, make sure you can answer these questions:

1. Vectors as Coordinates

What does a vector (3, 4, 5) represent in 3D space?
How do you add two vectors? What does addition mean geometrically?
What’s the difference between a position vector and a direction vector?
Read: “Computer Graphics from Scratch” Chapter 1, “Math for Programmers” Chapter 2

2. Transformation Matrices

What does it mean to “apply” a matrix to a vector?
Why is a rotation matrix orthogonal (its transpose equals its inverse)?
How do you construct a rotation matrix for the X, Y, or Z axis?
Watch: 3Blue1Brown Episode 3 “Linear transformations and matrices”

3. Matrix Multiplication and Composition

Why does the order of matrix multiplication matter?
If you want to rotate then translate, which matrix goes on the left?
What happens geometrically when you multiply two rotation matrices?
Read: “Math for Programmers” Chapter 4

4. Homogeneous Coordinates

Why do we use 4D vectors (x, y, z, w) for 3D graphics?
How does the w-component enable translation as matrix multiplication?
What’s the difference between a point (w=1) and a direction (w=0)?
Read: “Computer Graphics from Scratch” Chapter 12

5. Projection Matrices

What’s the difference between orthographic and perspective projection?
Why do objects farther away appear smaller in perspective?
How does the projection matrix transform the view frustum to a cube?
Read: “Math for Programmers” Chapter 5, “Computer Graphics from Scratch” Chapters 9-11

Questions to Guide Your Design

As you architect your renderer, ask yourself:

Data structures: How will you represent a 3D point? A transformation matrix? Will you use arrays, structs, or something else? What operations do they need to support?
Transformation pipeline: In what order will you apply model, view, and projection transformations? Can you compose them into a single matrix for efficiency?
Coordinate systems: Where is the origin of your world? Which direction is “up”? How does your camera’s coordinate system relate to world coordinates?
Clipping: What happens to vertices that are behind the camera or outside the view frustum? When should you handle this in the pipeline?
Screen mapping: After projection, your coordinates are in normalized device coordinates (-1 to 1). How do you map these to actual pixel positions on your screen?
Numerical precision: Floating-point math has limits. How will you handle very small or very large numbers? What happens when determinants approach zero?

Thinking Exercise: Trace a Point Through the Pipeline

Before coding, trace this example by hand:

Setup: You have a cube centered at world origin. One vertex is at position (1, 1, 1). Your camera is at position (0, 0, 5) looking toward the origin.

Step 1: Model Transform Rotate the cube 45 degrees around the Y-axis. Write out the rotation matrix and compute the new vertex position.

Ry(45) = | cos(45)   0   sin(45) |   | 0.707   0   0.707 |
         |    0      1      0    | = |   0     1     0   |
         | -sin(45)  0   cos(45) |   | -0.707  0   0.707 |

Original vertex: (1, 1, 1)
After rotation: (?, ?, ?)  <-- Calculate this!

Step 2: View Transform Transform the vertex from world space to camera space. The camera is at (0, 0, 5), so you need to translate by (0, 0, -5).

After view transform: (?, ?, ?)  <-- Calculate this!

Step 3: Projection Apply a simple perspective projection where x’ = x/z and y’ = y/z. (In practice you’d use a full projection matrix, but this captures the essence.)

After projection: (?, ?)  <-- Calculate this!

Verify your understanding: The projected point should be closer to the center of the screen than the original (1, 1) would suggest, because the point is farther from the camera than z=1.

The Interview Questions They’ll Ask

Building this project prepares you for these common technical interview questions:

“Explain how a 3D game renders a frame.”
- You’ll describe the transformation pipeline: model space -> world space -> view space -> clip space -> screen space
“Why do graphics APIs use 4x4 matrices instead of 3x3?”
- Homogeneous coordinates allow translation to be represented as matrix multiplication, enabling all transforms to be composed
“What’s the difference between rotating then translating vs. translating then rotating?”
- Matrix multiplication is non-commutative; the order determines whether you rotate around the world origin or the object’s local origin
“How does perspective projection work mathematically?”
- Division by z-depth creates foreshortening; the projection matrix sets up this division via homogeneous coordinates
“What is a view matrix and how do you construct one?”
- It’s the inverse of the camera’s world transform; it repositions the entire world so the camera is at origin looking down -Z
“Why do we need to normalize vectors for lighting calculations?”
- Dot products only give cosines of angles when both vectors have length 1; unnormalized vectors give incorrect shading
“What causes gimbal lock and how do you avoid it?”
- Sequential Euler rotations can align two axes, losing a degree of freedom; quaternions avoid this
“How would you implement a camera that orbits around an object?”
- Combine rotation around the object’s center with translation outward along the rotated forward vector

Hints in Layers

When you’re stuck, reveal hints one at a time:

Hint 1: Getting Started

Start with 2D before 3D. Implement a 2D renderer that can draw lines and apply 2D transformations (rotation, scale, translation). Once this works, extending to 3D is mostly about adding one more dimension and one more transformation (projection).

Your first milestone should be: draw a single point on screen at a hardcoded 3D position, projected to 2D. Don’t worry about rotation yet.

Hint 2: Matrix Implementation

Implement your matrix multiplication carefully. The most common bug is getting row-major vs. column-major confused. Pick a convention (most math textbooks use column vectors, so matrices are applied right-to-left: M * v). Stick with it consistently.

Test your matrix code in isolation: multiply identity by any matrix should give that matrix back. Multiply a rotation by its transpose should give identity.

Hint 3: Debugging Transforms

When your cube looks wrong, print the vertex positions after each transformation step. Common issues:

Cube is inside the camera (z is positive when it should be negative)
Cube is behind the camera (z is negative when your projection expects positive)
Transforms applied in wrong order (object spins around world origin instead of itself)

Draw the coordinate axes as colored lines (X=red, Y=green, Z=blue) to visualize orientation.

Hint 4: Performance Perspective

Compose all your transformation matrices into a single MVP (Model-View-Projection) matrix before transforming vertices. Multiplying matrices is O(1) per frame, but transforming each vertex is O(n). One matrix multiply per vertex is much faster than three.

However, don’t optimize prematurely. Get it working with separate transforms first, then optimize.

Books That Will Help

Topic	Primary Resource	Chapter/Section
Vectors and basic operations	“Math for Programmers” by Paul Orland	Chapters 2-3
Transformation matrices	“3Blue1Brown: Essence of Linear Algebra”	Episodes 3-4 (YouTube)
Matrix multiplication	“Math for Programmers” by Paul Orland	Chapter 4
Homogeneous coordinates	“Computer Graphics from Scratch” by Gabriel Gambetta	Chapter 12
Projection and viewing	“Computer Graphics from Scratch” by Gabriel Gambetta	Chapters 9-11
Camera and view matrices	“Math for Programmers” by Paul Orland	Chapter 5
Complete graphics pipeline	“Computer Graphics: Principles and Practice” by Hughes et al.	Part II
Real-time rendering	“Real-Time Rendering” by Akenine-Moller et al.	Chapters 2, 4

Project 2: Image Compression Using SVD

📖 View Detailed Guide →

📖 Detailed Implementation Guide →

File: LINEAR_ALGEBRA_LEARNING_PROJECTS.md
Programming Language: Python (or C)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Linear Algebra / Image Processing
Software or Tool: SVD
Main Book: “Math for Programmers” by Paul Orland

What you’ll build: A tool that compresses images by decomposing them with Singular Value Decomposition (SVD), letting you visually see how much information each “rank” captures.

Why it teaches linear algebra: Images are matrices. SVD reveals their hidden structure—the most important “directions” of variation. You’ll viscerally understand that a matrix can be broken into simpler pieces, and that some pieces matter more than others. This is the foundation for dimensionality reduction, recommendation systems, and data analysis.

Core challenges you’ll face:

Representing images as matrices of pixel values (maps to matrix representation)
Implementing or using SVD decomposition (maps to matrix factorization)
Understanding what U, Σ, V^T actually represent (maps to eigenvectors and singular values)
Reconstructing images from partial decompositions (maps to low-rank approximation)
Measuring compression quality vs. file size (maps to understanding rank and information)

Key Concepts:

Matrix representation of data: “Math for Programmers” by Paul Orland - Chapter 6
SVD intuition: “3Blue1Brown: Essence of Linear Algebra” - Singular Value Decomposition (YouTube)
Eigenvalues and eigenvectors: “Grokking Algorithms” by Aditya Bhargava - Appendix on Linear Algebra
Low-rank approximation: “Hands-On Machine Learning” by Aurélien Géron - Chapter 8

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic programming, understanding of what matrices are

Real world outcome: You’ll have a CLI tool where you input an image and a “rank” parameter. At rank 1, you see a blurry smear. At rank 10, recognizable shapes. At rank 50, nearly perfect quality at 10% the data. You’ll literally SEE linear algebra compressing information.

Learning milestones:

Load an image as a matrix of numbers → You understand matrices as data
Apply SVD and reconstruct the original → You understand matrix decomposition
Reconstruct with only the top k singular values → You understand rank and information content
Compare compression ratios and quality → You understand the power of eigenvalue decomposition

The Core Question You’re Answering

“How can we represent complex data with far fewer numbers while preserving what matters most?”

This project confronts one of the deepest questions in data science and information theory: What is the essential structure hidden inside high-dimensional data?

An image might have millions of pixels, but most of that information is redundant. The gradients are smooth. The patterns repeat. The colors cluster. SVD reveals this hidden structure by finding the most important “directions” of variation in your data.

When you compress an image with SVD, you’re not just making files smaller. You’re discovering that complex data can be approximated by a surprisingly small number of fundamental patterns. This insight underlies:

PCA (Principal Component Analysis): The foundation of modern data science
Latent Semantic Analysis: How search engines understand meaning
Recommendation systems: How Netflix knows what you want to watch
Noise reduction: Separating signal from noise in any domain

The deep insight: Most real-world data lives on a low-dimensional manifold embedded in high-dimensional space. SVD finds that manifold.

Concepts You Must Understand First

Before implementing SVD compression, ensure you can answer these questions:

1. Images as Matrices

How is a grayscale image represented as a matrix of numbers?
What do the row and column indices represent physically?
How do color images extend this to three channels (RGB)?
Book Reference: “Math for Programmers” by Paul Orland - Chapter 6 (Matrix representation of data)

2. Matrix Decomposition (Factorization)

What does it mean to “factor” a matrix into simpler pieces?
Why would we want to break a matrix into a product of other matrices?
How is this analogous to factoring integers (e.g., 12 = 2 x 2 x 3)?
Book Reference: “Hands-On Machine Learning” by Aurelien Geron - Chapter 8 (Dimensionality Reduction)

3. Singular Values and Their Meaning

What is the geometric interpretation of singular values?
Why are they always non-negative?
What does it mean that they’re ordered from largest to smallest?
How do singular values relate to the “importance” of different directions?
Video Reference: 3Blue1Brown - “Singular Value Decomposition” (YouTube)

4. Rank and Information Content

What is the rank of a matrix?
How does rank relate to the number of “independent pieces of information”?
What is a rank-1 matrix and what does it look like visually?
Why can any matrix be written as a sum of rank-1 matrices?
Book Reference: “Math for Programmers” by Paul Orland - Chapter 6

5. Low-Rank Approximation

What does it mean to approximate a matrix with a lower-rank version?
Why does keeping only the top k singular values give the “best” rank-k approximation?
What is the Eckart-Young theorem and why does it matter?
How do we measure “closeness” between matrices (Frobenius norm)?
Book Reference: “Hands-On Machine Learning” by Aurelien Geron - Chapter 8

Questions to Guide Your Design

Before writing code, work through these design questions:

How will you handle color images? Will you apply SVD to each RGB channel separately, or convert to grayscale first? What are the tradeoffs?
How will you visualize the compression? What metrics will you display (compression ratio, PSNR, visual diff)?
What happens at the extremes? What does rank-1 compression look like? What about keeping 90% of singular values?
How will you measure “quality”? Is there a single number that captures how much information was lost?
What’s the actual storage savings? If the original matrix is m x n, and you keep k singular values, what’s the memory footprint of U_k, S_k, V_k^T compared to the original?
Can you visualize the singular vectors themselves? What patterns do the first few left and right singular vectors reveal about image structure?

Thinking Exercise: Manual SVD on a 3x3 Matrix

Before using numpy.linalg.svd, work through this by hand to build intuition:

Consider the matrix:

A = | 3  0 |
    | 0  2 |
    | 0  0 |

Step 1: Compute A^T * A (the 2x2 Gram matrix)

What are the eigenvalues of A^T * A?
These are the squared singular values!

Step 2: Compute A * A^T (the 3x3 matrix)

What are its eigenvalues? (Same non-zero ones, plus a zero)

Step 3: The singular values are the square roots of these eigenvalues.

What are sigma_1 and sigma_2?

Step 4: For this diagonal matrix, what are U, Sigma, and V^T?

Verify that A = U * Sigma * V^T

Now try a non-diagonal matrix:

A = | 1  1 |
    | 0  1 |
    | 1  0 |

Work through the same steps. This is harder but builds real understanding.

The Interview Questions They’ll Ask

Conceptual Questions:

“Explain SVD in terms a non-technical person could understand.”
“What’s the difference between eigenvalue decomposition and SVD?”
“Why can’t we use eigenvalue decomposition for non-square matrices?”
“How does SVD relate to PCA?”

Technical Questions:

“If a matrix is m x n with rank r, what are the dimensions of U, Sigma, and V^T?”
“What is the computational complexity of computing the full SVD?”
“How would you compute SVD for a matrix too large to fit in memory?”
“Why does the Eckart-Young theorem guarantee SVD gives the best low-rank approximation?”

Applied Questions:

“You have a 1000x1000 image. If you keep 50 singular values, what’s the compression ratio?”
“How would you use SVD to denoise an image?”
“Your recommendation system uses SVD. A new user signs up with no ratings. How do you make recommendations?”
“How does Netflix use matrix factorization at scale?”

Hints in Layers

Layer 1 - Getting Started: Start with grayscale images only. Use PIL/Pillow to load an image, convert to a numpy array, and treat it as a matrix. Apply numpy.linalg.svd() and verify you can reconstruct the original image perfectly using all singular values.

Layer 2 - Implementing Compression: To keep only the top k singular values: U_k = U[:, :k], S_k = S[:k], Vt_k = Vt[:k, :]. The reconstructed image is U_k @ np.diag(S_k) @ Vt_k. Plot the result and the original side by side.

Layer 3 - Analyzing Compression: Plot the singular values on a log scale. You’ll see they decay rapidly for most images. This decay rate tells you how compressible the image is. Also plot cumulative energy: sum(S[:k]^2) / sum(S^2) to see what fraction of “variance” is captured by k components.

Layer 4 - Building the Full Tool: Add a slider or command-line parameter for k. Compute and display: compression ratio, PSNR (Peak Signal-to-Noise Ratio), and the difference image (original - reconstructed). For color images, apply SVD to each channel and recombine.

Books That Will Help

Topic	Book	Specific Section
Matrix representation of data	“Math for Programmers” by Paul Orland	Chapter 6: Matrices as data
SVD geometric intuition	3Blue1Brown	“Singular Value Decomposition” (YouTube)
Dimensionality reduction	“Hands-On Machine Learning” by Aurelien Geron	Chapter 8: Dimensionality Reduction
Mathematical foundations	“Linear Algebra Done Right” by Sheldon Axler	Chapter 7: Operators on Inner Product Spaces
Numerical computing	“Numerical Linear Algebra” by Trefethen & Bau	Lectures 4-5: SVD
Applications	“Mining of Massive Datasets” by Leskovec et al.	Chapter 11: Dimensionality Reduction

Project 3: Simple Neural Network from Scratch

📖 View Detailed Guide →

📖 Detailed Implementation Guide →

File: LINEAR_ALGEBRA_LEARNING_PROJECTS.md
Programming Language: Python (or C)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Machine Learning / Linear Algebra
Software or Tool: Matrix Calculus
Main Book: “Math for Programmers” by Paul Orland

What you’ll build: A neural network that classifies handwritten digits (MNIST), implementing forward propagation, backpropagation, and training—using only matrix operations.

Why it teaches linear algebra: Neural networks ARE linear algebra + nonlinearity. Every layer is a matrix multiplication. Backpropagation is the chain rule applied to matrices. You can’t understand deep learning without understanding that it’s just matrices transforming vectors through high-dimensional space.

Core challenges you’ll face:

Representing weights as matrices and inputs as vectors (maps to matrix-vector multiplication)
Computing layer outputs through matrix operations (maps to linear transformations)
Implementing backpropagation via matrix calculus (maps to gradients and transposes)
Understanding why weight initialization matters (maps to eigenvalue distribution)
Batch processing multiple inputs efficiently (maps to matrix-matrix multiplication)

Key Concepts:

Matrix multiplication as transformation: “Math for Programmers” by Paul Orland - Chapter 4
Gradient descent fundamentals: “Hands-On Machine Learning” by Aurélien Géron - Chapter 4
Neural network mathematics: “Grokking Algorithms” by Aditya Bhargava - Chapter on ML
Backpropagation mechanics: “3Blue1Brown: Neural Networks” - Episode 4 (YouTube)

Difficulty: Intermediate-Advanced Time estimate: 2-3 weeks Prerequisites: Calculus basics (derivatives), matrix multiplication

Real world outcome: You draw a digit on screen (or feed in MNIST test images), and your network correctly classifies it. You can visualize the weight matrices as images—seeing what “features” each neuron learned. When accuracy improves during training, you’re watching gradient descent navigate a high-dimensional landscape.

Learning milestones:

Implement forward pass with matrix multiplication → You understand linear transformations
Compute gradients via backpropagation → You understand matrix calculus and transposes
Train and see accuracy improve → You understand optimization in high-dimensional space
Visualize learned weights → You understand what matrices “encode”

The Core Question You’re Answering

“What IS a neural network mathematically, and why does stacking matrix multiplications with nonlinearities allow computers to ‘learn’?”

At its heart, a neural network is a composition of affine transformations (matrix multiplication + bias) interleaved with nonlinear activation functions. Without the nonlinearities, stacking layers would collapse into a single matrix multiplication—equivalent to one layer. The activation functions (ReLU, sigmoid, tanh) introduce the “bends” that allow the network to approximate any continuous function.

The deep connection to linear algebra:

Each layer transforms the input vector into a new representation: h = activation(Wx + b)
The weight matrix W learns which input features to combine and how strongly
Backpropagation is just the chain rule applied to matrix operations—gradients flow backward through transposed matrices
The network learns by adjusting matrices to minimize a loss function in a high-dimensional weight space

When you build a neural network from scratch, you’re not just coding—you’re watching linear algebra learn to recognize patterns.

Concepts You Must Understand First

Before diving into implementation, ensure you can answer these questions:

1. Matrix-Vector Multiplication as Transformation

What does multiplying a vector by a matrix geometrically represent?
If W is a 100x784 matrix and x is a 784-dimensional vector, what is the dimension of Wx?
Why does each row of W compute a “weighted sum” of the input features?
Reference: “Math for Programmers” by Paul Orland - Chapter 4

2. The Chain Rule and Gradients

How do you compute the derivative of a composition of functions: f(g(x))?
What is the gradient of a scalar function with respect to a vector?
Why does the chain rule let us propagate errors backward through layers?
Reference: “Hands-On Machine Learning” by Aurelien Geron - Chapter 4

3. Transpose and Its Geometric Meaning

What does the transpose of a matrix represent geometrically?
Why do gradients flow through W^T during backpropagation?
If W transforms from space A to space B, what does W^T do?
Reference: “3Blue1Brown: Essence of Linear Algebra” - Episode on Transpose

4. Weight Matrices as Learned Transformations

How do random initial weights affect training?
What happens if all weights start at zero? Why is this a problem?
Why do “vanishing” and “exploding” gradients relate to eigenvalues?
Reference: “Hands-On Machine Learning” by Aurelien Geron - Chapter 11

5. Batch Processing with Matrix-Matrix Multiplication

If X is a batch of N samples (N x 784), what is the shape of XW^T?
Why is batch processing more efficient than processing samples one at a time?
How does vectorization eliminate Python loops?
Reference: “Math for Programmers” by Paul Orland - Chapter 4

Questions to Guide Your Design

As you architect your neural network, wrestle with these design decisions:

How will you represent layers? Should each layer be an object with forward/backward methods? Or a simple collection of weight matrices and functions?
How will you handle the forward pass mathematically? Write out the equations for a 3-layer network: input -> hidden1 -> hidden2 -> output. What are the dimensions at each step?
How will you implement backpropagation? Can you derive the gradient of the loss with respect to each weight matrix? Start with the output layer and work backward.
What activation function will you use, and why? ReLU is simple (max(0, x)) but has “dead neurons.” Sigmoid squashes to [0,1] but saturates. What are the tradeoffs?
How will you initialize weights? Random uniform? Random normal? Xavier/He initialization? Why does the scale of initial weights matter for gradient flow?
How will you structure the training loop? Stochastic gradient descent processes one sample at a time. Mini-batch processes groups. Full batch processes everything. What are the tradeoffs in terms of convergence and efficiency?

Thinking Exercise

Before writing code, trace through a tiny network by hand to verify your understanding.

Setup: A 2-layer network with:

Input: x = [1, 2] (2-dimensional)
Hidden layer: W1 = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]] (3x2), b1 = [0, 0, 0]
Output layer: W2 = [[0.1, 0.2, 0.3]] (1x3), b2 = [0]
Activation: ReLU
Target: y = 1
Loss: Mean squared error

Forward pass (trace each step):

Compute z1 = W1 @ x + b1 = ?
Compute h1 = ReLU(z1) = ?
Compute z2 = W2 @ h1 + b2 = ?
Compute output = z2 (no activation on output for regression)
Compute loss = (output - y)^2 = ?

Backward pass (derive gradients):

dL/d(output) = 2 * (output - y) = ?
dL/dW2 = dL/d(output) * h1^T = ?
dL/dh1 = W2^T @ dL/d(output) = ?
dL/dz1 = dL/dh1 * ReLU’(z1) = ? (where ReLU’(x) = 1 if x > 0, else 0)
dL/dW1 = dL/dz1 @ x^T = ?

Complete this exercise on paper before coding. It will reveal any gaps in your understanding.

The Interview Questions They’ll Ask

If you build a neural network from scratch, you should be able to answer these:

“Explain backpropagation without using the word ‘backpropagation.’“ Expected: The chain rule applied to compute gradients of the loss with respect to each parameter, working from output to input, reusing intermediate computations.
“Why can’t you use only linear layers without activation functions?” Expected: Composition of linear functions is linear. W2 @ W1 = W3 (a single matrix). You’d have a 1-layer network regardless of depth.
“What’s the relationship between batch size and gradient noise?” Expected: Larger batches give more accurate gradient estimates but less frequent updates. Smaller batches add noise that can help escape local minima but may not converge smoothly.
“Why does gradient descent work in such high-dimensional spaces?” Expected: The loss landscape, while complex, tends to have many paths to good solutions. Local minima in high dimensions are often saddle points with escape routes.
“Derive the gradient of softmax cross-entropy loss.” Expected: The gradient simplifies beautifully to (prediction - target), making it computationally efficient and numerically stable.
“What happens to gradients as networks get deeper?” Expected: Vanishing (gradients shrink exponentially) or exploding (gradients grow exponentially). This relates to the eigenvalues of weight matrices and motivates techniques like batch normalization and residual connections.
“How would you implement dropout using only matrix operations?” Expected: Create a binary mask matrix and element-wise multiply with activations. Scale by 1/(1-p) to maintain expected values.
“Why do we use cross-entropy loss for classification instead of MSE?” Expected: Cross-entropy has better gradient properties for softmax outputs—gradients don’t saturate when predictions are confident. MSE punishes confident correct predictions.

Hints in Layers

If you’re stuck, reveal these hints one at a time:

Hint 1 - Structuring Your Code: Create a Layer class with forward(x) and backward(grad) methods. Each layer stores its weights and caches inputs needed for the backward pass. The network is just a list of layers.

Hint 2 - The Backward Pass Pattern: For each layer, the backward pass receives the gradient of the loss with respect to its output. It must compute:

Gradient with respect to its weights (to update them)
Gradient with respect to its input (to pass to the previous layer)

For a linear layer y = Wx + b:

dL/dW = dL/dy @ x^T
dL/dx = W^T @ dL/dy
dL/db = sum of dL/dy across batch dimension

Hint 3 - Numerical Gradient Checking: Verify your backprop by computing numerical gradients: for each weight w, compute (L(w+eps) - L(w-eps)) / (2*eps). Compare to your analytical gradient. They should match to ~1e-5.

Hint 4 - Common Bugs:

Forgetting to transpose when computing weight gradients
Not caching the pre-activation values needed for activation gradients
Updating weights before computing all gradients (update all at once at the end)
Off-by-one errors in batch dimensions

Books That Will Help

Topic	Book	Specific Chapter
Matrix multiplication as transformation	“Math for Programmers” by Paul Orland	Chapter 4: Transforming vectors with matrices
Gradient descent fundamentals	“Hands-On Machine Learning” by Aurelien Geron	Chapter 4: Training Models
Backpropagation intuition	3Blue1Brown Neural Networks series	Episode 3: What is backpropagation?
Matrix calculus for deep learning	“Hands-On Machine Learning” by Aurelien Geron	Chapter 10: Introduction to ANNs
Weight initialization	“Hands-On Machine Learning” by Aurelien Geron	Chapter 11: Training Deep Neural Networks
Visual understanding of neural networks	3Blue1Brown Neural Networks series	All 4 episodes (YouTube)
Practical implementation patterns	“Deep Learning from Scratch” by Seth Weidman	Chapters 1-4

Project 4: Physics Simulation with Constraints

📖 View Detailed Guide →

📖 Detailed Implementation Guide →

File: LINEAR_ALGEBRA_LEARNING_PROJECTS.md
Programming Language: C
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 4: Expert
Knowledge Area: Physics Simulation / Linear Algebra
Software or Tool: Physics Engine
Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron (Mathematical foundations)

What you’ll build: A 2D physics engine that simulates rigid body dynamics with constraints (springs, joints, collisions)—requiring you to solve systems of linear equations each frame.

Why it teaches linear algebra: Physics simulation requires solving Ax = b every frame. Collision response requires projections. Constraint solving requires understanding null spaces and least squares. You’ll see linear algebra as the engine of physical reality.

Core challenges you’ll face:

Representing position and velocity as vectors (maps to vectors as state)
Solving systems of equations for constraint forces (maps to solving Ax = b)
Detecting and resolving collisions via projections (maps to projections and dot products)
Implementing stable integration (maps to matrix conditioning)
Handling over/under-constrained systems (maps to rank, null space, least squares)

Key Concepts:

Vectors as physical quantities: “Math for Programmers” by Paul Orland - Chapter 2
Solving linear systems: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Mathematical foundations
Projections and orthogonality: “3Blue1Brown: Essence of Linear Algebra” - Episode on Dot Products
Numerical methods: “Numerical Recipes in C” by Press et al. - Chapter 2 (Gaussian elimination)

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Basic physics intuition, matrix operations

Real world outcome: You’ll see balls bouncing, chains swinging, and objects stacking realistically on screen. When you add a constraint, you’re adding an equation. When things explode (they will at first), you’ll debug by examining matrix conditioning—making abstract stability concepts visceral.

Learning milestones:

Simulate unconstrained motion (gravity, velocity) → You understand vectors as state
Add collision detection and response → You understand projections and dot products
Implement spring/joint constraints → You understand solving systems of equations
Handle stability and stacking → You understand matrix conditioning and numerical stability

The Core Question You’re Answering

“How do we make objects in a computer simulation obey the laws of physics by solving systems of equations?”

This question cuts to the heart of physics simulation. In the real world, physics “just happens” - gravity pulls, objects collide, constraints hold. In a computer, every physical interaction becomes a mathematical equation that must be solved.

The connection to linear algebra is direct and unavoidable:

Newton’s laws become F = ma, which in matrix form is M * a = F (mass matrix times acceleration equals force)
Constraints (like “these two points must stay connected”) become linear equations that restrict motion
Collision response requires solving for impulses that simultaneously satisfy multiple contact conditions
Each simulation frame requires solving Ax = b where A encodes the physics, x is what you’re solving for (velocities, forces), and b contains external forces and constraints

When you stack boxes, your computer isn’t “knowing” they should stack - it’s solving a system of equations every 1/60th of a second that computes the exact forces keeping them balanced. Linear algebra IS the physics engine.

Concepts You Must Understand First

Before diving into implementation, ensure you can answer these questions:

1. Vectors as Physical State (Position, Velocity, Acceleration)

How do we represent a particle’s complete state as a vector?
Why do we often combine position and velocity into a single “state vector”?
What’s the relationship between the derivative of position and velocity?
Reference: “Math for Programmers” by Paul Orland - Chapter 2 (Vectors for Game Development)

2. Solving Systems of Linear Equations (Ax = b)

What does it mean geometrically to solve Ax = b?
When does a unique solution exist? When infinite solutions? When none?
What’s the computational cost of naive Gaussian elimination vs. optimized methods?
How do you implement LU decomposition, and why is it useful for repeated solves?
Reference: “Numerical Recipes in C” by Press et al. - Chapter 2 (Solution of Linear Algebraic Equations)

3. Projections and Dot Products

How do you project one vector onto another? What’s the formula?
What does a negative dot product tell you about the angle between vectors?
How do projections help decompose a velocity into “along surface” and “into surface” components?
Why is the reflection formula v' = v - 2(v . n)n?
Reference: “3Blue1Brown: Essence of Linear Algebra” - Episode 9 (Dot Products)

4. Matrix Conditioning and Numerical Stability

What is a condition number, and why does it matter?
Why do poorly conditioned matrices cause “explosions” in physics simulations?
What’s the difference between explicit and implicit integration, and why does implicit help stability?
How do you detect when your matrix is becoming ill-conditioned?
Reference: “Numerical Recipes in C” by Press et al. - Chapter 2.6 (Singular Value Decomposition)

5. Null Space and Constraint Solving

What is the null space of a matrix?
Why do over-constrained systems have no solution?
What does least-squares give you when the system is inconsistent?
How do you use the pseudo-inverse to find “best fit” solutions?
Reference: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Mathematical foundations sections

Questions to Guide Your Design

As you architect your physics engine, keep these questions in mind:

State Representation: How will you organize your state vector? All positions first, then all velocities? Or interleaved per-object? How does this choice affect matrix sparsity?
Constraint Formulation: When you say “point A must stay 2 meters from point B”, how do you express this as a linear (or linearized) equation? What’s the Jacobian of this constraint?
Integration Strategy: Will you use explicit Euler (simple but unstable), semi-implicit Euler (better), or fully implicit (stable but requires solving systems)? What trade-offs does each bring?
Collision Detection vs. Response: How do you separate “finding when/where things collide” from “computing what forces/impulses result”? Why is this separation architecturally important?
Stacking and Resting Contact: When objects are resting on each other, how do you prevent them from slowly sinking through the floor? How does this relate to complementarity and inequality constraints?
Performance vs. Accuracy: At 60 FPS, you have ~16ms per frame. How do you balance the accuracy of your linear solver against real-time requirements? When is an approximate solution acceptable?

Thinking Exercise

Before coding, solve this by hand to build intuition:

Problem: Two particles in 2D:

Particle A at position (0, 2) with velocity (1, 0) and mass 1
Particle B at position (2, 2) with velocity (-1, 0) and mass 2

They collide elastically. Compute the post-collision velocities.

Step 1: Write the collision normal vector (the direction connecting the centers at collision)

Step 2: Decompose each particle’s velocity into components parallel and perpendicular to the collision normal

Step 3: Apply conservation of momentum and kinetic energy to find post-collision velocities along the collision normal

Step 4: Combine with unchanged perpendicular components to get final velocities

Expected Answer: Work through the math. You should find that momentum and energy are conserved, and the relative velocity along the normal reverses.

Extension: Now add a constraint: particles A and B are connected by a rigid rod of length 2. Write the constraint equation. What’s the Jacobian? If particle A is pulled downward by gravity, what force does the constraint exert on B?

The Interview Questions They’ll Ask

Physics simulation knowledge is highly valued in games, VFX, robotics, and scientific computing. Be prepared for:

“Walk me through how you’d implement a basic physics engine from scratch.”
- They want to hear: integration scheme, collision detection broad/narrow phase, constraint solving, stability considerations
“What’s the difference between position-based and velocity-based constraint solving?”
- Discuss Jacobian-based methods vs. position projection, stability trade-offs, and use cases
“Your simulation explodes when you stack 10 boxes. How do you debug this?”
- Talk about condition numbers, time step size, penetration depths, solver iterations, and warmstarting
“How would you handle 10,000 particles efficiently?”
- Discuss spatial partitioning, sparse matrices, parallel constraint solving, and SIMD vectorization
“Explain the Jacobian matrix in the context of constraint dynamics.”
- It maps from generalized velocities to constraint velocities; its transpose maps constraint forces back to generalized forces
“What’s the difference between impulse-based and force-based collision response?”
- Impulses change velocity instantly (good for discrete collisions); forces integrate over time (better for continuous contact)
“How do you handle friction in a physics engine?”
- Discuss Coulomb friction model, friction cone, linearized friction pyramids, and why friction makes the problem non-linear
“Your simulation has to run in real-time but also be deterministic for network multiplayer. What constraints does this impose?”
- Fixed time steps, deterministic floating point (or fixed point), careful ordering of operations

Hints in Layers

If you’re stuck, reveal these one at a time:

Layer 1 - Architecture Hint: Start with the simplest possible system: one particle, gravity only, Euler integration. Get this working perfectly before adding anything else. Then add a second particle and collision detection. Only then add constraints.

Layer 2 - Integration Hint: Explicit Euler will make your simulation unstable at high speeds or with stiff constraints. Switch to semi-implicit (symplectic) Euler: update velocity first, then use the NEW velocity to update position. This one change dramatically improves stability.

Layer 3 - Constraint Solving Hint: Don’t try to solve all constraints simultaneously at first. Use iterative constraint solving: for each constraint, compute and apply the correction, then move to the next. Repeat multiple times per frame. This is how most real-time physics engines work (Sequential Impulses algorithm).

Layer 4 - Stability Hint: Add “Baumgarte stabilization” to prevent constraint drift: instead of just satisfying the velocity constraint, add a term proportional to position error. The constraint becomes J * v = -beta * C where C is the constraint position error and beta is a small constant (try 0.1 - 0.3).

Books That Will Help

Topic	Book	Specific Chapters
Vector math foundations	“Math for Programmers” by Paul Orland	Ch 2: Vectors, Ch 3: Transformations
Solving linear systems	“Numerical Recipes in C” by Press et al.	Ch 2: Linear Algebraic Equations (2.1-2.6)
Numerical stability	“Numerical Recipes in C” by Press et al.	Ch 2.6: SVD, Ch 2.9: Condition Number
Systems programming	“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron	Ch 2: Data Representation, Ch 5: Optimization
Game physics specifically	“Game Physics Engine Development” by Ian Millington	Part III: Rigid Body Physics
Advanced constraint dynamics	“Physically Based Modeling” by David Baraff (SIGGRAPH Course Notes)	All sections (free online)

Project 5: Recommendation Engine with Matrix Factorization

📖 View Detailed Guide →

📖 Detailed Implementation Guide →

File: LINEAR_ALGEBRA_LEARNING_PROJECTS.md
Programming Language: Python (or C)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Machine Learning / Linear Algebra
Software or Tool: Matrix Factorization
Main Book: “Data Science for Business” by Provost & Fawcett

What you’ll build: A movie/music recommendation system that uses matrix factorization to predict user preferences from sparse ratings data.

Why it teaches linear algebra: Recommendation systems treat users and items as vectors in a latent space. Factorizing the ratings matrix reveals hidden structure—what “dimensions” of taste exist. This is applied linear algebra for real business value.

Core challenges you’ll face:

Representing the user-item ratings as a sparse matrix (maps to matrix representation)
Factorizing into user and item embedding matrices (maps to matrix factorization)
Minimizing reconstruction error (maps to optimization and Frobenius norm)
Handling missing values (maps to sparse matrices and regularization)
Finding similar users/items via vector similarity (maps to dot products and cosine similarity)

Key Concepts:

Matrix factorization intuition: “Data Science for Business” by Provost & Fawcett - Chapter on recommendations
Optimization and loss functions: “Hands-On Machine Learning” by Aurélien Géron - Chapter 4
Similarity metrics: “Math for Programmers” by Paul Orland - Chapter 6

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic matrix operations, some optimization intuition

Real world outcome: You input your movie ratings, and the system recommends films you’ll likely enjoy. You can inspect the learned “latent factors”—discovering that one dimension might correspond to “artsy vs. blockbuster” and another to “action vs. romance.” Abstract vectors become interpretable preferences.

Learning milestones:

Build the ratings matrix from data → You understand sparse matrix representation
Implement matrix factorization → You understand low-rank approximation
Predict missing ratings → You understand reconstruction from factors
Find similar items via embeddings → You understand vector similarity in learned spaces

The Core Question You’re Answering

“How can we discover hidden patterns in user preferences by treating ratings as a matrix to be factorized?”

The key insight behind recommendation engines is that user preferences aren’t random—they follow hidden patterns. When you rate movies, you’re not evaluating each film independently. Instead, you’re responding to underlying factors: how much action is there? Is it emotionally complex? Is it visually stunning?

Latent factors are these hidden dimensions that explain why users like what they like. Matrix factorization works by assuming that the ratings matrix R (users x items) can be approximated as the product of two smaller matrices:

R ≈ U × V^T

Where:

U is the user matrix (users x k latent factors) – each row is a user’s “taste profile”
V is the item matrix (items x k latent factors) – each row is an item’s “characteristic profile”
k is the number of latent factors (you choose this)

The magic: you never define what these factors mean. The algorithm discovers them by finding U and V that best reconstruct the known ratings. After training, you might discover that factor 1 corresponds to “action vs. drama” and factor 2 to “mainstream vs. indie”—but the math found this structure automatically.

To predict how user i would rate item j: compute the dot product of user i’s row in U with item j’s row in V. This measures how well the user’s preferences align with the item’s characteristics.

Concepts You Must Understand First

Before building a recommendation engine, ensure you can answer these foundational questions:

1. Sparse Matrices and Their Representation

What makes a matrix “sparse” and why does it matter for ratings data?
How do COO (coordinate), CSR (compressed sparse row), and dictionary representations differ?
Why can’t you just use a regular 2D array for millions of users and items?
Book Reference: “Data Science for Business” by Provost & Fawcett - Chapter 3 (Data representation)

2. Matrix Factorization as Finding Hidden Factors

Why does R ≈ UV^T work? What assumptions does this make about the data?
What does the rank of the approximation (k) control?
How is this related to SVD, and how does it differ?
Book Reference: “Hands-On Machine Learning” by Aurelien Geron - Chapter 8 (Dimensionality Reduction)

3. Frobenius Norm and Reconstruction Error

How do you measure how well UV^T approximates R?
What is the Frobenius norm and why is it the natural choice?
How do you handle the error calculation when most of R is unknown?
Book Reference: “Math for Programmers” by Paul Orland - Chapter 6 (Working with matrices)

4. Vector Similarity: Dot Product and Cosine

What does the dot product of two vectors tell you geometrically?
When should you use dot product vs. cosine similarity?
How do you find the most similar items/users given their embeddings?
Book Reference: “Math for Programmers” by Paul Orland - Chapter 3 (Vectors and dot products)

5. Optimization and Gradient Descent

How do you find U and V that minimize reconstruction error?
What is stochastic gradient descent and why use it here?
What is regularization and why does it prevent overfitting?
Book Reference: “Hands-On Machine Learning” by Aurelien Geron - Chapter 4 (Training Linear Models)

Questions to Guide Your Design

Before writing code, work through these design questions:

Data structure: Your ratings matrix might have 100,000 users and 50,000 movies, but only 0.1% of entries are filled. How will you represent this efficiently? What operations need to be fast?
Initialization: U and V start with random values. Does initialization matter? What happens if you start with all zeros? All ones? Very large values?
Training strategy: Should you process ratings one at a time (SGD), in mini-batches, or all at once? What are the tradeoffs in terms of speed, memory, and convergence?
Handling bias: Some users rate everything high. Some movies are universally loved. How do you separate these global biases from the latent factor interactions?
Cold start: A new user has no ratings. A new movie has no ratings. What can your system recommend? What are the options for handling this?
Evaluation: How do you measure if your recommendations are good? What metrics make sense? How do you split data for training vs. testing when order matters?

Thinking Exercise

Work through this tiny example by hand to build intuition:

The Setup: You have 3 users rating 4 movies on a 1-5 scale:

            Movie A  Movie B  Movie C  Movie D
User 1         5        4        ?        1
User 2         ?        3        4        2
User 3         4        ?        5        ?

Your Task: Find 2 latent factors (k=2) that explain this pattern.

Step 1: Initialize random user and item matrices:

U (3 users x 2 factors):        V (4 items x 2 factors):
    [0.5, 0.3]                      [0.6, 0.2]   (Movie A)
    [0.4, 0.7]                      [0.5, 0.4]   (Movie B)
    [0.6, 0.4]                      [0.3, 0.8]   (Movie C)
                                    [0.2, 0.5]   (Movie D)

Step 2: Predict User 1’s rating for Movie A:

User 1’s factors: [0.5, 0.3]
Movie A’s factors: [0.6, 0.2]
Prediction = 0.5 x 0.6 + 0.3 x 0.2 = 0.30 + 0.06 = 0.36

This is way off from the actual rating of 5! The error is 5 - 0.36 = 4.64.

Step 3: Update using gradient descent (learning rate = 0.1):

Adjust User 1’s factors to be more like Movie A’s (scaled by error)
Adjust Movie A’s factors to be more like User 1’s (scaled by error)

New User 1 factors: [0.5 + 0.1 x 4.64 x 0.6, 0.3 + 0.1 x 4.64 x 0.2] = [0.78, 0.39] New Movie A factors: [0.6 + 0.1 x 4.64 x 0.5, 0.2 + 0.1 x 4.64 x 0.3] = [0.83, 0.34]

Step 4: Repeat for all known ratings, many times (epochs).

Questions to ponder:

After training, what might the 2 latent factors represent for this data?
Can you predict User 1’s rating for Movie C using the learned factors?
Why might User 3 like Movie C (rating 5) but User 1 rate it unknown?

The Interview Questions They’ll Ask

Prepare for these common interview questions about recommendation systems:

Explain the cold-start problem and three different approaches to solve it. (Tests: Understanding of practical limitations and creative problem-solving)
You have 1 billion ratings. Walk me through how you’d train a matrix factorization model at scale. (Tests: Systems thinking, distributed computing awareness, practical ML engineering)
What’s the difference between collaborative filtering and content-based recommendations? When would you use each? (Tests: Breadth of knowledge about recommendation approaches)
How would you explain to a product manager why we can’t just recommend the highest-rated items to everyone? (Tests: Communication skills, understanding of personalization value)
Your recommendation system keeps suggesting the same popular items. How would you increase diversity? (Tests: Understanding of exploration vs. exploitation, practical system design)
How is matrix factorization related to neural network embeddings? Could you use a neural network instead? (Tests: Connections between classical and modern ML, depth of understanding)
A user rates a movie 5 stars, then changes it to 1 star. How does your system handle this? (Tests: Practical thinking about real-world data issues)
What evaluation metrics would you use, and why is offline evaluation different from online A/B testing? (Tests: Understanding of ML evaluation, practical experience)

Hints in Layers

Use these hints progressively—try the next step yourself before revealing the next hint.

Layer 1 - Getting Started: Start with a tiny dataset you can verify by hand. Use the MovieLens 100K dataset (100,000 ratings from 1,000 users on 1,700 movies). Store ratings as a dictionary: {(user_id, item_id): rating}. Don’t worry about efficiency yet.

Layer 2 - First Implementation: Implement basic SGD: for each known rating, compute the predicted rating (dot product), calculate error, update the user and item vectors. Use a small learning rate (0.01) and watch the training loss decrease. Add bias terms for users and items.

Layer 3 - Making It Work: Add L2 regularization to prevent overfitting (multiply factors by (1 - lambda) before updates, or add lambda x ||U||^2 + lambda x ||V||^2 to your loss). Split data into train/test by time or randomly. Compute RMSE on test set to measure generalization.

Layer 4 - Production Considerations: For scale: use NumPy vectorization, consider Alternating Least Squares (ALS) which parallelizes better than SGD. For recommendations: precompute top-k similar items for each item. For cold start: fall back to popularity-based recommendations, or use side information (movie genres, user demographics).

Books That Will Help

Topic	Book	Specific Chapter/Section
Business intuition for recommendations	“Data Science for Business” by Provost & Fawcett	Chapter 9: Similarity, Neighbors, and Clusters
Matrix factorization techniques	“Hands-On Machine Learning” by Aurelien Geron	Chapter 8: Dimensionality Reduction (PCA/SVD sections)
Vector operations and similarity	“Math for Programmers” by Paul Orland	Chapter 3: Vectors (dot product), Chapter 6: Matrices
Optimization fundamentals	“Hands-On Machine Learning” by Aurelien Geron	Chapter 4: Training Models (gradient descent)
Practical recommendation systems	“Programming Collective Intelligence” by Toby Segaran	Chapter 2: Making Recommendations
Deep learning approaches	“Deep Learning” by Goodfellow, Bengio, Courville	Chapter 15: Representation Learning (embeddings)
Collaborative filtering theory	“Mining of Massive Datasets” by Leskovec, Rajaraman, Ullman	Chapter 9: Recommendation Systems (free online)

Video Resources:

Andrew Ng’s Machine Learning course (Coursera) - Week 9: Recommender Systems
Google’s Machine Learning Crash Course - Embeddings module
3Blue1Brown: Essence of Linear Algebra - Episode on dot products (geometric intuition)

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor	Primary Concepts
3D Renderer	Intermediate	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Transformations, matrices, projections
SVD Image Compression	Intermediate	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐	Decomposition, eigenvalues, rank
Neural Network	Intermediate-Advanced	2-3 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐	Matrix multiplication, gradients
Physics Simulation	Advanced	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Systems of equations, projections
Recommendation Engine	Intermediate	1-2 weeks	⭐⭐⭐	⭐⭐⭐	Factorization, similarity, sparse matrices

Recommendation

Start with the 3D Renderer.

Here’s why:

Visual feedback loop: Every matrix operation produces a visible result. Mess up a rotation matrix? The cube warps. Get it right? It spins smoothly. This immediate feedback burns the concepts into your brain.
Forces foundational understanding: You can’t skip ahead. You must understand vectors before transformations, transformations before composition, composition before projection.
Historical authenticity: This is how linear algebra was developed for computers—to solve graphics problems. You’re learning it the way it was meant to be learned.
Gateway to everything else: Once you understand transformation matrices, neural networks (just more transformations) and physics (just vectors and constraints) become natural extensions.

Supplementary resource to work through in parallel: Watch 3Blue1Brown’s “Essence of Linear Algebra” series (YouTube, ~3 hours total). It provides the geometric intuition that textbooks often lack. Watch one episode, then implement that concept in your renderer.

Final Capstone Project: Real-Time Ray Tracer with Global Illumination

📖 Detailed Implementation Guide →

What you’ll build: A ray tracer that renders photorealistic 3D scenes by simulating light bouncing through the environment, with reflections, refractions, and soft shadows—all from pure linear algebra.

Why it teaches the complete picture: This project synthesizes EVERYTHING:

Vectors: Rays are origin points + direction vectors
Dot products: Determine angles for lighting calculations
Cross products: Compute surface normals
Transformations: Object positioning and camera movement
Systems of equations: Ray-object intersection
Matrix inverses: Transforming rays into object space
Projections: Reflection and refraction vectors
Orthonormal bases: Building coordinate frames for sampling

Core challenges you’ll face:

Computing ray-sphere and ray-plane intersections (maps to solving quadratic/linear systems)
Implementing reflection and refraction vectors (maps to projections and vector decomposition)
Building camera transformation matrices (maps to change of basis)
Computing surface normals for arbitrary geometry (maps to cross products and normalization)
Implementing Monte Carlo sampling for soft effects (maps to basis construction and hemispheres)

Key Concepts:

Ray-geometry intersection: “Computer Graphics from Scratch” by Gabriel Gambetta - Chapters on Raytracing
Reflection/refraction physics: “Physically Based Rendering” by Pharr & Humphreys - Chapter 8
Transformation matrices: “Math for Programmers” by Paul Orland - Chapter 5
Monte Carlo methods: “Physically Based Rendering” by Pharr & Humphreys - Chapter 2

Difficulty: Advanced Time estimate: 1-2 months Prerequisites: Complete at least the 3D Renderer project first

Real world outcome: You will generate genuinely photorealistic images that demonstrate mastery of every linear algebra concept in this guide. Here is exactly what you will see:

Your first render: A simple scene with colored spheres on a gray plane. The spheres appear solid and 3D because your dot product calculations create proper diffuse shading - brighter where surfaces face the light, darker on the opposite side.
Adding reflections: A chrome sphere now mirrors its environment. You see the checkered floor reflected and curved on the sphere’s surface. The distortion is mathematically correct because your reflection vector formula (r = d - 2(d.n)n) projects the incoming ray onto the surface normal.
Glass and refraction: A transparent glass ball bends the scene behind it like a magnifying lens. The caustic patterns and inverted image inside the sphere emerge naturally from Snell’s law implemented as vector operations.
Soft shadows: Area lights produce penumbras - shadows that fade gradually at their edges. Multiple shadow rays sample the light source, and the percentage that reach it determines shadow softness.
Global illumination: Colors bleed between surfaces. A red wall tints the white floor beside it pink. Light bounces multiple times through the scene, each bounce a recursive ray cast with proper attenuation.
Final showcase render: A complex scene with multiple material types - brushed metal, clear glass, matte plastic, and textured surfaces. The image is indistinguishable from a photograph. Every pixel required hundreds of linear algebra operations: ray-object intersections (solving quadratic equations), lighting calculations (dot products), reflections (vector projections), and coordinate transformations (matrix multiplications).

You can render animations by interpolating camera matrices between keyframes, watching your mathematical light simulation come alive.

Learning milestones:

Cast rays and detect sphere intersections → You’ve mastered vector equations
Add Phong shading with proper lighting → You understand dot products as projection
Implement reflections and refractions → You understand vector decomposition
Add transforms for camera and objects → You understand matrix inverses and change of basis
Implement soft shadows and global illumination → You understand sampling in 3D spaces

The Core Question You’re Answering

“How can we simulate the physics of light using nothing but linear algebra to create photorealistic images?”

This capstone synthesizes every linear algebra concept you have learned. Ray tracing is not just a graphics technique - it is applied linear algebra in its purest form. Every ray is a parametric line equation. Every intersection is a system of equations being solved. Every reflection is a vector projection. Every transformation between coordinate spaces requires matrix inverses. Every sampling of a hemisphere requires constructing orthonormal bases.

When you build a ray tracer, you are proving to yourself that linear algebra is not abstract mathematics - it is the mathematical machinery that simulates physical reality. Light follows the same vector equations you learned in Project 1. The difference is that now you are composing all those concepts together to simulate something that looks indistinguishable from a photograph.

Concepts You Must Understand First

Before attempting this capstone, ensure you have internalized these six foundational concepts:

Concept	Why It Matters for Ray Tracing	Where You Learned It
Ray representation	A ray is origin + t * direction. Every pixel requires casting a ray, and every intersection requires finding the t value where the ray hits geometry.	Project 1 (3D Renderer)
Ray-geometry intersection	Finding where a ray hits a sphere requires solving a quadratic equation. Finding where it hits a plane requires solving a linear equation. These are systems of equations in disguise.	Project 4 (Physics Simulation)
Reflection and refraction vectors	The reflection formula r = d - 2(d.n)n is a vector projection. Refraction uses Snell’s law converted to vector form. These determine where light bounces.	Project 1 (Projections)
Transformation matrices	Objects live in their own coordinate space. Cameras have their own space. You must transform rays between spaces using 4x4 matrices and their inverses.	Project 1 (3D Renderer)
Orthonormal bases	Sampling a hemisphere for soft shadows or global illumination requires building a local coordinate frame from a single normal vector using cross products.	Project 1 (Cross Products)
Matrix inverses	To transform a ray into object space, you need the inverse of the object’s transformation matrix. Singular matrices will crash your renderer.	All previous projects

Questions to Guide Your Design

Before writing code, work through these design questions on paper:

How will you represent a ray mathematically? A ray is not just two points - it is an origin and a direction. What data structure captures this? Why is the direction normalized?
What happens when a ray hits nothing? Your scene is finite but rays extend to infinity. How do you handle the background? What color does “nothing” return?
How do you find the closest intersection? A ray might hit multiple objects. How do you efficiently find which one is closest to the camera? What data structure tracks candidate intersections?
Where does recursion enter? Reflections require casting new rays. Refractions require casting new rays. Global illumination requires casting many new rays. How deep should recursion go? What terminates it?
How do you handle numerical precision? When you compute an intersection point and cast a shadow ray from it, floating-point errors might cause the ray to intersect the same surface it originated from. How do you prevent “shadow acne”?
What coordinate space should lighting calculations happen in? World space? Camera space? Object space? Each choice has tradeoffs. What happens if you mix them incorrectly?
How do you sample area lights for soft shadows? A point light creates hard shadows. An area light requires sampling multiple points. How do you distribute those samples? Why does randomization help?

Thinking Exercise

Before coding, trace through this scenario by hand:

Scene: A single reflective sphere with center at (0, 0, 5) and radius 1, with a point light at (-5, 5, 0).

Camera: Located at origin (0, 0, 0), looking down the negative Z axis.

Task: Trace a ray through pixel (320, 240) in a 640x480 image.

Work through these calculations:

Convert pixel to ray direction: Assuming a 90-degree field of view, what is the direction vector for this ray? (Hint: pixel (320, 240) is the center of the image, so the ray should point straight down Z.)
Find the intersection: Substitute the ray equation into the sphere equation (x^2 + y^2 + (z-5)^2 = 1). What quadratic do you get? What are the two t values? Which one is the “front” intersection?
Compute the intersection point: Using the smaller positive t, what is the 3D point where the ray hits the sphere?
Compute the surface normal: At the intersection point, the normal is (intersection_point - sphere_center) / radius. Calculate this unit vector.
Compute the reflected ray: Using r = d - 2(d.n)n, where d is the incoming ray direction and n is the normal, what is the reflected direction?
Trace the shadow ray: From the intersection point toward the light at (-5, 5, 0), does this ray hit any geometry before reaching the light? If not, the point is illuminated.

This exercise takes 15-20 minutes with pencil and paper, but it will clarify exactly what your code needs to do.

The Interview Questions They’ll Ask

Ray tracing interviews probe deep linear algebra understanding. Be prepared for:

“Derive the ray-sphere intersection formula from first principles.” Start with

P - C

^2 = r^2, substitute P = O + tD, expand, and show it becomes a quadratic in t. Know what the discriminant means geometrically.

“How do you compute a reflection vector?” Explain the projection interpretation: you decompose the incident vector into components parallel and perpendicular to the normal, then negate the parallel component.
“Explain Snell’s law in vector form.” Show how n1 * sin(theta1) = n2 * sin(theta2) becomes a vector equation involving dot products and cross products.
“Why do we use 4x4 matrices for 3D transformations?” Explain homogeneous coordinates and how the fourth row/column enables translation to be represented as matrix multiplication.
“How would you transform a ray from world space to object space?” Explain that you multiply the ray origin by the inverse model matrix, and the ray direction by the inverse transpose (or just the inverse for uniform scaling).
“What causes shadow acne and how do you fix it?” Explain floating-point precision issues at intersection points and the epsilon offset technique.
“How do you build an orthonormal basis from a single normal vector?” Describe the algorithm: pick a non-parallel vector, use cross products to generate two perpendicular tangent vectors.
“What is the relationship between ray tracing and the rendering equation?” Explain how ray tracing approximates the integral over the hemisphere by sampling, and how this relates to Monte Carlo integration.
“Why do reflections on curved surfaces look distorted?” Explain that the normal varies across a curved surface, so reflected rays diverge differently than they would from a flat mirror.
“How would you accelerate ray tracing for complex scenes?” Discuss spatial data structures (BVH, octrees, k-d trees) and how they reduce intersection tests from O(n) to O(log n).

Hints in Layers

Use these hints progressively - try each level before moving to the next:

Layer 1 - Starting Point: Begin with a single sphere and a single point light. No reflections, no refractions, no shadows. Just get diffuse shading working: color = max(0, dot(normal, light_direction)). If this works, you understand ray-sphere intersection and dot products for lighting.

Layer 2 - Adding Shadows: Before computing lighting, cast a shadow ray from the intersection point toward the light. If it hits any geometry, skip the lighting contribution. Watch for shadow acne - offset your shadow ray origin slightly along the normal.

Layer 3 - Reflections: For reflective materials, compute the reflection direction using r = d - 2(d.n)n, cast a new ray, and blend the result with the surface color. Limit recursion depth to prevent infinite loops between parallel mirrors.

Layer 4 - Refractions: Implement Snell’s law in vector form. Handle total internal reflection when the angle is too steep. Track whether the ray is entering or exiting the material by checking if the normal and ray direction are aligned.

Layer 5 - Transformations: Store objects in their own local coordinates. Build a model matrix that positions/rotates/scales them in world space. Transform rays into object space for intersection tests, then transform normals back to world space for lighting. Remember: normals transform by the inverse transpose.

Books That Will Help

Topic	Book	Specific Chapters	Why This Resource
Ray tracing fundamentals	“Computer Graphics from Scratch” by Gabriel Gambetta	Part II: Raytracing (Chapters 2-4)	Step-by-step derivation of ray-sphere intersection, lighting models, and shadows with clear mathematical explanations
Production ray tracing	“Physically Based Rendering” by Pharr, Jakob & Humphreys	Chapters 2-4, 8-9	The definitive reference used by film studios. Deep coverage of Monte Carlo methods, material models, and acceleration structures
Geometric foundations	“Math for Programmers” by Paul Orland	Chapters 3-5 (Vectors, Transformations)	Builds intuition for vector operations and matrix transformations with code examples
Mathematical physics of light	“Real-Time Rendering” by Akenine-Moller et al.	Chapter 9 (Global Illumination)	Bridges ray tracing theory with practical implementation, including the rendering equation
Linear algebra review	“3Blue1Brown: Essence of Linear Algebra” (YouTube)	All episodes, especially 3, 7, 9	Visual intuition for transformations, dot products, and cross products that makes the math click
Numerical stability	“Numerical Recipes” by Press et al.	Chapters 2-3	Understanding when floating-point math fails and how to write robust geometric code
Advanced sampling	“Physically Based Rendering” by Pharr, Jakob & Humphreys	Chapters 13-14 (Monte Carlo Integration)	When you are ready for soft shadows and global illumination, this explains the mathematics of sampling

Getting Started Today

Set up: Choose a language (C is great for understanding, Python for speed of iteration)
Watch: 3Blue1Brown Episode 1 - “Vectors, what even are they?”
Build: Start the 3D Renderer—draw a single projected point on screen
Iterate: Each day, add one transformation. See it work. Understand why.

The goal isn’t to rush through projects—it’s to reach the point where you see matrices as transformations, feel eigenvectors as natural axes, and think in terms of vector spaces. Building makes this happen in a way that reading never can.

Learning Linear Algebra Through Building

Goal

Why Linear Algebra Matters

The Core Insight: Everything is a Transformation

Composing Transformations

The Big Picture: What Linear Algebra Really Is

Vectors: Points and Directions in Space

Matrices: Machines That Transform Space

Eigenvalues and Eigenvectors: The Natural Directions

Matrix Decompositions: Breaking Down Transformations

Core Concept Analysis

Concept Summary Table

Deep Dive Reading by Concept

Recommended Books

Concept-to-Chapter Mapping

Vectors and Vector Operations

Matrix Transformations

Matrix Multiplication and Composition

Systems of Linear Equations

Determinants

Eigenvalues and Eigenvectors

Matrix Decompositions (SVD, etc.)

Projections and Orthogonality

Recommended Reading Order

Project 1: Software 3D Renderer from Scratch

The Core Question You’re Answering

Concepts You Must Understand First

Questions to Guide Your Design

Thinking Exercise: Trace a Point Through the Pipeline

The Interview Questions They’ll Ask

Hints in Layers

Books That Will Help

Project 2: Image Compression Using SVD

The Core Question You’re Answering

Concepts You Must Understand First

Questions to Guide Your Design

Thinking Exercise: Manual SVD on a 3x3 Matrix

The Interview Questions They’ll Ask

Hints in Layers

Books That Will Help

Project 3: Simple Neural Network from Scratch

The Core Question You’re Answering

Concepts You Must Understand First

Questions to Guide Your Design

Thinking Exercise

The Interview Questions They’ll Ask

Hints in Layers

Books That Will Help

Project 4: Physics Simulation with Constraints

The Core Question You’re Answering

Concepts You Must Understand First

Questions to Guide Your Design

Thinking Exercise

The Interview Questions They’ll Ask

Hints in Layers

Books That Will Help

Project 5: Recommendation Engine with Matrix Factorization

The Core Question You’re Answering

Concepts You Must Understand First

Questions to Guide Your Design

Thinking Exercise

The Interview Questions They’ll Ask

Hints in Layers

Books That Will Help

Project Comparison Table

Recommendation

Final Capstone Project: Real-Time Ray Tracer with Global Illumination

The Core Question You’re Answering

Concepts You Must Understand First

Questions to Guide Your Design

Thinking Exercise

The Interview Questions They’ll Ask

Hints in Layers

Books That Will Help

Getting Started Today