LEARN AI RESEARCH DEEP DIVE

Learn AI Research: From First Principles to The Frontier

Goal: To cultivate the mind and skills of an AI researcher. This path goes beyond just using AI tools; it’s about understanding them from first principles, replicating groundbreaking research, and ultimately, creating new knowledge.

Why Pursue AI Research?

Being an AI user is about applying existing models. Being an AI researcher is about asking “why” and “what if.” It’s the difference between driving a car and designing a new engine. The goal of research is not just to solve problems, but to understand the universe of computation, intelligence, and learning. It’s a journey to the very edge of what is known.

After following this path, you will be able to:

Read, understand, and critique modern AI research papers.
Implement complex models from scratch, not just as a user of a library.
Design, run, and analyze experiments with scientific rigor.
Formulate novel research questions and hypotheses.
Contribute meaningfully to the field, whether in academia or an industrial research lab (like DeepMind, FAIR, or OpenAI).

Core Knowledge Areas

An AI researcher stands on three pillars. This curriculum is designed to build them not in sequence, but in parallel, through hands-on projects.

The Three Pillars of AI Research

┌─────────────────────────────────────────────────────────────────────────┐
│                              AI RESEARCH                                │
│                                                                         │
│     The discipline of creating new knowledge and capabilities in        │
│          artificial intelligence through scientific inquiry.            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
                                 │
          ┌──────────────────────┼──────────────────────┐
          ▼                      ▼                      ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│   MATHEMATICS    │  │ COMPUTER SCIENCE │  │ SCIENTIFIC METHOD│
│                  │  │                  │  │                  │
│ • Linear Algebra │  │ • Programming    │  │ • Hypothesis     │
│ • Calculus       │  │   (Python, C++)  │  │ • Experiment     │
│ • Probability &  │  │ • Data Structures│  │   Design         │
│   Statistics     │  │   & Algorithms   │  │ • Analysis &     │
│ • Optimization   │  │ • Software Eng.  │  │   Critique       │
└──────────────────┘  └──────────────────┘  └──────────────────┘

The Curriculum: A Phased Approach

This is a long-term learning plan, structured like a self-driven PhD. Each project builds a specific skill, and they are ordered to construct your knowledge from the ground up.

Phase 1: The Foundation - Thinking in Tensors and Gradients

You cannot build a skyscraper on sand. This phase is about building a rock-solid mathematical and programming intuition by implementing the core mechanics of deep learning yourself.

Project 1: Micrograd - An Autograd Engine

File: LEARN_AI_RESEARCH_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: Swift, Rust
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Calculus / Graph Theory / Core Deep Learning
Software or Tool: Python, NumPy
Main Book: “Deep Learning” by Goodfellow, Bengio, and Courville (Chapter 6)

What you’ll build: A small library that can build a computational graph of mathematical expressions and automatically compute the gradient of any node with respect to any other node using the chain rule (backpropagation).

Why it teaches AI Research: This is the absolute soul of deep learning. By building an autograd engine, you will internalize backpropagation in a way that no textbook alone can teach. You will never again see PyTorch or TensorFlow as “magic.”

Core challenges you’ll face:

Representing expressions as a graph → maps to understanding computational graphs
Implementing the chain rule for each operation → maps to deeply understanding derivatives and backpropagation
Performing a topological sort of the graph → maps to correctly ordering the backward pass
Handling gradients of multi-variate functions → maps to vector calculus concepts

Key Concepts:

Backpropagation: “Calculus on Computational Graphs: Backpropagation” - Christopher Olah’s Blog
Derivatives and Chain Rule: Khan Academy’s Calculus 1 course.
Computational Graphs: “Deep Learning” - Goodfellow, Bengio, Courville (Ch 6.5)

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Solid Python, understanding of derivatives.

Real world outcome: A Python class Value that wraps a number. You can perform math on these Value objects, and then call a .backward() method on the final result to automatically compute the gradients of all inputs. You can then use this to train a simple neural network, proving your engine works. Andrej Karpathy’s “The spelled-out intro to neural networks and backpropagation: building micrograd” is the canonical guide for this project.

Implementation Hints:

Create a Value class that stores a float and tracks the “children” (the Value objects that created it) and the operation.
Overload Python’s magic methods (__add__, __mul__, etc.) to create new Value objects and build the graph structure.
Each magic method should also define a _backward function that knows how to propagate the gradient to its inputs according to the chain rule. (e.g., for c = a + b, a’s gradient gets 1 * c’s gradient, and so does b’s).
The .backward() method on a Value object will kick off the process, recursively calling _backward on all its children in a topologically sorted order.

Learning milestones:

You can compute gradients for a simple expression → (a*b + c).backward().
Your engine can train a single neuron → You can perform gradient descent.
Your engine can train a Multi-Layer Perceptron (MLP) → You have successfully built a working deep learning framework from scratch.
You see torch.tensor(..., requires_grad=True) and know exactly what it means.

Project 2: The Optimizer Showdown

File: LEARN_AI_RESEARCH_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: JavaScript (for visualization)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Optimization Theory
Software or Tool: NumPy, Matplotlib/Plotly
Main Book: “Deep Learning” by Goodfellow, Bengio, and Courville (Chapter 8)

What you’ll build: A visualization tool that shows how different optimization algorithms (like SGD, Momentum, RMSProp, and Adam) navigate a 2D loss surface.

Why it teaches AI Research: Training a model is an optimization problem. This project builds deep intuition about how models learn and why one optimizer might be better than another in a given situation. You’ll see why momentum helps escape local minima and how adaptive methods change their learning rate.

Core challenges you’ll face:

Implementing optimization algorithms from scratch → maps to translating mathematical formulas into code
Creating and visualizing 2D loss functions → maps to understanding loss landscapes
Tracking and plotting the path of each optimizer → maps to visualizing the learning process

Key Concepts:

Optimization Algorithms: “An overview of gradient descent optimization algorithms” by Sebastian Ruder
Loss Landscapes: “Visualizing the Loss Landscape of Neural Nets” - Li et al., 2018.

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Python, NumPy, basic calculus.

Real world outcome: An animated plot showing several “balls” rolling down a contoured surface, each representing an optimizer. You’ll visually see Adam converging faster than vanilla SGD.

Implementation Hints:

Define a simple 2D function to be your “loss surface,” like the Beale function or a simple quadratic bowl.
Implement each optimizer as a function that takes a starting point, a learning rate, and the gradient function, and returns the history of points visited.
For example, SGD’s update is just x = x - lr * grad(x). Momentum’s update will involve a “velocity” term. Adam’s will involve tracking moving averages of the gradient and its square.
Use Matplotlib or Plotly to create a contour plot of the loss surface, then plot the path of each optimizer on top of it.

Learning milestones:

You can implement and visualize SGD.
You can explain intuitively what momentum does.
You understand why Adam is the default choice for most problems.
You can diagnose training issues related to optimization.

Phase 2: Mastering the Architectures

With a solid foundation, you can now build and understand the key architectures that drive modern AI. The goal here is to re-implement famous papers.

Project 3: “LeNet-5” - Your First CNN

File: LEARN_AI_RESEARCH_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: N/A
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Computer Vision / Convolutional Neural Networks
Software or Tool: PyTorch or TensorFlow
Main Book: “Deep Learning with Python” by François Chollet

What you’ll build: A re-implementation of LeNet-5, one of the earliest successful Convolutional Neural Networks, to classify handwritten digits from the MNIST dataset.

Why it teaches AI Research: This project is the “Hello, World!” of computer vision. It teaches the fundamental building blocks of CNNs: convolutions, pooling, and dense layers. You’ll learn how to structure a vision model and train it on a real dataset.

Core challenges you’ll face:

Understanding convolution and pooling layers → maps to shared weights, feature maps, and spatial down-sampling
Managing tensor shapes → maps to tracking how data dimensions change as it flows through the network
Implementing a standard training loop → maps to data loading, forward pass, loss calculation, backward pass, and optimizer step

Key Concepts:

Convolutional Networks: “A Guide to Convolutional Neural Networks” by Adit Deshpande
LeNet-5 Paper: “Gradient-Based Learning Applied to Document Recognition” by LeCun et al., 1998.
PyTorch Workflow: Official “Training a Classifier” tutorial on pytorch.org.

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Phase 1 projects, Python, a deep learning framework.

Real world outcome: A trained model that achieves >98% accuracy on the MNIST test set. You can feed it an image of a handwritten digit, and it will correctly predict the number.

Implementation Hints:

Use a framework like PyTorch. Don’t build the layers from scratch this time; use nn.Conv2d, nn.MaxPool2d, nn.Linear.
Define your network as a Python class inheriting from nn.Module.
The structure is simple: Conv -> Pool -> Conv -> Pool -> Linear -> Linear -> Softmax. Pay close attention to the number of channels and the tensor dimensions after each layer.
Use the built-in MNIST dataset loader, and create a DataLoader to handle batching.
Write a standard training loop: iterate over epochs, and for each epoch, iterate over batches of data.

Learning milestones:

You can build and train a basic CNN.
You understand how convolutions extract features.
You can debug issues related to tensor shapes (a very common problem).
You are ready to tackle more complex architectures like ResNet.

Project 4: “Nano-GPT” - Building a Transformer

File: LEARN_AI_RESEARCH_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: N/A
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Natural Language Processing / Transformer Architecture
Software or Tool: PyTorch, NumPy
Main Book: N/A (papers and tutorials are better here)

What you’ll build: A character-level language model based on the GPT-style Transformer architecture. You will train it on a small text corpus (like Shakespeare) and then have it generate new text in that style.

Why it teaches AI Research: The Transformer is the most important architecture of the last decade. It powers every large language model. By building one from scratch, you will gain a true, deep understanding of self-attention, positional encodings, and the mechanics of generative text models. This project is a non-negotiable rite of passage.

Core challenges you’ll face:

Implementing scaled dot-product self-attention → maps to the core mechanism of the Transformer
Building a multi-head attention block → maps to allowing the model to focus on different things in parallel
Creating the decoder-only Transformer block → maps to combining self-attention and feed-forward layers with residual connections and layer normalization
Implementing causal masking → maps to preventing the model from “cheating” by looking at future tokens during training

Key Concepts:

Canonical Guide: Andrej Karpathy’s “Let’s build GPT: from scratch, in code, spelled out.” YouTube video.
The Original Paper: “Attention Is All You Need” by Vaswani et al., 2017.
Illustrated Transformer: “The Illustrated Transformer” by Jay Alammar.

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Solid Python, PyTorch, and a good grasp of deep learning concepts.

Real world outcome: A script that trains a small GPT model. After training, you can give it a prompt like “O, what light through yonder window breaks?” and it will generate coherent, Shakespearean-style text that follows the prompt.

Implementation Hints:

Follow Andrej Karpathy’s tutorial closely. He builds the entire thing from scratch, explaining every line.
Start by implementing just the self-attention mechanism. Test it in isolation to make sure you understand how the query, key, and value matrices interact.
Implement the full Transformer block. This is the repeatable unit of the network.
Build the full language model by stacking these blocks. Add token and positional embeddings at the beginning and a linear layer at the end to predict the next character.

Learning milestones:

You can explain self-attention to someone without using code.
You understand why residual connections and layer normalization are critical.
Your model generates text that isn’t just gibberish.
You read the “Attention Is All You Need” paper and it now seems simple and obvious.

Phase 3: Becoming a Scientist

You have the tools. Now it’s time to think like a researcher. This phase is about experimental design, rigor, and communication.

Project 5: The Paper Replicator

File: LEARN_AI_RESEARCH_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: N/A
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Scientific Rigor / Reproducibility
Software or Tool: PyTorch, Weights & Biases (or similar)
Main Book: The paper you choose to replicate.

What you’ll build: You will choose a foundational AI paper (e.g., ResNet, DCGAN, StyleGAN) that has an open-source implementation. Your task is to read the paper and reproduce its key results from scratch in your own codebase, only looking at the official code when you are hopelessly stuck.

Why it teaches AI Research: This is a core activity in any PhD program or research lab. It teaches you to translate the dense, formal language of a research paper into working code. You will discover that papers often omit small but crucial details, and your job is to figure out those missing pieces.

Core challenges you’ll face:

Translating academic prose and math into an implementation → maps to the core skill of a research engineer
Debugging “silent failures” → maps to situations where the code runs but the model doesn’t learn, and you have to figure out why
Setting up the experiment and dataset correctly → maps to ensuring a fair comparison to the original paper’s results
Tracking your experiments and results systematically → maps to using tools like Weights & Biases to log metrics and compare runs

Key Concepts:

Finding Papers: Papers with Code is the best resource. Find papers with good community implementations.
Experiment Tracking: “A Guide to Weights & Biases” or similar tutorials.
Reading Papers: “How to Read a Paper” by S. Keshav.

Difficulty: Expert Time estimate: 1-2 months Prerequisites: Phase 1 & 2 projects.

Real world outcome: A GitHub repository containing your implementation. The README.md will be a report showing the key graphs and numbers from the original paper alongside the results produced by your code, demonstrating that you have successfully replicated the research.

Implementation Hints:

Choose a paper that is influential but not impossibly complex. ResNet is a classic choice. DCGAN is another.
Set up your project with experiment tracking from day one. Log everything: hyperparameters, training loss, validation loss, sample outputs, etc.
Try to match the paper’s reported hyperparameters exactly (learning rate, batch size, optimizer).
When your model doesn’t work, don’t immediately look at the official code. Instead, go back to the paper. Did you misinterpret something? Is there a detail you missed? This struggle is where the learning happens.

Learning milestones:

You can read a paper and create a mental model of its architecture.
You can implement a non-trivial model from only a paper description.
Your replicated results are within a reasonable margin of error of the original paper’s results.
You have the confidence to tackle almost any paper published in the field.

Project 6: The Novel Experiment

File: LEARN_AI_RESEARCH_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: N/A
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 5: Master
Knowledge Area: The Scientific Method
Software or Tool: Your entire toolkit.
Main Book: Your own lab notebook.

What you’ll build: You will formulate a novel research question, design an experiment to test it, run the experiment, and write a 4-page report on your findings in the style of a NeurIPS or ICML paper.

Why it teaches AI Research: This is the final step. This is what AI researchers do. It combines all the previous skills: formulating a hypothesis, implementing a model, designing a rigorous experiment, analyzing the results, and communicating your findings clearly and concisely.

Core challenges you’ll face:

Formulating a good research question → maps to finding a question that is interesting, novel, and testable
Designing a controlled experiment → maps to isolating variables and creating a fair baseline for comparison
Interpreting your results, even if they are negative → maps to understanding that a null result is still a result
Writing a compelling academic paper → maps to structuring your argument, creating clear figures, and communicating your contribution

Key Concepts:

The Scientific Method: Any introductory university-level text.
Paper Templates: Overleaf provides LaTeX templates for all major AI conferences.
Finding Ideas: Read a lot of papers. Look for the “Future Work” sections. What haven’t they tried? What are the limitations of their approach?

Difficulty: Master Time estimate: 3+ months Prerequisites: All previous projects. This is your “thesis.”

Real world outcome: A PDF file formatted like a real research paper. It will have an Abstract, Introduction, Method, Results, and Conclusion. This document is the ultimate proof of your ability. Whether the experiment “succeeded” or “failed” is less important than the rigor and clarity with which you conducted and reported it.

Implementation Hints:

Start with a small question. Don’t try to invent a new architecture. A good starting point is to take an existing architecture and apply it to a new domain, or to test a small modification. Example: “Can a Transformer model be used to predict stock prices better than an LSTM? I hypothesize it can because…”
Define your baseline. What is the simplest reasonable model for this task? You must compare your proposed model against a baseline.
Be rigorous. Use the same dataset, the same training/validation/test split, and the same evaluation metric for all your experiments.
Write as you go. Don’t wait until the end. Keep a lab notebook (a simple Markdown file is fine). Document your ideas, your failed attempts, and your results as they happen. This will become the raw material for your final paper.

Learning milestones:

You formulate a testable hypothesis.
You design and run a controlled experiment to test it.
You write a clear, concise, and honest report of your findings.
You are no longer just a student of AI; you are a contributor to the field. You are a researcher.

Summary

Project	Main Programming Language
Micrograd - An Autograd Engine	Python
The Optimizer Showdown	Python
“LeNet-5” - Your First CNN	Python
“Nano-GPT” - Building a Transformer	Python
The Paper Replicator	Python
The Novel Experiment	Python

This path is long and demanding, but it is the path to true understanding. Good luck.