AI Systems Deep Dive: Transformers, Quantization & Inference

A comprehensive project-based learning journey to master AI/ML systems engineering, from understanding transformers at the mathematical level to building production-grade inference engines.

Learning Path Overview

This learning path is organized into 5 phases, each building on the previous:

Phase	Focus	Projects	Skills Gained
Phase 1	Attention & Memory	P01-P02	Mathematical foundations, memory optimization
Phase 2	Complete Architectures	P03-P04	End-to-end transformers, advanced architectures
Phase 3	Model Compression	P05-P06	Quantization, efficient fine-tuning
Phase 4	Inference Optimization	P07-P09	Caching, batching, speculative decoding
Phase 5	Production Systems	P10	Full production inference engine

Projects

Phase 1: Attention & Memory Optimization

#	Project	Difficulty	Time	Key Concepts
01	Build Attention from Scratch	Intermediate	1 week	Softmax, QKV, Multi-head attention
02	Implement Flash Attention	Advanced	1-2 weeks	Tiling, online softmax, memory efficiency

Phase 2: Complete Architectures

#	Project	Difficulty	Time	Key Concepts
03	Build a Full Transformer	Advanced	2-3 weeks	Encoder-decoder, training loop, GPT/BERT
04	Implement Sparse MoE Layer	Expert	2 weeks	Mixture-of-Experts, gating, load balancing

Phase 3: Model Compression

#	Project	Difficulty	Time	Key Concepts
05	Implement Post-Training Quantization	Advanced	2 weeks	INT8, calibration, GPTQ
06	Implement LoRA	Intermediate	1-2 weeks	Low-rank adaptation, efficient fine-tuning

Phase 4: Inference Optimization

#	Project	Difficulty	Time	Key Concepts
07	Build a KV Cache	Advanced	1 week	Caching, sliding window, streaming
08	Implement Continuous Batching	Master	3-4 weeks	PagedAttention, vLLM, scheduling
09	Implement Speculative Decoding	Expert	2 weeks	Draft/target models, rejection sampling

Phase 5: Capstone

#	Project	Difficulty	Time	Key Concepts
10	Production Inference Engine	Master	2-3 months	CUDA, Rust, full integration

Prerequisites

Before starting this learning path, you should have:

Python proficiency: Comfortable with NumPy, PyTorch
Linear algebra: Matrix operations, eigenvalues, SVD basics
Calculus: Gradients, chain rule for backpropagation
Deep learning basics: Neural networks, loss functions, optimizers

Recommended Study Order

Week 1-2: Project 1 (Attention) - Build mathematical foundation
Week 3-4: Project 2 (Flash Attention) - Learn memory optimization
Week 5-7: Project 3 (Transformer) - Build complete architecture
Week 8-10: Projects 5-6 (Quantization, LoRA) - Model compression
Week 11-12: Project 7 (KV Cache) - Inference basics
Week 13-16: Projects 8-9 (Batching, Speculative) - Advanced inference
Week 17+: Project 10 (Production Engine) - Full system integration

Target Audience

ML Engineers transitioning to AI infrastructure
Software engineers building LLM applications
Researchers wanting systems-level understanding
Anyone building production AI systems

Technologies Used

Category	Technologies
Languages	Python, Rust, C++, CUDA
Frameworks	PyTorch, Triton
Tools	vLLM, TensorRT-LLM, llama.cpp
Infrastructure	Docker, Kubernetes, Prometheus, Grafana

Learning Outcomes

By completing this learning path, you will be able to:

Implement transformers from scratch with full mathematical understanding
Optimize memory usage using techniques like Flash Attention
Compress models with quantization while maintaining quality
Build high-performance inference systems rivaling commercial offerings
Design production AI infrastructure with proper observability

How to Use These Projects

Each project file contains:

Learning Objectives - What you’ll learn
Theoretical Foundation - Deep conceptual coverage
Project Specification - What to build
Solution Architecture - How to design it
Implementation Guide - Phased approach with hints
Testing Strategy - How to verify correctness
Common Pitfalls - Issues to avoid
Extensions - Advanced challenges
Interview Questions - Real-world preparation
Resources - Books and papers

Contributing

This is a personal learning journey repository. Feel free to fork and adapt for your own learning path.

Part of the Learning Journey C project - A comprehensive approach to mastering AI systems engineering through project-based learning.