LLVM LEARNING PROJECTS
LLVM is a **compiler infrastructure** that has revolutionized how compilers are built. To truly understand it, you need to grapple with these fundamental building blocks:
Learning LLVM Through Real-World Projects
Core Concept Analysis
LLVM is a compiler infrastructure that has revolutionized how compilers are built. To truly understand it, you need to grapple with these fundamental building blocks:
| Concept | What It Is | Why It Matters |
|---|---|---|
| LLVM IR | A typed, SSA-based intermediate representation | The âuniversal assembly languageâ that all LLVM-based languages compile to |
| Front-end | Lexing, parsing, AST generation | How source code becomes structured data |
| Optimization Passes | Modular transformations on IR | Where the âmagicâ of compiler optimization happens |
| Back-end/Code Generation | IR â machine code | How abstract code becomes executable instructions |
| Clang Tooling | APIs for analyzing/transforming C/C++ | Building developer tools on top of industrial-strength parsing |
Project 1: Calculator Language â LLVM IR Compiler
- File: LLVM_LEARNING_PROJECTS.md
- Programming Language: C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The âResume Goldâ
- Difficulty: Level 3: Advanced
- Knowledge Area: Compilers / LLVM
- Software or Tool: LLVM Core
- Main Book: âWriting a C Compilerâ by Nora Sandler
What youâll build: A compiler for a simple expression language that generates LLVM IR and produces a native executable you can run.
Why it teaches LLVM: This is the âhello worldâ of LLVMâyouâll touch every layer: lexing input, building an AST, emitting LLVM IR via the C++ API, and watching llc turn it into machine code. Youâll understand why LLVM IR exists and how it serves as the bridge between languages and machines.
Core challenges youâll face:
- Designing a grammar and building a recursive descent parser (maps to front-end architecture)
- Using LLVMâs
IRBuilderto emit typed IR instructions (maps to LLVM IR semantics) - Understanding SSA form and why variables work differently in IR (maps to SSA fundamentals)
- Linking with LLVM libraries and managing the build (maps to LLVM toolchain integration)
Resources for key challenges:
- âWriting a C Compilerâ by Nora Sandler, Ch. 1-3 - Clear progression from lexer to code generation
- LLVMâs Kaleidoscope Tutorial (Ch. 1-3) - The canonical introduction, but better after youâve struggled a bit
- âEngineering a Compilerâ by Cooper & Torczon, Ch. 4 - Deep dive into intermediate representations
Key Concepts:
- Lexical Analysis: âWriting a C Compilerâ Ch. 1 - Nora Sandler
- Recursive Descent Parsing: âCompilers: Principles and Practiceâ Ch. 4 - Parag H. Dave
- SSA Form: âEngineering a Compilerâ Ch. 9 - Cooper & Torczon
- LLVM IRBuilder API: LLVM Programmerâs Manual - llvm.org/docs
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Comfortable with C++, basic understanding of what compilers do
Real world outcome: Youâll write programs like:
let x = 5 + 3 * 2;
print(x);
Compile them with YOUR compiler, and run the resulting binary to see 11 printed to the terminal. You can inspect the generated .ll file to see exactly what IR your compiler produced.
Learning milestones:
- Lexer + Parser working â You understand how source text becomes structured data
- First IR emitted â You grasp LLVM IR syntax and SSA form
- Native binary runs â Youâve connected front-end to back-end through LLVMâs infrastructure
Project 2: Custom Clang Static Analysis Checker
- File: LLVM_LEARNING_PROJECTS.md
- Programming Language: C++
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Static Analysis / Compilers
- Software or Tool: Clang LibTooling
- Main Book: âClean Codeâ by Robert C. Martin (for understanding what to check)
What youâll build: A custom linter that analyzes C/C++ source code and reports specific code patterns (e.g., âfunction too longâ, âpotential null dereferenceâ, âdeprecated API usageâ).
Why it teaches LLVM/Clang: Clangâs static analyzer is built on the same infrastructure as the compiler. Youâll navigate the AST (Abstract Syntax Tree) that Clang produces, understand how real compilers represent code internally, and learn the visitor pattern that powers code analysis tools.
Core challenges youâll face:
- Setting up a Clang tool with
LibTooling(maps to Clang infrastructure) - Navigating
RecursiveASTVisitorto find code patterns (maps to AST structure) - Extracting source location information for diagnostics (maps to source management)
- Handling the complexity of C++ AST nodes (maps to language representation)
Resources for key challenges:
- Clang documentation: âLibToolingâ and âHow to write RecursiveASTVisitorâ - Official but essential
- âClang Tidy: How to write your own checksâ - LLVM YouTube - Visual walkthrough of the process
Key Concepts:
- AST Traversal: Clang Internals Manual - llvm.org/docs/ClangInternals
- Visitor Pattern: âDesign Patternsâ Ch. 5 - Gamma et al.
- C/C++ Grammar Complexity: âThe C++ Programming Languageâ Appendix A - Bjarne Stroustrup
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Solid C++ knowledge, familiarity with visitor pattern helpful
Real world outcome: Run your checker on any C/C++ codebase and get output like:
src/utils.cpp:47:1: warning: function 'processData' has 150 lines (max recommended: 50)
src/main.cpp:23:5: warning: calling deprecated API 'oldFunction', use 'newFunction' instead
Found 12 issues in 8 files.
This is exactly how production tools like clang-tidy work.
Learning milestones:
- Tool compiles and runs on source â You understand Clangâs build system and tooling setup
- AST traversal finds patterns â You can navigate compiler data structures
- Actionable warnings emitted â Youâve built a real developer tool
Project 3: LLVM Optimization Pass
- File: LLVM_LEARNING_PROJECTS.md
- Main Programming Language: C++
- Alternative Programming Languages: Rust, C
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The âResume Goldâ (Educational/Personal Brand)
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Compilers, Optimization
- Software or Tool: LLVM, Clang
- Main Book: âEngineering a Compilerâ - Cooper & Torczon
What youâll build: A custom optimization pass that transforms LLVM IR to produce faster/smaller code for a specific pattern (e.g., strength reduction, dead code elimination for a custom pattern, or loop unrolling heuristics).
Why it teaches LLVM: Optimization passes are the heart of LLVMâs power. By writing one yourself, youâll understand how compilers reason about code, what information is available at the IR level, and how transformations must preserve program semantics while improving performance.
Core challenges youâll face:
- Registering a pass with LLVMâs new PassManager (maps to pass infrastructure)
- Analyzing IR to identify optimization opportunities (maps to IR analysis)
- Safely modifying IR while maintaining correctness (maps to transformation safety)
- Measuring the impact of your optimization (maps to benchmarking)
Resources for key challenges:
- âWriting an LLVM Passâ - LLVM official docs (updated for new PassManager)
- âLLVM Code Generationâ by Quentin Colombet - Deep dive into LLVMâs optimization pipeline
- âEngineering a Compilerâ Ch. 8-10 - Theoretical foundation for optimization
Key Concepts:
- Data-flow Analysis: âEngineering a Compilerâ Ch. 9 - Cooper & Torczon
- SSA-based Optimization: âEngineering a Compilerâ Ch. 9 - Cooper & Torczon
- LLVM Pass Manager: LLVM Programmerâs Manual - llvm.org/docs
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Completed Project 1 or equivalent LLVM IR familiarity
Real world outcome: Youâll compile a benchmark program with and without your pass:
$ clang -O2 benchmark.c -o baseline
$ clang -O2 -fpass-plugin=./MyPass.so benchmark.c -o optimized
$ time ./baseline # 2.3s
$ time ./optimized # 1.8s (22% faster!)
You can prove your optimization works with measurable performance improvements.
Learning milestones:
- Pass registered and runs â You understand LLVMâs modular architecture
- IR correctly transformed â You can reason about program semantics at IR level
- Measurable speedup achieved â Youâve done what professional compiler engineers do
Project 4: JIT-Compiled REPL for a Toy Language
- File: LLVM_LEARNING_PROJECTS.md
- Programming Language: C++
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 5. The âIndustry Disruptorâ
- Difficulty: Level 4: Expert
- Knowledge Area: Virtual Machines / JIT
- Software or Tool: LLVM ORC JIT
- Main Book: âEngineering a Compilerâ by Cooper & Torczon
What youâll build: An interactive Read-Eval-Print Loop where you type expressions, theyâre JIT-compiled to native code, executed, and results printedâall in milliseconds.
Why it teaches LLVM: LLVMâs JIT (Just-In-Time) compilation via ORC is what powers Juliaâs performance, parts of JavaScript engines, and database query compilation. Youâll understand dynamic code generation and the difference between AOT and JIT compilation.
Core challenges youâll face:
- Setting up LLVMâs ORC JIT engine (maps to JIT infrastructure)
- Managing symbols and linking at runtime (maps to dynamic linking)
- Handling the JIT compilation lifecycle (maps to execution engine)
- Integrating with the host language (calling C functions from JIT code) (maps to FFI)
Resources for key challenges:
- LLVM Kaleidoscope Tutorial Ch. 4 - âAdding JIT and Optimizer Supportâ - Essential starting point
- âBuilding a JIT in LLVMâ - LLVM official docs for ORC
- âHow Julia Uses LLVMâ - Various talks by Jameson Nash - Real-world JIT at scale
Key Concepts:
- JIT vs AOT Compilation: âEngineering a Compilerâ Ch. 1 - Cooper & Torczon
- Dynamic Symbol Resolution: âComputer Systems: A Programmerâs Perspectiveâ Ch. 7 - Bryant & OâHallaron
- ORC JIT Architecture: LLVM ORC Design Document - llvm.org/docs
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Completed Project 1, comfortable with C++ and memory management
Real world outcome:
> 2 + 3 * 4
14
> def add(x, y) x + y
defined function 'add'
> add(10, 20)
30
> def fib(n) if n < 2 then n else fib(n-1) + fib(n-2)
defined function 'fib'
> fib(35)
9227465 (computed in 0.8s with native performance!)
An interactive programming environment with the speed of compiled code.
Learning milestones:
- JIT engine initialized â You understand LLVMâs execution environment
- Code executes on-demand â Youâve bridged compile-time and runtime
- Functions callable across REPL entries â Youâve managed symbol resolution dynamically
Project 5: Source-to-Source Refactoring Tool
- File: LLVM_LEARNING_PROJECTS.md
- Main Programming Language: C++
- Alternative Programming Languages: Python, Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The âMicro-SaaS / Pro Toolâ (Solo-Preneur Potential)
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Compilers, Code Transformation
- Software or Tool: LLVM, Clang, libTooling
- Main Book: âRefactoring: Improving the Design of Existing Codeâ - Martin Fowler
What youâll build: A tool that automatically transforms C/C++ codeârenaming functions, modernizing syntax (e.g., NULL â nullptr), or applying custom transformations across an entire codebase.
Why it teaches LLVM/Clang: This is how real refactoring tools work. Youâll use Clangâs Rewriter to modify source code while preserving formatting, comments, and correctness. This teaches the difference between AST manipulation and source text manipulation.
Core challenges youâll face:
- Using
ASTMatchersto find specific code patterns (maps to pattern matching) - Applying source rewrites without breaking code (maps to source preservation)
- Handling macros and preprocessor complexity (maps to preprocessing)
- Processing multiple files with consistent transformations (maps to tooling at scale)
Resources for key challenges:
- âClang AST Matchers Tutorialâ - LLVM docs - DSL for finding code patterns
- âclang-renameâ source code - How LLVMâs own tools do it
- âRefactoring: Improving the Design of Existing Codeâ by Martin Fowler - The âwhyâ behind refactoring
Key Concepts:
- AST Matching: Clang AST Matchers Reference - clang.llvm.org/docs
- Source Rewriting: Clang Rewriter Class Documentation - clang.llvm.org/doxygen
- Safe Refactoring: âRefactoringâ Ch. 1-3 - Martin Fowler
Difficulty: Intermediate-Advanced Time estimate: 2 weeks Prerequisites: Completed Project 2 or familiar with Clang AST
Real world outcome:
$ ./my-modernizer --transform=nullptr-upgrade ./src/
Processed 47 files:
- Replaced 234 instances of 'NULL' with 'nullptr'
- Replaced 12 instances of '0' (null pointer context) with 'nullptr'
Your tool can process real codebases and produce valid, improved code.
Learning milestones:
- Patterns matched in AST â You can express code queries declaratively
- Single file transformed correctly â You understand source location management
- Codebase-wide transformation works â Youâve built production-quality tooling
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| Calculator â LLVM IR | Intermediate | 1-2 weeks | ââââ | ââââ |
| Custom Static Checker | Intermediate | 1-2 weeks | âââ | ââââ |
| Optimization Pass | Advanced | 2-3 weeks | âââââ | âââ |
| JIT REPL | Advanced | 2-3 weeks | âââââ | âââââ |
| Refactoring Tool | Intermediate-Advanced | 2 weeks | ââââ | ââââ |
Recommendation
Start with Project 1 (Calculator â LLVM IR). Hereâs why:
- Itâs the canonical path - Almost everyone who learns LLVM starts here, which means the most resources and community help exist
- Tangible output immediately - Youâll have a working compiler that produces executables within days
- Foundation for everything else - Projects 3 and 4 directly build on the IR knowledge youâll gain
If youâre already comfortable with compiler basics and want something immediately practical, Project 2 (Static Checker) is excellentâyouâll have a useful tool within a week that you can run on real codebases.
Final Capstone Project: A Complete Programming Language
What youâll build: A full programming language implementation with:
- Custom syntax (your language design)
- Type checking
- Multiple optimization passes
- Both AOT and JIT compilation modes
- Integration with C libraries (FFI)
- A standard library with basic I/O
Why itâs the ultimate LLVM project: This synthesizes everything: front-end design, IR generation, optimization, JIT, and tooling. Languages like Rust, Swift, and Julia are built on LLVMâyouâll understand how.
Core challenges youâll face:
- Designing a coherent type system (maps to type theory basics)
- Implementing semantic analysis (maps to compiler middle-end)
- Building a debug info generator for GDB/LLDB support (maps to DWARF format)
- Creating a build system and package format (maps to language ecosystem)
- Writing meaningful error messages (maps to UX of compilers)
Resources for key challenges:
- âCrafting Interpretersâ by Bob Nystrom - The best book on language implementation, period
- âWriting a C Compilerâ by Nora Sandler - Practical C-to-assembly, adaptable to C-to-LLVM
- âTypes and Programming Languagesâ by Benjamin Pierce - If you want rigorous type system knowledge
- âLanguage Implementation Patternsâ by Terence Parr - Patterns youâll use repeatedly
Key Concepts:
- Language Design: âCrafting Interpretersâ full book - Bob Nystrom
- Type Systems: âTypes and Programming Languagesâ Ch. 1-11 - Benjamin Pierce
- Error Recovery: âEngineering a Compilerâ Ch. 3 - Cooper & Torczon
- Debug Information: DWARF Debugging Standard - dwarfstd.org
Difficulty: Advanced Time estimate: 1-3 months Prerequisites: Complete Projects 1 and either 3 or 4
Real world outcome:
$ cat hello.mylang
fn main() {
let name = "World";
print("Hello, " + name + "!");
}
$ mylang build hello.mylang -o hello
$ ./hello
Hello, World!
$ mylang run hello.mylang # JIT mode
Hello, World!
$ mylang check hello.mylang # Type checking only
â No errors found
Youâll have created a language that others can actually use to write programs.
Learning milestones:
- Parser and type checker complete â Youâve built a language front-end
- Compiled programs run correctly â Your code generation works
- JIT mode with acceptable latency â Youâve mastered LLVMâs execution engine
- Someone else writes a program in your language â Youâve created something real
Getting Started Checklist
Before diving in, ensure you have:
- LLVM/Clang installed (version 15+ recommended):
brew install llvmor build from source - CMake familiarity (LLVM uses CMake extensively)
- C++17 comfort (LLVMâs codebase uses modern C++)
llvm-configin your PATH (for linking)- A test C file to examine: run
clang -emit-llvm -S test.c -o test.lland read the IR
The LLVM documentation is notoriously dense but comprehensiveâexpect to reference it constantly. The Kaleidoscope tutorial is your friend for the first project.