Project 3: Shell Parser (AST Builder)
A recursive descent parser that takes tokens from your lexer and builds an Abstract Syntax Tree representing the command structure—including pipelines, redirections, and command lists.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | C |
| Alternative Languages | Rust, OCaml, Haskell |
| Difficulty | Level 3: Advanced (The Engineer) |
| Time Estimate | 1-2 weeks |
| Knowledge Area | Compilers / Parsing |
| Tooling | Shell Parser |
| Prerequisites | Project 2, understanding of grammars and recursion |
What You Will Build
A recursive descent parser that takes tokens from your lexer and builds an Abstract Syntax Tree representing the command structure—including pipelines, redirections, and command lists.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
-
Operator precedence ( binds tighter than && which binds tighter than ;) → maps to grammar design - Recursive structures (subshells, command groups) → maps to recursive descent
- Multiple redirections (command can have many redirects) → maps to AST node design
- Error recovery (meaningful messages for syntax errors) → maps to parser robustness
-
Associativity (a b c is left-associative) → maps to grammar rules
Key Concepts
- Recursive descent parsing: “Language Implementation Patterns” Chapter 3-4 - Terence Parr
- Shell grammar: POSIX Shell Specification Section 2.10 - The Open Group
- AST design: “Engineering a Compiler” Chapter 5 - Cooper & Torczon
Real-World Outcome
$ echo 'ls -la | grep foo > out.txt' | ./shell_parser
Pipeline:
├── SimpleCommand: ls -la
│ └── Redirections: (none)
└── SimpleCommand: grep foo
└── Redirections:
└── RedirectOut: out.txt
$ echo 'cd /tmp && make || echo failed' | ./shell_parser
OrList:
├── AndList:
│ ├── SimpleCommand: cd /tmp
│ └── SimpleCommand: make
└── SimpleCommand: echo failed
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
SHELL_INTERNALS_DEEP_DIVE_PROJECTS.md - “Language Implementation Patterns” by Terence Parr