Project 3: Shell Parser (AST Builder)

A recursive descent parser that takes tokens from your lexer and builds an Abstract Syntax Tree representing the command structure—including pipelines, redirections, and command lists.

Quick Reference

Attribute Value
Primary Language C
Alternative Languages Rust, OCaml, Haskell
Difficulty Level 3: Advanced (The Engineer)
Time Estimate 1-2 weeks
Knowledge Area Compilers / Parsing
Tooling Shell Parser
Prerequisites Project 2, understanding of grammars and recursion

What You Will Build

A recursive descent parser that takes tokens from your lexer and builds an Abstract Syntax Tree representing the command structure—including pipelines, redirections, and command lists.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Operator precedence ( binds tighter than && which binds tighter than ;) → maps to grammar design
  • Recursive structures (subshells, command groups) → maps to recursive descent
  • Multiple redirections (command can have many redirects) → maps to AST node design
  • Error recovery (meaningful messages for syntax errors) → maps to parser robustness
  • Associativity (a   b   c is left-associative) → maps to grammar rules

Key Concepts

  • Recursive descent parsing: “Language Implementation Patterns” Chapter 3-4 - Terence Parr
  • Shell grammar: POSIX Shell Specification Section 2.10 - The Open Group
  • AST design: “Engineering a Compiler” Chapter 5 - Cooper & Torczon

Real-World Outcome

$ echo 'ls -la | grep foo > out.txt' | ./shell_parser
Pipeline:
  ├── SimpleCommand: ls -la
  │     └── Redirections: (none)
  └── SimpleCommand: grep foo
        └── Redirections:
              └── RedirectOut: out.txt

$ echo 'cd /tmp && make || echo failed' | ./shell_parser
OrList:
  ├── AndList:
  │     ├── SimpleCommand: cd /tmp
  │     └── SimpleCommand: make
  └── SimpleCommand: echo failed

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: SHELL_INTERNALS_DEEP_DIVE_PROJECTS.md
  • “Language Implementation Patterns” by Terence Parr