Project 2: Escape Sequence Parser

A parser that takes raw terminal output (bytes from a program) and decodes it into structured events: “print ‘Hello’”, “set color to red”, “move cursor to (5,10)”, etc.

Quick Reference

Attribute Value
Primary Language C
Alternative Languages Rust, Zig, Go
Difficulty Level 2: Intermediate (The Developer)
Time Estimate 1-2 weeks
Knowledge Area Parsing / State Machines / ANSI
Tooling ANSI Parser
Prerequisites Understanding of state machines, basic C

What You Will Build

A parser that takes raw terminal output (bytes from a program) and decodes it into structured events: “print ‘Hello’”, “set color to red”, “move cursor to (5,10)”, etc.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • State machine design → Parsing multi-byte sequences correctly
  • Partial sequence handling → What if ESC arrives but [ hasn’t yet?
  • Parameter parsing → “\x1b[5;10;42m” has three numeric parameters
  • Distinguishing sequences → CSI vs OSC vs DCS vs SS3
  • Handling malformed input → Graceful degradation

Key Concepts

Real-World Outcome

$ echo -e "\x1b[31mHello\x1b[0m World" | ./ansi_parser

Parsing input: <ESC>[31mHello<ESC>[0m World

Events:
  [1] CSI Sequence: SGR (Select Graphic Rendition)
      Parameters: [31]
      Action: Set foreground color to RED

  [2] Print: "Hello"

  [3] CSI Sequence: SGR (Select Graphic Rendition)
      Parameters: [0]
      Action: Reset all attributes

  [4] Print: " World"

  [5] Print: "\n"

$ cat /some/program/output | ./ansi_parser --stats
Parsed 45,231 bytes:
  - Printable characters: 42,100
  - CSI sequences: 847
  - OSC sequences: 12
  - Unknown/ignored: 23

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: TERMINAL_EMULATOR_DEEP_DIVE_PROJECTS.md
  • “Language Implementation Patterns” by Terence Parr