Project 6: Template Engine with Custom Syntax

Build a mixed-mode template DSL with text/code parsing, control flow, includes, and optional compiled execution.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 2-3 weeks
Main Programming Language Rust
Alternative Programming Languages Go, C, Zig
Coolness Level Level 4: Hardcore Tech Flex
Business Potential 4. The “Open Core” Infrastructure
Prerequisites Projects 2-3, strong parsing foundations, memory/data modeling basics
Key Topics Lexer modes, template AST, control flow execution, compilation and caching

1. Learning Objectives

By completing this project, you will:

  1. Implement lexer mode switching between literal text and code islands.
  2. Parse template tags and expressions into a typed AST.
  3. Execute templates with deterministic scoping and escaping.
  4. Add optional compilation backend for repeated rendering speed.
  5. Design robust error surfaces for template authors.

2. All Theory Needed (Per-Concept Breakdown)

Mixed-Mode Lexing and Structured Template Parsing

Fundamentals Template languages are dual-language systems: plain text plus embedded control/code constructs. Standard single-mode lexers struggle because tokenization rules differ by context. Mixed-mode lexing solves this by switching scanner behavior when delimiters appear ({{, {%, {#). Correct mode transitions are the foundation for reliable parsing. If mode boundaries drift, parser errors become noisy and output corruption follows. This project teaches stateful lexing and parser architecture for text/code hybrids.

Deep Dive into the concept Define explicit lexer modes: TEXT, EXPR, BLOCK, COMMENT. In TEXT, consume bytes until next opening delimiter. Emit TEXT_CHUNK token for bulk literal segments. On delimiter detection, emit OPEN_EXPR/OPEN_BLOCK/OPEN_COMMENT and switch mode. In non-text modes, tokenize identifiers, literals, operators, and delimiters until closing marker then return to TEXT.

Mode transitions need precedence handling for overlapping delimiters. For example {{ should match before { if both patterns exist. Implement longest-match strategy for delimiters.

Token spans are especially important in templates because parse errors often refer to two coordinate spaces: template file line/column and potentially included file origins. Include file id/source id in token metadata from the start.

Parsing template AST usually combines statement and expression grammars:

  • statement nodes: text, output expression, if/for/include, block definitions.
  • expression nodes: literals, field access, filters, function-like calls.

Control-flow parsing requires matching open/close tags (if/endif, for/endfor) and optional branches (elif, else). Use stack-based block tracking to catch unbalanced tags with specific diagnostics.

Comments ({# ... #}) should be lexed and discarded or preserved for tooling metadata depending on requirements.

One architectural choice: parse to a unified AST first, then either interpret or compile. This allows dual backends without duplicate parsing logic.

Escaping policy must be explicit. Common default: auto-escape output expressions unless marked safe. If auto-escape is disabled, document security implications clearly.

Testing mixed-mode lexers needs carefully crafted fixtures around delimiter edges, nested constructs, and whitespace trimming variants. Add regression fixtures for off-by-one errors around delimiter boundaries.

This mixed-mode expertise extends naturally to syntax-highlighting engines and embedded query DSLs.

How this fit on projects

  • Core front-end of this project.
  • Extends lexing/parsing from Project 2.
  • Feeds compiled execution concepts used in Project 7.

Definitions & key terms

  • Lexer mode: contextual scanner behavior state.
  • Delimiter island: bounded region parsed with alternate grammar.
  • Template AST: structural tree of text and code nodes.
  • Block stack: parser structure tracking nested control tags.
  • Auto-escape: default escaping of dynamic output.

Mental model diagram

Template source
  |
  v
Mode lexer (TEXT/EXPR/BLOCK/COMMENT)
  |
  v
Template parser + expression parser
  |
  v
Template AST

How it works

  1. Scan literal text until delimiter.
  2. Switch mode and tokenize embedded code.
  3. Parse control tags with block stack.
  4. Parse expressions for output/filter nodes.
  5. Build AST and return diagnostics on imbalance.

Minimal concrete example

Input:
Hello {{ user.name }}{% if user.admin %} (admin){% endif %}

AST (simplified):
Text("Hello ")
Output(Field(user.name))
If(Field(user.admin), [Text(" (admin)")], [])

Common misconceptions

  • “Template parsing is easy string replace.” -> fails on nested flow and scoping.
  • “Mode switching can be regex-only.” -> fragile for nested/malformed input.
  • “Only runtime errors matter.” -> parse-time diagnostics dramatically improve usability.

Check-your-understanding questions

  1. Why keep file/source id in token spans?
  2. What data structure helps detect unmatched endif?
  3. Predict failure if lexer forgets to leave EXPR mode.

Check-your-understanding answers

  1. Needed for include-file diagnostics and traceability.
  2. block stack with expected closing tags.
  3. downstream text interpreted as code, causing cascade errors.

Real-world applications

  • email/template rendering engines.
  • static site generators.
  • infrastructure config templaters.

Where you’ll apply it

  • §3.2 requirements 1-3.
  • §4.1 architecture components.
  • §6.2 critical test cases.

References

  • Nystrom, Crafting Interpreters, scanner/parser foundations.
  • Jinja docs template syntax semantics.

Key insights Template engines are parser projects with deliberate mode boundaries and strong block-structure rules.

Summary Mixed-mode lexing plus structured parsing is the essential base for reliable template DSL execution.

Homework/Exercises to practice the concept

  1. Define mode transition table for delimiters.
  2. Draw parser stack events for nested if and for.
  3. Design three malformed-tag fixtures.

Solutions to the homework/exercises

  1. include current mode, trigger token, next mode.
  2. push on open, pop on matching close, error on mismatch.
  3. unclosed if, unexpected endif, include missing string literal.

Template Execution, Compilation, and Caching

Fundamentals Once parsed, templates can be interpreted node-by-node or compiled into an intermediate instruction format for faster repeated runs. Interpretation is easier to build and debug. Compilation improves throughput when the same template is rendered many times with different data. Caching compiled templates avoids repeated parse/compile overhead. This project teaches when and how to apply both strategies while preserving semantic parity.

Deep Dive into the concept Interpreter design: walk AST with runtime context stack. Text nodes append literal output. Output nodes evaluate expression and escape. If/for nodes control traversal of child nodes. Includes load child template AST and render with inherited or scoped context.

Scoping rules are critical. Loop variables should shadow but not overwrite outer values. Include context policy should be explicit: full inherit, explicit pass-through, or isolated scope with imports. Determinism requires choosing one and documenting it.

Compilation backend can lower AST to instruction list:

  • EMIT_TEXT
  • EVAL_EXPR
  • JUMP_IF_FALSE
  • LOOP_BEGIN/LOOP_END
  • INCLUDE_TEMPLATE

Execution VM then processes instructions with stack/context. Compilation pays off when templates are reused many times.

Caching strategy key choices:

  • key by template content hash + engine version + options.
  • invalidate on source change.
  • store parsed AST or compiled instructions (or both).

Semantic parity tests are mandatory: interpreter and compiled backend must produce identical output for same inputs.

Error model should distinguish compile-time and runtime:

  • compile-time: syntax errors, unresolved block structures.
  • runtime: missing variable, type mismatch in loops, include not found.

Security concerns: enforce escaping at output boundary. Consider safe-string marker semantics carefully; unsafe bypasses can create injection vulnerabilities.

Performance measurement should include:

  • cold parse+render latency,
  • warm cached render latency,
  • memory footprint of cache. Use deterministic fixtures to compare backend correctness and performance.

This architecture mirrors many production template systems and prepares you for rule compilation ideas in Project 7.

How this fit on projects

  • Core runtime engine of this project.
  • Compiled backend ideas align with optimization in Project 7.

Definitions & key terms

  • Interpreter: executes AST directly.
  • Bytecode/IR: lower-level instruction representation.
  • Cache key: deterministic identifier for reusable compiled artifact.
  • Scope chain: layered variable lookup contexts.
  • Semantic parity: compiled and interpreted outputs match exactly.

Mental model diagram

Template AST --> [Interpreter] ---------> output
           \--> [Compiler -> VM] ------> output
                    |
                 cache store

How it works

  1. Parse template to AST.
  2. Choose backend (interpret or compile).
  3. Render with context and escaping policy.
  4. Cache compiled artifact for repeated runs.
  5. Validate parity across backends.

Minimal concrete example

Template:
{% for item in items %}<p>{{ item }}</p>{% endfor %}
Data: ["A","B"]
Output: <p>A</p><p>B</p>

Common misconceptions

  • “Compilation always faster.” -> compile overhead hurts one-off renders.
  • “Cache by filename only is enough.” -> ignores content/options changes.
  • “Safe strings remove need for escaping policy.” -> they need strict trust boundaries.

Check-your-understanding questions

  1. What should cache key include to avoid stale outputs?
  2. Why run semantic parity tests?
  3. How should missing variable behave in strict mode?

Check-your-understanding answers

  1. source hash, engine version, options, dependency hashes.
  2. To guarantee backend changes do not alter behavior.
  3. Runtime error with source span and variable path.

Real-world applications

  • server-side rendering engines.
  • static site builds at scale.
  • email personalization systems.

Where you’ll apply it

  • §3.2 requirements 4-6.
  • §5.10 phase 2/3.
  • Also used in Project 7 optimization mindset.

References

  • Jinja template internals docs.
  • Nystrom bytecode chapters (design analogy).

Key insights Compilation and caching are optimization layers that must preserve exact semantic behavior.

Summary Template engines become production-ready when parser correctness, runtime semantics, and caching discipline are treated as one system.

Homework/Exercises to practice the concept

  1. Design cache key schema.
  2. Define strict vs permissive missing-variable policy.
  3. Draft semantic parity test matrix.

Solutions to the homework/exercises

  1. combine content hash + config hash + includes hash.
  2. strict: error; permissive: empty string with warning.
  3. run same fixtures through interpreter and compiled VM outputs.

3. Project Specification

3.1 What You Will Build

A template engine supporting comments, interpolation, conditionals, loops, includes, and optional precompiled template artifacts.

Included:

  • mixed-mode scanner.
  • parser + AST.
  • interpreter backend.
  • optional compiled backend + cache.

Excluded:

  • full inheritance/block override framework (extension path).

3.2 Functional Requirements

  1. tokenize text/code regions via mode switching.
  2. parse templates with nested control structures.
  3. evaluate expressions and filters.
  4. render output with escaping policy.
  5. support include loading.
  6. support compiled template mode with parity guarantees.

3.3 Non-Functional Requirements

  • Performance: warm cached render at least 2x faster than cold parse+render baseline.
  • Reliability: deterministic rendering for same template/data.
  • Usability: parser/runtime errors include source span and context.

3.4 Example Usage / Output

$ template_engine render page.tpl data.json
<html><h1>Hello, Alice</h1><p>3 items</p></html>

3.5 Data Formats / Schemas / Protocols

TemplateNode = Text | OutputExpr | IfBlock | ForBlock | Include
Instruction = EmitText | EvalExpr | JumpIfFalse | LoopIter | IncludeInst

3.6 Edge Cases

  • unclosed block tags.
  • missing include file.
  • undefined variable in strict mode.
  • filter applied to unsupported type.

3.7 Real World Outcome

You can ship a practical template runtime used by CLI tools, web rendering pipelines, or static generation workflows.

3.7.1 How to Run (Copy/Paste)

cd project_based_ideas/COMPILERS_RUNTIMES/DOMAIN_SPECIFIC_LANGUAGES_DSL_PROJECTS
make p06-test
./bin/p06-template render fixtures/p06_page.tpl fixtures/p06_data.json
./bin/p06-template compile fixtures/p06_page.tpl --out build/p06_page.tplc

3.7.2 Golden Path Demo (Deterministic)

p06_page.tpl + p06_data.json always yields same HTML output and output hash.

3.7.3 If CLI: exact terminal transcript

$ ./bin/p06-template render fixtures/p06_page.tpl fixtures/p06_data.json
[ok] backend=interpreter
[ok] output_hash=8e8e1fc4
exit=0

$ ./bin/p06-template compile fixtures/p06_page.tpl --out build/p06_page.tplc
[ok] instructions=47
[ok] artifact_hash=0f18bc77
exit=0

$ ./bin/p06-template render fixtures/p06_bad_unclosed.tpl fixtures/p06_data.json
[error] ParseError 14:1 expected '{% endif %}' before end-of-file
exit=2

4. Solution Architecture

4.1 High-Level Design

Template source -> Mode lexer -> Parser -> Template AST -> [Interpreter | Compiler+VM] -> Output

4.2 Key Components

Component Responsibility Key Decisions
Mode lexer tokenize text/code/comment regions delimiter precedence and span fidelity
Parser AST construction stack-based block matching
Evaluator runtime semantics strict/permissive variable policies
Compiler/VM performance backend instruction design and parity tests
Cache reuse compiled artifacts hash-based invalidation

4.4 Data Structures (No Full Code)

RenderContext { scopes: stack<map<string, Value>> }
CompiledArtifact { version, template_hash, instruction_list }

4.4 Algorithm Overview

Key Algorithm: Block Stack Parser

  1. parse token stream sequentially.
  2. on opening block, push expected closer.
  3. parse nested children recursively.
  4. on closing block, verify top-of-stack match.
  5. emit AST node.

Complexity Analysis

  • parse: O(n tokens).
  • render interpret: O(nodes + expression cost).
  • compile: O(nodes), VM render O(instructions).

5. Implementation Guide

5.1 Development Environment Setup

mkdir -p bin build fixtures tests

5.2 Project Structure

p06-template-engine/
├── src/
│   ├── lexer_modes.*
│   ├── parser.*
│   ├── ast.*
│   ├── evaluator.*
│   ├── compiler.*
│   ├── vm.*
│   └── cache.*
├── fixtures/
└── tests/

5.3 The Core Question You’re Answering

“How do I parse mixed text/code templates and execute them efficiently without sacrificing correctness?”

5.4 Concepts You Must Understand First

  1. lexer modes and delimiter-driven transitions.
  2. block-structured parser design.
  3. runtime scope chains.
  4. compiled backend parity testing.

5.5 Questions to Guide Your Design

  1. What mode table handles all delimiters safely?
  2. Which runtime errors should be strict vs permissive?
  3. How will include files affect source span reporting?
  4. What parity tests prove compiler correctness?

5.6 Thinking Exercise

Tokenize and parse this line manually: Hello {{ user.name }}{% if user.admin %}*{% endif %} Then list expected AST nodes in order.

5.7 The Interview Questions They’ll Ask

  1. Why are template engines lexer-mode-heavy systems?
  2. Interpreter vs compiled backend tradeoffs?
  3. How to design cache invalidation for templates?
  4. How to prevent template injection issues?
  5. How to test block matching thoroughly?

5.8 Hints in Layers

Hint 1: build text + interpolation first.

Hint 2: add if block parsing before loops/includes.

Hint 3: keep expression parser reusable from P03.

Hint 4: treat include loading and parsing as separate stage.

5.9 Books That Will Help

Topic Book Chapter
Parsing pipeline Crafting Interpreters scanner/parser chapters
Expression grammar reuse Language Implementation Patterns expression sections
DSL product mindset Domain Specific Languages implementation chapters

5.10 Implementation Phases

Phase 1: Foundation (4-6 hours)

  • mode lexer and minimal parser for text + output expressions.

Checkpoint: render simple interpolation template.

Phase 2: Core Functionality (10-16 hours)

  • control flow blocks and includes.
  • evaluator scope semantics.

Checkpoint: nested control fixtures pass.

Phase 3: Polish & Edge Cases (8-12 hours)

  • compiler/VM backend and cache.
  • parity and performance benchmarks.

Checkpoint: compiled and interpreted outputs match across suite.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Backend strategy interpreter only / dual backend dual backend teaches optimization path
Escape policy manual / auto-escape default auto-escape default safer baseline
Include scope full inherit / isolated / explicit explicit pass-through defaults predictable templates

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Lexer tests mode transitions delimiter edges, comments
Parser tests AST shape nested if/for/includes
Runtime tests output semantics escaping, missing vars, loops
Parity tests backend equivalence interpret vs compile output

6.2 Critical Test Cases

  1. nested if inside for output correctness.
  2. include file missing error with file span.
  3. strict mode missing variable runtime error.
  4. parity test for 20 fixtures across two backends.

6.3 Test Data

fixtures/p06_page.tpl
fixtures/p06_data.json
fixtures/p06_bad_unclosed.tpl
fixtures/p06_include_missing.tpl

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
incorrect mode switching garbled tokens explicit transition table tests
weak block stack checks vague parser errors include expected closer in diagnostics
backend divergence inconsistent outputs parity snapshot tests

7.2 Debugging Strategies

  • token dump by mode with line/column spans.
  • AST pretty-printer + VM instruction dump.

7.3 Performance Traps

Parsing includes repeatedly during hot render path can dominate; cache parsed/compiled artifacts keyed by content hash.


8. Extensions & Challenges

8.1 Beginner Extensions

  • whitespace trim markers around delimiters.
  • built-in uppercase/lowercase filters.

8.2 Intermediate Extensions

  • template inheritance blocks.
  • streaming renderer for large outputs.

8.3 Advanced Extensions

  • JIT-like hot template optimization.
  • sandboxed function call policy for secure multi-tenant usage.

9. Real-World Connections

9.1 Industry Applications

  • web framework view engines.
  • static site generation pipelines.
  • Jinja2: https://jinja.palletsprojects.com/
  • Handlebars: https://handlebarsjs.com/

9.3 Interview Relevance

  • parser architecture for mixed languages.
  • execution backend tradeoffs and caching.

10. Resources

10.1 Essential Reading

  • Jinja template docs and design notes.
  • Nystrom parser/compiler chapters.

10.2 Video Resources

  • template engine internals talks.
  • compiler vs interpreter architecture sessions.

10.3 Tools & Documentation

  • profiler tools for render hotspots.
  • snapshot diff tooling for output parity.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain mode transitions without notes.
  • I can describe parser stack behavior for nested blocks.
  • I can justify my escape and scope policies.

11.2 Implementation

  • parser and evaluator support required constructs.
  • compile backend parity tests pass.
  • deterministic output hash checks pass.

11.3 Growth

  • I documented one scaling bottleneck and mitigation.
  • I can compare this engine to Jinja/Handlebars.
  • I can explain design tradeoffs in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • mode lexer + parser + interpreter for text/interpolation/if.

Full Completion:

  • includes, loops, strict diagnostics, deterministic fixtures, optional compile backend.

Excellence (Going Above & Beyond):

  • performant caching, inheritance support, and robust parity benchmarks.

13 Additional Content Rules (Applied)

13.1 Determinism

Freeze fixtures and assert output/instruction hashes.

13.2 Outcome Completeness

Provide successful render, compile, and failure demos with explicit exit codes.

13.3 Cross-Linking

Builds on Project 2 and Project 3, and prepares scaling ideas in Project 7.

13.4 No Placeholder Text

All sections are concrete and actionable.