Project 6: Template Engine with Custom Syntax

Build a mixed-mode template DSL with text/code parsing, control flow, includes, and optional compiled execution.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	2-3 weeks
Main Programming Language	Rust
Alternative Programming Languages	Go, C, Zig
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	4. The “Open Core” Infrastructure
Prerequisites	Projects 2-3, strong parsing foundations, memory/data modeling basics
Key Topics	Lexer modes, template AST, control flow execution, compilation and caching

1. Learning Objectives

By completing this project, you will:

Implement lexer mode switching between literal text and code islands.
Parse template tags and expressions into a typed AST.
Execute templates with deterministic scoping and escaping.
Add optional compilation backend for repeated rendering speed.
Design robust error surfaces for template authors.

2. All Theory Needed (Per-Concept Breakdown)

Mixed-Mode Lexing and Structured Template Parsing

Fundamentals Template languages are dual-language systems: plain text plus embedded control/code constructs. Standard single-mode lexers struggle because tokenization rules differ by context. Mixed-mode lexing solves this by switching scanner behavior when delimiters appear ({{, {%, {#). Correct mode transitions are the foundation for reliable parsing. If mode boundaries drift, parser errors become noisy and output corruption follows. This project teaches stateful lexing and parser architecture for text/code hybrids.

Deep Dive into the concept Define explicit lexer modes: TEXT, EXPR, BLOCK, COMMENT. In TEXT, consume bytes until next opening delimiter. Emit TEXT_CHUNK token for bulk literal segments. On delimiter detection, emit OPEN_EXPR/OPEN_BLOCK/OPEN_COMMENT and switch mode. In non-text modes, tokenize identifiers, literals, operators, and delimiters until closing marker then return to TEXT.

Mode transitions need precedence handling for overlapping delimiters. For example {{ should match before { if both patterns exist. Implement longest-match strategy for delimiters.

Token spans are especially important in templates because parse errors often refer to two coordinate spaces: template file line/column and potentially included file origins. Include file id/source id in token metadata from the start.

Parsing template AST usually combines statement and expression grammars:

statement nodes: text, output expression, if/for/include, block definitions.
expression nodes: literals, field access, filters, function-like calls.

Control-flow parsing requires matching open/close tags (if/endif, for/endfor) and optional branches (elif, else). Use stack-based block tracking to catch unbalanced tags with specific diagnostics.

Comments ({# ... #}) should be lexed and discarded or preserved for tooling metadata depending on requirements.

One architectural choice: parse to a unified AST first, then either interpret or compile. This allows dual backends without duplicate parsing logic.

Escaping policy must be explicit. Common default: auto-escape output expressions unless marked safe. If auto-escape is disabled, document security implications clearly.

Testing mixed-mode lexers needs carefully crafted fixtures around delimiter edges, nested constructs, and whitespace trimming variants. Add regression fixtures for off-by-one errors around delimiter boundaries.

This mixed-mode expertise extends naturally to syntax-highlighting engines and embedded query DSLs.

How this fit on projects

Core front-end of this project.
Extends lexing/parsing from Project 2.
Feeds compiled execution concepts used in Project 7.

Definitions & key terms

Lexer mode: contextual scanner behavior state.
Delimiter island: bounded region parsed with alternate grammar.
Template AST: structural tree of text and code nodes.
Block stack: parser structure tracking nested control tags.
Auto-escape: default escaping of dynamic output.

Mental model diagram

Template source
  |
  v
Mode lexer (TEXT/EXPR/BLOCK/COMMENT)
  |
  v
Template parser + expression parser
  |
  v
Template AST

How it works

Scan literal text until delimiter.
Switch mode and tokenize embedded code.
Parse control tags with block stack.
Parse expressions for output/filter nodes.
Build AST and return diagnostics on imbalance.

Minimal concrete example

Input:
Hello {{ user.name }}{% if user.admin %} (admin){% endif %}

AST (simplified):
Text("Hello ")
Output(Field(user.name))
If(Field(user.admin), [Text(" (admin)")], [])

Common misconceptions

“Template parsing is easy string replace.” -> fails on nested flow and scoping.
“Mode switching can be regex-only.” -> fragile for nested/malformed input.
“Only runtime errors matter.” -> parse-time diagnostics dramatically improve usability.

Check-your-understanding questions

Why keep file/source id in token spans?
What data structure helps detect unmatched endif?
Predict failure if lexer forgets to leave EXPR mode.

Check-your-understanding answers

Needed for include-file diagnostics and traceability.
block stack with expected closing tags.
downstream text interpreted as code, causing cascade errors.

Real-world applications

email/template rendering engines.
static site generators.
infrastructure config templaters.

Where you’ll apply it

§3.2 requirements 1-3.
§4.1 architecture components.
§6.2 critical test cases.

References

Nystrom, Crafting Interpreters, scanner/parser foundations.
Jinja docs template syntax semantics.

Key insights Template engines are parser projects with deliberate mode boundaries and strong block-structure rules.

Summary Mixed-mode lexing plus structured parsing is the essential base for reliable template DSL execution.

Homework/Exercises to practice the concept

Define mode transition table for delimiters.
Draw parser stack events for nested if and for.
Design three malformed-tag fixtures.

Solutions to the homework/exercises

include current mode, trigger token, next mode.
push on open, pop on matching close, error on mismatch.
unclosed if, unexpected endif, include missing string literal.

Template Execution, Compilation, and Caching

Fundamentals Once parsed, templates can be interpreted node-by-node or compiled into an intermediate instruction format for faster repeated runs. Interpretation is easier to build and debug. Compilation improves throughput when the same template is rendered many times with different data. Caching compiled templates avoids repeated parse/compile overhead. This project teaches when and how to apply both strategies while preserving semantic parity.

Deep Dive into the concept Interpreter design: walk AST with runtime context stack. Text nodes append literal output. Output nodes evaluate expression and escape. If/for nodes control traversal of child nodes. Includes load child template AST and render with inherited or scoped context.

Scoping rules are critical. Loop variables should shadow but not overwrite outer values. Include context policy should be explicit: full inherit, explicit pass-through, or isolated scope with imports. Determinism requires choosing one and documenting it.

Compilation backend can lower AST to instruction list:

EMIT_TEXT
EVAL_EXPR
JUMP_IF_FALSE
LOOP_BEGIN/LOOP_END
INCLUDE_TEMPLATE

Execution VM then processes instructions with stack/context. Compilation pays off when templates are reused many times.

Caching strategy key choices:

key by template content hash + engine version + options.
invalidate on source change.
store parsed AST or compiled instructions (or both).

Semantic parity tests are mandatory: interpreter and compiled backend must produce identical output for same inputs.

Error model should distinguish compile-time and runtime:

compile-time: syntax errors, unresolved block structures.
runtime: missing variable, type mismatch in loops, include not found.

Security concerns: enforce escaping at output boundary. Consider safe-string marker semantics carefully; unsafe bypasses can create injection vulnerabilities.

Performance measurement should include:

cold parse+render latency,
warm cached render latency,
memory footprint of cache. Use deterministic fixtures to compare backend correctness and performance.

This architecture mirrors many production template systems and prepares you for rule compilation ideas in Project 7.

How this fit on projects

Core runtime engine of this project.
Compiled backend ideas align with optimization in Project 7.

Definitions & key terms

Interpreter: executes AST directly.
Bytecode/IR: lower-level instruction representation.
Cache key: deterministic identifier for reusable compiled artifact.
Scope chain: layered variable lookup contexts.
Semantic parity: compiled and interpreted outputs match exactly.

Mental model diagram

Template AST --> [Interpreter] ---------> output
           \--> [Compiler -> VM] ------> output
                    |
                 cache store

How it works

Parse template to AST.
Choose backend (interpret or compile).
Render with context and escaping policy.
Cache compiled artifact for repeated runs.
Validate parity across backends.

Minimal concrete example

Template:
{% for item in items %}<p>{{ item }}</p>{% endfor %}
Data: ["A","B"]
Output: <p>A</p><p>B</p>

Common misconceptions

“Compilation always faster.” -> compile overhead hurts one-off renders.
“Cache by filename only is enough.” -> ignores content/options changes.
“Safe strings remove need for escaping policy.” -> they need strict trust boundaries.

Check-your-understanding questions

What should cache key include to avoid stale outputs?
Why run semantic parity tests?
How should missing variable behave in strict mode?

Check-your-understanding answers

source hash, engine version, options, dependency hashes.
To guarantee backend changes do not alter behavior.
Runtime error with source span and variable path.

Real-world applications

server-side rendering engines.
static site builds at scale.
email personalization systems.

Where you’ll apply it

§3.2 requirements 4-6.
§5.10 phase 2/3.
Also used in Project 7 optimization mindset.

References

Jinja template internals docs.
Nystrom bytecode chapters (design analogy).

Key insights Compilation and caching are optimization layers that must preserve exact semantic behavior.

Summary Template engines become production-ready when parser correctness, runtime semantics, and caching discipline are treated as one system.

Homework/Exercises to practice the concept

Design cache key schema.
Define strict vs permissive missing-variable policy.
Draft semantic parity test matrix.

Solutions to the homework/exercises

combine content hash + config hash + includes hash.
strict: error; permissive: empty string with warning.
run same fixtures through interpreter and compiled VM outputs.

3. Project Specification

3.1 What You Will Build

A template engine supporting comments, interpolation, conditionals, loops, includes, and optional precompiled template artifacts.

Included:

mixed-mode scanner.
parser + AST.
interpreter backend.
optional compiled backend + cache.

Excluded:

full inheritance/block override framework (extension path).

3.2 Functional Requirements

tokenize text/code regions via mode switching.
parse templates with nested control structures.
evaluate expressions and filters.
render output with escaping policy.
support include loading.
support compiled template mode with parity guarantees.

3.3 Non-Functional Requirements

Performance: warm cached render at least 2x faster than cold parse+render baseline.
Reliability: deterministic rendering for same template/data.
Usability: parser/runtime errors include source span and context.

3.4 Example Usage / Output

$ template_engine render page.tpl data.json
<html><h1>Hello, Alice</h1><p>3 items</p></html>

3.5 Data Formats / Schemas / Protocols

TemplateNode = Text | OutputExpr | IfBlock | ForBlock | Include
Instruction = EmitText | EvalExpr | JumpIfFalse | LoopIter | IncludeInst

3.6 Edge Cases

unclosed block tags.
missing include file.
undefined variable in strict mode.
filter applied to unsupported type.

3.7 Real World Outcome

You can ship a practical template runtime used by CLI tools, web rendering pipelines, or static generation workflows.

3.7.1 How to Run (Copy/Paste)

cd project_based_ideas/COMPILERS_RUNTIMES/DOMAIN_SPECIFIC_LANGUAGES_DSL_PROJECTS
make p06-test
./bin/p06-template render fixtures/p06_page.tpl fixtures/p06_data.json
./bin/p06-template compile fixtures/p06_page.tpl --out build/p06_page.tplc

3.7.2 Golden Path Demo (Deterministic)

p06_page.tpl + p06_data.json always yields same HTML output and output hash.

3.7.3 If CLI: exact terminal transcript

$ ./bin/p06-template render fixtures/p06_page.tpl fixtures/p06_data.json
[ok] backend=interpreter
[ok] output_hash=8e8e1fc4
exit=0

$ ./bin/p06-template compile fixtures/p06_page.tpl --out build/p06_page.tplc
[ok] instructions=47
[ok] artifact_hash=0f18bc77
exit=0

$ ./bin/p06-template render fixtures/p06_bad_unclosed.tpl fixtures/p06_data.json
[error] ParseError 14:1 expected '{% endif %}' before end-of-file
exit=2

4. Solution Architecture

4.1 High-Level Design

Template source -> Mode lexer -> Parser -> Template AST -> [Interpreter | Compiler+VM] -> Output

4.2 Key Components

Component	Responsibility	Key Decisions
Mode lexer	tokenize text/code/comment regions	delimiter precedence and span fidelity
Parser	AST construction	stack-based block matching
Evaluator	runtime semantics	strict/permissive variable policies
Compiler/VM	performance backend	instruction design and parity tests
Cache	reuse compiled artifacts	hash-based invalidation

4.4 Data Structures (No Full Code)

RenderContext { scopes: stack<map<string, Value>> }
CompiledArtifact { version, template_hash, instruction_list }

4.4 Algorithm Overview

Key Algorithm: Block Stack Parser

parse token stream sequentially.
on opening block, push expected closer.
parse nested children recursively.
on closing block, verify top-of-stack match.
emit AST node.

Complexity Analysis

parse: O(n tokens).
render interpret: O(nodes + expression cost).
compile: O(nodes), VM render O(instructions).

5. Implementation Guide

5.1 Development Environment Setup

mkdir -p bin build fixtures tests

5.2 Project Structure

p06-template-engine/
├── src/
│   ├── lexer_modes.*
│   ├── parser.*
│   ├── ast.*
│   ├── evaluator.*
│   ├── compiler.*
│   ├── vm.*
│   └── cache.*
├── fixtures/
└── tests/

5.3 The Core Question You’re Answering

“How do I parse mixed text/code templates and execute them efficiently without sacrificing correctness?”

5.4 Concepts You Must Understand First

lexer modes and delimiter-driven transitions.
block-structured parser design.
runtime scope chains.
compiled backend parity testing.

5.5 Questions to Guide Your Design

What mode table handles all delimiters safely?
Which runtime errors should be strict vs permissive?
How will include files affect source span reporting?
What parity tests prove compiler correctness?

5.6 Thinking Exercise

Tokenize and parse this line manually: Hello {{ user.name }}{% if user.admin %}*{% endif %} Then list expected AST nodes in order.

5.7 The Interview Questions They’ll Ask

Why are template engines lexer-mode-heavy systems?
Interpreter vs compiled backend tradeoffs?
How to design cache invalidation for templates?
How to prevent template injection issues?
How to test block matching thoroughly?

5.8 Hints in Layers

Hint 1: build text + interpolation first.

Hint 2: add if block parsing before loops/includes.

Hint 3: keep expression parser reusable from P03.

Hint 4: treat include loading and parsing as separate stage.

5.9 Books That Will Help

Topic	Book	Chapter
Parsing pipeline	Crafting Interpreters	scanner/parser chapters
Expression grammar reuse	Language Implementation Patterns	expression sections
DSL product mindset	Domain Specific Languages	implementation chapters

5.10 Implementation Phases

Phase 1: Foundation (4-6 hours)

mode lexer and minimal parser for text + output expressions.

Checkpoint: render simple interpolation template.

Phase 2: Core Functionality (10-16 hours)

control flow blocks and includes.
evaluator scope semantics.

Checkpoint: nested control fixtures pass.

Phase 3: Polish & Edge Cases (8-12 hours)

compiler/VM backend and cache.
parity and performance benchmarks.

Checkpoint: compiled and interpreted outputs match across suite.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Backend strategy	interpreter only / dual backend	dual backend	teaches optimization path
Escape policy	manual / auto-escape default	auto-escape default	safer baseline
Include scope	full inherit / isolated / explicit	explicit pass-through defaults	predictable templates

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Lexer tests	mode transitions	delimiter edges, comments
Parser tests	AST shape	nested if/for/includes
Runtime tests	output semantics	escaping, missing vars, loops
Parity tests	backend equivalence	interpret vs compile output

6.2 Critical Test Cases

nested if inside for output correctness.
include file missing error with file span.
strict mode missing variable runtime error.
parity test for 20 fixtures across two backends.

6.3 Test Data

fixtures/p06_page.tpl
fixtures/p06_data.json
fixtures/p06_bad_unclosed.tpl
fixtures/p06_include_missing.tpl

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
incorrect mode switching	garbled tokens	explicit transition table tests
weak block stack checks	vague parser errors	include expected closer in diagnostics
backend divergence	inconsistent outputs	parity snapshot tests

7.2 Debugging Strategies

token dump by mode with line/column spans.
AST pretty-printer + VM instruction dump.

7.3 Performance Traps

Parsing includes repeatedly during hot render path can dominate; cache parsed/compiled artifacts keyed by content hash.

8. Extensions & Challenges

8.1 Beginner Extensions

whitespace trim markers around delimiters.
built-in uppercase/lowercase filters.

8.2 Intermediate Extensions

template inheritance blocks.
streaming renderer for large outputs.

8.3 Advanced Extensions

JIT-like hot template optimization.
sandboxed function call policy for secure multi-tenant usage.

9. Real-World Connections

9.1 Industry Applications

web framework view engines.
static site generation pipelines.

Jinja2: https://jinja.palletsprojects.com/
Handlebars: https://handlebarsjs.com/

9.3 Interview Relevance

parser architecture for mixed languages.
execution backend tradeoffs and caching.

10. Resources

10.1 Essential Reading

Jinja template docs and design notes.
Nystrom parser/compiler chapters.

10.2 Video Resources

template engine internals talks.
compiler vs interpreter architecture sessions.

10.3 Tools & Documentation

profiler tools for render hotspots.
snapshot diff tooling for output parity.

11. Self-Assessment Checklist

11.1 Understanding

I can explain mode transitions without notes.
I can describe parser stack behavior for nested blocks.
I can justify my escape and scope policies.

11.2 Implementation

parser and evaluator support required constructs.
compile backend parity tests pass.
deterministic output hash checks pass.

11.3 Growth

I documented one scaling bottleneck and mitigation.
I can compare this engine to Jinja/Handlebars.
I can explain design tradeoffs in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

mode lexer + parser + interpreter for text/interpolation/if.

Full Completion:

includes, loops, strict diagnostics, deterministic fixtures, optional compile backend.

Excellence (Going Above & Beyond):

performant caching, inheritance support, and robust parity benchmarks.

13 Additional Content Rules (Applied)

13.1 Determinism

Freeze fixtures and assert output/instruction hashes.

13.2 Outcome Completeness

Provide successful render, compile, and failure demos with explicit exit codes.

13.3 Cross-Linking

Builds on Project 2 and Project 3, and prepares scaling ideas in Project 7.

13.4 No Placeholder Text

All sections are concrete and actionable.