Project 5: Macro-Based HTML DSL (Compile-Time DSL)

Build a compile-time HTML DSL with macro expansion, AST transformation, and static validation.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	1-2 weeks
Main Programming Language	Elixir
Alternative Programming Languages	Rust (proc macros), Lisp/Clojure, Nim
Coolness Level	Level 5: Pure Magic
Business Potential	1. The “Resume Gold”
Prerequisites	AST basics, compile-time vs runtime distinction, recursion
Key Topics	Macros, quote/unquote, hygiene, compile-time checks

1. Learning Objectives

By completing this project, you will:

Represent HTML-like DSL syntax as host-language AST.
Expand nested macro forms into deterministic runtime output.
Enforce compile-time validation for tag/attribute misuse.
Preserve hygiene and avoid accidental variable capture.
Understand zero-runtime-overhead DSL patterns.

2. All Theory Needed (Per-Concept Breakdown)

Compile-Time Metaprogramming and AST Transformation

Fundamentals Compile-time DSLs operate before runtime execution. Instead of parsing strings at runtime, macros transform source forms into host-language AST during compilation. This yields stronger validation and lower runtime overhead. The key mental model is “code as data”: syntax is represented as tree structures that can be inspected and rewritten. In Elixir and Lisp-like systems, this is explicit and ergonomic. For DSL builders, compile-time expansion means mistakes can be caught earlier and output can be optimized.

Deep Dive into the concept Compilation typically involves parsing source to AST, macro expansion, semantic checks, and bytecode/native output. Your DSL participates in expansion stage. A macro receives AST nodes, rewrites them, and returns transformed AST. The transformation must be deterministic and hygienic.

For HTML DSL design, define a canonical AST for tags:

tag name,
attribute list,
child node list,
text/interpolation nodes.

Macro entrypoint (html do ... end) captures nested forms and rewrites them into calls over this canonical structure or direct iodata-like output structures. The expansion strategy can be direct string-building AST or intermediate IR then renderer. Intermediate IR is easier to validate and test.

Compile-time validation can check:

allowed attribute names per tag (optional strict profile),
required attributes (img requires alt in accessibility mode),
invalid nesting policies (if enforced),
malformed macro arguments.

Because checks run at compile time, feedback arrives before deployment. That is a major product advantage.

Macro expansion must preserve source location where possible for diagnostics. Use metadata from input AST to surface errors at author line numbers.

Deterministic expansion is critical for reproducible builds. Same input module should emit semantically identical output regardless of compile order. Avoid hidden global state in macros.

Security is another benefit: by constraining the DSL output path, you reduce injection risks relative to runtime string concatenation. You can enforce escaping at transformation level.

Limitations: compile-time DSLs are host-language-dependent and can be harder for newcomers. Debugging macro-expanded code requires tooling (AST inspection) and discipline. Therefore include helper debug macros to inspect expanded forms during learning.

This project teaches a distinct DSL strategy compared to parser-based external DSLs. Instead of lexers/parsers, you leverage host compiler phases.

How this fit on projects

Primary architecture in this project.
Contrasts with runtime parsers in Project 2.
Useful for compile-time validation concepts in Project 7.

Definitions & key terms

Macro: compile-time AST transformation function.
Homoiconicity: code represented in structures manipulable as data.
Expansion: rewriting macro forms to lower-level AST.
Hygiene: preventing accidental variable capture during expansion.
Iodata/IR: intermediate representation for efficient output generation.

Mental model diagram

DSL source form --> host parser AST --> macro expansion --> validated IR/AST --> runtime output

How it works

Parse host-language module to AST.
Encounter DSL macro call.
Rewrite nested tag forms into canonical IR.
Validate IR at compile time.
Emit runtime rendering AST.

Minimal concrete example

Source DSL form:
div class: "note" do
  h1 "Hello"
end

Expanded idea:
render_tag("div", [class:"note"], [render_tag("h1", [], ["Hello"])])

Common misconceptions

“Macros are just faster functions.” -> functions run at runtime, macros transform syntax pre-runtime.
“Compile-time means no need for runtime tests.” -> runtime data-dependent behavior still needs tests.
“Hygiene is optional.” -> hygiene bugs are hard to diagnose and can be severe.

Check-your-understanding questions

Why can macros provide better error timing than runtime parsers?
What is the risk of non-hygienic macro expansion?
Predict benefit of intermediate IR vs direct string generation.

Check-your-understanding answers

They fail during compilation before execution paths are reached.
Variable capture and shadowing causing incorrect behavior.
Easier validation, testing, and alternate renderer backends.

Real-world applications

Phoenix/HEEx-style templating.
compile-time route declarations.
framework-specific declarative UI layers.

Where you’ll apply it

§3.2 compile-time validation requirements.
§4.1 macro pipeline architecture.
§6.2 expansion snapshot tests.

References

Chris McCord, Metaprogramming Elixir.
Elixir School metaprogramming docs.

Key insights Compile-time DSLs shift correctness and performance benefits left, at the cost of macro complexity.

Summary Macro DSLs are language engineering inside compiler phases, offering early feedback and efficient output generation.

Homework/Exercises to practice the concept

Sketch canonical tag IR.
Define three compile-time validation rules.
Write one non-hygienic expansion scenario.

Solutions to the homework/exercises

Tag(name, attrs, children) plus Text nodes.
unknown attr, invalid arg shape, forbidden nesting.
expansion introduces local variable named x conflicting with caller x.

Macro Hygiene, Validation, and Host-Language Integration

Fundamentals Macro hygiene prevents transformed code from accidentally capturing or shadowing variables in caller scope. Integration means DSL blocks can still use host constructs (if, loops, interpolation) without breaking expansion semantics. Validation ensures misuse fails clearly at compile time. Together, these determine whether a macro DSL is practical or fragile.

Deep Dive into the concept Hygiene issues appear when generated identifiers collide with caller names. Solutions include generated unique names and explicit variable scoping controls provided by host macro system. In Elixir-style macros, quoting/unquoting controls which parts are literal AST and which are evaluated/injected.

Host integration is usually the hardest part. Users expect to place conditionals and loops inside DSL blocks naturally. Your macro must recognize those constructs and either preserve them as AST or convert them into IR nodes that evaluate at runtime. Avoid evaluating runtime expressions during compilation unless values are known constants.

Validation strategy should be layered:

Syntax-shape validation in macro entry (argument arity/form).
Structural validation after expansion to IR.
Optional domain validation (allowed tags/attrs) configurable by strictness mode.

Diagnostics should include original source location and suggested fix. Example: unknown tag attribute should list allowed alternatives if strict schema exists.

Escape handling is essential. Decide where escaping occurs (at node rendering stage, not ad-hoc in macros). Keep deterministic encoding rules.

Testing macros requires expansion tests and runtime output tests. Expansion tests assert transformed AST/IR structure for fixed input forms. Runtime tests assert produced HTML string for deterministic input data.

A practical integration pattern:

macro expands DSL forms to IR-builder calls.
runtime renderer interprets IR with escaped interpolation. This balances compile-time validation and runtime flexibility.

Be conservative with compile-time execution. Calling arbitrary user functions during expansion can produce non-deterministic builds and side effects. Keep macro phase pure.

As DSL grows, provide lints: warn on deeply nested tags, duplicate attributes, or deprecated constructs. These are excellent compile-time UX improvements.

How this fit on projects

Core correctness and usability for this project.
hygiene and validation mindset transfers to Project 7 semantic checks.

Definitions & key terms

Hygiene: macro expansion preserving lexical correctness.
Quote/Unquote: mechanisms for constructing/injecting AST fragments.
Strict mode: validation profile with tighter constraints.
Expansion snapshot: stored representation of transformed AST for tests.
Interpolation: injecting runtime values into rendered output safely.

Mental model diagram

Caller scope vars
      |
DSL macro form --> hygienic expansion --> IR --> runtime render with escaped interpolation

How it works

Capture AST with quote.
Pattern-match DSL nodes.
Build hygienic expanded AST.
Validate nodes/attrs.
Render at runtime with escaping.

Minimal concrete example

Rule:
if user.admin? do
  div class: "admin" do ... end
end

Expansion preserves `if` as runtime host AST and only rewrites tag forms.

Common misconceptions

“Macros should evaluate all embedded expressions at compile time.” -> breaks runtime semantics.
“Escaping can be skipped for trusted templates.” -> future input paths often change trust boundaries.
“Hygiene problems are rare.” -> they appear quickly in nested macros.

Check-your-understanding questions

Why preserve host control-flow AST instead of evaluating in macro?
What kind of tests catch hygiene issues?
Where should escaping happen for consistency?

Check-your-understanding answers

Because runtime data may not be known at compile time.
Expansion snapshots with deliberately conflicting variable names.
In renderer layer applied to interpolated dynamic content.

Real-world applications

compile-time UI/template DSLs.
safe email template builders.
routing/query declaration macros.

Where you’ll apply it

§5.10 phase 2 and 3.
§6.2 expansion and runtime tests.

References

Metaprogramming Elixir.
Elixir docs on macros and hygiene.

Key insights Macro DSL quality depends more on hygiene and validation discipline than on syntax cleverness.

Summary Reliable macro DSLs preserve host-language semantics, validate early, and avoid scope corruption.

Homework/Exercises to practice the concept

Design one strict-mode validation table for 5 tags.
Create a hygiene failure fixture and expected fix.
Define expansion snapshot format.

Solutions to the homework/exercises

include tag -> allowed attrs mapping.
detect collision and generate unique identifiers.
serialize simplified AST nodes to deterministic text.

3. Project Specification

3.1 What You Will Build

A compile-time HTML DSL package with nested tags, attributes, interpolation, and host control-flow support.

Included:

macro entrypoint for DSL blocks.
compile-time validation and diagnostics.
runtime rendering from expanded IR.

Excluded:

full HTML5 validation matrix.
CSS/JS minification pipeline.

3.2 Functional Requirements

Support nested tag composition in DSL syntax.
Support attribute maps and text children.
Integrate host if/for constructs inside DSL blocks.
Compile-time detect malformed DSL forms.
Deterministically render output for fixed input.

3.3 Non-Functional Requirements

Performance: no parser cost at runtime; render overhead near plain template function.
Reliability: expansion and output deterministic for fixtures.
Usability: compile-time errors include source locations and hints.

3.4 Example Usage / Output

Input DSL:
html do
  body class: "dark" do
    h1 "Welcome"
  end
end

Output:
<html><body class="dark"><h1>Welcome</h1></body></html>

3.5 Data Formats / Schemas / Protocols

IR Node variants:
- Tag(name, attrs, children)
- Text(content)
- Dynamic(expr)

3.6 Edge Cases

duplicate attribute keys.
unsupported attribute value types.
invalid nesting in strict mode.
unescaped interpolation risks.

3.7 Real World Outcome

Developers author HTML structure in readable DSL form with compile-time feedback and fast runtime rendering.

3.7.1 How to Run (Copy/Paste)

cd project_based_ideas/COMPILERS_RUNTIMES/DOMAIN_SPECIFIC_LANGUAGES_DSL_PROJECTS
make p05-test
./bin/p05-macro-demo --fixture fixtures/p05_golden.dsl

3.7.2 Golden Path Demo (Deterministic)

Fixture expands to stable IR hash and exact HTML output.

3.7.3 If CLI: exact terminal transcript

$ ./bin/p05-macro-demo --fixture fixtures/p05_golden.dsl
[ok] expansion_hash=31f9a844
[ok] html=<html><body class="dark"><h1>Welcome, Alice!</h1></body></html>
exit=0

$ ./bin/p05-macro-demo --fixture fixtures/p05_bad_attr.dsl
[error] CompileError 12:9 unknown attribute 'clas' for tag 'body'
[hint] did you mean 'class'?
exit=2

4. Solution Architecture

4.1 High-Level Design

DSL macro call -> AST capture -> Hygienic expansion -> IR validation -> runtime render

4.2 Key Components

Component	Responsibility	Key Decisions
Macro entry	capture and dispatch DSL forms	pattern matching by node shape
IR builder	canonical representation	tag/text/dynamic node variants
Validator	compile-time checks	strict mode toggle
Renderer	runtime HTML output	escaping + deterministic ordering

4.4 Data Structures (No Full Code)

ValidationError { span, code, message, hint }
ExpandedTemplate { ir_nodes, metadata }

4.4 Algorithm Overview

Key Algorithm: Nested Tag Expansion

Match tag macro form.
Normalize attrs and children.
Recursively expand child forms.
Emit Tag IR node.
Validate IR tree.
Generate runtime rendering AST.

Complexity Analysis

Expansion: O(n) AST nodes.
Rendering: O(n + text_size).

5. Implementation Guide

5.1 Development Environment Setup

mkdir -p bin fixtures tests

5.2 Project Structure

p05-macro-html-dsl/
├── src/
│   ├── macro_entry.*
│   ├── expander.*
│   ├── ir.*
│   ├── validator.*
│   └── renderer.*
├── fixtures/
└── tests/

5.3 The Core Question You’re Answering

“How can compile-time macros express a rich DSL while keeping runtime behavior predictable and safe?”

5.4 Concepts You Must Understand First

Quote/unquote AST mechanics.
Hygiene and scope isolation.
Compile-time diagnostics design.
Runtime escaping strategy.

5.5 Questions to Guide Your Design

What AST shape should represent tags and children?
Which validations are compile-time vs runtime?
How do host control-flow nodes pass through expansion?
How will you test expansion determinism?

5.6 Thinking Exercise

Take one nested DSL snippet and hand-write expected IR tree including dynamic interpolations.

5.7 The Interview Questions They’ll Ask

Macro vs function differences in practice?
What is hygiene and how do you enforce it?
Why use intermediate IR for compile-time DSLs?
How does compile-time validation improve developer UX?
What tradeoffs make macro DSLs hard to maintain?

5.8 Hints in Layers

Hint 1: implement one tag and one text child first.

Hint 2: build IR printer for debugging expansions.

Hint 3: add strict validation after expansion, not during parsing.

Hint 4: separate escape logic from expansion logic.

5.9 Books That Will Help

Topic	Book	Chapter
Macro fundamentals	Metaprogramming Elixir	Ch. 1-2
DSL design patterns	Domain Specific Languages	internal DSL sections
compiler pipeline intuition	Crafting Interpreters	front-end architecture

5.10 Implementation Phases

Phase 1: Foundation (3-5 hours)

macro entry and basic tag expansion.
runtime renderer for static nodes.

Checkpoint: one-page fixture renders correctly.

Phase 2: Core Functionality (6-10 hours)

host control-flow passthrough.
compile-time validator and error formatting.

Checkpoint: strict-mode fixtures pass/fail correctly.

Phase 3: Polish & Edge Cases (4-6 hours)

expansion snapshots.
escaping and interpolation hardening.

Checkpoint: deterministic expansion hash tests pass.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Output strategy	direct strings / iodata-like IR	IR then render	easier validation + optimization
Validation timing	runtime only / compile-time + runtime	both	early feedback + runtime safety
Strictness	fixed / configurable	configurable strict profiles	supports learning + production posture

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Expansion tests	AST/IR correctness	nested tags, control-flow
Validation tests	compile-time errors	unknown attrs, malformed nodes
Render tests	output correctness	escaped interpolation, deterministic output

6.2 Critical Test Cases

nested tag expansion with attributes.
if and for integration inside DSL block.
compile-time error on unknown attribute.
escaped interpolation prevents raw HTML injection.

6.3 Test Data

fixtures/p05_golden.dsl
fixtures/p05_bad_attr.dsl
fixtures/p05_control_flow.dsl
fixtures/p05_escape.dsl

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
non-hygienic vars	random runtime misbehavior	generated unique names + scoped expansion
mixing expansion and rendering	hard-to-debug macros	strict IR boundary
weak error spans	confusing compile errors	preserve metadata from source AST

7.2 Debugging Strategies

macro expansion inspect mode for fixture snippets.
IR snapshot diffing in tests.

7.3 Performance Traps

Building large concatenated strings eagerly can increase allocations; structured output buffering is preferable.

8. Extensions & Challenges

8.1 Beginner Extensions

add fragment nodes.
add conditional class helper macro.

8.2 Intermediate Extensions

strict accessibility mode (required alt/aria rules).
precompile template cache for hot paths.

8.3 Advanced Extensions

source-map style error mapping from expanded AST.
compile-time lints for deeply nested structures.

9. Real-World Connections

9.1 Industry Applications

compile-time HTML/template generation.
declarative UI DSLs.

Phoenix LiveView/HEEx: https://hexdocs.pm/phoenix_live_view/
Rust proc-macro ecosystem: https://doc.rust-lang.org/reference/procedural-macros.html

9.3 Interview Relevance

metaprogramming architecture.
compile-time validation strategies.

10. Resources

10.1 Essential Reading

McCord, Metaprogramming Elixir.
Elixir docs on macros and AST.

10.2 Video Resources

macro internals deep dives.
talks on compile-time DSL ergonomics.

10.3 Tools & Documentation

macro expansion inspection tools.
language AST docs.

11. Self-Assessment Checklist

11.1 Understanding

I can explain compile-time vs runtime responsibilities.
I can describe hygiene with an example.
I can justify my validation split.

11.2 Implementation

nested tag expansion works.
compile-time diagnostics include spans/hints.
deterministic expansion tests pass.

11.3 Growth

I can propose one strict-mode extension.
I can discuss macro DSL tradeoffs in interviews.
I can compare this approach to parser-based DSLs.

12. Submission / Completion Criteria

Minimum Viable Completion:

macro expansion for static nested tags.
one compile-time validation error and one successful render path.

Full Completion:

host control-flow support, strict validation, deterministic snapshots.

Excellence (Going Above & Beyond):

accessibility lint mode and advanced source mapping.

13 Additional Content Rules (Applied)

13.1 Determinism

Use fixed fixtures; verify expansion and render hashes.

13.2 Outcome Completeness

Include both compile success and compile failure demos with exit codes.

13.3 Cross-Linking

Concepts complement parser-based approaches in Project 2 and scale semantics in Project 7.

13.4 No Placeholder Text

All sections are concrete and implementable.