Project 1: Fluent Query Builder (Internal DSL)
Build an internal DSL that lets teams compose valid SQL-like queries through fluent, chainable calls with structural constraints.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 1: Beginner |
| Time Estimate | Weekend (6-10 focused hours) |
| Main Programming Language | Python |
| Alternative Programming Languages | Ruby, Kotlin, TypeScript |
| Coolness Level | Level 2: Practical but Forgettable |
| Business Potential | 2. The “Micro-SaaS / Pro Tool” |
| Prerequisites | Basic classes/objects, SQL clause order, string formatting safety basics |
| Key Topics | Fluent interfaces, builder state machines, DSL constraints, SQL rendering |
1. Learning Objectives
By completing this project, you will:
- Design a fluent internal DSL that reads like domain language and preserves call intent.
- Enforce a valid query construction sequence through explicit state constraints.
- Render deterministic SQL output from accumulated builder state.
- Detect and explain invalid API usage with actionable errors.
- Compare internal DSL tradeoffs against external DSLs before moving into parser-heavy projects.
2. All Theory Needed (Per-Concept Breakdown)
Fluent Interfaces as Language Design
Fundamentals A fluent interface is not “just method chaining.” It is an API design style where each call narrows intent and preserves context, so code reads as a domain sentence. In an internal DSL, the host language parser is reused, so the language author focuses on semantics and constraints rather than grammar parsing. The key property is continuity: each call should produce an object that represents the next legal stage of expression. If everything returns the same unrestricted object, you get convenient chaining but weak language guarantees. If each stage returns a narrower surface, you get stronger guarantees and better discoverability. This project teaches the practical middle ground: enough freedom for real queries and enough structure to prevent nonsensical chains.
Deep Dive into the concept
Treat fluent DSL design as language engineering in miniature. You are defining a syntax (method names and arguments), semantics (how calls combine), and validity rules (what sequences are legal). The host language gives you lexical and parser infrastructure “for free,” but all semantic rigor remains your responsibility. The first design decision is vocabulary: should the DSL expose SQL terms directly (select, from_table, where) or abstract business verbs (find_users, active_only)? Direct vocabulary shortens onboarding for SQL-savvy developers and maps cleanly to rendering. Domain vocabulary can improve clarity for narrow teams but may hide important details or leak business assumptions.
The second decision is staging. A naive builder allows any method in any order and validates only at the end. This is easy to implement but creates delayed errors: users can write long chains only to fail during rendering. Staged builders encode legal transitions in the API shape. For example, a SelectStage can offer from_table, but not limit; a FromStage can offer where, order_by, limit, and to_sql. This produces earlier feedback and stronger IDE guidance. The tradeoff is implementation complexity and potential verbosity in type-heavy languages.
A third decision is multiplicity. Some clauses may repeat (where conditions), others should be singular (from_table, limit). Repeated clauses require ordered accumulation and well-defined join behavior (AND/OR policy). Singular clauses require idempotency rules: second call overwrites, appends, or errors. Explicit policy is crucial for predictability.
Fourth is rendering determinism. DSL output should be stable so tests and diffs remain meaningful. Determinism requires canonical clause order, canonical spacing, and deterministic ordering of generated placeholders. If you let internal maps drive output order, behavior may drift across runtimes or versions.
Fifth is safety boundary. In this project, you are learning language structure, not building a full SQL sanitizer. Still, your design should separate identifiers from values and leave a clear extension point for parameter binding. The DSL should prevent obvious concatenation abuse by making value interpolation explicit (where_eq("age", 18) style) even if the first milestone supports raw expressions.
Finally, fluent APIs require error UX. Error messages are part of the language surface. “Invalid call” is not enough; messages should state expected stage and valid next methods. This mirrors compiler diagnostics in larger languages. As you progress to external DSLs in P02/P03, this mindset transfers directly: language usability is as important as raw parser correctness.
How this fit on projects
- Directly used in this project for stage-by-stage query construction.
- Reused in Project 3 when shaping expression builders.
- Reused in Project 4 when creating business-friendly rule authoring helpers.
Definitions & key terms
- Fluent interface: API style where chained calls form readable domain expressions.
- Internal DSL: language embedded inside a host language syntax.
- Stage: constrained API surface representing one valid point in a build pipeline.
- Terminal operation: call that materializes output (
to_sql) and ends composition. - Deterministic rendering: stable output for identical logical input.
Mental model diagram
Intent sentence
|
v
[SelectStage] --from_table--> [FromStage] --where/order/limit--> [QueryableStage] --to_sql--> SQL
| | |
invalid calls clause accumulation canonical renderer
How it works (step-by-step, invariants, failure modes)
- Start with a root builder in
SelectStage. select(...)records projection list; invariant: non-empty selection.from_table(...)records source; invariant: exactly one source table.- Optional calls accumulate conditions/order/limit; invariant: each clause validated by type/shape.
to_sql()emits canonical clause order.- Failure mode: calling
wherebeforefrom_tableshould fail with stage-aware guidance. - Failure mode: empty select list or invalid limit should fail fast with parameter-specific errors.
Minimal concrete example
Pseudocode chain:
query = Query.start()
.select("id", "name")
.from_table("users")
.where_eq("status", "active")
.order_by("created_at", "DESC")
.limit(20)
.to_sql()
Expected output:
SELECT id, name FROM users WHERE status = ? ORDER BY created_at DESC LIMIT 20
params = ["active"]
Common misconceptions
- “Method chaining alone is a DSL.” -> Chaining without constraints is mostly syntax sugar.
- “Internal DSLs remove language design complexity.” -> They remove parser work, not semantics work.
- “Validation can wait until execution.” -> Late failures degrade developer feedback loops.
Check-your-understanding questions
- Why is returning different stage objects better than always returning
self? - Which invariant should guard
limitvalues? - Explain why deterministic rendering matters for test reliability.
- Predict behavior: what should happen if
from_tableis called twice?
Check-your-understanding answers
- It encodes legal next operations and prevents invalid sequences earlier.
limitmust be a positive integer within explicit project bounds.- It keeps snapshots, golden tests, and debugging stable across runs.
- Prefer explicit error; silent overwrite hides mistakes.
Real-world applications
- Query builders in ORMs and analytics SDKs.
- Search APIs that compile to SQL, Elasticsearch, or document queries.
- Domain-specific workflow builders in internal tooling.
Where you’ll apply it
- §3.2 Functional requirements (stage constraints).
- §4.1 High-level design (component separation).
- §5.10 Phase 2 (core chain semantics).
- Also used in Project 3 and Project 7.
References
- Martin Fowler, Domain Specific Languages, Ch. 35.
- GoF, Design Patterns, Builder pattern.
- Bill Karwin, SQL Antipatterns, query structure sections.
Key insights An internal DSL succeeds when legal expression paths are easier than illegal ones.
Summary Fluent DSLs are language design through API shape. Strong staging, explicit invariants, and deterministic output turn a chainable API into a trustworthy language interface.
Homework/Exercises to practice the concept
- Draw two API surfaces: unconstrained chain vs staged chain.
- Define singular vs repeatable clauses and policies.
- Write three error messages for invalid call order.
Solutions to the homework/exercises
- Staged chain should hide invalid methods per step.
- Singular:
from,limit; repeatable:where,join. - Example: “
where()requiresfrom_table()first. Next valid methods: from_table().”
Builder State Machines and Validity Constraints
Fundamentals A builder is a state machine disguised as an object API. Each call transitions from one valid configuration state to another until a terminal state can be rendered. Thinking in states makes hidden assumptions explicit: what is required, what is optional, what can repeat, and what is forbidden. This prevents accidental coupling between method order and internal mutation quirks. In DSL work, the state machine model also helps define diagnostics and tests: each state has allowed transitions and expected failures. Instead of testing “does SQL look right” only, you can test “does this illegal transition fail with the right message.”
Deep Dive into the concept State-machine modeling gives you three critical benefits: correctness, explainability, and extensibility. Correctness comes from enumerating legal states and transitions before coding. Explainability comes from mapping user actions to state names in diagnostics. Extensibility comes from adding new clauses as controlled transitions instead of ad-hoc conditional logic.
Begin by identifying mandatory milestones: projection and source are mandatory; conditions/order/limit are optional. This yields a minimal finite-state graph:
S0(new query, no projection)S1(projection chosen)S2(source chosen, query core complete)S3(optional modifiers accumulated)S4(rendered/finalized)
You may merge S2 and S3, but conceptually they differ: S2 means query is valid enough to run; S3 means query has extra constraints. Transitions should include precondition checks and side effects. For instance, select(cols) from S0 -> S1 with precondition cols not empty, side effect state.select = cols. from_table(name) from S1 -> S2 with precondition name valid identifier. where(cond) from S2|S3 -> S3 with precondition cond parseable or typed. to_sql() from S2|S3 -> S4 with invariant checks.
Representing this in code can follow either dynamic or static enforcement. Dynamic enforcement keeps one object with phase enum and runtime guards. Static enforcement returns typed stage objects. Dynamic is simpler in Python; static is stronger in Kotlin/TypeScript. You can mix both: expose staged facade externally and keep internal enum for defense-in-depth.
A subtle but important detail is idempotency. If users call the same optional method repeatedly, what happens? Example policies:
order_by: overwrite previous call (last-write-wins) or append (multi-column order).limit: strict single assignment with error on second call.where: append with implicitAND. Policy choice should match mental model and be documented.
Testing should mirror graph coverage. For each transition, write one happy-path case and one failed-precondition case. For each state, test forbidden method calls. This creates a compact but strong suite and prevents regressions when new clauses are added.
Failure modes include hidden implicit transitions, mutable aliasing, and “finalized but still mutable” objects. Hidden transitions happen when helper methods auto-infer missing clauses; they reduce explicitness and surprise users. Mutable aliasing happens when chain calls share internal mutable arrays across cloned builders; one query can leak into another. Finalized mutability causes to_sql output drift after serialization. Protect against this by freezing state post-render or by immutability by default.
This model transfers directly to parser projects. In external DSLs, parser state and AST validation are just richer versions of builder-state validation. If you internalize state machines here, later grammar and semantic phases feel far less abstract.
How this fit on projects
- Used in this project for staged query construction and deterministic terminal rendering.
- Reused in Project 2 for parser phase states.
- Reused in Project 6 for lexer-mode transitions.
Definitions & key terms
- State machine: finite set of states with allowed transitions.
- Precondition: rule that must be true before transition.
- Invariant: property always true in valid states.
- Terminal state: no further mutating transitions allowed.
- Transition coverage: testing approach that exercises state edges.
Mental model diagram
S0(new)
--select--> S1
S1
--from--> S2
S2
--where/order/limit--> S3 (repeat)
S2/S3
--to_sql--> S4(final)
Any illegal edge -> structured error
How it works
- Define state graph and transition table.
- Attach preconditions and side effects per edge.
- Enforce at API boundaries.
- Freeze or copy-on-write after terminal calls.
- Validate with transition-coverage tests.
Minimal concrete example
Transition table excerpt
- from S0: allow select
- from S1: allow from_table
- from S2: allow where, order_by, limit, to_sql
- from S4: allow none
Common misconceptions
- “State machines are overkill for simple APIs.” -> They prevent ambiguous behavior early.
- “Runtime checks are enough without explicit states.” -> Explicit states make diagnostics and tests clearer.
- “Terminal operations can still mutate safely.” -> Post-finalization mutation breaks determinism.
Check-your-understanding questions
- What invariant should always hold before
to_sql? - Why is transition coverage better than only snapshot tests?
- Predict failure: calling
limitinS1.
Check-your-understanding answers
- Projection and source must both exist and be valid.
- It catches illegal path acceptance, not just output formatting.
- Should fail with stage error:
limitrequiresfrom_tablefirst.
Real-world applications
- Payment API request builders.
- Cloud policy authoring SDKs.
- CI pipeline configuration builders.
Where you’ll apply it
- §4.2 Key components (
QueryState,Renderer,Validator). - §5.10 implementation phases and checkpoints.
- Also used in Project 2.
References
- Fowler, Domain Specific Languages.
- Cooper & Torczon, Engineering a Compiler (semantic invariants mindset).
Key insights If you can draw the state graph, you can implement and test the DSL behavior predictably.
Summary Builder APIs become reliable when modeled as explicit finite-state machines with strict transition rules.
Homework/Exercises to practice the concept
- Add
group_bytransition without breaking existing states. - Define policy for repeated
order_bycalls. - Create one invalid transition and expected diagnostic.
Solutions to the homework/exercises
- Add from
S2|S3 -> S3; keep to_sql legality unchanged. - Recommended: append list for multi-column ordering.
- Example:
to_sqlfromS1-> “missing source table.”
3. Project Specification
3.1 What You Will Build
Build a small library that exposes a fluent API for query composition and renders canonical SQL strings plus ordered parameters. Included scope:
select,from_table,where_eq/where_raw,order_by,limit,to_sql.- stage-aware validation and clear errors.
- deterministic rendering and deterministic test fixtures.
Excluded scope:
- full SQL dialect coverage.
- joins, subqueries, nested expressions.
- database driver integration.
3.2 Functional Requirements
- Fluent API: Query methods chain in readable order.
- Stage Enforcement: Invalid method order fails immediately.
- Clause Accumulation: Multiple where clauses combine predictably.
- Renderer: Output SQL in canonical clause order.
- Parameters: Values separated from SQL text for binding safety.
- Diagnostics: Errors include stage, method, and valid alternatives.
3.3 Non-Functional Requirements
- Performance: render 10k simple queries in under 500 ms on reference machine.
- Reliability: deterministic output for same logical input across runs.
- Usability: errors understandable by junior developers in under 30 seconds.
3.4 Example Usage / Output
Input chain:
select("id", "email") -> from_table("users") -> where_eq("active", true) -> limit(5)
Output:
SQL: SELECT id, email FROM users WHERE active = ? LIMIT 5
Params: [true]
3.5 Data Formats / Schemas / Protocols
Internal state schema:
QueryState {
select_fields: [Identifier],
from_table: Identifier,
where_clauses: [Predicate],
order_clause: Optional<OrderSpec>,
limit_value: Optional<Int>,
phase: Enum(S0,S1,S2,S3,S4)
}
3.6 Edge Cases
- empty projection list -> fail.
from_tablecalled twice -> fail.- negative/zero
limit-> fail. - unsupported sort direction -> fail with allowed values.
- raw where expression empty string -> fail.
3.7 Real World Outcome
A team can author query logic in application code using a readable DSL and get deterministic SQL plus parameter arrays for execution.
3.7.1 How to Run (Copy/Paste)
cd project_based_ideas/COMPILERS_RUNTIMES/DOMAIN_SPECIFIC_LANGUAGES_DSL_PROJECTS
make p01-test
make p01-demo
Required env/config: none.
3.7.2 Golden Path Demo (Deterministic)
- Fixed inputs: table
users, fieldsid,name, filterstatus=active, limit3. - Expected SQL and params exactly match fixture text file.
3.7.3 If CLI: exact terminal transcript
$ ./bin/p01-query-dsl-demo --fixture fixtures/p01_golden.json
[info] Building query from fixture: p01_golden.json
[ok] SQL : SELECT id, name FROM users WHERE status = ? ORDER BY created_at DESC LIMIT 3
[ok] ARGS : ["active"]
[ok] Deterministic hash: 1f7f4c0f
exit=0
$ ./bin/p01-query-dsl-demo --invalid-case where-before-from
[error] Invalid DSL sequence: where_eq() is not allowed in phase SELECT_DEFINED.
[hint] Call from_table() first.
exit=2
4. Solution Architecture
4.1 High-Level Design
Caller API
|
v
Fluent Stages ----> Validator ----> Canonical Renderer ----> SQL + params
| |
+----> QueryState ---+
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
QueryStage façade |
Expose chainable methods | staged interfaces over one core state |
QueryState |
Hold accumulated clauses | explicit phase enum + immutable snapshots |
Validator |
Check transitions and arguments | fail fast at call-time |
Renderer |
Emit SQL + params | canonical order and stable spacing |
4.4 Data Structures (No Full Code)
record Predicate {
field_name: string
operator: enum(EQ, RAW)
value: any
}
record QueryState {
phase: enum
fields: list<string>
table: optional<string>
predicates: list<Predicate>
order: optional<(field, direction)>
limit: optional<int>
}
4.4 Algorithm Overview
Key Algorithm: Canonical SQL Rendering
- Validate required state invariants.
- Start with
SELECTand join fields in insertion order. - Append
FROMtable. - Append
WHEREpredicates joined byAND. - Append
ORDER BYandLIMITif present. - Collect parameter list in predicate traversal order.
Complexity Analysis
- Time: O(c) where c = number of clauses and predicates.
- Space: O(c) for output buffer and parameter vector.
5. Implementation Guide
5.1 Development Environment Setup
# Use your preferred language tooling; example workflow:
mkdir -p bin fixtures tests
5.2 Project Structure
p01-fluent-dsl/
├── src/
│ ├── stages.*
│ ├── state.*
│ ├── validator.*
│ └── renderer.*
├── fixtures/
│ ├── p01_golden.json
│ └── p01_invalid_where_before_from.json
├── tests/
│ ├── transition_tests.*
│ └── rendering_tests.*
└── README.md
5.3 The Core Question You’re Answering
“How can a fluent API behave like a language and still prevent invalid grammar states?”
5.4 Concepts You Must Understand First
- Fluent interfaces and staged builders.
- SQL clause order as a grammar-like constraint.
- Deterministic output contracts for golden tests.
- Fail-fast API diagnostics.
5.5 Questions to Guide Your Design
- Should stages be represented by types, runtime flags, or both?
- What policy will you use for repeated optional clauses?
- How will you guarantee stable parameter ordering?
- What makes an error message immediately actionable?
5.6 Thinking Exercise
Draw two call traces:
- legal:
select -> from -> where -> to_sql - illegal:
select -> where
For each, annotate state transitions and expected diagnostics.
5.7 The Interview Questions They’ll Ask
- Internal vs external DSL: what do you gain/lose?
- Why do staged builders improve correctness?
- How do you avoid SQL injection in builder renderers?
- How would you add joins without exploding API complexity?
- Why is deterministic rendering important in CI?
5.8 Hints in Layers
Hint 1: Starting Point Model the builder as a finite-state machine before implementing methods.
Hint 2: Next Level Split rendering from validation so tests isolate failures.
Hint 3: Technical Details Represent predicates as structured records, not concatenated strings.
Hint 4: Tools/Debugging
Add a debug_state() endpoint for tests to inspect intermediate state.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Fluent DSL design | Domain Specific Languages (Fowler) | Ch. 35 |
| Builder constraints | Design Patterns (GoF) | Builder |
| Query correctness | SQL Antipatterns (Karwin) | Intro + query basics |
5.10 Implementation Phases
Phase 1: Foundation (2-3 hours)
Goals: state model + minimum happy path.
Tasks:
- Define
QueryStateand phases. - Implement
select,from_table,to_sqlbaseline.
Checkpoint: deterministic SQL for two fixed fixtures.
Phase 2: Core Functionality (3-4 hours)
Goals: full clause support + validation.
Tasks:
- Add
where_eq,order_by,limit. - Add transition and argument validators.
Checkpoint: invalid sequences produce precise errors.
Phase 3: Polish & Edge Cases (1-3 hours)
Goals: diagnostics + reproducibility.
Tasks:
- Add golden fixtures and deterministic hash check.
- Document extension points (joins, group by).
Checkpoint: all tests pass; docs explain behavior decisions.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Stage enforcement | runtime flag / typed stages | both where possible | balances ergonomics and safety |
| where API | raw strings / structured predicates | structured first | safer and easier to validate |
repeated order_by |
overwrite / append | append | supports multi-key sorting naturally |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit tests | clause/state logic | transition legality, argument checks |
| Integration tests | end-to-end chain -> SQL | multi-clause rendering + params |
| Edge tests | boundary and misuse | empty select, duplicate from, bad limit |
6.2 Critical Test Cases
- Happy path with all clauses -> exact SQL/params match fixture.
wherebeforefrom-> stage error with hint.- Determinism -> same chain repeated 1000 times yields same output hash.
6.3 Test Data
fixtures:
- p01_golden.json
- p01_minimal.json
- p01_invalid_where_before_from.json
- p01_invalid_limit_zero.json
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Late validation only | errors appear at render time | validate at each transition |
| String concatenation for values | unsafe SQL and quote bugs | structured predicates + params |
| Hidden mutability | prior queries change unexpectedly | immutable snapshots or deep copy |
7.2 Debugging Strategies
- Transition tracing: log
from_state -> method -> to_state. - Renderer snapshots: capture token list before joining SQL string.
7.3 Performance Traps
Repeated string concatenation in loops can cause avoidable allocations; prefer buffered assembly.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
offsetwith validation. - Add
where_in(field, values).
8.2 Intermediate Extensions
- Add
group_byandhavingwith invariants. - Add parameter placeholder style switching (
?vs$1).
8.3 Advanced Extensions
- Add join support with alias validation.
- Add dialect adapter layer (PostgreSQL/MySQL).
9. Real-World Connections
9.1 Industry Applications
- ORM query builders: expressive APIs with constraint guards.
- Analytics pipelines: user-authored filter/query configs compiled to SQL.
9.2 Related Open Source Projects
- SQLAlchemy Core: https://www.sqlalchemy.org/ - expression-based query construction.
- jOOQ: https://www.jooq.org/ - typed fluent SQL DSL.
9.3 Interview Relevance
- API design under correctness constraints.
- State-machine modeling in non-parser language tools.
- Deterministic rendering/test strategy.
10. Resources
10.1 Essential Reading
- Domain Specific Languages (Martin Fowler), fluent interface chapters.
- Design Patterns (GoF), builder and object construction patterns.
- SQL Antipatterns (Bill Karwin), safe query composition mindset.
10.2 Video Resources
- Fowler talks on DSL design patterns.
- API design talks focused on fluent builders and ergonomics.
10.3 Tools & Documentation
- Python typing docs (if using Python typing for staged interfaces).
- SQL style guides for deterministic query formatting.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain why this is an internal DSL, not just a helper API.
- I can describe state transitions and invariants without notes.
- I can justify each error policy decision.
11.2 Implementation
- All functional requirements are implemented.
- Deterministic golden tests pass.
- Invalid transitions return actionable diagnostics.
11.3 Growth
- I documented one API tradeoff I would change in v2.
- I can discuss this project in interview format.
- I can map this project to external DSL work in P02/P03.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Valid fluent chain for select/from/where/limit.
- Deterministic SQL + params output.
- At least 5 invalid-path tests with clear errors.
Full Completion:
- All minimum criteria plus:
- Stage-aware interfaces and comprehensive transition tests.
- Golden-path and failure-path CLI demos with explicit exit codes.
Excellence (Going Above & Beyond):
- Join/group-by extensions with preserved invariants.
- Dialect abstraction and benchmark notes.
13 Additional Content Rules (Applied)
13.1 Determinism
Use fixed fixture inputs and stable rendering order. Include hash assertion in tests.
13.2 Outcome Completeness
Include one golden demo and one failure demo with explicit exit=0 and exit=2.
13.3 Cross-Linking
This project prepares you for Project 2 and Project 3.
13.4 No Placeholder Text
All sections in this file are intentionally concrete and executable as a learning plan.