Project 7: “The Executable Spec with mdflow” — Literate Programming
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Markdown / Bash |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Literate Programming |
What you’ll build: A Markdown spec whose code blocks are executed and validated, keeping docs in sync with reality.
Why it teaches Executable Specs: Documentation that executes cannot rot.
Success criteria:
- The spec fails when code changes and passes after repair.
Real World Outcome
You’ll create a living specification document where every code example is automatically executed and validated. When your implementation changes, the spec either passes (proving docs are accurate) or fails (alerting you to update them).
Example: API Specification (api-spec.md)
# User Authentication API
## Creating a User
The `/api/users` endpoint accepts POST requests with email and password:
```bash
curl -X POST http://localhost:3000/api/users \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"secure123"}'
Expected response:
{
"id": "usr_abc123",
"email": "test@example.com",
"created_at": "2025-01-02T10:00:00Z"
}
When you run mdflow execute api-spec.md:
$ mdflow execute api-spec.md
Running: api-spec.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Block 1: curl POST /api/users
Status: 201 Created
Response matched expected JSON schema
✓ Block 2: Expected response validation
Field 'id' matches pattern: usr_[a-z0-9]+
Field 'email' equals: test@example.com
Field 'created_at' is valid ISO 8601
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
All blocks passed ✓ (2/2)
Execution time: 1.2s
When the API breaks:
$ mdflow execute api-spec.md
Running: api-spec.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✗ Block 1: curl POST /api/users
Status: 500 Internal Server Error
Expected: 201 Created
Response:
{
"error": "Database connection failed"
}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FAILED: 1 of 2 blocks failed
Execution time: 0.8s
This forces you to either fix the implementation or update the spec. Documentation can never drift from reality.
The Core Question You’re Answering
“How do I ensure my documentation stays synchronized with my actual codebase as it evolves?”
Most documentation becomes outdated within weeks of writing. Code examples break, APIs change, but the docs remain frozen in time. This project addresses the fundamental problem: passive documentation rots, executable documentation validates itself.
By embedding executable tests directly in your specification, you create a contract that must be maintained. When the contract breaks, CI fails, forcing alignment.
Concepts You Must Understand First
Stop and research these before coding:
- Literate Programming
- What did Donald Knuth mean by “programs as literature”?
- How does weaving code with narrative improve understanding?
- Why is order of presentation different from order of execution?
- Book Reference: “Literate Programming” by Donald E. Knuth
- Test-Driven Documentation
- How do executable examples serve as both docs and tests?
- What makes a good assertion in documentation?
- When should examples be simplified vs realistic?
- Book Reference: “Growing Object-Oriented Software, Guided by Tests” Ch. 2
- Markdown Processing
- How do you parse and extract fenced code blocks?
- What metadata can be attached to code blocks (language, annotations)?
- How do you preserve line numbers for error reporting?
- Web Reference: CommonMark Specification - Fenced Code Blocks
Questions to Guide Your Design
Before implementing, think through these:
- Execution Model
- How do you isolate each code block’s execution environment?
- Should blocks share state, or run independently?
- How do you handle blocks that depend on previous outputs?
- What happens if block 3 fails—do you run block 4?
- Assertion Syntax
- How do users specify expected outputs (inline, separate blocks)?
- Do you support regex matching, JSON schema validation, or both?
- How do you handle non-deterministic outputs (timestamps, IDs)?
- Should exit codes alone determine success, or stdout comparison?
- Language Support
- How do you execute different languages (bash, python, curl)?
- Do you need sandboxing (Docker containers, chroot)?
- How do you manage dependencies (language runtimes, system packages)?
- Should you support custom interpreters per project?
Thinking Exercise
Trace: Multi-Step API Workflow
Given this specification:
## User Workflow
Create a user:
```bash
USER_ID=$(curl -s POST /api/users -d '{"email":"test@example.com"}' | jq -r .id)
Verify creation:
curl GET "/api/users/$USER_ID"
Expected: {"id":"$USER_ID","email":"test@example.com"}
*Questions while designing:*
- How do you propagate `$USER_ID` from block 1 to block 2?
- Should the spec run in a single shell session, or fresh shells per block?
- What if `USER_ID` is empty because block 1 failed—should block 2 run?
- How do you validate that the returned ID matches the captured variable?
**Design Decision Matrix:**
| Approach | Pros | Cons |
|----------|------|------|
| Single shell session | State persists naturally | Pollution between tests |
| Environment variables | Explicit data flow | Manual propagation |
| JSON output files | Language-agnostic | Filesystem clutter |
---
#### The Interview Questions They'll Ask
1. "How would you design a system to execute code blocks from Markdown while preserving security boundaries?"
2. "Explain the tradeoffs between making documentation executable versus keeping separate test suites."
3. "How do you handle non-deterministic outputs (timestamps, random IDs) in executable documentation?"
4. "What strategies prevent test pollution when documentation blocks depend on shared state?"
5. "How would you integrate this into CI/CD to fail builds when documentation drifts from implementation?"
6. "Describe how you'd support multiple programming languages in a single specification document."
---
#### Hints in Layers
**Hint 1: Start with a Parser**
Use a Markdown parser (like `markdown-it` in Node.js or `mistune` in Python) to extract fenced code blocks. Store metadata (language, line numbers) for each block.
**Hint 2: Execution Strategy**
For each code block:
- Write code to a temporary script file
- Execute using the appropriate interpreter (`bash`, `python3`, `node`)
- Capture stdout, stderr, and exit code
- Compare against expected outputs (if specified)
**Hint 3: State Management**
Create a temporary directory as a "sandbox workspace":
/tmp/mdflow-session-abc123/ ├── block-1.sh ├── block-1.stdout ├── block-2.sh └── shared.env # Environment variables for state
**Hint 4: Assertion Annotations**
Support special comments for assertions:
```markdown
```bash
curl /api/users/123
# expect-status: 200
# expect-json: {"id":"123"}
```
Parse these comments to build validation rules.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Literate Programming Philosophy | “Literate Programming” by Donald E. Knuth | Introduction & Ch. 1 |
| Test-Driven Development | “Test Driven Development: By Example” by Kent Beck | Part I |
| Markdown Parsing | “Crafting Interpreters” by Robert Nystrom | Ch. 4 (Scanning) |
| Documentation as Code | “Docs for Developers” by Jared Bhatti et al. | Ch. 6 |
Common Pitfalls & Debugging
Problem 1: “Code blocks fail due to missing dependencies”
- Why: Spec assumes tools are installed (curl, jq, etc.)
- Fix: Add a validation phase that checks for required binaries before execution
- Quick test:
command -v curl || echo "Missing curl"
Problem 2: “Non-deterministic outputs cause false failures”
- Why: Timestamps, UUIDs, or random data change every run
- Fix: Support regex patterns or placeholder matching (
expect-pattern: usr_[a-z0-9]+) - Quick test: Replace exact matches with pattern assertions
Problem 3: “State leaks between blocks”
- Why: Environment variables, temp files, or database records persist
- Fix: Run each block in a fresh subprocess with isolated environment
- Quick test: Add
set -uto bash blocks to catch undefined variables
Problem 4: “Error messages don’t point to the right line in the spec”
- Why: You’re losing line number context during extraction
- Fix: Store original line numbers when parsing, include them in error reports
- Quick test:
Error in api-spec.md:15 (block 2)
Definition of Done
- Parser extracts all fenced code blocks with metadata (language, line numbers)
- Executor runs bash and at least one other language (Python or curl)
- Assertions validate exit codes and stdout/stderr content
- Failed blocks produce clear error messages with file/line references
- Spec execution stops on first failure (or continues with
--keep-goingflag) - Environment isolation prevents state leaks between blocks
- README includes example spec demonstrating success and failure cases
- CI integration example shows how to fail builds on spec failures