Project 32: “The Reverse Documenter” — Documentation
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Markdown |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Documentation |
What you’ll build: Generate docs and tests from an undocumented function.
Why it teaches Verification: If the AI cannot explain it, the code is too complex.
Success criteria:
- A new unit test covers the documented edge cases.
Real World Outcome
You’ll have a Kiro workflow that analyzes undocumented code and generates:
- JSDoc/docstring comments explaining purpose, parameters, return values, and edge cases
- Unit tests covering normal cases, edge cases, and error conditions
- README sections documenting public APIs
- Type definitions (TypeScript interfaces, Python type hints)
Example Session:
$ kiro "Document the calculateDiscount function in src/pricing.ts and generate tests"
🔍 Analyzing undocumented function...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Reading: src/pricing.ts
Function: calculateDiscount(price: number, userTier: string, promoCode?: string)
Static Analysis Results:
├─ Parameters: 3 (2 required, 1 optional)
├─ Return type: number
├─ Branches: 8 (if/else conditions)
├─ External calls: 2 (fetchPromoDetails, getUserHistory)
└─ Edge cases detected: 5 (null checks, tier validation, promo expiration)
📝 Generating Documentation...
/**
* Calculates the final discount percentage for a purchase.
*
* Combines base tier discount with promotional codes. Tier discounts
* stack multiplicatively with promo codes. Premium tier users receive
* an additional 5% on top of promo discounts.
*
* @param price - The original price before any discounts (must be > 0)
* @param userTier - User membership tier: 'free', 'pro', or 'premium'
* @param promoCode - Optional promotional code (e.g., 'SAVE20')
* @returns Final discount percentage (0-100)
*
* @throws {Error} If price <= 0 or userTier is invalid
*
* @example
* calculateDiscount(100, 'pro', 'SAVE20') // Returns 25 (5% tier + 20% promo)
* calculateDiscount(100, 'premium', 'SAVE20') // Returns 30 (5% tier + 20% promo + 5% premium bonus)
*/
✅ Documentation added to src/pricing.ts
🧪 Generating Test Cases...
describe('calculateDiscount', () => {
it('should apply tier discount only when no promo code', () => {
expect(calculateDiscount(100, 'pro')).toBe(5);
});
it('should stack tier and promo discounts', () => {
expect(calculateDiscount(100, 'pro', 'SAVE20')).toBe(25);
});
it('should apply premium bonus to promo codes', () => {
expect(calculateDiscount(100, 'premium', 'SAVE20')).toBe(30);
});
it('should throw error for negative price', () => {
expect(() => calculateDiscount(-10, 'pro')).toThrow('Price must be positive');
});
it('should throw error for invalid tier', () => {
expect(() => calculateDiscount(100, 'gold')).toThrow('Invalid user tier');
});
it('should return 0 for expired promo codes', () => {
expect(calculateDiscount(100, 'pro', 'EXPIRED2023')).toBe(5); // Only tier discount
});
});
✅ Test file created: src/__tests__/pricing.test.ts
✅ 6 test cases covering all branches
📊 Coverage Analysis:
├─ Line coverage: 100%
├─ Branch coverage: 100%
└─ Edge cases covered: 5/5
💡 Complexity Signal: calculateDiscount has a cyclomatic complexity of 8.
Consider refactoring into smaller functions if it grows further.
What You See:
- Auto-generated JSDoc comments explaining every parameter, return value, and edge case
- Complete test suite covering happy paths, edge cases, and error conditions
- Coverage report showing 100% branch coverage
- Complexity warnings when functions are too dense
The Core Question You’re Answering
“How can AI reverse-engineer intent from undocumented code, and can it generate tests that prove its understanding is correct?”
This project forces you to confront the verification problem: if Kiro generates documentation that sounds plausible but is wrong, the tests will fail. This feedback loop ensures the AI actually understands the code, not just pattern-matches documentation style.
Concepts You Must Understand First
Stop and research these before coding:
- Static Code Analysis (AST Parsing)
- What is an Abstract Syntax Tree and how do you traverse it?
- How do you extract function signatures, parameter types, and control flow?
- How do you detect edge cases (null checks, boundary conditions)?
- Book Reference: “Compilers: Principles and Practice” by Parag H. Dave - Ch. 2-3
- Test Generation Strategies
- What is the difference between property-based testing and example-based testing?
- How do you identify equivalence classes for input partitioning?
- What is branch coverage vs line coverage vs path coverage?
- Book Reference: “The Art of Software Testing” by Glenford J. Myers - Ch. 4-5
- Documentation Standards
- What are JSDoc, docstring, and XML documentation comment conventions?
- How do you write documentation that survives refactoring?
- What level of detail is appropriate for public vs private APIs?
- Reference: JSDoc specification, PEP 257 (Python Docstring Conventions)
- Cyclomatic Complexity
- How do you measure code complexity (McCabe metric)?
- Why does high complexity correlate with bugs?
- When should you refactor based on complexity scores?
- Book Reference: “Code Complete” by Steve McConnell - Ch. 19
Questions to Guide Your Design
Before implementing, think through these:
- Code Understanding
- How will you parse the target function (AST parser vs regex vs LLM-based)?
- How will you identify edge cases (static analysis vs symbolic execution)?
- How will you handle external dependencies (mocking vs integration tests)?
- How will you detect the function’s actual behavior vs its intended behavior?
- Documentation Quality
- How will you validate that generated docs match actual behavior?
- How will you avoid hallucinating functionality that doesn’t exist?
- How will you decide which details to include vs omit?
- How will you maintain docs when code changes (watch for drift)?
- Test Coverage
- How will you ensure tests actually validate the documented behavior?
- How will you generate realistic test data (random vs domain-specific)?
- How will you avoid brittle tests that break on refactoring?
- How will you measure test quality (mutation testing)?
Thinking Exercise
Exercise: Analyze This Undocumented Function
Given this undocumented JavaScript function:
function process(data, opts) {
if (!data) return [];
const result = [];
const limit = opts?.max || 100;
for (let i = 0; i < data.length && i < limit; i++) {
if (data[i].status === 'active' || opts?.includeInactive) {
result.push({
...data[i],
processed: true,
timestamp: Date.now()
});
}
}
return opts?.reverse ? result.reverse() : result;
}
Questions while analyzing:
- What are the possible input types for
dataandopts? - What are all the edge cases (null data, empty array, missing opts, etc.)?
- What is the function’s actual purpose based on its behavior?
- What would be a good name for this function?
- What test cases would prove you understand its behavior?
- What happens if
datais not an array? Should that be documented/tested?
Expected Documentation:
/**
* Filters and processes active records from a dataset, with optional limits and ordering.
*
* @param {Array<{status: string}>} data - Array of objects with at least a `status` field
* @param {Object} [opts] - Optional configuration
* @param {number} [opts.max=100] - Maximum number of records to process
* @param {boolean} [opts.includeInactive=false] - Whether to include non-active records
* @param {boolean} [opts.reverse=false] - Whether to reverse the output order
* @returns {Array<Object>} Processed records with added `processed` and `timestamp` fields
*
* @example
* process([{status: 'active', id: 1}], {max: 50})
* // Returns: [{status: 'active', id: 1, processed: true, timestamp: 1704211234567}]
*/
Expected Test Cases:
- Returns empty array when data is null/undefined
- Filters out inactive records by default
- Includes inactive records when
opts.includeInactiveis true - Limits output to
opts.maxrecords - Reverses output when
opts.reverseis true - Adds
processed: trueand current timestamp to each record
The Interview Questions They’ll Ask
-
“How would you detect if AI-generated documentation is hallucinating functionality that doesn’t exist in the code?”
-
“Explain the difference between documenting what code does vs why it does it. Which should AI focus on?”
-
“How would you validate that generated tests actually cover the documented edge cases?”
-
“What strategies would you use to keep documentation in sync with code as it evolves?”
-
“How would you measure the quality of AI-generated tests (beyond simple code coverage)?”
-
“Explain how mutation testing could validate that your tests actually catch bugs, not just execute lines.”
Hints in Layers
Hint 1: AST-Based Analysis Use a proper parser (TypeScript Compiler API, Babel, tree-sitter) to extract:
- Function signature (name, parameters, return type)
- Control flow branches (if/else, switch, loops)
- External dependencies (function calls, imports)
- Type annotations (if available)
Hint 2: Edge Case Detection Look for these patterns in the AST:
if (!x)orif (x == null)→ null checkif (arr.length === 0)→ empty array checkif (x < 0)orif (x > MAX)→ boundary conditionsthrow new Error(...)→ error casestry/catch→ exception handling
Hint 3: Test Generation Strategy For each branch in the code:
- Generate a test that triggers that branch
- Assert the expected output for that branch
- Add a test for the inverse condition (branch not taken)
- Add boundary tests (min, max, just-above, just-below)
Hint 4: Documentation Validation Loop
1. Generate documentation from code analysis
2. Generate tests from documentation
3. Run tests against actual code
4. If tests fail → documentation was wrong → regenerate
5. If tests pass → documentation matches behavior ✓
Hint 5: Complexity Signals If a function has:
- Cyclomatic complexity > 10 → suggest refactoring before documenting
- More than 5 parameters → suggest object parameter pattern
- Deeply nested logic → suggest extracting helper functions
- No return type annotation → infer and suggest adding it
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| AST parsing and code analysis | “Compilers: Principles and Practice” by Parag H. Dave | Ch. 2-3 (Lexical/Syntax Analysis) |
| Test generation strategies | “The Art of Software Testing” by Glenford J. Myers | Ch. 4-5 (Test Case Design) |
| Code complexity metrics | “Code Complete” by Steve McConnell | Ch. 19 (Complexity Management) |
| Documentation best practices | “Clean Code” by Robert C. Martin | Ch. 4 (Comments) |
| Property-based testing | “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert | Ch. 1-3 |
Common Pitfalls & Debugging
Problem 1: “Generated docs claim the function does X, but tests show it does Y”
- Why: LLM hallucinated functionality based on function name, not actual code behavior
- Fix: Always validate docs against actual execution (run tests)
- Quick test:
npm test -- --coverageand check if tests pass
Problem 2: “Tests are too brittle - they break when code is refactored”
- Why: Tests are coupled to implementation details, not behavior
- Fix: Test public API behavior, not internal implementation
- Example: Test
calculateDiscount(100, 'pro') === 5notexpect(tierDiscountMap['pro']).toBe(0.05)
Problem 3: “AST parser fails on modern JavaScript syntax (optional chaining, nullish coalescing)”
- Why: Using outdated parser or wrong parser configuration
- Fix: Use TypeScript Compiler API or Babel with latest preset
- Quick test:
npx tsc --version(ensure TypeScript 5.x+)
Problem 4: “Generated tests have 100% line coverage but miss critical bugs”
- Why: Line coverage doesn’t measure test quality, only execution
- Fix: Add mutation testing (Stryker) to validate tests catch bugs
- Quick test:
npx stryker runand check mutation score
Problem 5: “Function is too complex to document clearly (cyclomatic complexity 20+)”
- Why: Function violates Single Responsibility Principle
- Fix: Suggest refactoring before documenting: “This function is too complex. Consider breaking it into smaller functions: extractActiveRecords(), applyLimit(), applyTransform()”
- Signal: If you can’t write clear docs, the code is too complex
Definition of Done
- Generated documentation includes purpose, all parameters, return value, and examples
- All documented edge cases have corresponding test cases
- Tests achieve 100% branch coverage (not just line coverage)
- Tests pass when run against the actual code
- Documentation follows language conventions (JSDoc/docstring/XML doc)
- Complexity warnings are shown for functions with cyclomatic complexity > 10
- Generated tests use realistic test data (not just
foo,bar,123) - Tests are independent (no shared state between tests)
- Error cases are documented and tested (throw conditions, edge cases)
- Public API documentation includes usage examples