Project 32: "The Reverse Documenter" — Documentation

Project 32: “The Reverse Documenter” — Documentation

Attribute	Value
File	`KIRO_CLI_LEARNING_PROJECTS.md`
Main Programming Language	Markdown
Coolness Level	Level 2: Practical
Difficulty	Level 2: Intermediate
Knowledge Area	Documentation

What you’ll build: Generate docs and tests from an undocumented function.

Why it teaches Verification: If the AI cannot explain it, the code is too complex.

Success criteria:

A new unit test covers the documented edge cases.

Real World Outcome

You’ll have a Kiro workflow that analyzes undocumented code and generates:

JSDoc/docstring comments explaining purpose, parameters, return values, and edge cases
Unit tests covering normal cases, edge cases, and error conditions
README sections documenting public APIs
Type definitions (TypeScript interfaces, Python type hints)

Example Session:

$ kiro "Document the calculateDiscount function in src/pricing.ts and generate tests"

🔍 Analyzing undocumented function...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Reading: src/pricing.ts
Function: calculateDiscount(price: number, userTier: string, promoCode?: string)

Static Analysis Results:
├─ Parameters: 3 (2 required, 1 optional)
├─ Return type: number
├─ Branches: 8 (if/else conditions)
├─ External calls: 2 (fetchPromoDetails, getUserHistory)
└─ Edge cases detected: 5 (null checks, tier validation, promo expiration)

📝 Generating Documentation...

/**
 * Calculates the final discount percentage for a purchase.
 *
 * Combines base tier discount with promotional codes. Tier discounts
 * stack multiplicatively with promo codes. Premium tier users receive
 * an additional 5% on top of promo discounts.
 *
 * @param price - The original price before any discounts (must be > 0)
 * @param userTier - User membership tier: 'free', 'pro', or 'premium'
 * @param promoCode - Optional promotional code (e.g., 'SAVE20')
 * @returns Final discount percentage (0-100)
 *
 * @throws {Error} If price <= 0 or userTier is invalid
 *
 * @example
 * calculateDiscount(100, 'pro', 'SAVE20') // Returns 25 (5% tier + 20% promo)
 * calculateDiscount(100, 'premium', 'SAVE20') // Returns 30 (5% tier + 20% promo + 5% premium bonus)
 */

✅ Documentation added to src/pricing.ts

🧪 Generating Test Cases...

describe('calculateDiscount', () => {
  it('should apply tier discount only when no promo code', () => {
    expect(calculateDiscount(100, 'pro')).toBe(5);
  });

  it('should stack tier and promo discounts', () => {
    expect(calculateDiscount(100, 'pro', 'SAVE20')).toBe(25);
  });

  it('should apply premium bonus to promo codes', () => {
    expect(calculateDiscount(100, 'premium', 'SAVE20')).toBe(30);
  });

  it('should throw error for negative price', () => {
    expect(() => calculateDiscount(-10, 'pro')).toThrow('Price must be positive');
  });

  it('should throw error for invalid tier', () => {
    expect(() => calculateDiscount(100, 'gold')).toThrow('Invalid user tier');
  });

  it('should return 0 for expired promo codes', () => {
    expect(calculateDiscount(100, 'pro', 'EXPIRED2023')).toBe(5); // Only tier discount
  });
});

✅ Test file created: src/__tests__/pricing.test.ts
✅ 6 test cases covering all branches

📊 Coverage Analysis:
├─ Line coverage: 100%
├─ Branch coverage: 100%
└─ Edge cases covered: 5/5

💡 Complexity Signal: calculateDiscount has a cyclomatic complexity of 8.
   Consider refactoring into smaller functions if it grows further.

What You See:

Auto-generated JSDoc comments explaining every parameter, return value, and edge case
Complete test suite covering happy paths, edge cases, and error conditions
Coverage report showing 100% branch coverage
Complexity warnings when functions are too dense

The Core Question You’re Answering

“How can AI reverse-engineer intent from undocumented code, and can it generate tests that prove its understanding is correct?”

This project forces you to confront the verification problem: if Kiro generates documentation that sounds plausible but is wrong, the tests will fail. This feedback loop ensures the AI actually understands the code, not just pattern-matches documentation style.

Concepts You Must Understand First

Stop and research these before coding:

Static Code Analysis (AST Parsing)
- What is an Abstract Syntax Tree and how do you traverse it?
- How do you extract function signatures, parameter types, and control flow?
- How do you detect edge cases (null checks, boundary conditions)?
- Book Reference: “Compilers: Principles and Practice” by Parag H. Dave - Ch. 2-3
Test Generation Strategies
- What is the difference between property-based testing and example-based testing?
- How do you identify equivalence classes for input partitioning?
- What is branch coverage vs line coverage vs path coverage?
- Book Reference: “The Art of Software Testing” by Glenford J. Myers - Ch. 4-5
Documentation Standards
- What are JSDoc, docstring, and XML documentation comment conventions?
- How do you write documentation that survives refactoring?
- What level of detail is appropriate for public vs private APIs?
- Reference: JSDoc specification, PEP 257 (Python Docstring Conventions)
Cyclomatic Complexity
- How do you measure code complexity (McCabe metric)?
- Why does high complexity correlate with bugs?
- When should you refactor based on complexity scores?
- Book Reference: “Code Complete” by Steve McConnell - Ch. 19

Questions to Guide Your Design

Before implementing, think through these:

Code Understanding
- How will you parse the target function (AST parser vs regex vs LLM-based)?
- How will you identify edge cases (static analysis vs symbolic execution)?
- How will you handle external dependencies (mocking vs integration tests)?
- How will you detect the function’s actual behavior vs its intended behavior?
Documentation Quality
- How will you validate that generated docs match actual behavior?
- How will you avoid hallucinating functionality that doesn’t exist?
- How will you decide which details to include vs omit?
- How will you maintain docs when code changes (watch for drift)?
Test Coverage
- How will you ensure tests actually validate the documented behavior?
- How will you generate realistic test data (random vs domain-specific)?
- How will you avoid brittle tests that break on refactoring?
- How will you measure test quality (mutation testing)?

Thinking Exercise

Exercise: Analyze This Undocumented Function

Given this undocumented JavaScript function:

function process(data, opts) {
  if (!data) return [];
  const result = [];
  const limit = opts?.max || 100;

  for (let i = 0; i < data.length && i < limit; i++) {
    if (data[i].status === 'active' || opts?.includeInactive) {
      result.push({
        ...data[i],
        processed: true,
        timestamp: Date.now()
      });
    }
  }

  return opts?.reverse ? result.reverse() : result;
}

Questions while analyzing:

What are the possible input types for data and opts?
What are all the edge cases (null data, empty array, missing opts, etc.)?
What is the function’s actual purpose based on its behavior?
What would be a good name for this function?
What test cases would prove you understand its behavior?
What happens if data is not an array? Should that be documented/tested?

Expected Documentation:

/**
 * Filters and processes active records from a dataset, with optional limits and ordering.
 *
 * @param {Array<{status: string}>} data - Array of objects with at least a `status` field
 * @param {Object} [opts] - Optional configuration
 * @param {number} [opts.max=100] - Maximum number of records to process
 * @param {boolean} [opts.includeInactive=false] - Whether to include non-active records
 * @param {boolean} [opts.reverse=false] - Whether to reverse the output order
 * @returns {Array<Object>} Processed records with added `processed` and `timestamp` fields
 *
 * @example
 * process([{status: 'active', id: 1}], {max: 50})
 * // Returns: [{status: 'active', id: 1, processed: true, timestamp: 1704211234567}]
 */

Expected Test Cases:

Returns empty array when data is null/undefined
Filters out inactive records by default
Includes inactive records when opts.includeInactive is true
Limits output to opts.max records
Reverses output when opts.reverse is true
Adds processed: true and current timestamp to each record

The Interview Questions They’ll Ask

“How would you detect if AI-generated documentation is hallucinating functionality that doesn’t exist in the code?”
“Explain the difference between documenting what code does vs why it does it. Which should AI focus on?”
“How would you validate that generated tests actually cover the documented edge cases?”
“What strategies would you use to keep documentation in sync with code as it evolves?”
“How would you measure the quality of AI-generated tests (beyond simple code coverage)?”
“Explain how mutation testing could validate that your tests actually catch bugs, not just execute lines.”

Hints in Layers

Hint 1: AST-Based Analysis Use a proper parser (TypeScript Compiler API, Babel, tree-sitter) to extract:

Function signature (name, parameters, return type)
Control flow branches (if/else, switch, loops)
External dependencies (function calls, imports)
Type annotations (if available)

Hint 2: Edge Case Detection Look for these patterns in the AST:

if (!x) or if (x == null) → null check
if (arr.length === 0) → empty array check
if (x < 0) or if (x > MAX) → boundary conditions
throw new Error(...) → error cases
try/catch → exception handling

Hint 3: Test Generation Strategy For each branch in the code:

Generate a test that triggers that branch
Assert the expected output for that branch
Add a test for the inverse condition (branch not taken)
Add boundary tests (min, max, just-above, just-below)

Hint 4: Documentation Validation Loop

Generate documentation from code analysis
Generate tests from documentation
Run tests against actual code
If tests fail → documentation was wrong → regenerate
If tests pass → documentation matches behavior ✓

Hint 5: Complexity Signals If a function has:

Cyclomatic complexity > 10 → suggest refactoring before documenting
More than 5 parameters → suggest object parameter pattern
Deeply nested logic → suggest extracting helper functions
No return type annotation → infer and suggest adding it

Books That Will Help

Topic	Book	Chapter
AST parsing and code analysis	“Compilers: Principles and Practice” by Parag H. Dave	Ch. 2-3 (Lexical/Syntax Analysis)
Test generation strategies	“The Art of Software Testing” by Glenford J. Myers	Ch. 4-5 (Test Case Design)
Code complexity metrics	“Code Complete” by Steve McConnell	Ch. 19 (Complexity Management)
Documentation best practices	“Clean Code” by Robert C. Martin	Ch. 4 (Comments)
Property-based testing	“Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert	Ch. 1-3

Common Pitfalls & Debugging

Problem 1: “Generated docs claim the function does X, but tests show it does Y”

Why: LLM hallucinated functionality based on function name, not actual code behavior
Fix: Always validate docs against actual execution (run tests)
Quick test: npm test -- --coverage and check if tests pass

Problem 2: “Tests are too brittle - they break when code is refactored”

Why: Tests are coupled to implementation details, not behavior
Fix: Test public API behavior, not internal implementation
Example: Test calculateDiscount(100, 'pro') === 5 not expect(tierDiscountMap['pro']).toBe(0.05)

Problem 3: “AST parser fails on modern JavaScript syntax (optional chaining, nullish coalescing)”

Why: Using outdated parser or wrong parser configuration
Fix: Use TypeScript Compiler API or Babel with latest preset
Quick test: npx tsc --version (ensure TypeScript 5.x+)

Problem 4: “Generated tests have 100% line coverage but miss critical bugs”

Why: Line coverage doesn’t measure test quality, only execution
Fix: Add mutation testing (Stryker) to validate tests catch bugs
Quick test: npx stryker run and check mutation score

Problem 5: “Function is too complex to document clearly (cyclomatic complexity 20+)”

Why: Function violates Single Responsibility Principle
Fix: Suggest refactoring before documenting: “This function is too complex. Consider breaking it into smaller functions: extractActiveRecords(), applyLimit(), applyTransform()”
Signal: If you can’t write clear docs, the code is too complex