Project 28: Headless Testing Framework - Automated Test Generation

Build an automated test generation system using headless Claude: analyze code to generate tests, run tests and fix failures iteratively, achieve coverage targets, and integrate with CI.

Quick Reference

Attribute	Value
Difficulty	Expert
Time Estimate	3 weeks
Language	Python (Alternatives: TypeScript, Go)
Prerequisites	Projects 24-27 completed, testing experience
Key Topics	TDD, test generation, coverage analysis, iterative refinement, CI integration
Main Book	“Test Driven Development” by Kent Beck

1. Learning Objectives

By completing this project, you will:

Automate test generation: Use Claude to create comprehensive test suites
Implement the TDD loop: Generate, run, fix, repeat until targets met
Parse test output: Extract failures and feed back to Claude for fixes
Track coverage metrics: Measure and optimize test coverage
Handle iteration limits: Know when to stop and escalate
Integrate with CI: Run automated test generation in pipelines
Design quality prompts: Craft prompts that produce good tests

2. Real World Outcome

When complete, you’ll have a system that automatically generates and refines tests:

$ python auto-test.py --source ./src/auth --target-coverage 80

Automated Test Generation

Phase 1: Analyze code
----- Files: 5
----- Functions: 23
----- Current coverage: 45%
----- Gap to 80%: 35%

Phase 2: Generate tests
----- Generating tests for login()
----- Generating tests for logout()
----- Generating tests for refresh_token()
----- Generating tests for validate_session()
...

Phase 3: Run tests
----- Tests run: 42
----- Passed: 38
----- Failed: 4
----- Coverage: 72%

Phase 4: Fix failing tests (iteration 1)
----- Fixing test_login_invalid_password
----- Fixing test_token_expiry
----- Fixing test_session_validation_edge_case
----- Fixing test_concurrent_logout

Phase 5: Run tests (iteration 2)
----- Tests run: 42
----- Passed: 42
----- Failed: 0
----- Coverage: 83%

Target coverage achieved!

Generated files:
- tests/test_login.py
- tests/test_logout.py
- tests/test_token.py
- tests/test_session.py

The TDD Automation Loop

┌─────────────────────────────────────────────────────────────────┐
│                    AUTOMATED TDD LOOP                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  START ──► Analyze ──► Generate ──► Run ──┬──► Coverage met?    │
│                                           │         │            │
│              ▲                            │         ▼            │
│              │                            │    YES: Done!        │
│              │                            │    NO:  ───┐         │
│              │                            │            │         │
│              └──── Fix failures ◄─────────┴────────────┘         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

3. The Core Question You’re Answering

“How do I use Claude to automatically generate, run, and iterate on tests until coverage targets are met?”

TDD requires iterative refinement. This project automates the entire cycle: generate tests, run them, fix failures, and repeat until quality targets are achieved. The result is a testing assistant that handles the mechanical work while you focus on design.

Why This Matters

Manual test writing is:

Time-consuming (often 50%+ of development time)
Tedious (especially for edge cases)
Inconsistent (quality varies by developer)
Often skipped (when deadlines loom)

Automated test generation provides:

Consistent baseline coverage
Edge cases you might miss
Faster initial test suite creation
Learning opportunity (study generated tests)

4. Concepts You Must Understand First

Stop and research these before coding:

4.1 Test Generation Fundamentals

What makes a good test?

# GOOD TEST: Clear, focused, independent
def test_login_with_valid_credentials():
    """User can log in with correct username and password."""
    user = create_test_user(password="secret123")
    result = login(user.email, "secret123")
    assert result.success is True
    assert result.token is not None

# BAD TEST: Unclear, testing too much
def test_login():
    # Multiple assertions, unclear what's being tested
    login("a@b.com", "123")
    login("c@d.com", "456")
    # What exactly is this testing?

Key qualities:

Focused: One test = one behavior
Named clearly: Describes what’s tested
Independent: No test-order dependencies
Deterministic: Same result every run

Reference: “Test Driven Development” by Kent Beck, Chapter 1-5

4.2 Coverage Analysis

# Run pytest with coverage
pytest --cov=src --cov-report=json tests/

# coverage.json structure
{
  "totals": {
    "covered_lines": 450,
    "num_statements": 600,
    "percent_covered": 75.0
  },
  "files": {
    "src/auth/login.py": {
      "covered_lines": [1,2,5,6,7],
      "missing_lines": [10,11,15],
      "percent_covered": 70.0
    }
  }
}

Coverage types:

Line coverage: Which lines executed
Branch coverage: Which if/else branches taken
Function coverage: Which functions called
Path coverage: Which execution paths taken

Reference: pytest-cov documentation

Using --continue maintains context across iterations:

# Initial generation
claude -p "Generate tests for login.py" --output-format json
# Returns session_id: "abc123"

# Fix failures with context
claude -p "Fix this failing test: $ERROR" --continue abc123
# Claude remembers the code and previous tests

Key Questions:

How many iterations before giving up?
When to start fresh vs continue?
How to provide useful error context?

5. Questions to Guide Your Design

5.1 What Tests to Generate?

Test Type	Complexity	Claude’s Strength
Unit tests	Low	Excellent
Integration tests	Medium	Good
Edge case tests	Medium	Excellent
Error handling tests	Medium	Good
Performance tests	High	Limited
Security tests	High	Good

5.2 How to Handle Failures?

┌─────────────────────────────────────────────────────────────────┐
│                    FAILURE HANDLING MATRIX                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Failure Type              │ Action                             │
│  ─────────────────────────────────────────────────────────────   │
│  Syntax error              │ Fix immediately (obvious issue)    │
│  Import error              │ Add missing imports                │
│  Assertion failure         │ Analyze, may be test or code bug   │
│  Runtime error             │ Check mocking, dependencies        │
│  Timeout                   │ Simplify or skip test              │
│  Flaky (intermittent)      │ Add retries or mark as flaky       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

5.3 When to Stop?

Stopping conditions:

Coverage target reached
All tests pass
Maximum iterations exceeded
No progress (same failures repeating)
Human intervention required

6. Thinking Exercise

Design the TDD Automation Loop

┌─────────────────────────────────────────────────────────────────┐
│                    TDD AUTOMATION LOOP                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Start: Source code + Coverage target (e.g., 80%)               │
│       │                                                          │
│       ▼                                                          │
│  ┌─────────────────┐                                            │
│  │ 1. ANALYZE CODE │ Find untested functions                    │
│  │    - Parse AST  │ List public functions/methods              │
│  │    - Get coverage│ Identify gaps                              │
│  └────────┬────────┘                                            │
│           │                                                      │
│           ▼                                                      │
│  ┌─────────────────┐                                            │
│  │ 2. GENERATE     │ Claude: create tests for each gap          │
│  │    TESTS        │ Include edge cases, error handling         │
│  │                 │ Use session for context                    │
│  └────────┬────────┘                                            │
│           │                                                      │
│           ▼                                                      │
│  ┌─────────────────┐                                            │
│  │ 3. WRITE FILES  │ Save generated tests to test files         │
│  │                 │ tests/test_<module>.py                     │
│  └────────┬────────┘                                            │
│           │                                                      │
│           ▼                                                      │
│  ┌─────────────────┐                                            │
│  │ 4. RUN TESTS    │ pytest --cov --tb=short                    │
│  │                 │ Capture output                              │
│  └────────┬────────┘                                            │
│           │                                                      │
│           ├──── All pass & coverage >= target ──▶ DONE!         │
│           │                                                      │
│           ▼ (failures or low coverage)                          │
│  ┌─────────────────┐                                            │
│  │ 5. PARSE OUTPUT │ Extract failure messages                   │
│  │                 │ Format for Claude                          │
│  └────────┬────────┘                                            │
│           │                                                      │
│           ▼                                                      │
│  ┌─────────────────┐                                            │
│  │ 6. FIX FAILURES │ Claude: fix based on errors                │
│  │    --continue   │ Use session context                        │
│  └────────┬────────┘                                            │
│           │                                                      │
│           │  iteration++                                         │
│           │                                                      │
│           ├──── iteration < max ──────────┐                     │
│           │                               │                     │
│           ▼                               │                     │
│  ┌─────────────────┐                      │                     │
│  │ 7. CHECK LIMITS │                      │                     │
│  │ Max iterations? │─── NO ───────────────┘                     │
│  │                 │                      (loop to step 4)      │
│  └────────┬────────┘                                            │
│           │ YES                                                  │
│           ▼                                                      │
│  ┌─────────────────┐                                            │
│  │ 8. REPORT       │ Partial results + remaining failures      │
│  │                 │ Human intervention needed                  │
│  └─────────────────┘                                            │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Design Questions

How many iterations before giving up?
- Typically 3-5 for most failures
- More iterations = diminishing returns
- Track if same error repeats
Should you fix one test at a time or all at once?
- One at a time: Clearer feedback
- All at once: Faster, but harder to debug
- Hybrid: Batch similar errors
How do you handle flaky tests?
- Retry 2-3 times before marking as flaky
- Add @pytest.mark.flaky if persistent
- Consider test isolation issues

7. The Interview Questions They’ll Ask

7.1 “How would you automate test generation for a codebase?”

Good answer structure:

Analyze code to identify functions/methods
Generate tests with Claude using structured prompts
Run tests and capture output
Parse failures and iterate
Stop when coverage target met or max iterations reached

7.2 “What’s the TDD cycle and how would you automate it?”

TDD cycle:

Red: Write failing test
Green: Make it pass
Refactor: Clean up

Automation approach:

Generate tests (Red is implicit - tests define expected behavior)
Fix failures (Green)
Improve tests in next iteration (Refactor)

7.3 “How do you measure test quality beyond coverage?”

Quality metrics:

Mutation testing: Do tests catch injected bugs?
Assertion density: Tests with more assertions find more bugs
Test isolation: Independent tests are more reliable
Flakiness rate: How often do tests fail randomly?
Execution time: Fast tests run more often

7.4 “How do you handle test failures in an automated pipeline?”

Strategies:

Parse error messages for context
Feed errors back to Claude with session context
Retry with exponential backoff
After max retries, mark as needing human review
Generate report of remaining failures

7.5 “What are the limits of AI-generated tests?”

Limitations:

May miss domain-specific edge cases
Can generate superficial tests (just calling functions)
May not understand business logic
Can create brittle tests tied to implementation
Requires human review for quality

Best used for:

Initial test scaffolding
Edge case suggestions
Boilerplate test code
Coverage gap identification

8. Hints in Layers

Hint 1: Start with One File

Generate tests for a single file first:

def generate_tests_for_file(filepath: str) -> str:
    prompt = f"""Generate pytest tests for this file:

{read_file(filepath)}

Include:
- Tests for each public function
- Edge cases (empty input, None, boundaries)
- Error handling tests

Output as valid Python code."""

    return run_claude(prompt)

Hint 2: Parse pytest Output

Use --tb=short for concise errors:

def run_tests():
    result = subprocess.run(
        ["pytest", "--tb=short", "-v", "--cov=src", "--cov-report=json"],
        capture_output=True,
        text=True
    )

    return {
        "passed": result.returncode == 0,
        "output": result.stdout,
        "errors": result.stderr
    }

Hint 3: Use –continue for Context

Maintain session across iterations:

class TDDAutomator:
    def __init__(self):
        self.session_id = None

    def generate_initial(self, prompt):
        result = run_claude(prompt, output_format="json")
        self.session_id = result["session_id"]
        return result

    def fix_failures(self, errors):
        return run_claude(
            f"Fix these test failures:\n{errors}",
            continue_session=self.session_id
        )

Hint 4: Set Hard Limits

Prevent infinite loops:

MAX_ITERATIONS = 5
MIN_PROGRESS = 0.1  # At least 10% improvement per iteration

for i in range(MAX_ITERATIONS):
    result = run_tests()
    current_coverage = result["coverage"]

    if current_coverage >= target:
        break

    if i > 0:
        improvement = current_coverage - last_coverage
        if improvement < MIN_PROGRESS:
            print("No progress, stopping")
            break

    last_coverage = current_coverage
    fix_failures(result["errors"])

9. Books That Will Help

Topic	Book	Chapter/Section
TDD	“Test Driven Development” by Kent Beck	All (foundational)
Python testing	“Python Testing with pytest” by Brian Okken	Ch. 2-5: Writing tests
Test design	“xUnit Test Patterns” by Gerard Meszaros	Ch. 4-6: Test patterns
Refactoring	“Refactoring” by Martin Fowler	Ch. 4: Building tests
Clean tests	“Clean Code” by Robert Martin	Ch. 9: Unit tests

10. Implementation Guide

10.1 Complete TDD Automator

import subprocess
import json
from pathlib import Path
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class TestRunResult:
    passed: bool
    total_tests: int
    passed_tests: int
    failed_tests: int
    coverage: float
    errors: list
    output: str

@dataclass
class IterationResult:
    iteration: int
    tests_fixed: int
    new_coverage: float
    remaining_failures: list

class TDDAutomator:
    def __init__(
        self,
        source_dir: str,
        test_dir: str = "tests",
        target_coverage: float = 80.0,
        max_iterations: int = 5
    ):
        self.source_dir = Path(source_dir)
        self.test_dir = Path(test_dir)
        self.target_coverage = target_coverage
        self.max_iterations = max_iterations
        self.session_id: Optional[str] = None

    def run(self) -> dict:
        """Run the full TDD automation loop."""
        print("\n" + "=" * 60)
        print("AUTOMATED TEST GENERATION")
        print("=" * 60)

        # Phase 1: Analyze
        print("\nPhase 1: Analyzing code...")
        analysis = self.analyze_code()
        print(f"  Files: {analysis['file_count']}")
        print(f"  Functions: {analysis['function_count']}")
        print(f"  Current coverage: {analysis['coverage']:.1f}%")

        if analysis['coverage'] >= self.target_coverage:
            print(f"\nCoverage target already met!")
            return {"success": True, "coverage": analysis['coverage']}

        # Phase 2: Generate initial tests
        print("\nPhase 2: Generating tests...")
        self.generate_initial_tests(analysis)

        # Phase 3-6: Run and iterate
        for iteration in range(self.max_iterations):
            print(f"\nIteration {iteration + 1}/{self.max_iterations}")

            # Run tests
            result = self.run_tests()
            print(f"  Tests: {result.passed_tests}/{result.total_tests} passed")
            print(f"  Coverage: {result.coverage:.1f}%")

            # Check if done
            if result.passed and result.coverage >= self.target_coverage:
                print(f"\nTarget coverage achieved!")
                return {"success": True, "coverage": result.coverage}

            # Check if no failures to fix
            if result.passed and result.coverage < self.target_coverage:
                print("\nAll tests pass but coverage too low.")
                print("Generating additional tests...")
                self.generate_additional_tests(result.coverage)
                continue

            # Fix failures
            print(f"  Fixing {result.failed_tests} failures...")
            self.fix_failures(result.errors)

        # Max iterations reached
        print(f"\nMax iterations reached.")
        final_result = self.run_tests()
        return {
            "success": False,
            "coverage": final_result.coverage,
            "remaining_failures": final_result.errors
        }

    def analyze_code(self) -> dict:
        """Analyze source code for testing opportunities."""
        files = list(self.source_dir.glob("**/*.py"))
        files = [f for f in files if not f.name.startswith("test_")]

        # Get current coverage
        coverage = self._get_coverage()

        # Count functions (simplified)
        function_count = 0
        for f in files:
            content = f.read_text()
            function_count += content.count("def ")

        return {
            "file_count": len(files),
            "function_count": function_count,
            "coverage": coverage,
            "files": [str(f) for f in files]
        }

    def generate_initial_tests(self, analysis: dict):
        """Generate initial test suite."""
        prompt = f"""Generate a comprehensive pytest test suite for these files:

{chr(10).join(analysis['files'])}

For each file, create tests covering:
1. Normal functionality (happy path)
2. Edge cases (empty input, None, boundaries)
3. Error handling (invalid input, exceptions)
4. Integration between functions

Output as complete, runnable Python test files.
Use pytest conventions.
Include necessary imports and fixtures.
"""

        result = self._run_claude(prompt)
        self.session_id = result.get("session_id")

        # Save generated tests
        self._save_tests(result.get("result", ""))

    def generate_additional_tests(self, current_coverage: float):
        """Generate more tests to increase coverage."""
        # Get coverage report to find gaps
        coverage_data = self._get_coverage_details()

        prompt = f"""Current coverage is {current_coverage:.1f}%, target is {self.target_coverage}%.

These lines are not covered:
{json.dumps(coverage_data.get('missing_lines', {}), indent=2)}

Generate additional tests to cover these specific lines.
Focus on the uncovered code paths."""

        result = self._run_claude(prompt, continue_session=True)
        self._save_tests(result.get("result", ""), append=True)

    def run_tests(self) -> TestRunResult:
        """Run pytest and capture results."""
        result = subprocess.run(
            [
                "pytest",
                str(self.test_dir),
                "-v",
                "--tb=short",
                f"--cov={self.source_dir}",
                "--cov-report=json"
            ],
            capture_output=True,
            text=True
        )

        # Parse results
        coverage = self._get_coverage()
        errors = self._parse_failures(result.stdout + result.stderr)

        # Count tests (parse from output)
        passed = result.stdout.count(" PASSED")
        failed = result.stdout.count(" FAILED")

        return TestRunResult(
            passed=result.returncode == 0,
            total_tests=passed + failed,
            passed_tests=passed,
            failed_tests=failed,
            coverage=coverage,
            errors=errors,
            output=result.stdout
        )

    def fix_failures(self, errors: list):
        """Ask Claude to fix failing tests."""
        error_text = "\n\n".join(errors[:5])  # Limit to 5 errors

        prompt = f"""These tests are failing:

{error_text}

Fix each failing test. The issue might be:
1. Test logic is wrong
2. Missing imports or fixtures
3. Incorrect assertions
4. Need for mocking

Provide the corrected test code."""

        result = self._run_claude(prompt, continue_session=True)
        self._update_tests(result.get("result", ""))

    def _run_claude(self, prompt: str, continue_session: bool = False) -> dict:
        """Run Claude with optional session continuation."""
        cmd = ["claude", "-p", prompt, "--output-format", "json"]

        if continue_session and self.session_id:
            cmd.extend(["--continue", self.session_id])

        result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)

        if result.returncode != 0:
            raise Exception(f"Claude error: {result.stderr}")

        return json.loads(result.stdout)

    def _get_coverage(self) -> float:
        """Get current test coverage percentage."""
        try:
            with open("coverage.json") as f:
                data = json.load(f)
                return data.get("totals", {}).get("percent_covered", 0)
        except FileNotFoundError:
            return 0.0

    def _get_coverage_details(self) -> dict:
        """Get detailed coverage information."""
        try:
            with open("coverage.json") as f:
                data = json.load(f)
                missing = {}
                for file, info in data.get("files", {}).items():
                    if info.get("missing_lines"):
                        missing[file] = info["missing_lines"]
                return {"missing_lines": missing}
        except FileNotFoundError:
            return {}

    def _parse_failures(self, output: str) -> list:
        """Extract failure messages from pytest output."""
        failures = []
        lines = output.split("\n")

        in_failure = False
        current_failure = []

        for line in lines:
            if "FAILED" in line:
                in_failure = True
                current_failure = [line]
            elif in_failure:
                if line.startswith("===") or line.startswith("---"):
                    if current_failure:
                        failures.append("\n".join(current_failure))
                    in_failure = False
                    current_failure = []
                else:
                    current_failure.append(line)

        return failures

    def _save_tests(self, content: str, append: bool = False):
        """Save generated test content to files."""
        self.test_dir.mkdir(exist_ok=True)

        # Extract test code from Claude's response
        test_code = self._extract_code(content)

        # Write to test file
        test_file = self.test_dir / "test_generated.py"
        mode = "a" if append else "w"
        with open(test_file, mode) as f:
            f.write(test_code)
            f.write("\n\n")

    def _update_tests(self, content: str):
        """Update test file with fixes."""
        # In practice, you'd parse which tests to update
        # For simplicity, append fixes
        self._save_tests(content, append=True)

    def _extract_code(self, content: str) -> str:
        """Extract Python code from Claude's response."""
        # Look for code blocks
        if "```python" in content:
            start = content.find("```python") + 9
            end = content.find("```", start)
            return content[start:end].strip()
        elif "```" in content:
            start = content.find("```") + 3
            end = content.find("```", start)
            return content[start:end].strip()
        return content


def main():
    import argparse

    parser = argparse.ArgumentParser(description="Automated TDD with Claude")
    parser.add_argument("--source", required=True, help="Source directory")
    parser.add_argument("--target-coverage", type=float, default=80.0)
    parser.add_argument("--max-iterations", type=int, default=5)
    args = parser.parse_args()

    automator = TDDAutomator(
        source_dir=args.source,
        target_coverage=args.target_coverage,
        max_iterations=args.max_iterations
    )

    result = automator.run()

    print("\n" + "=" * 60)
    print("FINAL RESULT")
    print("=" * 60)
    print(f"Success: {result['success']}")
    print(f"Coverage: {result['coverage']:.1f}%")

    if not result['success']:
        print(f"Remaining failures: {len(result.get('remaining_failures', []))}")


if __name__ == "__main__":
    main()

10.2 GitHub Actions Integration

name: Auto Test Generation

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      target_coverage:
        description: 'Target coverage percentage'
        required: false
        default: '80'

jobs:
  generate-tests:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install pytest pytest-cov
          npm install -g @anthropic-ai/claude-code

      - name: Run TDD Automator
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          python auto_test.py \
            --source ./src \
            --target-coverage ${{ github.event.inputs.target_coverage || '80' }}

      - name: Upload coverage report
        uses: codecov/codecov-action@v3
        with:
          file: coverage.json

      - name: Create PR with generated tests
        if: success()
        uses: peter-evans/create-pull-request@v5
        with:
          title: "Auto-generated tests (coverage: ${{ env.COVERAGE }}%)"
          commit-message: "Add auto-generated tests"
          branch: auto-tests
          body: |
            ## Automated Test Generation

            This PR adds auto-generated tests achieving ${{ env.COVERAGE }}% coverage.

            Please review the generated tests for:
            - Correctness of assertions
            - Appropriate edge cases
            - Test clarity and naming

10.3 Prompt Engineering for Better Tests

GENERATION_PROMPT_TEMPLATE = """You are an expert test engineer. Generate comprehensive pytest tests for this code:

```python
{source_code}

Requirements

Test Coverage
- Test every public function/method
- Include at least 3 tests per function
- Cover happy path, edge cases, and error handling
Test Quality
- Use descriptive test names: test_<function>_<scenario>_<expected_result>
- One assertion per test (prefer)
- Use fixtures for setup/teardown
- Add docstrings explaining what each test verifies
Edge Cases to Consider
- Empty inputs ([], “”, None, 0)
- Boundary values (min, max, off-by-one)
- Invalid types
- Large inputs
- Concurrent access (if applicable)
Mocking
- Mock external dependencies (APIs, databases)
- Use @pytest.fixture for reusable mocks
- Use unittest.mock.patch for patching

Output Format

import pytest
from unittest.mock import Mock, patch
from {module} import {functions}

# Fixtures
@pytest.fixture
def sample_data():
    return {{...}}

# Tests
def test_function_happy_path():
    \"\"\"Verify function works with valid input.\"\"\"
    result = function(valid_input)
    assert result == expected

def test_function_empty_input():
    \"\"\"Verify function handles empty input gracefully.\"\"\"
    result = function([])
    assert result == {{}}

# ... more tests

”””

FIX_FAILURES_PROMPT = “"”These tests are failing. Fix them:

{failures}

Analysis

For each failure:

Identify if the issue is in the test or the code under test
If test issue: fix the test logic, assertions, or setup
If code issue: document it but don’t change the source code

Output

Provide the corrected test code. Explain what was wrong and how you fixed it. “””

---

## 11. Learning Milestones

| Milestone | Description | Verification |
|-----------|-------------|--------------|
| 1 | Tests generate and save | test_generated.py exists |
| 2 | Tests can be run | pytest runs without crash |
| 3 | Coverage is measured | coverage.json has data |
| 4 | Failures are parsed | Error list extracted |
| 5 | Fixes are applied | Fewer failures next run |
| 6 | Loop terminates correctly | Stops at target or max iterations |
| 7 | CI integration works | GitHub Action runs successfully |

---

## 12. Common Pitfalls

### 12.1 Infinite Loops

```python
# WRONG: No exit condition
while coverage < target:
    fix_failures()
    coverage = run_tests()  # May never improve

# RIGHT: Limit iterations and check progress
for i in range(MAX_ITERATIONS):
    if coverage >= target:
        break
    if coverage == last_coverage:
        print("No progress, stopping")
        break
    last_coverage = coverage

12.2 Context Loss

# WRONG: New session each time (loses context)
claude("-p", "Fix these tests")
claude("-p", "Now fix these other tests")

# RIGHT: Continue session
result = claude("-p", "Generate tests", "--output-format", "json")
session_id = result["session_id"]
claude("-p", "Fix tests", "--continue", session_id)

12.3 Overwhelming Error Context

# WRONG: Send all 50 errors
prompt = f"Fix all these errors:\n{all_errors}"  # Too much!

# RIGHT: Batch errors
for batch in chunk_list(errors, size=5):
    prompt = f"Fix these errors:\n{batch}"
    fix_batch(prompt)

12.4 Ignoring Test Quality

# WRONG: Only care about coverage
if coverage >= 80:
    print("Done!")

# RIGHT: Also check test quality
if coverage >= 80:
    if all_tests_meaningful():  # Check assertions exist
        if no_flaky_tests():
            print("Done!")

13. Extension Ideas

Mutation testing: Verify tests catch injected bugs
Property-based testing: Generate hypothesis tests
Visual diff: Show before/after test coverage
Test prioritization: Run most likely to fail first
Multi-language: Support JavaScript (Jest), Go, Rust
Team metrics: Track test generation effectiveness over time

14. Summary

This project teaches you to:

Automate the TDD cycle with Claude
Generate, run, and iterate on tests
Parse test output for feedback
Track and improve coverage metrics
Integrate with CI/CD pipelines

The key insight is that test generation is an iterative process. The first pass rarely achieves perfect coverage. By maintaining session context and feeding failures back to Claude, you create a learning loop that progressively improves the test suite.

Key takeaway: Automated test generation is a force multiplier, not a replacement for human judgment. Use it to create the initial scaffolding, then review and refine. The goal is faster time-to-coverage, not perfect tests on first try.