Project 24: "The Secret Sanitizer Hook" — Secrets Management

Project 24: “The Secret Sanitizer Hook” — Secrets Management

Attribute	Value
File	`KIRO_CLI_LEARNING_PROJECTS.md`
Main Programming Language	Python (TruffleHog / Gitleaks)
Coolness Level	Level 3: Genuinely Clever
Business Potential	3. Service & Support (Security)
Difficulty	Level 2: Intermediate
Knowledge Area	Secrets Management

What you’ll build: A PostToolUse hook that scans modified files for secrets.

Why it teaches Safety: Prevents accidental secret leakage.

Success criteria:

A dummy key is detected and blocked.

Real World Outcome

You’ll create a PostToolUse hook that automatically scans every file written or modified by Kiro for secrets (API keys, passwords, tokens, private keys). When a secret is detected, the hook blocks the operation and alerts you immediately.

Example Hook Execution:

# Kiro writes a file containing a secret
$ kiro "create a .env file with DATABASE_URL=postgres://user:pass@localhost/db"

[Kiro writes .env file]

🚨 SECRET DETECTED in .env (line 1)
   Type: PostgreSQL Connection String
   Pattern: postgres://[user]:[password]@[host]/[db]

   DATABASE_URL=postgres://user:pass@localhost/db
                              ^^^^^^^^

❌ BLOCKED: File write operation prevented.

Recommendations:
  1. Use environment variables instead: DATABASE_URL="${DATABASE_URL}"
  2. Add .env to .gitignore
  3. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault)

[Hook exits with code 1 - operation aborted]

Example scan output:

$ python ~/.kiro/hooks/secret-sanitizer.py

Scanning files modified in last operation...
  ✓ src/app.py - Clean
  ✓ src/utils.py - Clean
  🚨 config.yaml - 2 secrets found
     Line 12: AWS Access Key (AKIA...)
     Line 13: AWS Secret Key (40-char base64 string)
  ✓ README.md - Clean

Summary: 2 secrets detected in 1 file
Action: BLOCK operation (exit code 1)

Integration with Git:

# After blocking the write, show what would have been committed
$ git diff

+AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
+AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

❌ These credentials would have been committed to git!

You’re implementing the same secret detection that GitHub, GitLab, and Bitbucket use to prevent credential leaks.

The Core Question You’re Answering

“How do you prevent developers (and AI agents) from accidentally committing secrets to version control or writing them to unencrypted config files?”

Before you build any detection, understand this: Secret leakage is one of the most common security incidents. Attackers scan public repositories for AWS keys, database passwords, and API tokens. Once a secret is committed to git history, it’s permanently exposed—even if you delete it in a later commit.

Your hook acts as a safety net that catches secrets before they reach git, config files, or logs.

Concepts You Must Understand First

Stop and research these before coding:

Secret Detection Patterns (Entropy, Regex, Signatures)
- What is Shannon entropy and how is it used to detect random strings (API keys)?
- How do you write regex patterns for AWS keys (AKIA…), GitHub tokens (ghp_…), etc.?
- What are false positives (detecting “password” in code comments) and how do you reduce them?
- Book Reference: “Practical Cryptography” by Niels Ferguson - Ch. 2 (Randomness)
Git Internals and Hooks
- What is the difference between PostToolUse hooks (Kiro) vs. pre-commit hooks (Git)?
- How do you scan only the files modified in the last operation (git diff –name-only)?
- Why can’t you just delete a secret from git history (it’s still in reflog and old commits)?
- Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
Secrets Management Best Practices
- What is the principle of least privilege (why you shouldn’t use root credentials)?
- How do environment variables (os.environ) protect secrets from being committed?
- What are secrets managers (AWS Secrets Manager, Vault, 1Password) and when should you use them?
- Book Reference: “Security Engineering” by Ross Anderson - Ch. 4 (Cryptographic Protocols)
TruffleHog and Gitleaks Internals
- How does TruffleHog scan git history for high-entropy strings?
- What is the difference between regex-based detection and entropy-based detection?
- How do you configure custom patterns (YAML rules for company-specific secrets)?
- Book Reference: “Practical Malware Analysis” by Sikorski - Ch. 13 (Automated Analysis)
False Positive Reduction
- How do you distinguish between real secrets and test/example credentials?
- What is allowlisting (explicitly marking known-safe strings)?
- How do you handle encrypted secrets (ansible-vault, sops) vs. plaintext?
- Book Reference: “Building Secure and Reliable Systems” by Google - Ch. 14 (Security Monitoring)
Incident Response for Leaked Secrets
- What do you do if a secret is detected after commit (rotate immediately)?
- How do you scan the entire git history for secrets (git log -p)?
- What is the MITRE ATT&CK framework for credential access (T1552)?
- Book Reference: “The Art of Memory Forensics” by Ligh - Ch. 8 (Malware Analysis)

Questions to Guide Your Design

Before implementing, think through these:

Detection Strategy
- Should you scan all files or only modified files (performance trade-off)?
- Do you run detection on every tool use or only on file writes (Edit, Write)?
- Should you scan file content or git diffs (diffs are faster but may miss secrets)?
- How do you handle binary files (images, PDFs) that might contain secrets?
Pattern Library
- Which secret types are highest priority (AWS, GitHub, Stripe, OpenAI)?
- Should you use a pre-built pattern library (Gitleaks rules) or custom regex?
- How do you detect generic secrets (40+ char random strings) vs. specific formats?
- Should you detect passwords in URLs (https://user:pass@example.com)?
Blocking vs. Warning
- Should the hook block the operation (exit code 1) or just warn and continue?
- Do you block on all secrets or only high-confidence detections?
- Should you allow users to override the block (confirmation prompt)?
- How do you handle secrets in test fixtures (tests/fixtures/dummy-key.txt)?
User Experience
- How do you show which line contains the secret without displaying the secret itself?
- Should you suggest remediation steps (use environment variables, add to .gitignore)?
- Do you integrate with the terminal (red error messages, visual alerts)?
- Should you log detections to a file for security auditing?
Performance Optimization
- How do you avoid scanning the same file multiple times in one session?
- Should you cache detection results (file hash → scan result)?
- Do you run scans in parallel (ThreadPoolExecutor) for large codebases?
- How do you handle large files (> 1MB) that slow down scanning?

Thinking Exercise

Analyze a Real Secret Leak Scenario

Before coding, manually trace how a secret might leak through Kiro’s workflow:

Scenario: Developer asks Kiro to deploy to AWS

# User prompt
$ kiro "deploy the app to AWS using my credentials"

# Kiro (without the hook) might write:
# deploy.sh
#!/bin/bash
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws s3 sync ./build s3://my-bucket

Where secrets could leak:

deploy.sh file (hardcoded credentials)
Git history (if deploy.sh is committed)
Shell history (~/.bash_history if script is run manually)
Logs (if AWS CLI logs credentials in error messages)
CI/CD logs (GitHub Actions logs show environment variables)

Your hook’s detection points:

# PostToolUse hook fires after Write tool creates deploy.sh
event = {
    'tool': 'Write',
    'input': {'file_path': 'deploy.sh', 'content': '#!/bin/bash\nexport AWS_ACCESS_KEY_ID=AKIA...'},
    'output': {'status': 'success'}
}

# Hook scans the written file
findings = scan_file('deploy.sh')
# Finding 1: AWS Access Key (line 2, pattern: AKIA[A-Z0-9]{16})
# Finding 2: AWS Secret Key (line 3, high entropy: 40 random chars)

# Hook blocks the operation
exit(1)  # Reverts the file write

Questions while tracing:

At what point in the workflow should the hook scan for secrets?
Should you scan the file content or the tool input parameters?
What happens if the secret was copied from the user’s prompt (user provided it)?
How do you prevent false positives (AKIAIOSFODNN7EXAMPLE is a documented example key)?
Should you automatically suggest export AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" as a fix?

Manual test:

# 1. Create a test file with a fake secret
echo "API_KEY=sk_test_4eC39HqLyjWDarjtT1zdp7dc" > .env

# 2. Run Gitleaks on the file
gitleaks detect --source . --verbose
# Leak detected: Generic API Key (line 1)

# 3. Add to allowlist and re-run
echo "API_KEY=sk_test_4eC39HqLyjWDarjtT1zdp7dc" >> .gitleaksignore
gitleaks detect --source . --verbose
# No leaks detected (allowlisted)

The Interview Questions They’ll Ask

Prepare to answer these:

“How do secret detection tools like TruffleHog distinguish between real API keys and random strings in the code?”
“A developer committed an AWS access key to git 50 commits ago. What steps would you take to remediate this incident?”
“What is the difference between entropy-based secret detection and regex-based detection? When would you use each?”
“How would you handle false positives, such as detecting ‘password’ in code comments or test data?”
“Why is deleting a secret from the latest commit insufficient to secure the repository?”
“What are the performance trade-offs between scanning on every file write vs. scanning only on git commit?”

Hints in Layers

Hint 1: Start with Pre-Built Tools Don’t write regex patterns from scratch. Use TruffleHog or Gitleaks, which have hundreds of pre-built patterns for common secret types (AWS, GitHub, Stripe, etc.).

Hint 2: Hook Event Structure The PostToolUse hook receives a JSON event on stdin:

{
  "hookType": "PostToolUse",
  "tool": {"name": "Write", "input": {"file_path": "config.yaml", "content": "..."}, "output": {"status": "success"}}
}

Extract file_path and scan it.

Hint 3: Integrate Gitleaks for Fast Scanning

# Install Gitleaks
brew install gitleaks  # macOS
# or download binary from https://github.com/gitleaks/gitleaks/releases

# Scan a single file
gitleaks detect --source /path/to/file --verbose --no-git

# Parse JSON output
gitleaks detect --source . --report-format json --report-path results.json

Hint 4: Exit Code Semantics

Exit 0: No secrets found (allow operation)
Exit 1: Secrets found (block operation and revert changes)
Use sys.exit(1) in Python or exit 1 in Bash

Books That Will Help

Topic	Book	Chapter
Secret Detection Theory	“Practical Cryptography” by Ferguson	Ch. 2 (Randomness)
Git Internals	“Pro Git” by Scott Chacon	Ch. 10 (Git Internals)
Secrets Management	“Security Engineering” by Ross Anderson	Ch. 4 (Cryptographic Protocols)
Entropy Analysis	“Applied Cryptography” by Bruce Schneier	Ch. 17 (Randomness)
Incident Response	“The Art of Memory Forensics” by Ligh	Ch. 8 (Malware Analysis)
Secure Systems	“Building Secure Systems” by Google	Ch. 14 (Security Monitoring)

Common Pitfalls & Debugging

Problem 1: “Too many false positives (detecting ‘password’ in comments)”

Why: Regex patterns are too broad and match non-secrets.
Fix: Use entropy analysis (only flag strings with high randomness) or context-aware patterns (exclude comments).
Quick test: gitleaks detect --source . --verbose | grep "password" (review all matches)

Problem 2: “Secrets in git history not detected by the hook”

Why: The hook only scans new changes, not the entire git history.
Fix: Run a one-time full repo scan: gitleaks detect --source . --verbose
Quick test: git log -p | grep -E 'AKIA[A-Z0-9]{16}' (manual search for AWS keys)

Problem 3: “Hook blocks valid test fixtures (tests/fixtures/dummy-key.txt)”

Why: Test data often includes fake secrets for testing.
Fix: Add test directories to .gitleaksignore or use a custom allowlist.
Quick test: Create .gitleaksignore with tests/, re-run scan.

Problem 4: “Performance degradation on large repos (> 1000 files)”

Why: Scanning every file on every tool use is too slow.
Fix: Only scan files modified in the last operation (use git diff --name-only).
Quick test: time gitleaks detect --source . (measure scan time before/after optimization)

Problem 5: “Secrets in environment variables not detected”

Why: The hook scans files, but secrets might be passed via export VAR=secret in the shell.
Fix: Scan shell history (~/.bash_history) or intercept the Bash tool’s input.
Quick test: grep -E 'export.*KEY' ~/.bash_history

Problem 6: “No notification when secret is blocked”

Why: The hook exits with code 1, but Kiro doesn’t show the hook’s stderr output.
Fix: Write findings to a log file (~/.kiro/secret-findings.log) and show the path in the error.
Quick test: tail -f ~/.kiro/secret-findings.log (monitor detections in real-time)