Project 3: Codebase Refactoring Toolkit
Build a safe, repeatable refactoring toolkit using grep, sed, and find.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Language | Shell + sed/grep/find |
| Prerequisites | sed and find basics |
| Key Topics | in-place edits, safety, diffing |
1. Learning Objectives
By completing this project, you will:
- Precisely target files for refactoring with find predicates.
- Build safe multi-file edits with sed and backups.
- Validate changes with diff and grep checks.
- Create idempotent refactor scripts.
- Reduce manual refactoring risk and effort.
2. Theoretical Foundation
2.1 Core Concepts
- Targeting: The file set is the boundary of correctness. Mistakes here cause widespread damage.
- In-place Editing:
sed -iis powerful but dangerous; use backups and dry runs. - Idempotence: Refactor scripts should be safe to run multiple times.
- Verification: Always verify output using diffs and counts before committing.
2.2 Why This Matters
Large codebases require automation. The ability to refactor safely across thousands of files is essential in professional environments.
2.3 Historical Context / Background
Before modern IDE refactors, engineers relied on Unix text tools to perform large-scale transformations. These tools remain the fastest option for bulk changes and automation.
2.4 Common Misconceptions
- “sed is unsafe”: It is safe with backups and validation.
- “find is just search”: It is a logical query engine for file metadata.
3. Project Specification
3.1 What You Will Build
A toolkit of scripts that can:
- Rename functions/variables across files
- Update deprecated API usage
- Normalize formatting patterns
- Generate before/after diff reports
3.2 Functional Requirements
- Target selection: Use find filters for file types and directories.
- Dry-run preview: Show diffs before applying.
- Safe edits: Apply changes with backups.
- Verification: Use grep to confirm replacements.
3.3 Non-Functional Requirements
- Safety: No destructive changes without confirmation.
- Performance: Handle large repos efficiently.
- Transparency: Log all modified files.
3.4 Example Usage / Output
$ ./refactor.sh --rename oldFunc newFunc --path src/
Updated 42 files
See refactor-report.txt for diff summary
3.5 Real World Outcome
After a refactor run, you have:
- A list of modified files
.bakbackups for rollback- A diff report showing all changes
This lets you validate changes before commit and revert safely if needed.
4. Solution Architecture
4.1 High-Level Design
+---------+ +---------+ +---------+ +---------+
| Finder | -> | Preview | -> | Apply | -> | Verify |
+---------+ +---------+ +---------+ +---------+
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Finder | Build file list | find predicates |
| Preview | Show diffs | diff vs sed -n |
| Apply | Modify files | sed -i with backups |
| Verify | Confirm results | grep counts |
4.3 Data Structures
# file list and change logs
files.txt
refactor-report.txt
4.4 Algorithm Overview
Key Algorithm: Safe refactor
- Generate file list with find.
- Preview changes with sed -n and diff.
- Apply changes with sed -i.bak.
- Verify results with grep counts.
Complexity Analysis:
- Time: O(F * L) where F files, L lines per file
- Space: O(F) for file list and backups
5. Implementation Guide
5.1 Development Environment Setup
chmod +x refactor.sh
5.2 Project Structure
refactor/
├── refactor.sh
├── rules/
│ └── rename.conf
└── README.md
5.3 The Core Question You Are Answering
“How do I refactor a large codebase safely without relying on IDE automation?”
5.4 Concepts You Must Understand First
- sed addressing and substitution
- find predicates and logical operators
- diff output interpretation
5.5 Questions to Guide Your Design
- How do you ensure only target files are changed?
- How do you rollback if something goes wrong?
- How do you detect partial replacements?
5.6 Thinking Exercise
Pick a small file, manually apply a sed substitution, then confirm with diff. Repeat until it is reliable.
5.7 The Interview Questions They Will Ask
- Why is a dry run important?
- How do you ensure refactor scripts are idempotent?
- What is the role of find in large refactors?
5.8 Hints in Layers
Hint 1: Start with a small directory and one replacement.
Hint 2: Add backup files with -i.bak.
Hint 3: Add verification by counting matches after replacement.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| sed scripts | “Sed & Awk” | Ch. 4-5 |
| pipelines | “The Linux Command Line” | Ch. 6 |
5.10 Implementation Phases
Phase 1: Targeting
- Build find filters
- Confirm file list
Phase 2: Safe edits
- Apply sed with backups
Phase 3: Verification
- Diff and grep checks
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Backups | none vs .bak | use .bak | rollback safety |
| Preview | diff vs sed -n | diff | clarity |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Sed rules | match counts |
| Integration Tests | Full run | dry-run + apply |
| Edge Cases | Binary files | skip rules |
6.2 Critical Test Cases
- Replace only in target files.
- Verify no leftover old symbols.
- Rollback from backups works.
7. Common Pitfalls and Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Overbroad find | Unexpected edits | refine predicates |
| Missing backups | irrecoverable changes | use -i.bak |
| Regex mismatch | partial changes | test on sample |
8. Extensions and Challenges
8.1 Beginner Extensions
- Add
--dry-runmode
8.2 Intermediate Extensions
- Add config-driven rule sets
8.3 Advanced Extensions
- Add parallel execution with xargs -P
9. Real-World Connections
- API migrations in large repos
- Security patch rollouts
10. Resources
- GNU sed manual
- The Grymoire sed tutorial
11. Self-Assessment Checklist
- I can explain why find predicates matter
- I can roll back from backups
12. Submission / Completion Criteria
Minimum Viable Completion:
- Safe rename across multiple files
Full Completion:
- Preview + verification steps
Excellence (Going Above and Beyond):
- Rule sets + parallel execution