Project 3: Codebase Refactoring Toolkit

Build a safe, repeatable refactoring toolkit using grep, sed, and find.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-3 weeks
Language Shell + sed/grep/find
Prerequisites sed and find basics
Key Topics in-place edits, safety, diffing

1. Learning Objectives

By completing this project, you will:

  1. Precisely target files for refactoring with find predicates.
  2. Build safe multi-file edits with sed and backups.
  3. Validate changes with diff and grep checks.
  4. Create idempotent refactor scripts.
  5. Reduce manual refactoring risk and effort.

2. Theoretical Foundation

2.1 Core Concepts

  • Targeting: The file set is the boundary of correctness. Mistakes here cause widespread damage.
  • In-place Editing: sed -i is powerful but dangerous; use backups and dry runs.
  • Idempotence: Refactor scripts should be safe to run multiple times.
  • Verification: Always verify output using diffs and counts before committing.

2.2 Why This Matters

Large codebases require automation. The ability to refactor safely across thousands of files is essential in professional environments.

2.3 Historical Context / Background

Before modern IDE refactors, engineers relied on Unix text tools to perform large-scale transformations. These tools remain the fastest option for bulk changes and automation.

2.4 Common Misconceptions

  • “sed is unsafe”: It is safe with backups and validation.
  • “find is just search”: It is a logical query engine for file metadata.

3. Project Specification

3.1 What You Will Build

A toolkit of scripts that can:

  • Rename functions/variables across files
  • Update deprecated API usage
  • Normalize formatting patterns
  • Generate before/after diff reports

3.2 Functional Requirements

  1. Target selection: Use find filters for file types and directories.
  2. Dry-run preview: Show diffs before applying.
  3. Safe edits: Apply changes with backups.
  4. Verification: Use grep to confirm replacements.

3.3 Non-Functional Requirements

  • Safety: No destructive changes without confirmation.
  • Performance: Handle large repos efficiently.
  • Transparency: Log all modified files.

3.4 Example Usage / Output

$ ./refactor.sh --rename oldFunc newFunc --path src/
Updated 42 files
See refactor-report.txt for diff summary

3.5 Real World Outcome

After a refactor run, you have:

  • A list of modified files
  • .bak backups for rollback
  • A diff report showing all changes

This lets you validate changes before commit and revert safely if needed.


4. Solution Architecture

4.1 High-Level Design

+---------+    +---------+    +---------+    +---------+
| Finder  | -> | Preview | -> | Apply   | -> | Verify  |
+---------+    +---------+    +---------+    +---------+

4.2 Key Components

Component Responsibility Key Decisions
Finder Build file list find predicates
Preview Show diffs diff vs sed -n
Apply Modify files sed -i with backups
Verify Confirm results grep counts

4.3 Data Structures

# file list and change logs
files.txt
refactor-report.txt

4.4 Algorithm Overview

Key Algorithm: Safe refactor

  1. Generate file list with find.
  2. Preview changes with sed -n and diff.
  3. Apply changes with sed -i.bak.
  4. Verify results with grep counts.

Complexity Analysis:

  • Time: O(F * L) where F files, L lines per file
  • Space: O(F) for file list and backups

5. Implementation Guide

5.1 Development Environment Setup

chmod +x refactor.sh

5.2 Project Structure

refactor/
├── refactor.sh
├── rules/
│   └── rename.conf
└── README.md

5.3 The Core Question You Are Answering

“How do I refactor a large codebase safely without relying on IDE automation?”

5.4 Concepts You Must Understand First

  1. sed addressing and substitution
  2. find predicates and logical operators
  3. diff output interpretation

5.5 Questions to Guide Your Design

  1. How do you ensure only target files are changed?
  2. How do you rollback if something goes wrong?
  3. How do you detect partial replacements?

5.6 Thinking Exercise

Pick a small file, manually apply a sed substitution, then confirm with diff. Repeat until it is reliable.

5.7 The Interview Questions They Will Ask

  1. Why is a dry run important?
  2. How do you ensure refactor scripts are idempotent?
  3. What is the role of find in large refactors?

5.8 Hints in Layers

Hint 1: Start with a small directory and one replacement.

Hint 2: Add backup files with -i.bak.

Hint 3: Add verification by counting matches after replacement.

5.9 Books That Will Help

Topic Book Chapter
sed scripts “Sed & Awk” Ch. 4-5
pipelines “The Linux Command Line” Ch. 6

5.10 Implementation Phases

Phase 1: Targeting

  • Build find filters
  • Confirm file list

Phase 2: Safe edits

  • Apply sed with backups

Phase 3: Verification

  • Diff and grep checks

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Backups none vs .bak use .bak rollback safety
Preview diff vs sed -n diff clarity

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Sed rules match counts
Integration Tests Full run dry-run + apply
Edge Cases Binary files skip rules

6.2 Critical Test Cases

  1. Replace only in target files.
  2. Verify no leftover old symbols.
  3. Rollback from backups works.

7. Common Pitfalls and Debugging

Pitfall Symptom Solution
Overbroad find Unexpected edits refine predicates
Missing backups irrecoverable changes use -i.bak
Regex mismatch partial changes test on sample

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add --dry-run mode

8.2 Intermediate Extensions

  • Add config-driven rule sets

8.3 Advanced Extensions

  • Add parallel execution with xargs -P

9. Real-World Connections

  • API migrations in large repos
  • Security patch rollouts

10. Resources

  • GNU sed manual
  • The Grymoire sed tutorial

11. Self-Assessment Checklist

  • I can explain why find predicates matter
  • I can roll back from backups

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Safe rename across multiple files

Full Completion:

  • Preview + verification steps

Excellence (Going Above and Beyond):

  • Rule sets + parallel execution