Project 1: minigrep-plus
Build a grep-like CLI that behaves correctly in pipelines and terminals.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 1 (Beginner) |
| Time Estimate | Weekend |
| Language | Rust (Alternatives: Go, Python) |
| Prerequisites | Basic CLI usage, file I/O, loops, error handling |
| Key Topics | STDIN/STDOUT/STDERR, buffering, TTY detection, exit codes |
1. Learning Objectives
By completing this project, you will:
- Implement a Unix-style CLI that reads from files or STDIN.
- Render terminal-friendly output without breaking pipes.
- Stream large input safely with constant memory usage.
- Apply conventional exit codes for match/no-match/error.
- Design flags and help text that match user expectations.
2. Theoretical Foundation
2.1 Core Concepts
- Standard Streams: Programs read from STDIN and write to STDOUT and STDERR. This lets tools compose via pipes without intermediate files.
- Buffering: Buffered I/O reduces syscalls and avoids loading large files into memory. For line-based tools, a buffered reader is the default.
- TTY Detection: A terminal interprets ANSI escape codes. Pipes and files do not. A tool must detect its output destination and adjust accordingly.
- Exit Codes: Exit codes are the API for automation. grep uses: 0 = match found, 1 = no match, 2 = error.
- Pattern Matching: Even without regex, you must define case handling, match semantics, and how to highlight output.
2.2 Why This Matters
This is the baseline behavior expected of any CLI. If you mishandle streams or exit codes, your tool becomes unreliable in CI, scripts, and pipelines. Mastering this project means you can build tools that feel native to Unix.
2.3 Historical Context / Background
grep started in 1973 at Bell Labs as a line filter over text streams. The principle it established is still true: a CLI is only correct if it composes cleanly in pipelines.
2.4 Common Misconceptions
- “Printing colors is always okay”: It breaks machine parsing when output is not a TTY.
- “Reading entire files is fine”: It fails on logs and large data.
- “Exit codes do not matter”: They are the only signal a script has.
3. Project Specification
3.1 What You Will Build
A CLI tool named minigrep that:
- Searches one or more files for a pattern
- Falls back to STDIN when no file is provided
- Highlights matches only when output is a terminal
- Returns Unix-style exit codes
3.2 Functional Requirements
- Input Sources: Accept a pattern and optional list of files. If no files are given, read STDIN.
- Flags:
-i/--ignore-case,--color=auto|always|never,-n/--line-number. - Output: Print matching lines in order. Highlight the match if color is enabled.
- Exit Codes: 0 if any match, 1 if no matches, 2 on error.
3.3 Non-Functional Requirements
- Performance: Stream line-by-line, constant memory usage.
- Reliability: Continue through multiple files even if one fails.
- Usability: Help text includes examples and exit code behavior.
3.4 Example Usage / Output
$ minigrep "error" /var/log/app.log
[2025-01-01 10:10:10] error: connection failed
[2025-01-01 10:10:11] error: retrying
3.5 Real World Outcome
Run against a file and then a pipeline to verify behavior and output:
$ minigrep "timeout" app.log --line-number
12: request timeout for user=alice
98: request timeout for user=bob
$ cat app.log | minigrep "timeout" --line-number
12: request timeout for user=alice
98: request timeout for user=bob
In the terminal, matching text is colored. In the piped example, no ANSI codes appear, so downstream tools receive clean text.
4. Solution Architecture
4.1 High-Level Design
+------------------+ +-------------------+ +-------------------+
| Input Reader | --> | Matcher | --> | Output Renderer |
| (files or STDIN) | | (pattern logic) | | (color/format) |
+------------------+ +-------------------+ +-------------------+
| | |
+-------------------------+-----------------------+
Shared config
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Args Parser | Parse flags and inputs | POSIX-like defaults |
| Reader | Stream lines | Buffered I/O, per file |
| Matcher | Find match and range | Case handling strategy |
| Renderer | Format output | TTY detection and color policy |
4.3 Data Structures
struct Config {
pattern: String,
ignore_case: bool,
color: ColorMode,
line_numbers: bool,
files: Vec<String>,
}
struct MatchResult {
line_number: usize,
line_text: String,
match_range: Option<(usize, usize)>,
}
4.4 Algorithm Overview
Key Algorithm: Line-by-line search
- Open input source (file or STDIN).
- For each line, normalize case if needed and test for match.
- If match, compute range (for highlighting) and render.
- Track match count for exit codes.
Complexity Analysis:
- Time: O(N * M) worst-case (N lines, M pattern), O(N) for fixed substring search.
- Space: O(1) extra (streaming).
5. Implementation Guide
5.1 Development Environment Setup
# Rust example
rustup default stable
cargo new minigrep
cargo add clap atty colored
5.2 Project Structure
minigrep/
├── src/
│ ├── main.rs
│ ├── config.rs
│ ├── matcher.rs
│ └── output.rs
├── tests/
│ └── basic.rs
└── README.md
5.3 The Core Question You Are Answering
“How do Unix tools behave correctly in pipelines while still providing a good terminal experience?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Standard Streams
- Why are STDOUT and STDERR separate?
- How does piping change output expectations?
- TTY Detection
- What does
isatty()check? - Why do ANSI codes break downstream tools?
- What does
- Exit Codes
- Why grep uses 0, 1, 2
- How scripts depend on exit codes
- Buffered I/O
- Why line-by-line processing scales
- What happens when input is huge
5.5 Questions to Guide Your Design
- Should
--color=alwaysoverride TTY detection? - Should the matcher handle regex later or only substring now?
- Where do errors go if multiple files are searched?
- Do you keep searching after a read error?
5.6 Thinking Exercise
Trace this pipeline by hand:
echo -e "a\nB\nc" | minigrep "b" -i
- Where does input come from?
- Should output be colored?
- What exit code should it return?
5.7 The Interview Questions They Will Ask
- What is the difference between STDOUT and STDERR?
- Why do we need TTY detection for color output?
- How do exit codes influence shell pipelines?
- How would you make matching faster for large files?
5.8 Hints in Layers
Hint 1: Start with file input only, then add STDIN support.
Hint 2: Add --color=auto and test with a pipe.
Hint 3: Track match count; decide exit code at the end.
Hint 4: Use a helper to format highlighted output.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Streams and file I/O | “The Linux Programming Interface” | Ch. 4 |
| Terminal I/O | “Advanced Programming in the UNIX Environment” | Ch. 18 |
| CLI conventions | “The Art of Unix Programming” | Ch. 5 |
5.10 Implementation Phases
Phase 1: Foundation (2-3 hours)
Goals:
- Parse arguments
- Read a file line-by-line
Checkpoint: Search a file and print matching lines.
Phase 2: Core Functionality (3-4 hours)
Goals:
- Add STDIN support
- Add line numbers
Checkpoint: echo "x" | minigrep x -n works.
Phase 3: Polish and Edge Cases (2-3 hours)
Goals:
- Add color mode
- Correct exit codes
Checkpoint: Pipes show no color; exit code matches grep.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Match type | Fixed string vs regex | Fixed string first | Simpler baseline |
| Color policy | always/auto/never | Support all three | Matches common tools |
| Error handling | fail-fast vs continue | Continue per file | Better UX for multi-file |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Matcher logic | case sensitivity |
| Integration Tests | CLI behavior | file + stdin |
| Edge Case Tests | Empty input | no matches |
6.2 Critical Test Cases
- No matches: exit code 1, no output.
- Invalid file: error to STDERR, exit code 2.
- Pipe output: no ANSI codes when STDOUT is not a TTY.
6.3 Test Data
alpha
beta
gamma
Expected:
minigrep beta data.txtprints onlybetaminigrep z data.txtprints nothing and returns 1
7. Common Pitfalls and Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Always coloring output | Escape codes in pipes | Check TTY and color mode |
| Reading whole file | Memory spike on big files | Stream with buffers |
| Wrong exit codes | Scripts fail | Track match count |
7.2 Debugging Strategies
- Compare output to
grepon the same input. - Add a verbose mode for debugging line parsing.
7.3 Performance Traps
- Repeated
to_lowercase()per line can be expensive. Pre-normalize the pattern and use case-insensitive matching wisely.
8. Extensions and Challenges
8.1 Beginner Extensions
- Add
--countto print number of matches. - Add
--files-with-matchesto list filenames only.
8.2 Intermediate Extensions
- Add
--contextto print surrounding lines. - Add
--fixed-stringsmode.
8.3 Advanced Extensions
- Add regex support with streaming engine.
- Add binary file detection.
9. Real-World Connections
9.1 Industry Applications
- Log analysis in production pipelines
- CI output filtering and alerts
- Security auditing for known patterns
9.2 Related Open Source Projects
- ripgrep: High-performance grep replacement
- ack: Perl-based search tool
9.3 Interview Relevance
- Streams, exit codes, and CLI conventions show systems knowledge.
10. Resources
10.1 Essential Reading
- “The Linux Programming Interface” by Michael Kerrisk - Ch. 4
- “Advanced Programming in the UNIX Environment” by Stevens and Rago - Ch. 18
10.2 Tools and Documentation
- clap: Argument parsing
- atty: TTY detection
10.3 Related Projects in This Series
- Project 2: task-nexus (subcommands and persistence)
- Project 4: stream-viz (signals and progress)
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the difference between STDOUT and STDERR
- I can describe how TTY detection affects output
- I can explain why exit codes matter
11.2 Implementation
- All functional requirements are met
- Output is correct for files and STDIN
- Exit codes match grep conventions
11.3 Growth
- I can explain this tool to a teammate
- I can extend it with one extra flag
12. Submission / Completion Criteria
Minimum Viable Completion:
- Correctly matches lines from files or STDIN
- Respects exit code conventions
- No color output when piped
Full Completion:
- Includes
--colorand--line-number - Handles multiple files with clear errors
Excellence (Going Above and Beyond):
- Supports context lines and fixed-string mode
- Includes tests and usage examples