Project 1: minigrep-plus

Build a grep-like CLI that behaves correctly in pipelines and terminals.

Quick Reference

Attribute Value
Difficulty Level 1 (Beginner)
Time Estimate Weekend
Language Rust (Alternatives: Go, Python)
Prerequisites Basic CLI usage, file I/O, loops, error handling
Key Topics STDIN/STDOUT/STDERR, buffering, TTY detection, exit codes

1. Learning Objectives

By completing this project, you will:

  1. Implement a Unix-style CLI that reads from files or STDIN.
  2. Render terminal-friendly output without breaking pipes.
  3. Stream large input safely with constant memory usage.
  4. Apply conventional exit codes for match/no-match/error.
  5. Design flags and help text that match user expectations.

2. Theoretical Foundation

2.1 Core Concepts

  • Standard Streams: Programs read from STDIN and write to STDOUT and STDERR. This lets tools compose via pipes without intermediate files.
  • Buffering: Buffered I/O reduces syscalls and avoids loading large files into memory. For line-based tools, a buffered reader is the default.
  • TTY Detection: A terminal interprets ANSI escape codes. Pipes and files do not. A tool must detect its output destination and adjust accordingly.
  • Exit Codes: Exit codes are the API for automation. grep uses: 0 = match found, 1 = no match, 2 = error.
  • Pattern Matching: Even without regex, you must define case handling, match semantics, and how to highlight output.

2.2 Why This Matters

This is the baseline behavior expected of any CLI. If you mishandle streams or exit codes, your tool becomes unreliable in CI, scripts, and pipelines. Mastering this project means you can build tools that feel native to Unix.

2.3 Historical Context / Background

grep started in 1973 at Bell Labs as a line filter over text streams. The principle it established is still true: a CLI is only correct if it composes cleanly in pipelines.

2.4 Common Misconceptions

  • “Printing colors is always okay”: It breaks machine parsing when output is not a TTY.
  • “Reading entire files is fine”: It fails on logs and large data.
  • “Exit codes do not matter”: They are the only signal a script has.

3. Project Specification

3.1 What You Will Build

A CLI tool named minigrep that:

  • Searches one or more files for a pattern
  • Falls back to STDIN when no file is provided
  • Highlights matches only when output is a terminal
  • Returns Unix-style exit codes

3.2 Functional Requirements

  1. Input Sources: Accept a pattern and optional list of files. If no files are given, read STDIN.
  2. Flags: -i/--ignore-case, --color=auto|always|never, -n/--line-number.
  3. Output: Print matching lines in order. Highlight the match if color is enabled.
  4. Exit Codes: 0 if any match, 1 if no matches, 2 on error.

3.3 Non-Functional Requirements

  • Performance: Stream line-by-line, constant memory usage.
  • Reliability: Continue through multiple files even if one fails.
  • Usability: Help text includes examples and exit code behavior.

3.4 Example Usage / Output

$ minigrep "error" /var/log/app.log
[2025-01-01 10:10:10] error: connection failed
[2025-01-01 10:10:11] error: retrying

3.5 Real World Outcome

Run against a file and then a pipeline to verify behavior and output:

$ minigrep "timeout" app.log --line-number
12: request timeout for user=alice
98: request timeout for user=bob

$ cat app.log | minigrep "timeout" --line-number
12: request timeout for user=alice
98: request timeout for user=bob

In the terminal, matching text is colored. In the piped example, no ANSI codes appear, so downstream tools receive clean text.


4. Solution Architecture

4.1 High-Level Design

+------------------+     +-------------------+     +-------------------+
| Input Reader     | --> | Matcher           | --> | Output Renderer   |
| (files or STDIN) |     | (pattern logic)   |     | (color/format)    |
+------------------+     +-------------------+     +-------------------+
           |                         |                       |
           +-------------------------+-----------------------+
                             Shared config

4.2 Key Components

Component Responsibility Key Decisions
Args Parser Parse flags and inputs POSIX-like defaults
Reader Stream lines Buffered I/O, per file
Matcher Find match and range Case handling strategy
Renderer Format output TTY detection and color policy

4.3 Data Structures

struct Config {
    pattern: String,
    ignore_case: bool,
    color: ColorMode,
    line_numbers: bool,
    files: Vec<String>,
}

struct MatchResult {
    line_number: usize,
    line_text: String,
    match_range: Option<(usize, usize)>,
}

4.4 Algorithm Overview

Key Algorithm: Line-by-line search

  1. Open input source (file or STDIN).
  2. For each line, normalize case if needed and test for match.
  3. If match, compute range (for highlighting) and render.
  4. Track match count for exit codes.

Complexity Analysis:

  • Time: O(N * M) worst-case (N lines, M pattern), O(N) for fixed substring search.
  • Space: O(1) extra (streaming).

5. Implementation Guide

5.1 Development Environment Setup

# Rust example
rustup default stable
cargo new minigrep
cargo add clap atty colored

5.2 Project Structure

minigrep/
├── src/
│   ├── main.rs
│   ├── config.rs
│   ├── matcher.rs
│   └── output.rs
├── tests/
│   └── basic.rs
└── README.md

5.3 The Core Question You Are Answering

“How do Unix tools behave correctly in pipelines while still providing a good terminal experience?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Standard Streams
    • Why are STDOUT and STDERR separate?
    • How does piping change output expectations?
  2. TTY Detection
    • What does isatty() check?
    • Why do ANSI codes break downstream tools?
  3. Exit Codes
    • Why grep uses 0, 1, 2
    • How scripts depend on exit codes
  4. Buffered I/O
    • Why line-by-line processing scales
    • What happens when input is huge

5.5 Questions to Guide Your Design

  1. Should --color=always override TTY detection?
  2. Should the matcher handle regex later or only substring now?
  3. Where do errors go if multiple files are searched?
  4. Do you keep searching after a read error?

5.6 Thinking Exercise

Trace this pipeline by hand:

echo -e "a\nB\nc" | minigrep "b" -i
  • Where does input come from?
  • Should output be colored?
  • What exit code should it return?

5.7 The Interview Questions They Will Ask

  1. What is the difference between STDOUT and STDERR?
  2. Why do we need TTY detection for color output?
  3. How do exit codes influence shell pipelines?
  4. How would you make matching faster for large files?

5.8 Hints in Layers

Hint 1: Start with file input only, then add STDIN support.

Hint 2: Add --color=auto and test with a pipe.

Hint 3: Track match count; decide exit code at the end.

Hint 4: Use a helper to format highlighted output.

5.9 Books That Will Help

Topic Book Chapter
Streams and file I/O “The Linux Programming Interface” Ch. 4
Terminal I/O “Advanced Programming in the UNIX Environment” Ch. 18
CLI conventions “The Art of Unix Programming” Ch. 5

5.10 Implementation Phases

Phase 1: Foundation (2-3 hours)

Goals:

  • Parse arguments
  • Read a file line-by-line

Checkpoint: Search a file and print matching lines.

Phase 2: Core Functionality (3-4 hours)

Goals:

  • Add STDIN support
  • Add line numbers

Checkpoint: echo "x" | minigrep x -n works.

Phase 3: Polish and Edge Cases (2-3 hours)

Goals:

  • Add color mode
  • Correct exit codes

Checkpoint: Pipes show no color; exit code matches grep.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Match type Fixed string vs regex Fixed string first Simpler baseline
Color policy always/auto/never Support all three Matches common tools
Error handling fail-fast vs continue Continue per file Better UX for multi-file

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Matcher logic case sensitivity
Integration Tests CLI behavior file + stdin
Edge Case Tests Empty input no matches

6.2 Critical Test Cases

  1. No matches: exit code 1, no output.
  2. Invalid file: error to STDERR, exit code 2.
  3. Pipe output: no ANSI codes when STDOUT is not a TTY.

6.3 Test Data

alpha
beta
gamma

Expected:

  • minigrep beta data.txt prints only beta
  • minigrep z data.txt prints nothing and returns 1

7. Common Pitfalls and Debugging

Pitfall Symptom Solution
Always coloring output Escape codes in pipes Check TTY and color mode
Reading whole file Memory spike on big files Stream with buffers
Wrong exit codes Scripts fail Track match count

7.2 Debugging Strategies

  • Compare output to grep on the same input.
  • Add a verbose mode for debugging line parsing.

7.3 Performance Traps

  • Repeated to_lowercase() per line can be expensive. Pre-normalize the pattern and use case-insensitive matching wisely.

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add --count to print number of matches.
  • Add --files-with-matches to list filenames only.

8.2 Intermediate Extensions

  • Add --context to print surrounding lines.
  • Add --fixed-strings mode.

8.3 Advanced Extensions

  • Add regex support with streaming engine.
  • Add binary file detection.

9. Real-World Connections

9.1 Industry Applications

  • Log analysis in production pipelines
  • CI output filtering and alerts
  • Security auditing for known patterns
  • ripgrep: High-performance grep replacement
  • ack: Perl-based search tool

9.3 Interview Relevance

  • Streams, exit codes, and CLI conventions show systems knowledge.

10. Resources

10.1 Essential Reading

  • “The Linux Programming Interface” by Michael Kerrisk - Ch. 4
  • “Advanced Programming in the UNIX Environment” by Stevens and Rago - Ch. 18

10.2 Tools and Documentation

  • clap: Argument parsing
  • atty: TTY detection
  • Project 2: task-nexus (subcommands and persistence)
  • Project 4: stream-viz (signals and progress)

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the difference between STDOUT and STDERR
  • I can describe how TTY detection affects output
  • I can explain why exit codes matter

11.2 Implementation

  • All functional requirements are met
  • Output is correct for files and STDIN
  • Exit codes match grep conventions

11.3 Growth

  • I can explain this tool to a teammate
  • I can extend it with one extra flag

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Correctly matches lines from files or STDIN
  • Respects exit code conventions
  • No color output when piped

Full Completion:

  • Includes --color and --line-number
  • Handles multiple files with clear errors

Excellence (Going Above and Beyond):

  • Supports context lines and fixed-string mode
  • Includes tests and usage examples