Project 1: minigrep-plus (The Foundation)

Build a grep-like CLI that behaves correctly in pipes, respects TTY rules, and returns reliable exit codes.

Quick Reference

Attribute Value
Difficulty Level 1 (Beginner)
Time Estimate 6-10 hours
Main Programming Language Rust (Alternatives: Go, Python)
Alternative Programming Languages Go, Python
Coolness Level Level 2: Practical
Business Potential 1: Foundational utility
Prerequisites Basic file I/O, strings, CLI usage
Key Topics Streams, TTY detection, exit codes, parsing

1. Learning Objectives

By completing this project, you will:

  1. Design a CLI grammar that separates flags, positionals, and operands safely.
  2. Implement streaming search that never loads whole files into memory.
  3. Detect TTY vs pipe output and toggle color and formatting correctly.
  4. Return exit codes that match Unix conventions and are script-friendly.
  5. Produce stable output formats for both humans and machines.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Streams, TTYs, and Exit Codes

Fundamentals

Standard streams are the most important contract in Unix CLI design. Every program gets three file descriptors by default: stdin, stdout, and stderr. Stdin is the data input channel, stdout is the data output channel, and stderr is the diagnostics channel. The reason this matters is composability. If your tool prints errors to stdout, you corrupt pipelines. If you print data to stderr, you force users to do extra redirection and surprise them. A second layer is TTY detection. A TTY is an interactive terminal device. When stdout is a TTY, you can show color, progress, and extra hints. When stdout is not a TTY, you should output plain, deterministic data. Finally, exit codes are the machine-readable outcome. A zero exit code means success. Non-zero exit codes indicate failure or special states. Tools like grep use exit code 1 to mean “no match”, which is not a crash but still a signal to scripts.

Deep Dive into the concept

Streams are not just conventions; they are explicit contracts between programs. A pipeline is a chain of processes connected by pipes, where stdout of one becomes stdin of the next. Pipes only carry stdout by default, which is why stderr exists as a separate channel for human diagnostics. If you blur the channels, you break the pipeline’s semantics. A classic example is a search tool that prints status lines to stdout. When a user runs search | wc -l, those status lines are counted as data, which is wrong. The tool should instead print status lines to stderr so that stdout remains pure data.

TTY detection is the gate for interactivity. The shell can redirect stdout to a file or pipe. Your program can call isatty(1) to check if stdout is a terminal. If the answer is false, you should avoid colors, interactive prompts, and animation. This is not just a visual choice; it is about stability. ANSI escape codes are invisible to humans but pollute pipes. They also break JSON output because they insert non-JSON bytes. Respecting NO_COLOR and a --color flag is the standard: default to auto, allow explicit override, and always provide a non-color mode.

Exit codes are the final word in an automated pipeline. They are the only universal signal that scripts and CI systems can rely on. In Unix, 0 means success and non-zero means some failure or a special state. For search tools, there is a subtle convention: 0 means match found, 1 means no match, and 2 means error (bad pattern, file not found, permission denied). That separation allows a script to distinguish “found nothing” from “failed to run.” This nuance makes your tool dependable in real automation. It also forces you to define what is an error. If a file is missing, that is an error. If a file is empty, that is not an error. A good CLI must document these semantics and keep them stable.

Streams are also tied to buffering. When stdout goes to a terminal, it is typically line-buffered. When stdout goes to a pipe, it is typically block-buffered. That changes perceived latency. If your tool writes progress to stdout, that output may be delayed in pipes. This is another reason to keep progress on stderr and flush it explicitly when needed. Many languages also default to buffering stderr differently. You should understand your language’s defaults and be explicit about flushing in long-running operations.

A final subtlety is signals. When a user presses Ctrl+C, the process receives SIGINT. If you are mid-write, the terminal can be left in a partial line. A good CLI prints a newline on interrupt and exits with a specific code (often 130). Even though this project is simple, adopting the habit of clear exit paths is crucial.

How this fit on projects

This concept directly controls how minigrep-plus behaves in pipes and scripts. It also informs the design of your output modes, error messages, and exit code table in the specification. You will apply these rules in the scanning loop, the output formatter, and the CLI main function.

Definitions & key terms

  • stdin/stdout/stderr: The three default streams used for input, data output, and diagnostics.
  • TTY: A terminal device indicating interactive output.
  • Exit code: Integer process result used by scripts.
  • NO_COLOR: A convention to disable ANSI color output.
  • Pipeline: A chain of processes connected by pipes.

Mental model diagram (ASCII)

stdin --> [minigrep-plus] --> stdout --> next command
                      |
                      +--> stderr (diagnostics, progress)
                      +--> exit code (0, 1, 2)

How it works (step-by-step)

  1. The shell sets up stdin/stdout/stderr based on redirection and pipes.
  2. Your CLI checks isatty(stdout) and environment variables like NO_COLOR.
  3. The search loop emits match lines to stdout and diagnostics to stderr.
  4. On completion, the CLI maps outcome to exit codes: 0, 1, or 2.
  5. The shell reads the exit code and exposes it to scripts via $?.

Minimal concrete example

$ minigrep "main" src/main.rs
# stdout: matching lines only
# stderr: (empty)
# exit code: 0

$ minigrep "nope" src/main.rs
# stdout: empty
# stderr: empty
# exit code: 1

$ minigrep "[" src/main.rs
# stdout: empty
# stderr: error message
# exit code: 2

Common misconceptions

  • “Colored output is harmless in pipes.” -> Color codes corrupt machine-readable output.
  • “No match is success.” -> For grep-style tools, no match is a distinct state.
  • “stderr is optional.” -> Without stderr, you cannot keep data output clean.

Check-your-understanding questions

  1. Why should progress output go to stderr instead of stdout?
  2. What is the semantic difference between exit code 1 and exit code 2 in grep-like tools?
  3. How does TTY detection influence output formatting?
  4. What does NO_COLOR imply when stdout is a TTY?

Check-your-understanding answers

  1. Because stdout is the data channel used by pipes; diagnostics must not pollute it.
  2. Exit code 1 means no matches were found; exit code 2 means an error occurred.
  3. If stdout is not a TTY, output must be plain and non-interactive.
  4. It disables ANSI color output even when a terminal is present.

Real-world applications

  • grep, ripgrep, and ag use this contract to work in scripts.
  • curl and git separate data and diagnostics for automation.

Where you will apply it

References

  • The Linux Programming Interface, Ch. 4 (I/O and file descriptors)
  • Advanced Programming in the UNIX Environment, Ch. 3 and 10 (I/O, signals)
  • Command Line Interface Guidelines (clig.dev)

Key insights

A CLI that gets streams and exit codes right becomes trustworthy in every pipeline.

Summary

Stream separation, TTY awareness, and exit codes are the three pillars of Unix composability. They determine whether your tool is safe to script and reliable to automate.

Homework/Exercises to practice the concept

  1. Run grep with matches and without matches, and record exit codes.
  2. Pipe a colored tool into wc -l and observe the difference.
  3. Write a tiny script that fails if a command returns non-zero.

Solutions to the homework/exercises

  1. Exit code 0 for matches, 1 for no matches, 2 for errors.
  2. Color codes add hidden bytes that break the count.
  3. Use if minigrep "panic" fixtures/sample.log; then echo ok; else echo fail; fi or check $?.

2.2 Pattern Matching and Streaming File I/O

Fundamentals

Pattern matching is the core of any search tool. At a minimum, you must compare a pattern to each line of input and decide whether it matches. This can be a literal substring search, a case-insensitive match, or a regular expression. Whatever matching mode you choose, the critical property is streaming behavior. Files can be large, and loading them fully into memory makes your tool slow and unreliable. Streaming I/O means reading incrementally, line by line or chunk by chunk, and emitting results as you go. This keeps memory usage constant and enables your tool to work on stdin without knowing the total size in advance. Streaming also aligns with Unix pipelines, where data flows continuously through multiple tools.

Deep Dive into the concept

Efficient file scanning requires a balance between correctness and performance. The simplest approach is line-based scanning: read each line, check for a match, print if matched. This is easy to implement and maps to the mental model of grep. But line-based scanning has edge cases. Lines can be extremely long, and naive line reading can allocate huge buffers or even fail. A robust implementation uses buffered reads with configurable limits or falls back to chunk-based scanning for long lines.

Substring search can be done with a naive algorithm (scan each line and compare character-by-character) or with more advanced algorithms such as Boyer-Moore or Knuth-Morris-Pratt. For this project, you can rely on your language’s standard library string search. The key is to understand that case-insensitive search requires normalization. You can either convert both pattern and line to lowercase or use locale-aware case folding. The former is simpler and deterministic; the latter is more correct for international text but more complex.

Regular expressions introduce another layer. If you support regex, you must decide on the engine (Rust’s regex crate, Go’s regexp, Python’s re) and understand their guarantees. Some engines are linear-time (Rust regex), while others can be exponential in pathological cases. This matters because a CLI tool must be safe against worst-case inputs. You should also be clear about whether the pattern is treated as a literal by default or a regex by default, and provide flags like --regex or --fixed-strings for explicit control.

Streaming I/O interacts with output formatting. If you colorize matches, you need to compute the match positions and wrap them with ANSI escape codes. This is straightforward for literal search but more complex for regex, which can return multiple ranges. You must ensure that colorization does not change the underlying text and that it can be disabled for non-TTY output. If you have a --line-number flag, you need to count lines as you stream, not by preloading the file.

Finally, input can come from stdin or files. If a file argument is provided, you read the file. If not, you read stdin. Some tools support both (multiple files and stdin). This requires a unified input abstraction so the rest of your logic does not care where bytes come from. You should also consider binary files. Many grep-like tools detect binary and either skip it or print a warning. For this project, you can define a policy: treat input as text and ignore invalid UTF-8, or treat it as bytes. The policy must be documented and consistent.

How this fit on projects

This concept determines the core scanning loop, the match logic, and the performance profile of your CLI. It also drives your decisions about regex support, case folding, and line numbering.

Definitions & key terms

  • Streaming I/O: Processing data incrementally without loading it all into memory.
  • Substring search: Checking if a pattern occurs inside a line.
  • Case folding: Converting text to a common case for comparison.
  • Regex engine: The library that evaluates regular expressions.
  • Buffering: Reading data in chunks to reduce system calls.

Mental model diagram (ASCII)

[Input bytes] -> [Buffered reader] -> [Line parser] -> [Match engine] -> [Output]

How it works (step-by-step)

  1. Create a buffered reader from a file or stdin.
  2. Read one line or chunk at a time.
  3. Apply the match function (literal or regex).
  4. If matched, format the line (line numbers, color) and print.
  5. Track counts and exit status based on whether any match occurred.

Minimal concrete example

for (idx, line) in reader.lines().enumerate() {
    let line = line?;
    if matcher.is_match(&line) {
        println!("{}:{}", idx + 1, line);
        found = true;
    }
}

Common misconceptions

  • “Reading the whole file is fine.” -> Large files will exhaust memory and slow the tool.
  • “Regex is always better.” -> Regex can be slower and less predictable than substring search.
  • “Case-insensitive means tolower only.” -> Locale rules can make naive folding incorrect.

Check-your-understanding questions

  1. Why is streaming I/O critical for CLI tools that read stdin?
  2. What is the trade-off between substring search and regex search?
  3. How do line numbers affect streaming implementation?

Check-your-understanding answers

  1. stdin may be infinite or very large; streaming keeps memory bounded.
  2. Regex is more expressive but can be slower or complex; substring search is fast and predictable.
  3. You must count lines as you read, not after loading all data.

Real-world applications

  • ripgrep and grep use buffered scanning for speed and memory safety.
  • Log processing tools rely on streaming input to handle gigabyte logs.

Where you will apply it

References

  • The Rust Programming Language, Ch. 12 (minigrep)
  • The Linux Programming Interface, Ch. 4 (buffered I/O)

Key insights

A search tool is only as good as its streaming loop; correctness and performance live there.

Summary

Streaming file I/O and pattern matching define the correctness, memory profile, and UX of your grep-style CLI.

Homework/Exercises to practice the concept

  1. Implement a tiny line counter that reads stdin.
  2. Compare substring search vs regex on a large file.
  3. Test behavior on a file with a very long line.

Solutions to the homework/exercises

  1. Use a buffered reader and increment a counter per line.
  2. Measure with time and compare outputs.
  3. Confirm your scanner does not panic or allocate excessive memory.

2.3 CLI Grammar and Argument Parsing

Fundamentals

A CLI grammar is the structured layout of flags, positionals, and subcommands that define how users express intent. For minigrep-plus, the grammar is small but still important because it is the first exposure to conventions users expect. The grammar must define where the pattern goes, where the file path goes, and which flags modify behavior. If the grammar is ambiguous, user errors become common and your help text becomes unreliable. Good argument parsing also gives you a consistent place to validate inputs and to expose defaults. This is the point where a CLI becomes a product instead of a quick script.

Deep Dive into the concept

The shell parses the command line before your program sees it. It expands globs, removes quotes, and splits on whitespace, giving you argv as a list of strings. Your parser must interpret that list. There are two primary styles: POSIX style with flags like -i and --ignore-case, and the “GNU” style with long options and optional arguments. For this project, follow the CLI Guidelines: short options are single letters, long options are descriptive words, and -- ends option parsing. This allows the user to search for patterns that begin with - without confusing the parser.

Argument parsing libraries (Clap, Cobra, argparse) can generate usage and help output. But you still need to design the grammar. Should the pattern be a positional argument or a flag like --pattern? The answer depends on ergonomics. grep PATTERN FILE is standard and fast to type, so a positional pattern is appropriate. But if you later add more operands, you must ensure they are not ambiguous. A good rule is “required operands first, optional operands last.” Another rule is to avoid optional positionals because they break error messages: when a positional is missing, the parser cannot distinguish between missing file and missing pattern.

Your grammar also defines defaults and overrides. For example, --ignore-case should default to false, and --color should default to auto. You should allow --color=always|auto|never to match other tools. Also consider aliases: -i for ignore case is standard. For exit codes, you should document them in --help or --version --verbose output. A clear help output is part of the grammar. The CLI Guidelines recommend including a short summary, usage synopsis, and examples. This reduces support overhead and helps users explore the tool without reading external docs.

Parsing errors must be clear and actionable. If a user passes too few arguments, the error should include the expected usage and an example. If a user passes an unknown flag, show the closest match. Some parsers do this automatically. If you build custom parsing, you must implement it manually. Finally, argument parsing interacts with config and environment variables. In this project, you may include MINIGREP_IGNORE_CASE as an environment variable. If you do, define precedence clearly: flags override env, env overrides defaults. That pattern is standard and sets you up for later projects.

How this fit on projects

This concept controls how users invoke the tool and how you present help. It also determines how you validate inputs and produce consistent error messages.

Definitions & key terms

  • Positional argument: An operand determined by its position.
  • Flag / option: A named switch that modifies behavior.
  • -- separator: Ends option parsing.
  • Usage synopsis: The one-line description of how to call the CLI.
  • Default value: A value used when a flag is not provided.

Mental model diagram (ASCII)

argv[] -> [Parser] -> {pattern, file?, flags} -> [Validation] -> [Execution]

How it works (step-by-step)

  1. Receive argv from the shell.
  2. Parse flags, stopping at -- if present.
  3. Assign required positionals (pattern, optional file).
  4. Validate values and set defaults.
  5. Either show help or run the search loop.

Minimal concrete example

$ minigrep --ignore-case --color=auto "error" ./app.log
# pattern=error, file=./app.log, ignore_case=true, color=auto

Common misconceptions

  • “The CLI can see the original quotes.” -> The shell removes quotes before argv.
  • “Optional positionals are harmless.” -> They lead to ambiguous parsing.
  • “Help output is optional.” -> It is part of the user contract.

Check-your-understanding questions

  1. Why should a CLI implement the -- separator?
  2. What is the risk of optional positional arguments?
  3. How should defaults and environment variables interact?

Check-your-understanding answers

  1. It allows operands that start with - to be treated as data.
  2. They create ambiguity and poor error messages.
  3. Defaults < env vars < explicit flags.

Real-world applications

  • git and kubectl rely on consistent grammars to scale.
  • grep and rg use -i and --ignore-case for ergonomics.

Where you will apply it

References

  • POSIX Utility Syntax Guidelines
  • Command Line Interface Guidelines (clig.dev)

Key insights

A clear grammar is the foundation of discoverability and trust in any CLI.

Summary

Argument parsing is not just about flags; it is the design of user intent. Good grammars reduce ambiguity and improve automation.

Homework/Exercises to practice the concept

  1. Draft a usage synopsis for minigrep-plus with flags and operands.
  2. Design a --color flag that mirrors grep behavior.
  3. Create three example commands that would appear in --help.

Solutions to the homework/exercises

  1. minigrep [OPTIONS] PATTERN [FILE] with flags listed below.
  2. --color=auto|always|never with default auto.
  3. Include a basic search, a case-insensitive search, and a piped example.

3. Project Specification

3.1 What You Will Build

You will build a CLI tool named minigrep that searches for a pattern in either a file or stdin. The tool supports case-insensitive search, line numbering, and optional colorized matches when stdout is a TTY. It prints only matching lines to stdout and diagnostics to stderr. It returns exit code 0 when a match is found, 1 when no match is found, and 2 on errors. The tool does not implement full regex or multiline matching; it is line-based and focused on predictable behavior.

3.2 Functional Requirements

  1. Search from file or stdin: If a file is provided, read it; otherwise read stdin.
  2. Pattern matching: Support literal substring matching; optional regex mode if implemented.
  3. Case-insensitive mode: -i or --ignore-case toggles case folding.
  4. Line numbers: -n or --line-number prepends line numbers.
  5. Color mode: --color=auto|always|never with auto default and NO_COLOR support.
  6. Exit codes: 0 match found, 1 no match, 2 error.
  7. Help output: --help prints usage and examples.
  8. Machine output: --json outputs matches as JSON objects (optional but recommended).

3.3 Non-Functional Requirements

  • Performance: Must handle files larger than memory using streaming I/O.
  • Reliability: Must not crash on empty files or long lines.
  • Usability: Output must be stable and easy to parse; errors must be clear.

3.4 Example Usage / Output

$ minigrep "error" ./app.log --line-number
12: error: failed to connect
58: error: retrying

3.5 Data Formats / Schemas / Protocols

If --json is used, output one JSON object per match (JSONL):

{"file":"./app.log","line":12,"text":"error: failed to connect"}

3.6 Edge Cases

  • Empty input should return exit code 1 (no match) with no output.
  • Pattern contains leading - should be accepted after --.
  • Very long lines should not crash the tool.
  • Binary files with invalid UTF-8 should be handled deterministically (document policy).

3.7 Real World Outcome

This section is a golden reference. A learner can compare their result against it directly.

3.7.1 How to Run (Copy/Paste)

# Build
cargo build --release

# Run on file
./target/release/minigrep "panic" ./fixtures/sample.log

# Run on stdin
cat ./fixtures/sample.log | ./target/release/minigrep "panic"

3.7.2 Golden Path Demo (Deterministic)

$ ./target/release/minigrep "panic" ./fixtures/sample.log --line-number --color=never
3: panic: index out of bounds
9: panic: unwrap failed

$ echo $?
0

3.7.3 Failure Demo (Deterministic)

$ ./target/release/minigrep "not-found" ./fixtures/sample.log
$ echo $?
1

$ ./target/release/minigrep "[" ./fixtures/sample.log
minigrep: invalid pattern: "["
$ echo $?
2

3.7.4 Exit Codes

  • 0: At least one match found.
  • 1: No matches found.
  • 2: Error (invalid pattern, file not found, permission denied).

4. Solution Architecture

4.1 High-Level Design

+------------------+
| CLI Parser       |
+------------------+
          |
          v
+------------------+     +------------------+
| Input Reader     | --> | Match Engine     |
+------------------+     +------------------+
          |                        |
          v                        v
+------------------+     +------------------+
| Output Formatter | --> | Exit Code Mapper |
+------------------+     +------------------+

4.2 Key Components

Component Responsibility Key Decisions
Parser Parse flags/positionals Follow POSIX conventions, support --.
Reader Stream input Buffered reading, line-by-line processing.
Matcher Determine matches Literal search first, optional regex later.
Formatter Output formatting Respect TTY and NO_COLOR.
Exit mapper Map outcomes to exit codes 0/1/2 semantics.

4.3 Data Structures (No Full Code)

struct MatchRecord {
    file: Option<String>,
    line_number: usize,
    text: String,
}

4.4 Algorithm Overview

Key Algorithm: Streaming Scan

  1. Open file or stdin as a buffered reader.
  2. For each line, check if it matches.
  3. If match, format and print the line.
  4. Track whether any matches occur.
  5. Return exit code based on match count and error state.

Complexity Analysis:

  • Time: O(N) where N is total bytes scanned.
  • Space: O(L) where L is max line length.

5. Implementation Guide

5.1 Development Environment Setup

rustup default stable
cargo new minigrep
cd minigrep

5.2 Project Structure

minigrep/
├── src/
│   ├── main.rs
│   ├── args.rs
│   ├── matcher.rs
│   └── output.rs
├── fixtures/
│   └── sample.log
└── Cargo.toml

5.3 The Core Question You’re Answering

“How do I design output that is useful both to humans and to scripts?”

Before you write any code, sit with this question. A CLI is not just an app; it is a component in someone else’s pipeline.

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Streams and exit codes
    • Why stderr is separate from stdout.
    • How exit codes propagate to scripts.
    • Book: The Linux Programming Interface, Ch. 4.
  2. TTY detection
    • How isatty changes output rules.
    • Why NO_COLOR exists.
    • Book: Advanced Programming in the UNIX Environment, Ch. 18.
  3. Streaming I/O
    • Why you cannot load huge files into memory.
    • Book: The Linux Programming Interface, Ch. 4.

5.5 Questions to Guide Your Design

  1. What should the tool do when no file is provided?
  2. Should --color=always override NO_COLOR?
  3. How should errors be reported without breaking pipelines?
  4. Is “no match” a failure or a normal state?

5.6 Thinking Exercise

Trace this pipeline by hand:

cat ./fixtures/sample.log | ./target/release/minigrep "panic" | wc -l
  • Which stream should each output line go to?
  • What exit code should be reported?

5.7 The Interview Questions They’ll Ask

  1. “Why do Unix tools separate stdout and stderr?”
  2. “What is a TTY and why does it change output behavior?”
  3. “Why does grep return exit code 1 when nothing is found?”

5.8 Hints in Layers

Hint 1: Start with a literal match Write the simplest possible matcher that checks line.contains(pattern).

Hint 2: Add line numbers Increment a counter per line and format output as N: line.

Hint 3: Add TTY and color If stdout is a TTY and --color is auto, wrap match with ANSI codes.

Hint 4: Map exit codes Return 0 if match found, 1 if none, 2 on error.

5.9 Books That Will Help

Topic Book Chapter
Streams The Linux Programming Interface Ch. 4
Terminal I/O Advanced Programming in the UNIX Environment Ch. 18
CLI basics The Linux Command Line Ch. 6

5.10 Implementation Phases

Phase 1: Foundation (2-3 hours)

Goals:

  • Parse arguments and flags.
  • Read input from file or stdin.

Tasks:

  1. Implement a basic CLI with pattern and optional file.
  2. Add --help with usage and examples.

Checkpoint: minigrep "x" file prints matching lines.

Phase 2: Core Functionality (2-4 hours)

Goals:

  • Case-insensitive search.
  • Line numbers and exit codes.

Tasks:

  1. Implement --ignore-case.
  2. Track match state and exit codes.

Checkpoint: minigrep "x" file; echo $? matches grep behavior.

Phase 3: Polish and Edge Cases (2-3 hours)

Goals:

  • TTY detection and color rules.
  • Error handling and JSON output.

Tasks:

  1. Add --color=auto|always|never and NO_COLOR handling.
  2. Add --json output mode.

Checkpoint: minigrep "panic" fixtures/sample.log --json | jq works without color pollution.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Matching mode literal only, regex literal by default Predictable performance.
Color behavior always, auto, never auto default Matches Unix conventions.
Input model line-based, chunk-based line-based Simpler and sufficient.

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate matcher logic case-insensitive matches
Integration Tests Validate CLI output matches from stdin
Edge Case Tests Stress boundaries long lines, empty files

6.2 Critical Test Cases

  1. Match found: Search in file with known matches; exit code 0.
  2. No match: Search with pattern missing; exit code 1.
  3. Invalid pattern: If regex enabled, invalid pattern returns exit code 2.
  4. Pipe behavior: Piped output contains no ANSI codes when TTY is false.

6.3 Test Data

fixtures/sample.log
- contains two lines with "panic"
- contains one line with "error"

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Output on stdout includes errors Pipelines break Print errors to stderr only.
Always color output JSON/pipes corrupted Use TTY detection and NO_COLOR.
Exit code always 0 Scripts misbehave Track match state and errors.

7.2 Debugging Strategies

  • Reproduce with pipes: minigrep "panic" fixtures/sample.log | cat to detect color leakage.
  • Inspect exit code: Use echo $? after each run.
  • Use small fixtures: Keep deterministic sample files for tests.

7.3 Performance Traps

  • Reading full files into memory causes slowdowns on large logs.
  • Regex engines can be slow on catastrophic patterns; prefer literal search.

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add --count to print the number of matching lines.
  • Add --files-with-matches to list filenames only.

8.2 Intermediate Extensions

  • Add multi-file support with filename prefixes.
  • Add --before-context and --after-context options.

8.3 Advanced Extensions

  • Add regex mode with safe engine and timeouts.
  • Add parallel file scanning for large directories.

9. Real-World Connections

9.1 Industry Applications

  • Log analysis tools use grep-like filtering as a core primitive.
  • CI pipelines rely on exit codes to decide pass/fail stages.
  • ripgrep - fast recursive search tool with streaming design.
  • ag (The Silver Searcher) - grep-like search optimized for code.

9.3 Interview Relevance

  • Streams and exit codes are common OS interview topics.
  • CLI grammar design is tested in systems and tooling interviews.

10. Resources

10.1 Essential Reading

  • The Rust Programming Language, Ch. 12 (minigrep)
  • The Linux Programming Interface, Ch. 4 (I/O)

10.2 Video Resources

  • “Unix Pipes and Filters” lectures (MIT Missing Semester)
  • “Command Line Interface Guidelines” talk

10.3 Tools and Documentation

  • clap documentation for Rust argument parsing
  • atty or is-terminal libraries for TTY detection

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain why stdout and stderr must be separate.
  • I can explain the difference between exit code 1 and 2.
  • I can explain when a CLI should disable color output.

11.2 Implementation

  • All functional requirements are met.
  • All test cases pass.
  • Edge cases are handled without crashes.

11.3 Growth

  • I can explain this project in an interview.
  • I can point to at least one design trade-off I made.
  • I documented lessons learned.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Parses flags and positionals correctly.
  • Searches stdin or a file using streaming I/O.
  • Returns exit codes 0/1/2 with correct semantics.

Full Completion:

  • Adds color rules, line numbers, and JSON output.
  • Includes robust error messages and help text.

Excellence (Going Above and Beyond):

  • Supports multiple files with filename prefixes.
  • Adds context lines and safe regex mode.