Project 4: stream-viz (Streaming Visualizer)

Build a streaming CLI that reports live throughput and line metrics without breaking pipelines.

Quick Reference

Attribute Value
Difficulty Level 3 (Advanced)
Time Estimate 1 week
Main Programming Language Rust (Alternatives: Go, Python)
Alternative Programming Languages Go, Python
Coolness Level Level 3: Genuinely clever
Business Potential 2: Internal tooling
Prerequisites Async I/O, signals, streams
Key Topics Streaming I/O, TTY updates, SIGINT handling

1. Learning Objectives

By completing this project, you will:

  1. Implement non-blocking streaming input processing.
  2. Render live status updates without corrupting stdout.
  3. Handle SIGINT and SIGTERM with clean shutdown.
  4. Separate data output from progress output reliably.
  5. Provide deterministic summary output for scripts.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Streaming I/O and Buffering

Fundamentals

Streaming I/O means processing data as it arrives rather than loading it all at once. For a CLI that reads stdin, this is mandatory because stdin can be infinite. Buffering is how you balance performance and responsiveness: reading in small chunks reduces latency but increases syscalls, while large buffers improve throughput but delay updates. A streaming visualizer must balance both, providing real-time metrics while still handling large volumes of input efficiently.

Deep Dive into the concept

At the OS level, pipes and stdin are file descriptors that deliver bytes. Reads may block until data arrives. If you read in a tight loop with small buffers, you may consume CPU. If you read with large buffers, you may not emit updates frequently enough. The solution is to decouple reading from rendering. Use a buffered reader to pull chunks of bytes, then parse them into lines or byte counts, and maintain counters in memory. For real-time updates, you can run a timer that fires every N milliseconds and prints the current stats to stderr. This allows reading and rendering to proceed independently.

Line counting introduces its own complexity. Lines are determined by newline bytes. If you read in chunks, a line can span multiple chunks. You must carry partial lines across reads. This is a common bug in streaming tools. The safe approach is to keep a small buffer of trailing bytes after each chunk and prepend it to the next chunk before scanning for newlines.

Buffering interacts with output. stdout might be piped, and it is not safe to print progress there. Therefore, you should reserve stdout for final outputs (if any) and use stderr for live metrics. In many designs, the live display uses carriage returns (\r) to update a single line. That works in a TTY but not in pipes. Therefore, you must detect TTY and disable live updates when stdout or stderr is not a terminal.

Finally, performance matters. A high-throughput stream should not allocate per line. Reuse buffers, avoid string conversions where possible, and keep the hot path simple. This is a real systems problem: the faster the input, the more the overhead of unnecessary allocations and logging. The tool should still be able to process tens of MB/s without lag.

How this fit on projects

This concept defines the core read loop and the metrics counters. It also impacts the output architecture and the performance guarantees in the spec.

Definitions & key terms

  • Buffering: Reading in fixed-size chunks.
  • Non-blocking: Avoiding stalls in the read loop.
  • Throughput: Bytes or lines per second.
  • Partial line: A line split across chunks.

Mental model diagram (ASCII)

stdin -> [chunk reader] -> [line parser] -> [counters]
                       -> [timer] -> [live render]

How it works (step-by-step)

  1. Read bytes into a buffer from stdin.
  2. Count bytes and detect newlines.
  3. Update counters and carry over partial lines.
  4. Every interval, compute rates and render to stderr if TTY.
  5. On EOF, print final summary.

Minimal concrete example

Chunk: "ping\npo"
Lines: 1, partial: "po"
Next chunk: "ng\n"
Lines: +1, partial: ""

Common misconceptions

  • “Line counting is trivial.” -> Chunked reads can split lines.
  • “Progress output can go to stdout.” -> It breaks pipelines.
  • “Large buffers always improve performance.” -> They reduce update granularity.

Check-your-understanding questions

  1. Why must partial lines be buffered across reads?
  2. What is the trade-off between buffer size and update frequency?
  3. Why should progress output go to stderr?

Check-your-understanding answers

  1. Because lines can be split across chunks and you must not double count.
  2. Larger buffers increase throughput but delay feedback.
  3. To keep stdout clean for piping and scripts.

Real-world applications

  • Log tailing tools and metrics agents use similar streaming patterns.
  • pv (Pipe Viewer) is a classic throughput visualizer.

Where you will apply it

References

  • The Linux Programming Interface, Ch. 4 (I/O)
  • pv source code concepts

Key insights

Streaming tools must separate reading from rendering to stay both fast and responsive.

Summary

Chunked reads, partial line buffering, and timer-driven updates are the foundation of streaming CLIs.

Homework/Exercises to practice the concept

  1. Write a program that counts lines from stdin using chunked reads.
  2. Compare performance of 4KB vs 64KB buffers.
  3. Implement a timer that prints throughput every second.

Solutions to the homework/exercises

  1. Keep a carry buffer for partial lines.
  2. Measure using time and record throughput.
  3. Use a ticker to print stats to stderr.

2.2 TTY Rendering and Output Discipline

Fundamentals

Live updating output is only safe in a TTY. When stdout or stderr is redirected, control sequences like \r and ANSI clear commands become noise. TTY-aware rendering is the discipline of only using interactive output when a terminal is present, and falling back to plain, line-based output otherwise. This keeps your tool script-friendly and avoids corrupting pipes.

Deep Dive into the concept

A streaming visualizer often wants to update a single line in place. That is done with carriage return (\r) and optional clear-to-end-of-line sequences. This is a TTY-only technique. If you emit these bytes to a file or pipe, they become literal characters and pollute logs. The correct design is to detect TTYs at runtime and enable live rendering only when both stdout and stderr are terminals (or when a --force-tty flag is provided).

When TTY is present, you still need to consider flicker. The safest approach is to redraw at a fixed rate, such as once per second, rather than on every read. That ensures stable output and predictable CPU usage. You should also avoid color unless the terminal supports it or unless the user requests it. The NO_COLOR convention applies here as well.

Output discipline also means separating data output from progress output. For this tool, the core data is the final summary, which can go to stdout. Live updates should go to stderr. This keeps stdout clean for scripts. It also means you should not write any diagnostics to stdout. If you want to provide a --json summary, that should always be written to stdout and be deterministic.

Another aspect is resizing. If the terminal resizes, a single-line render still works, but for multi-line displays you would need to recalc layout. This project can remain single-line, but you should still handle SIGWINCH if you add multi-line output later. Documenting this makes your design future-proof.

How this fit on projects

This concept governs when live updates are enabled, how progress is rendered, and where summary output goes. It also defines the behavior of --no-live or --quiet flags.

Definitions & key terms

  • TTY rendering: Output that assumes a terminal.
  • Carriage return: \r used to return to line start.
  • Output discipline: Keeping stdout clean for data.
  • NO_COLOR: Environment variable disabling colors.

Mental model diagram (ASCII)

TTY? -> yes -> live updates on stderr
    -> no  -> only final summary

How it works (step-by-step)

  1. Check TTY status of stderr (and optionally stdout).
  2. If TTY, print live updates with \r and flush.
  3. If not TTY, disable live updates entirely.
  4. Print final summary to stdout on completion.

Minimal concrete example

$ yes "ping" | stream-viz --rate
Rate: 120000 lines/s | Total: 7000000  # stderr updates

$ yes "ping" | stream-viz --rate | cat
# no live updates, only final summary printed to stdout

Common misconceptions

  • “TTY detection is optional.” -> It prevents corrupted output.
  • “stderr is only for errors.” -> It is also for progress and diagnostics.
  • “Always show color.” -> Color must be optional and TTY-aware.

Check-your-understanding questions

  1. Why should live updates go to stderr?
  2. What happens if you send \r to a file?
  3. How does NO_COLOR affect live output?

Check-your-understanding answers

  1. To keep stdout clean for pipelines and scripts.
  2. The file contains raw control characters and looks corrupted.
  3. It disables color even if TTY is present.

Real-world applications

  • pv and rsync print progress to stderr.
  • Many CLI tools use \r for progress bars only in TTYs.

Where you will apply it

References

  • Command Line Interface Guidelines
  • Advanced Programming in the UNIX Environment, Ch. 18

Key insights

TTY-aware rendering preserves both usability and pipeline correctness.

Summary

Live output must be conditional on TTY presence and separated from data output.

Homework/Exercises to practice the concept

  1. Write a progress loop that updates a single line with \r.
  2. Pipe the output to a file and observe the corruption.
  3. Add a flag to disable live updates.

Solutions to the homework/exercises

  1. Use print!("\rRate: 1000 lines/s") and flush.
  2. Observe raw carriage returns in the file.
  3. Skip rendering when --no-live is set.

2.3 Signal Handling and Graceful Shutdown

Fundamentals

Signals are how the OS communicates with your process. For a streaming CLI, the most important signal is SIGINT (Ctrl+C). A good tool should handle SIGINT gracefully: stop reading, flush any pending output, restore terminal state, and exit with a well-defined code. If you ignore signals, the user experience is jarring and can leave the terminal in a weird state. Signal handling is a key part of production CLI behavior.

Deep Dive into the concept

By default, SIGINT terminates a process immediately. If you are in the middle of writing to the terminal or updating a live display, that can leave an unfinished line or a broken layout. Therefore, you should register a signal handler. The handler should set a flag indicating shutdown and allow the main loop to finish cleanly. This is safer than doing heavy work directly in the signal handler, which is constrained by async-signal-safe rules in many languages.

Graceful shutdown is about state consistency. For stream-viz, you need to output a final summary when the user stops the stream. This should happen both on EOF and on SIGINT. The summary should be deterministic and should include total bytes, total lines, and average rate. If you can, include the elapsed time. When output is scripted, this summary should be machine-readable via --json.

Exit codes are part of the signal contract. Many tools use 130 to indicate termination by SIGINT (128 + signal number). This is consistent with shell conventions. You should document this and ensure your CLI exits with code 130 on SIGINT. For other signals like SIGTERM, you can also exit with 143. These details matter for automation, especially in CI or pipeline monitoring.

Finally, signal handling interacts with timers and threads. If you use a timer thread for periodic updates, you must stop it on shutdown to avoid writing after you decide to exit. Use a shared atomic flag or a cancellation channel. This is a great place to practice clean concurrency shutdown patterns.

How this fit on projects

This concept defines how Ctrl+C behaves, how the final summary is printed, and which exit codes are used. It ensures the tool feels professional in real use.

Definitions & key terms

  • Signal: OS-level notification to a process.
  • SIGINT: Interrupt signal from Ctrl+C.
  • Graceful shutdown: Controlled termination with cleanup.
  • Exit code 130: Common code for SIGINT termination.

Mental model diagram (ASCII)

SIGINT -> set shutdown flag -> stop loop -> print summary -> exit 130

How it works (step-by-step)

  1. Register a signal handler for SIGINT and SIGTERM.
  2. On signal, set a shutdown flag.
  3. Break the read loop and stop timers.
  4. Print final summary to stdout.
  5. Exit with code 130 or 143.

Minimal concrete example

$ yes "ping" | stream-viz
^C
Final summary: 7000000 lines, avg 118000 lines/s
# exit code 130

Common misconceptions

  • “Signals can be ignored.” -> Users expect immediate, clean stop.
  • “Printing inside handler is safe.” -> It can deadlock or corrupt output.
  • “Exit code does not matter.” -> Scripts depend on it.

Check-your-understanding questions

  1. Why should the handler avoid heavy work?
  2. What exit code is typical for SIGINT?
  3. How do timers interact with shutdown?

Check-your-understanding answers

  1. Signal handlers are restricted; heavy work can be unsafe.
  2. 130 (128 + 2).
  3. Timers must be stopped to avoid writing after exit.

Real-world applications

  • curl and rsync handle SIGINT with clean summaries.
  • Streaming agents report totals on shutdown.

Where you will apply it

References

  • Advanced Programming in the UNIX Environment, Ch. 10 (signals)
  • The Linux Programming Interface, signal handling sections

Key insights

Graceful signal handling turns a brittle streaming tool into a reliable one.

Summary

Signals are part of the CLI contract. Handle them explicitly and document exit codes.

Homework/Exercises to practice the concept

  1. Write a program that traps SIGINT and prints a message.
  2. Make a timer that stops when a shutdown flag is set.
  3. Print a final summary on shutdown.

Solutions to the homework/exercises

  1. Use a signal handler and an atomic flag.
  2. Check the flag inside the timer loop.
  3. Print totals before exiting.

3. Project Specification

3.1 What You Will Build

A CLI named stream-viz that reads stdin and reports throughput and line metrics. It prints live updates to stderr when attached to a TTY, and prints a final deterministic summary to stdout on exit. It handles SIGINT gracefully and returns exit code 130 when interrupted.

3.2 Functional Requirements

  1. Read from stdin and process streaming input.
  2. Compute metrics: total bytes, total lines, lines/sec, bytes/sec.
  3. Live updates when stderr is a TTY (configurable).
  4. Final summary printed to stdout.
  5. Signal handling for SIGINT and SIGTERM.
  6. JSON summary mode for scripts.
  7. Deterministic test mode: optional --max-lines to stop after N lines.

3.3 Non-Functional Requirements

  • Performance: handle 50 MB/s without lag.
  • Reliability: no output corruption in pipes.
  • Usability: clear flags and help output.

3.4 Example Usage / Output

$ yes "ping" | stream-viz --rate --lines
Rate: 120000 lines/s | Total: 7234511

3.5 Data Formats / Schemas / Protocols

{"bytes":123456789,"lines":7234511,"seconds":60.0,"lines_per_sec":120000}

3.6 Edge Cases

  • stdin is empty -> summary shows zero counts.
  • stdout is not a TTY -> no live output.
  • SIGINT mid-stream -> summary printed and exit code 130.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

cargo build --release

yes "ping" | ./target/release/stream-viz --rate --lines

3.7.2 Golden Path Demo (Deterministic)

$ STREAM_VIZ_NOW=2026-01-01T00:00:00Z cat ./fixtures/lines_1000.txt | ./target/release/stream-viz --lines --max-lines 1000 --json
{"bytes":8000,"lines":1000,"seconds":1.0,"lines_per_sec":1000}
$ echo $?
0

3.7.3 Failure Demo (Deterministic)

$ ./target/release/stream-viz --rate --lines < /path/does/not/exist
stream-viz: failed to read stdin
$ echo $?
2

3.7.4 Exit Codes

  • 0: Success.
  • 2: Input error.
  • 130: Interrupted by SIGINT.

4. Solution Architecture

4.1 High-Level Design

+------------------+
| Reader           |
+------------------+
          |
          v
+------------------+     +------------------+
| Metrics Counter  | --> | Renderer         |
+------------------+     +------------------+
          |                        |
          v                        v
+------------------+     +------------------+
| Summary Builder  | --> | Exit Handler     |
+------------------+     +------------------+

4.2 Key Components

Component Responsibility Key Decisions
Reader Stream stdin chunked reads with carry buffer
Counter maintain bytes/lines atomic counters
Renderer live output TTY-aware stderr updates
Summary final report stdout JSON or text
Signal handler graceful shutdown exit code mapping

4.3 Data Structures (No Full Code)

struct Metrics {
    bytes: u64,
    lines: u64,
    start_time: Instant,
}

4.4 Algorithm Overview

Key Algorithm: Streaming Metrics

  1. Read chunks from stdin.
  2. Update byte count and line count.
  3. Periodically render rates.
  4. On EOF or SIGINT, print summary.

Complexity Analysis:

  • Time: O(N) bytes.
  • Space: O(B) buffer size.

5. Implementation Guide

5.1 Development Environment Setup

cargo new stream-viz
cd stream-viz

5.2 Project Structure

src/
  main.rs
  reader.rs
  metrics.rs
  render.rs

5.3 The Core Question You’re Answering

“How do I build a streaming CLI that is real-time without breaking pipelines?”

5.4 Concepts You Must Understand First

  1. Streaming I/O and buffering.
  2. TTY-aware rendering rules.
  3. Signal handling and exit codes.

5.5 Questions to Guide Your Design

  1. Should live output go to stderr or stdout?
  2. How do you compute rates deterministically?
  3. What should happen on SIGINT?

5.6 Thinking Exercise

Sketch the loop that reads data and updates rates once per second.

5.7 The Interview Questions They’ll Ask

  1. “Why do progress updates belong on stderr?”
  2. “How do you avoid blocking reads?”
  3. “What exit code indicates SIGINT?”

5.8 Hints in Layers

Hint 1: Start with a line counter Use a buffer and count newlines.

Hint 2: Add a timer Use a ticker to print stats every second.

Hint 3: Add TTY detection Disable live updates when stderr is not a TTY.

Hint 4: Handle SIGINT Set a shutdown flag and print final summary.

5.9 Books That Will Help

Topic Book Chapter
Streaming I/O The Linux Programming Interface Ch. 4
Signals Advanced Programming in the UNIX Environment Ch. 10

5.10 Implementation Phases

Phase 1: Streaming Core (2-3 days)

Goals: read stdin and count bytes/lines.

Phase 2: Live Rendering (2 days)

Goals: TTY-aware updates and rate calculation.

Phase 3: Signal Handling (1-2 days)

Goals: graceful shutdown with summary.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Buffer size 4KB vs 64KB 64KB throughput with reasonable latency.
Render target stdout vs stderr stderr keep stdout clean.
Rate interval 100ms vs 1s 1s stable output.

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests line counting chunk splits
Integration Tests pipeline behavior yes | stream-viz
Edge Case Tests empty input zero lines

6.2 Critical Test Cases

  1. Partial line across chunks counts correctly.
  2. Live output disabled when stderr not TTY.
  3. SIGINT prints summary and exits 130.

6.3 Test Data

fixtures/lines.txt with known line count

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Writing updates to stdout Pipes corrupted Use stderr for live output.
Incorrect line counts Off by one Keep carry buffer.
No SIGINT handling abrupt exit add graceful shutdown.

7.2 Debugging Strategies

  • Use small fixtures with known counts.
  • Log raw chunk sizes in debug mode.
  • Test with and without TTY using redirection.

7.3 Performance Traps

  • Per-line allocations can destroy throughput.

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add bytes/sec reporting.
  • Add --no-live flag.

8.2 Intermediate Extensions

  • Support multi-line status view.
  • Add histogram of line lengths.

8.3 Advanced Extensions

  • Add sampling mode with reservoir sampling.
  • Add TUI view with curses.

9. Real-World Connections

9.1 Industry Applications

  • Stream processors and log shippers use similar metrics.
  • pv and progress tools.

9.3 Interview Relevance

  • Streaming I/O and signal handling are classic OS topics.

10. Resources

10.1 Essential Reading

  • The Linux Programming Interface, Ch. 4
  • Advanced Programming in the UNIX Environment, Ch. 10

10.2 Video Resources

  • “Signals and I/O” lectures

10.3 Tools and Documentation

  • Rust async I/O docs or Go bufio docs

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain chunked reads and partial lines.
  • I can explain why progress output goes to stderr.
  • I can explain SIGINT exit codes.

11.2 Implementation

  • All functional requirements are met.
  • Output is clean in pipelines.
  • SIGINT produces a final summary.

11.3 Growth

  • I can discuss performance trade-offs.
  • I can add a new metric without breaking output.
  • I can demo both TTY and pipe behavior.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Streaming read loop works.
  • Live updates are TTY-aware.
  • Final summary printed on exit.

Full Completion:

  • JSON summary mode.
  • Proper SIGINT exit code.

Excellence (Going Above and Beyond):

  • Multi-line TUI view.
  • Sampling and histogram extensions.