Project 8: Log Aggregator with Tail -f
A log aggregation system that watches multiple log files in real-time (like
tail -f), streams them to a central server, supports filtering/searching, and stores in compressed archives.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | Go |
| Alternative Languages | Rust, C++, Java |
| Difficulty | Level 3: Advanced |
| Time Estimate | 2 weeks |
| Knowledge Area | File I/O, Streaming, Real-time Processing |
| Tooling | None (from scratch) |
| Prerequisites | Completed Projects 1-6. Understand file I/O deeply. Familiar with goroutines for background tasks. |
What You Will Build
A log aggregation system that watches multiple log files in real-time (like tail -f), streams them to a central server, supports filtering/searching, and stores in compressed archives.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Efficient file tailing → maps to file I/O and inotify/fsnotify
- Handling file rotation → maps to detecting file changes and reopening
- Streaming over network → maps to TCP/WebSocket streaming
- Real-time filtering → maps to regex and string processing
Key Concepts
- fsnotify for file watching: fsnotify package documentation
- io.Reader/Writer: “The Go Programming Language” Ch. 7 - Donovan & Kernighan
- Compression (gzip): compress/gzip package documentation
- Streaming protocols: WebSocket or custom TCP
Real-World Outcome
# Agent running on each server:
$ ./logagent --config /etc/logagent.yaml
Watching:
- /var/log/nginx/access.log (streaming)
- /var/log/nginx/error.log (streaming)
- /var/log/app/*.log (watching for new files)
Connected to aggregator at logs.example.com:9000
Streaming...
# Central aggregator:
$ ./logaggregator --port 9000 --storage /var/logs
Listening on :9000
Connected agents: 15
Logs/second: 2,341
Storage used: 12.4 GB
# Query logs:
$ ./logcli search --from "1h ago" --pattern "ERROR" --source "web-*"
[web-01] 2025-01-10 14:32:01 ERROR: Connection refused
[web-03] 2025-01-10 14:32:15 ERROR: Timeout after 30s
[web-01] 2025-01-10 14:33:02 ERROR: Out of memory
# Live tail across all servers:
$ ./logcli tail --source "web-*" --pattern "ERROR"
[web-01] 2025-01-10 14:35:01 ERROR: Connection refused
^C
# Export compressed logs:
$ ./logcli export --date 2025-01-09 --output logs-2025-01-09.gz
Exported 1,234,567 lines (compressed: 45 MB)
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_GO_DEEP_DIVE.md - “The Linux Programming Interface” by Michael Kerrisk