Project 1: Log Analyzer & Alerting System
Build a streaming log analyzer that extracts patterns, aggregates metrics, and triggers alerts.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | Shell (Bash) |
| Prerequisites | grep/awk/sed basics, pipes |
| Key Topics | regex, streaming, aggregation, alerting |
1. Learning Objectives
By completing this project, you will:
- Parse structured and semi-structured logs with regex.
- Build streaming pipelines that scale to large files.
- Aggregate metrics (counts, rates, top-N) with awk.
- Detect anomalies and trigger alerts.
- Produce human-readable reports and machine-friendly outputs.
2. Theoretical Foundation
2.1 Core Concepts
- Log Formats: Common formats include Apache/Nginx, syslog, and custom app logs. Each has a predictable structure.
- Streaming Model: Text tools process line-by-line. This allows real-time monitoring and avoids loading files into memory.
- Regex Filtering: grep and awk rely on regex to identify patterns like error codes or endpoints.
- Aggregation: awk associative arrays turn text into statistics.
- Alert Thresholds: Alerts trigger when rates or counts exceed a threshold within a window.
2.2 Why This Matters
Logs are the ground truth for production systems. This project teaches how to turn raw log streams into actionable insights without heavy tooling.
2.3 Historical Context / Background
The Unix pipeline model was designed for log analysis long before modern observability stacks. Many production incidents are still debugged with grep and awk because they are fast and universal.
2.4 Common Misconceptions
- “Logs are unstructured”: Most logs are highly structured if you look carefully.
- “You must use Python”: awk and grep can handle huge logs faster with less memory.
3. Project Specification
3.1 What You Will Build
A CLI script named logwatch.sh that:
- Reads a log file or STDIN
- Extracts key fields (IP, method, status)
- Produces a summary report
- Triggers alerts when error rates cross thresholds
3.2 Functional Requirements
- Input: Accept file path or STDIN.
- Parsing: Extract IP, method, endpoint, status, response time.
- Aggregation: Count top IPs and error codes.
- Alerting: Trigger alerts when error rate > X%.
- Output: Print report and optional JSON summary.
3.3 Non-Functional Requirements
- Performance: Stream line-by-line.
- Reliability: Handle malformed lines gracefully.
- Usability: Clear output headers and thresholds.
3.4 Example Usage / Output
$ ./logwatch.sh /var/log/nginx/access.log
3.5 Real World Outcome
Running on a real log produces a dashboard-like report:
$ ./logwatch.sh access.log
=========================================
Log Summary: access.log
=========================================
Total requests: 12450
Error rate (4xx/5xx): 3.2%
Top endpoints:
/api/users 3200
/api/login 2500
/api/orders 1900
Top status codes:
200 12050
404 250
500 150
ALERT: Error rate above 3% threshold
If running on a stream (tail -f), you see periodic summaries and live alerts without stopping the pipeline.
4. Solution Architecture
4.1 High-Level Design
+-----------+ +-----------+ +--------------+ +---------+
| Input | --> | Parser | --> | Aggregator | --> | Reporter|
| (file/STDIN) | | (regex) | | (awk counts) | | (stdout)|
+-----------+ +-----------+ +--------------+ +---------+
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Reader | Stream lines | tail vs batch |
| Parser | Extract fields | regex vs awk split |
| Aggregator | Count metrics | associative arrays |
| Reporter | Format output | table vs JSON |
4.3 Data Structures
# awk associative arrays
count_status[status]++
count_endpoint[path]++
4.4 Algorithm Overview
Key Algorithm: Streaming aggregation
- For each line, parse fields with regex or split.
- Increment counters by status and endpoint.
- At end (or interval), compute totals and error rates.
- Print report and alerts.
Complexity Analysis:
- Time: O(N) for N lines
- Space: O(U) for unique endpoints and statuses
5. Implementation Guide
5.1 Development Environment Setup
# No dependencies required; use standard Unix tools
chmod +x logwatch.sh
5.2 Project Structure
logwatch/
├── logwatch.sh
└── README.md
5.3 The Core Question You Are Answering
“How do I transform raw log streams into actionable metrics and alerts using only Unix tools?”
5.4 Concepts You Must Understand First
- Regex extraction with grep/sed
- awk fields and associative arrays
- Pipes and redirection
5.5 Questions to Guide Your Design
- What fields define “error” in your system?
- How often should alerts trigger?
- How do you handle log rotation?
5.6 Thinking Exercise
Take 10 log lines and manually extract IP, status, endpoint. How would awk store each metric?
5.7 The Interview Questions They Will Ask
- Why is streaming better than loading files?
- How do you compute error rate in a stream?
- What are the risks of regex parsing?
5.8 Hints in Layers
Hint 1: Start with counting status codes.
Hint 2: Add endpoint aggregation after parsing works.
Hint 3: Add alert thresholds last.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| awk and sed | “Sed & Awk” | Ch. 7-8 |
| Pipelines | “The Linux Command Line” | Ch. 6 |
5.10 Implementation Phases
Phase 1: Foundation (2-3 days)
Goals:
- Parse status codes
- Count total requests
Checkpoint: Report total and status counts.
Phase 2: Aggregation (3-4 days)
Goals:
- Top endpoints
- Error rate calculation
Checkpoint: Error rate printed.
Phase 3: Alerting (2-3 days)
Goals:
- Threshold-based alerts
- Optional JSON output
Checkpoint: Alerts trigger correctly.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Parsing | regex field extraction |
| Integration Tests | Pipeline | tail -f input |
| Edge Cases | Malformed lines | skip gracefully |
6.2 Critical Test Cases
- Logs with only 200s -> error rate 0.
- Logs with missing fields -> skipped, no crash.
- Error spike triggers alert.
7. Common Pitfalls and Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Regex mismatch | empty metrics | debug with grep -E |
| Memory blowup | huge endpoint set | limit top-N |
| Alert spam | repeated alerts | add cooldown |
8. Extensions and Challenges
8.1 Beginner Extensions
- Add
--top-nflag - Add CSV output
8.2 Intermediate Extensions
- Sliding window error rate
- GeoIP lookup for IPs
8.3 Advanced Extensions
- Export Prometheus metrics
- Alert webhook integration
9. Real-World Connections
- Incident response log triage
- Lightweight monitoring in restricted environments
10. Resources
- GNU awk manual
- Nginx log format docs
11. Self-Assessment Checklist
- I can parse logs into fields reliably
- I can compute error rates in a stream
12. Submission / Completion Criteria
Minimum Viable Completion:
- Parse logs and print totals
Full Completion:
- Aggregations + alerts
Excellence (Going Above and Beyond):
- Sliding window, webhook, or Prometheus output