Project 1: Log Analyzer & Alerting System

Build a streaming log analyzer that extracts patterns, aggregates metrics, and triggers alerts.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language Shell (Bash)
Prerequisites grep/awk/sed basics, pipes
Key Topics regex, streaming, aggregation, alerting

1. Learning Objectives

By completing this project, you will:

  1. Parse structured and semi-structured logs with regex.
  2. Build streaming pipelines that scale to large files.
  3. Aggregate metrics (counts, rates, top-N) with awk.
  4. Detect anomalies and trigger alerts.
  5. Produce human-readable reports and machine-friendly outputs.

2. Theoretical Foundation

2.1 Core Concepts

  • Log Formats: Common formats include Apache/Nginx, syslog, and custom app logs. Each has a predictable structure.
  • Streaming Model: Text tools process line-by-line. This allows real-time monitoring and avoids loading files into memory.
  • Regex Filtering: grep and awk rely on regex to identify patterns like error codes or endpoints.
  • Aggregation: awk associative arrays turn text into statistics.
  • Alert Thresholds: Alerts trigger when rates or counts exceed a threshold within a window.

2.2 Why This Matters

Logs are the ground truth for production systems. This project teaches how to turn raw log streams into actionable insights without heavy tooling.

2.3 Historical Context / Background

The Unix pipeline model was designed for log analysis long before modern observability stacks. Many production incidents are still debugged with grep and awk because they are fast and universal.

2.4 Common Misconceptions

  • “Logs are unstructured”: Most logs are highly structured if you look carefully.
  • “You must use Python”: awk and grep can handle huge logs faster with less memory.

3. Project Specification

3.1 What You Will Build

A CLI script named logwatch.sh that:

  • Reads a log file or STDIN
  • Extracts key fields (IP, method, status)
  • Produces a summary report
  • Triggers alerts when error rates cross thresholds

3.2 Functional Requirements

  1. Input: Accept file path or STDIN.
  2. Parsing: Extract IP, method, endpoint, status, response time.
  3. Aggregation: Count top IPs and error codes.
  4. Alerting: Trigger alerts when error rate > X%.
  5. Output: Print report and optional JSON summary.

3.3 Non-Functional Requirements

  • Performance: Stream line-by-line.
  • Reliability: Handle malformed lines gracefully.
  • Usability: Clear output headers and thresholds.

3.4 Example Usage / Output

$ ./logwatch.sh /var/log/nginx/access.log

3.5 Real World Outcome

Running on a real log produces a dashboard-like report:

$ ./logwatch.sh access.log
=========================================
Log Summary: access.log
=========================================
Total requests: 12450
Error rate (4xx/5xx): 3.2%

Top endpoints:
  /api/users      3200
  /api/login      2500
  /api/orders     1900

Top status codes:
  200  12050
  404    250
  500    150

ALERT: Error rate above 3% threshold

If running on a stream (tail -f), you see periodic summaries and live alerts without stopping the pipeline.


4. Solution Architecture

4.1 High-Level Design

+-----------+     +-----------+     +--------------+     +---------+
| Input     | --> | Parser    | --> | Aggregator   | --> | Reporter|
| (file/STDIN) |  | (regex)  |     | (awk counts) |     | (stdout)|
+-----------+     +-----------+     +--------------+     +---------+

4.2 Key Components

Component Responsibility Key Decisions
Reader Stream lines tail vs batch
Parser Extract fields regex vs awk split
Aggregator Count metrics associative arrays
Reporter Format output table vs JSON

4.3 Data Structures

# awk associative arrays
count_status[status]++
count_endpoint[path]++

4.4 Algorithm Overview

Key Algorithm: Streaming aggregation

  1. For each line, parse fields with regex or split.
  2. Increment counters by status and endpoint.
  3. At end (or interval), compute totals and error rates.
  4. Print report and alerts.

Complexity Analysis:

  • Time: O(N) for N lines
  • Space: O(U) for unique endpoints and statuses

5. Implementation Guide

5.1 Development Environment Setup

# No dependencies required; use standard Unix tools
chmod +x logwatch.sh

5.2 Project Structure

logwatch/
├── logwatch.sh
└── README.md

5.3 The Core Question You Are Answering

“How do I transform raw log streams into actionable metrics and alerts using only Unix tools?”

5.4 Concepts You Must Understand First

  1. Regex extraction with grep/sed
  2. awk fields and associative arrays
  3. Pipes and redirection

5.5 Questions to Guide Your Design

  1. What fields define “error” in your system?
  2. How often should alerts trigger?
  3. How do you handle log rotation?

5.6 Thinking Exercise

Take 10 log lines and manually extract IP, status, endpoint. How would awk store each metric?

5.7 The Interview Questions They Will Ask

  1. Why is streaming better than loading files?
  2. How do you compute error rate in a stream?
  3. What are the risks of regex parsing?

5.8 Hints in Layers

Hint 1: Start with counting status codes.

Hint 2: Add endpoint aggregation after parsing works.

Hint 3: Add alert thresholds last.

5.9 Books That Will Help

Topic Book Chapter
awk and sed “Sed & Awk” Ch. 7-8
Pipelines “The Linux Command Line” Ch. 6

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Parse status codes
  • Count total requests

Checkpoint: Report total and status counts.

Phase 2: Aggregation (3-4 days)

Goals:

  • Top endpoints
  • Error rate calculation

Checkpoint: Error rate printed.

Phase 3: Alerting (2-3 days)

Goals:

  • Threshold-based alerts
  • Optional JSON output

Checkpoint: Alerts trigger correctly.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Parsing regex field extraction
Integration Tests Pipeline tail -f input
Edge Cases Malformed lines skip gracefully

6.2 Critical Test Cases

  1. Logs with only 200s -> error rate 0.
  2. Logs with missing fields -> skipped, no crash.
  3. Error spike triggers alert.

7. Common Pitfalls and Debugging

Pitfall Symptom Solution
Regex mismatch empty metrics debug with grep -E
Memory blowup huge endpoint set limit top-N
Alert spam repeated alerts add cooldown

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add --top-n flag
  • Add CSV output

8.2 Intermediate Extensions

  • Sliding window error rate
  • GeoIP lookup for IPs

8.3 Advanced Extensions

  • Export Prometheus metrics
  • Alert webhook integration

9. Real-World Connections

  • Incident response log triage
  • Lightweight monitoring in restricted environments

10. Resources

  • GNU awk manual
  • Nginx log format docs

11. Self-Assessment Checklist

  • I can parse logs into fields reliably
  • I can compute error rates in a stream

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Parse logs and print totals

Full Completion:

  • Aggregations + alerts

Excellence (Going Above and Beyond):

  • Sliding window, webhook, or Prometheus output