LEARNING TIME SERIES DATABASES
Learning Time Series Databases: A Project-Based Journey
Introduction
Time Series Databases (TSDBs) are specialized storage systems optimized for time-stamped data—metrics, logs, IoT sensor readings, financial ticks, and monitoring telemetry. Unlike traditional databases that optimize for random access and updates, TSDBs are built around a fundamental insight: time is the primary query dimension, and data is almost always appended, never updated.
To truly understand TSDBs, you need to grapple with their core challenges:
- Write-Heavy Workloads: Millions of data points per second, append-only
- Time-Range Queries: “Give me all CPU metrics between 2pm and 3pm”
- Compression: Adjacent values are often similar—exploit this
- High Cardinality: Millions of unique metric/tag combinations
- Retention & Downsampling: Old data is less valuable, can be aggregated
- Efficient Aggregations: Sum, avg, percentiles over massive datasets
This guide provides 15 projects that progressively build your understanding from basic concepts to implementing a full-featured TSDB.
Core Concept Analysis
What Makes Time Series Data Special?
Traditional DB Row: | user_id | name | email | created_at |
Time Series Point: | timestamp | metric_name | tags{} | value |
Key characteristics:
- Immutable writes: Data points are rarely updated after insertion
- Sequential access: Queries almost always involve time ranges
- High write throughput: Systems may ingest millions of points/second
- Predictable patterns: Values often change slowly between adjacent timestamps
The TSDB Stack
┌─────────────────────────────────────────────┐
│ Query Engine │
│ (PromQL, InfluxQL, SQL) │
├─────────────────────────────────────────────┤
│ Index Layer │
│ (Tag/Label → Series ID mapping) │
├─────────────────────────────────────────────┤
│ Compression Layer │
│ (Gorilla, Delta, Run-Length Encoding) │
├─────────────────────────────────────────────┤
│ Storage Engine │
│ (TSM Tree, Blocks, LSM variants) │
├─────────────────────────────────────────────┤
│ Write-Ahead Log (WAL) │
│ (Durability layer) │
└─────────────────────────────────────────────┘
Project 1: In-Memory Time Series Store
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Zig
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 2: Intermediate (The Developer)
- Knowledge Area: Data Structures / Time Series Fundamentals
- Software or Tool: Custom Time Series Store
- Main Book: “C Interfaces and Implementations” by David R. Hanson
What you’ll build
An in-memory time series store that can ingest data points, store them efficiently by series, and answer basic time-range queries.
Why it teaches Time Series Databases
This is your foundation. You’ll understand the fundamental data model: series identification (metric name + tags), timestamps, and values. You’ll see why organizing data by series (not by insertion order) is crucial for query performance.
Core challenges you’ll face
- Series identification (how do you uniquely identify “cpu.usage{host=server1,region=us-east}”?) → maps to tag indexing
- Efficient time-range lookups (binary search vs. linear scan) → maps to time-based indexing
- Memory layout decisions (array of structs vs. struct of arrays) → maps to columnar storage
- Handling out-of-order writes (sensor data arrives late) → maps to write buffering
Key Concepts
- Time Series Data Model: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann
- Hash Tables for Series Lookup: “Algorithms” Chapter 3.4 - Robert Sedgewick
- Binary Search for Time Ranges: “C Programming: A Modern Approach” Chapter 17 - K.N. King
- Memory Layout: “Computer Systems: A Programmer’s Perspective” Chapter 6 - Bryant & O’Hallaron
Difficulty
Beginner-Intermediate
Time estimate
1 week
Prerequisites
Basic C, understanding of arrays and hash tables
Real world outcome
$ ./tsdb-mem
> INSERT cpu.usage host=server1 region=us-east 1703001600 0.75
OK
> INSERT cpu.usage host=server1 region=us-east 1703001610 0.82
OK
> INSERT cpu.usage host=server2 region=us-west 1703001600 0.45
OK
> QUERY cpu.usage{host=server1} FROM 1703001600 TO 1703001620
[1703001600, 0.75]
[1703001610, 0.82]
2 points returned in 0.001ms
Implementation Hints
The core data structure is a hash map where:
- Key: A “series key” combining metric name and sorted tags (e.g.,
cpu.usage|host=server1|region=us-east) - Value: A dynamic array of (timestamp, value) pairs, kept sorted by timestamp
Pseudo-code structure:
STRUCT DataPoint:
timestamp: int64
value: float64
STRUCT Series:
key: string
points: DynamicArray<DataPoint>
STRUCT TimeSeriesStore:
series_map: HashMap<string, Series>
FUNCTION insert(metric, tags, timestamp, value):
series_key = build_series_key(metric, tags)
IF series_key NOT IN series_map:
series_map[series_key] = new Series(series_key)
series = series_map[series_key]
insert_sorted(series.points, DataPoint(timestamp, value))
FUNCTION query(metric, tags, start_time, end_time):
series_key = build_series_key(metric, tags)
series = series_map[series_key]
start_idx = binary_search_lower_bound(series.points, start_time)
end_idx = binary_search_upper_bound(series.points, end_time)
RETURN series.points[start_idx:end_idx]
Critical insight: Building the series key requires sorting the tags so that {host=a,region=b} and {region=b,host=a} produce the same key.
Learning milestones
- Basic insert/query works → You understand the series concept
- Binary search time-range queries are fast → You understand why time ordering matters
- Handle 1M+ points without slowdown → You understand memory layout implications
Project 2: Delta Encoding Compressor
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 2: Intermediate (The Developer)
- Knowledge Area: Compression / Data Encoding
- Software or Tool: Compression Library
- Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron
What you’ll build
A compression library that implements delta encoding for timestamps and values, reducing time series data by 60-80%.
Why it teaches Time Series Databases
Compression is the defining characteristic of TSDBs. Without it, storing billions of data points would be prohibitively expensive. You’ll understand why time series data compresses so well: adjacent values are similar.
Core challenges you’ll face
- Delta encoding for timestamps (store differences, not absolutes) → maps to temporal locality
- Delta-of-delta encoding (when timestamps are evenly spaced) → maps to pattern exploitation
- Variable-length integer encoding (small deltas need fewer bytes) → maps to bit-level efficiency
- Handling edge cases (large jumps, negative deltas) → maps to robustness
Key Concepts
- Delta Encoding Fundamentals: Delta Encoding Tutorial - smarx.com
- Variable-Length Integers: “Computer Systems: A Programmer’s Perspective” Chapter 2 - Bryant & O’Hallaron
- Bit Manipulation: “The Art of Computer Programming Vol 4A” Section 7.1.3 - Knuth
- Compression Theory: Time Series Compression Algorithms Explained - TigerData
Difficulty
Intermediate
Time estimate
1 week
Prerequisites
Bit manipulation, understanding of binary representation
Real world outcome
$ ./delta-compress input.bin output.compressed
Original size: 1,000,000 bytes (125,000 data points)
Compressed size: 234,567 bytes
Compression ratio: 4.26x
Timestamps: avg delta = 10s, avg bits/delta = 4.2
Values: avg delta = 0.003, avg bits/delta = 12.1
$ ./delta-decompress output.compressed recovered.bin
Decompressed 125,000 points
Verification: PASSED (bit-exact match)
Implementation Hints
Timestamp delta encoding:
ORIGINAL TIMESTAMPS: [1000, 1010, 1020, 1030, 1040]
DELTA ENCODED: [1000, 10, 10, 10, 10] # First value + deltas
Delta-of-delta (for regular intervals):
DELTAS: [10, 10, 10, 10]
DELTA-OF-DELTA: [10, 0, 0, 0] # First delta + differences
Variable-length integer encoding (Simple8b or varint):
- If delta fits in 7 bits: use 1 byte
- If delta fits in 14 bits: use 2 bytes
- And so on…
Pseudo-code:
FUNCTION encode_timestamps(timestamps):
output = BitStream()
output.write_int64(timestamps[0]) # First timestamp in full
prev_delta = 0
FOR i FROM 1 TO len(timestamps):
delta = timestamps[i] - timestamps[i-1]
delta_of_delta = delta - prev_delta
output.write_varint(zigzag_encode(delta_of_delta))
prev_delta = delta
RETURN output
FUNCTION zigzag_encode(n):
# Maps negative numbers to positive: 0,-1,1,-2,2 → 0,1,2,3,4
IF n >= 0: RETURN 2 * n
ELSE: RETURN 2 * abs(n) - 1
Critical insight: Zig-zag encoding is essential for handling negative deltas efficiently in variable-length encoding.
Learning milestones
- Basic delta encoding works → You understand the core concept
- Variable-length integers reduce size further → You understand bit-level optimization
- Round-trip (encode → decode) is bit-exact → You’ve built production-quality code
Project 3: Gorilla Compression (Facebook Paper)
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Java
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Compression / Floating Point / Bit Manipulation
- Software or Tool: Gorilla Compression Library
- Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron
What you’ll build
A complete implementation of Facebook’s Gorilla compression algorithm, achieving ~12x compression on real-world metrics data.
Why it teaches Time Series Databases
This is the industry-standard compression for floating-point time series. By implementing it, you’ll understand why 51% of values compress to a single bit and how to exploit the structure of IEEE 754 floating-point numbers.
Core challenges you’ll face
- XOR-based floating point compression (adjacent floats have similar bit patterns) → maps to floating point internals
- Leading/trailing zero counting (efficiently encode the XOR result) → maps to bit manipulation mastery
- Block-based encoding (headers, data layout) → maps to format design
- Streaming decode (decompress without loading entire block) → maps to memory efficiency
Resources for key challenges
- Original Gorilla Paper: Gorilla: A Fast, Scalable, In-Memory Time Series Database - Facebook/VLDB 2015
- Paper Summary: The Morning Paper - Gorilla - Adrian Colyer
- Go Implementation Reference: go-tsz - Damian Gryski
Key Concepts
- IEEE 754 Floating Point: “Computer Systems: A Programmer’s Perspective” Chapter 2.4 - Bryant & O’Hallaron
- XOR Properties: “The Art of Computer Programming Vol 4A” - Knuth
- Bit Stream Implementation: “Practical Binary Analysis” Chapter 2 - Dennis Andriesse
- Compression Block Design: Gorilla Paper Section 4.1
Difficulty
Advanced
Time estimate
2-3 weeks
Prerequisites
Strong bit manipulation, understanding of IEEE 754 floating point
Real world outcome
$ ./gorilla-compress metrics.raw metrics.gorilla
Input: 8,000,000 bytes (500,000 timestamp-value pairs)
Output: 685,000 bytes
Compression ratio: 11.68x
Breakdown:
Timestamps: 1.03 bytes/point (delta-of-delta encoding)
Values: 0.34 bytes/point (XOR encoding)
- 51.2% compressed to 1 bit (identical to previous)
- 29.8% compressed with '10' prefix (avg 26.6 bits)
- 19.0% compressed with '11' prefix (avg 36.2 bits)
$ ./gorilla-decompress metrics.gorilla recovered.raw
Decompressed 500,000 points in 12.3ms
Verification: PASSED
Implementation Hints
The XOR insight: When two floating-point values are similar, their XOR has many leading and trailing zeros.
Value 1: 0x40490FDB (3.14159...)
Value 2: 0x40490FDA (3.14158...)
XOR result: 0x00000001 (only 1 bit differs!)
Encoding the XOR result:
FUNCTION encode_value(current, previous):
xor = current XOR previous
IF xor == 0:
write_bit(0) # Values identical - just 1 bit!
RETURN
write_bit(1) # Values differ
leading_zeros = count_leading_zeros(xor)
trailing_zeros = count_trailing_zeros(xor)
meaningful_bits = 64 - leading_zeros - trailing_zeros
IF leading_zeros >= prev_leading AND trailing_zeros >= prev_trailing:
# Fits in previous window
write_bit(0)
write_bits(xor >> prev_trailing, prev_meaningful_bits)
ELSE:
# New window
write_bit(1)
write_bits(leading_zeros, 5) # 5 bits for leading zeros (0-31)
write_bits(meaningful_bits, 6) # 6 bits for length (0-63)
write_bits(xor >> trailing_zeros, meaningful_bits)
Block structure:
┌──────────────────────────────────────────┐
│ Header: timestamp_start (64 bits) │
├──────────────────────────────────────────┤
│ First value (64 bits, uncompressed) │
├──────────────────────────────────────────┤
│ Compressed (timestamp, value) pairs... │
│ Each pair: delta-of-delta + XOR encoded │
└──────────────────────────────────────────┘
Critical insight: The algorithm exploits two properties: (1) timestamps arrive at regular intervals, (2) metric values change slowly. This is why it works so well for monitoring data.
Learning milestones
- XOR encoding produces small bit counts → You understand floating point similarity
- Block compression achieves 10x+ ratio → You’ve matched Facebook’s results
- Streaming decompression works → You understand how to read bit-aligned data
Project 4: Write-Ahead Log (WAL) for Time Series
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Zig
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Durability / Storage Systems
- Software or Tool: WAL Implementation
- Main Book: “Operating Systems: Three Easy Pieces” by Remzi H. Arpaci-Dusseau
What you’ll build
A write-ahead log that ensures durability for time series writes, with crash recovery and segment rotation.
Why it teaches Time Series Databases
Every production TSDB uses a WAL. You’ll understand the fundamental trade-off: write to a sequential log first (fast), then organize data later (async). This is why TSDBs can handle millions of writes per second.
Core challenges you’ll face
- Sequential write optimization (why appending is fast) → maps to disk I/O patterns
- Segment rotation (WAL files can’t grow forever) → maps to file management
- Crash recovery (replay WAL to rebuild state) → maps to durability guarantees
- Checkpointing (truncate WAL after data is persisted) → maps to space reclamation
- CRC checksums (detect corruption) → maps to data integrity
Key Concepts
- Write-Ahead Logging: “Operating Systems: Three Easy Pieces” Chapter 42 - Arpaci-Dusseau
- fsync and Durability: “The Linux Programming Interface” Chapter 13 - Michael Kerrisk
- CRC Checksums: “Computer Networks” Chapter 3.2 - Tanenbaum
- Segment Files: InfluxDB WAL Documentation
Difficulty
Advanced
Time estimate
2 weeks
Prerequisites
File I/O, understanding of durability concepts
Real world outcome
$ ./wal-demo
Starting WAL demo...
[Write Phase]
Wrote 100,000 data points to WAL
WAL segments: wal-00001.log (16MB), wal-00002.log (16MB)
Write throughput: 245,000 points/second
[Simulated Crash]
Killing process without clean shutdown...
[Recovery Phase]
$ ./wal-demo --recover
Scanning WAL segments...
wal-00001.log: 65,432 valid entries, 0 corrupt
wal-00002.log: 34,568 valid entries, 0 corrupt
Recovered 100,000 data points in 234ms
In-memory state rebuilt successfully!
[Checkpoint Phase]
Checkpointing WAL after persistent storage write...
Truncated wal-00001.log (data now in TSM files)
Implementation Hints
WAL entry format:
┌─────────────────────────────────────────────┐
│ Length (4 bytes) │
├─────────────────────────────────────────────┤
│ CRC32 checksum (4 bytes) │
├─────────────────────────────────────────────┤
│ Entry type (1 byte): WRITE | DELETE | ... │
├─────────────────────────────────────────────┤
│ Payload (variable length) │
│ - Series key │
│ - Timestamp │
│ - Value │
└─────────────────────────────────────────────┘
Segment rotation pseudo-code:
STRUCT WAL:
current_segment: File
segment_number: int
segment_size: int
max_segment_size: int = 16MB
FUNCTION write_entry(entry):
serialized = serialize(entry)
crc = crc32(serialized)
IF segment_size + len(serialized) > max_segment_size:
rotate_segment()
write(current_segment, length || crc || serialized)
fsync(current_segment) # Critical for durability!
segment_size += len(serialized)
FUNCTION rotate_segment():
close(current_segment)
segment_number += 1
current_segment = open("wal-{segment_number}.log", CREATE)
segment_size = 0
Crash recovery:
FUNCTION recover():
state = empty HashMap
FOR EACH segment IN sorted(glob("wal-*.log")):
FOR EACH entry IN read_entries(segment):
IF verify_crc(entry):
apply_to_state(state, entry)
ELSE:
LOG("Corrupt entry detected, stopping replay")
BREAK
RETURN state
Critical insight: The WAL must call fsync() to guarantee durability. Without it, data can be lost on crash even if write() returned successfully.
Learning milestones
- Sequential writes hit 200K+ points/second → You understand why WAL is fast
- Recovery rebuilds state after kill -9 → You understand durability
- Segment rotation prevents unbounded growth → You understand operational concerns
Project 5: Inverted Index for Time Series Tags
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Java
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Indexing / Search
- Software or Tool: Tag Index
- Main Book: “Algorithms” by Robert Sedgewick
What you’ll build
An inverted index that maps tag values to series IDs, enabling fast queries like “find all series where region=us-east”.
Why it teaches Time Series Databases
Without an index, finding series matching a tag filter requires scanning all series—impossibly slow at scale. You’ll understand why TSDBs need two types of indexes: one for time (within a series) and one for tags (across series).
Core challenges you’ll face
- Inverted index construction (tag value → set of series IDs) → maps to search engine fundamentals
- Set intersection (combine multiple tag filters efficiently) → maps to query optimization
- Bitmap vs. sorted array (space/time trade-offs) → maps to data structure selection
- Index persistence (store index to disk) → maps to serialization
- Index updates (new series arrive continuously) → maps to incremental indexing
Key Concepts
- Inverted Indexes: “Algorithms” Chapter 5.1 - Sedgewick
- Roaring Bitmaps: RoaringBitmap Documentation - Daniel Lemire
- Set Intersection Algorithms: “Algorithms” Chapter 5.2 - Sedgewick
- Postings Lists: Prometheus TSDB Internals
Difficulty
Advanced
Time estimate
2 weeks
Prerequisites
Hash tables, set operations, basic understanding of search
Real world outcome
$ ./tag-index-demo
Building index from 1,000,000 series...
Index stats:
Unique tag keys: 15
Unique tag values: 45,230
Index size: 12.4 MB (bitmaps)
Query: region=us-east
Matching series: 125,432
Query time: 0.8ms
Query: region=us-east AND env=production
Matching series: 41,234
Query time: 1.2ms (intersection of 2 postings lists)
Query: region=us-east AND env=production AND service=api
Matching series: 8,432
Query time: 1.5ms (intersection of 3 postings lists)
Throughput: 650 queries/second
Implementation Hints
Index structure:
STRUCT TagIndex:
# Maps "label_name=label_value" → set of series IDs
postings: HashMap<string, RoaringBitmap>
# Maps series ID → series key (for result retrieval)
series_by_id: HashMap<int, string>
next_series_id: int
FUNCTION add_series(series_key, tags):
series_id = next_series_id++
series_by_id[series_id] = series_key
FOR EACH (key, value) IN tags:
posting_key = "{key}={value}"
IF posting_key NOT IN postings:
postings[posting_key] = new RoaringBitmap()
postings[posting_key].add(series_id)
Query execution:
FUNCTION query(filters):
# filters is like [("region", "us-east"), ("env", "production")]
IF len(filters) == 0:
RETURN all_series_ids()
# Start with smallest postings list (optimization)
sorted_filters = sort_by_postings_size(filters)
result = postings["{sorted_filters[0].key}={sorted_filters[0].value}"]
FOR EACH filter IN sorted_filters[1:]:
posting_key = "{filter.key}={filter.value}"
result = result.intersection(postings[posting_key])
IF result.is_empty():
RETURN empty # Early termination
RETURN result
Critical insight: Roaring Bitmaps are essential for production TSDBs. They compress well and support fast set operations. For this project, you can start with sorted arrays and implement intersection via merge-join, then upgrade to Roaring Bitmaps.
Learning milestones
- Simple tag queries work → You understand inverted indexes
- Multi-tag intersection is fast → You understand query optimization
- Index scales to 1M+ series → You understand the cardinality challenge
Project 6: Block-Based Storage Engine (Prometheus-Style)
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, C, C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert (The Systems Architect)
- Knowledge Area: Storage Engines / Database Internals
- Software or Tool: Block Storage Engine
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
A Prometheus-style storage engine with a write-ahead head block, immutable on-disk blocks, and compaction.
Why it teaches Time Series Databases
Prometheus’s TSDB is elegantly simple yet production-ready. You’ll understand the hybrid architecture: in-memory head block for recent data, immutable blocks for historical data, with compaction merging blocks over time.
Core challenges you’ll face
- Head block management (in-memory buffer with WAL backing) → maps to write path optimization
- Block creation (flush head to immutable on-disk block) → maps to data lifecycle
- Chunk encoding (compress samples within chunks) → maps to storage efficiency
- Index file format (series → chunk references) → maps to read path optimization
- Compaction (merge adjacent blocks) → maps to storage maintenance
Resources for key challenges
- Prometheus TSDB Design: Prometheus Storage Documentation
- Block Architecture: Prometheus TSDB Explained - Groundcover
- Deep Dive: Prometheus TSDB Internals
Key Concepts
- Block Structure: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann
- Compaction Strategies: Prometheus Storage Architecture - Palark
- Memory-Mapped Files: “The Linux Programming Interface” Chapter 49 - Michael Kerrisk
- Chunk Format: Prometheus source code (pkg/chunkenc)
Difficulty
Expert
Time estimate
3-4 weeks
Prerequisites
WAL implementation (Project 4), compression (Projects 2-3), strong systems programming
Real world outcome
$ ./prometheus-style-tsdb
Starting TSDB with 2h block range...
[Ingestion - 1 hour]
Ingested 5,000,000 samples across 10,000 series
Head block: 180MB in-memory, WAL: 45MB on disk
[Block Creation - at 2h mark]
Flushing head block to disk...
Created block: 01HQKV3X5M2N4P6R8S0T
Time range: 1703001600 - 1703008800
Samples: 5,000,000
Size on disk: 23MB (7.8x compression)
Chunks: 10,000
Index size: 1.2MB
[Query]
> http_requests_total{method="GET"}[5m]
Scanned: 1 block, 45 chunks
Returned: 30 samples
Query time: 4.2ms
[Compaction - later]
Compacting blocks: 01HQKV3X5M..., 01HQKV7Y9N...
Created merged block: 01HQKZ2A8B
Time range: 1703001600 - 1703016000 (4 hours)
Reduced from 2 blocks (46MB) to 1 block (42MB)
Implementation Hints
Block directory structure:
data/
├── wal/
│ ├── 00000001
│ └── 00000002
├── 01HQKV3X5M2N4P6R8S0T/ # Block 1
│ ├── meta.json
│ ├── index
│ ├── chunks/
│ │ ├── 000001
│ │ └── 000002
│ └── tombstones
└── 01HQKV7Y9N3O5Q7R9S1U/ # Block 2
└── ...
meta.json:
{
"ulid": "01HQKV3X5M2N4P6R8S0T",
"minTime": 1703001600000,
"maxTime": 1703008800000,
"stats": {
"numSamples": 5000000,
"numSeries": 10000,
"numChunks": 10000
},
"compaction": {
"level": 1,
"sources": []
}
}
Head block pseudo-code:
STRUCT HeadBlock:
series: HashMap<SeriesID, MemSeries>
wal: WAL
min_time: int64
max_time: int64
STRUCT MemSeries:
labels: LabelSet
chunks: List<MemChunk>
STRUCT MemChunk:
samples: List<(timestamp, value)>
min_time: int64
max_time: int64
FUNCTION append(series_id, timestamp, value):
wal.write(series_id, timestamp, value)
series = get_or_create_series(series_id)
chunk = series.head_chunk()
IF chunk.is_full():
chunk = series.create_new_chunk()
chunk.append(timestamp, value)
update_time_bounds(timestamp)
Block creation (compaction from head):
FUNCTION create_block(head, block_dir):
block_id = generate_ulid()
# Create chunks file
chunks_file = open("{block_dir}/chunks/000001")
chunk_refs = []
FOR EACH series IN head.series:
FOR EACH chunk IN series.chunks:
compressed = gorilla_compress(chunk)
offset = chunks_file.write(compressed)
chunk_refs.append(ChunkRef(series.id, chunk.min_time, offset))
# Create index file
write_index(block_dir, head.series, chunk_refs)
# Write metadata
write_meta_json(block_dir, block_id, time_range, stats)
Critical insight: The block structure separates chunks (actual data) from index (series → chunk mapping). This allows efficient queries: first find relevant chunks via index, then only read those chunks.
Learning milestones
- Head block accepts writes → You understand the write path
- Blocks persist to disk → You understand block creation
- Queries span head + blocks → You understand the read path
- Compaction merges blocks → You understand storage maintenance
Project 7: TSM Tree Storage Engine (InfluxDB-Style)
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert (The Systems Architect)
- Knowledge Area: LSM Trees / Storage Engines
- Software or Tool: TSM Storage Engine
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
A Time Structured Merge Tree (TSM) storage engine, inspired by InfluxDB’s architecture—a specialized LSM tree for time series.
Why it teaches Time Series Databases
TSM is the production storage engine behind InfluxDB. You’ll understand how LSM trees adapt to time series workloads: write-optimized ingestion, background compaction, and columnar storage for efficient compression.
Core challenges you’ll face
- Cache + WAL architecture (in-memory cache backed by WAL) → maps to write optimization
- TSM file format (series-organized, compressed blocks) → maps to read optimization
- Compaction levels (merge small files into larger ones) → maps to LSM fundamentals
- Index within TSM files (find data by series key + time range) → maps to efficient lookups
- Cache eviction (flush to TSM when memory limit reached) → maps to resource management
Resources for key challenges
- TSM Architecture: InfluxDB Storage Engine Internals - InfluxData
- TSM Documentation: InfluxDB Storage Engine Docs
- LSM Trees: How to Build an LSM Tree from Scratch - FreeCodeCamp
Key Concepts
- LSM Tree Fundamentals: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann
- Compaction Strategies: LSM Compaction Mechanism - Alibaba Cloud
- Write Amplification: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann
- Bloom Filters: “Algorithms” Chapter 5 - Sedgewick
Difficulty
Expert
Time estimate
4-5 weeks
Prerequisites
Project 4 (WAL), Project 3 (Gorilla compression), understanding of LSM trees
Real world outcome
$ ./tsm-engine
TSM Storage Engine initialized
Cache size limit: 256MB
TSM file target size: 256MB
[Write Phase]
Writing 10,000,000 points across 50,000 series...
Cache usage: 245MB
WAL segments: 3
[Flush Phase - cache full]
Flushing cache to TSM file...
Created: data/000001.tsm (98MB compressed)
Series: 50,000
Points: 10,000,000
Compression ratio: 2.5x
[Compaction Phase]
Level 1 compaction: 000001.tsm + 000002.tsm + 000003.tsm + 000004.tsm
→ 000005.tsm (Level 2, 320MB)
Write amplification avoided: 3.2x
[Query Phase]
Query: SELECT * FROM cpu WHERE host='server1' AND time > now() - 1h
Files scanned: 1 TSM (using index)
Points returned: 3,600
Query time: 2.1ms
Implementation Hints
TSM file structure:
┌─────────────────────────────────────────────────────┐
│ TSM File │
├─────────────────────────────────────────────────────┤
│ Block 1: Series A, timestamps 1000-2000 │
│ - Compressed timestamps (delta-of-delta) │
│ - Compressed values (Gorilla XOR) │
├─────────────────────────────────────────────────────┤
│ Block 2: Series A, timestamps 2001-3000 │
├─────────────────────────────────────────────────────┤
│ Block 3: Series B, timestamps 1000-2000 │
├─────────────────────────────────────────────────────┤
│ ... more blocks ... │
├─────────────────────────────────────────────────────┤
│ Index: │
│ Series A → [Block 1 offset, Block 2 offset] │
│ Series B → [Block 3 offset] │
├─────────────────────────────────────────────────────┤
│ Footer: Index offset, magic number │
└─────────────────────────────────────────────────────┘
Cache structure:
STRUCT Cache:
entries: HashMap<SeriesKey, SortedList<(timestamp, value)>>
size_bytes: int
max_size_bytes: int
FUNCTION write(series_key, timestamp, value):
wal.append(series_key, timestamp, value)
IF series_key NOT IN entries:
entries[series_key] = new SortedList()
entries[series_key].insert(timestamp, value)
size_bytes += POINT_SIZE
IF size_bytes >= max_size_bytes:
flush_to_tsm()
Compaction levels:
Level 1: Small files from cache flush (~50MB each)
Level 2: Merged from 4 Level-1 files (~200MB)
Level 3: Merged from 4 Level-2 files (~800MB)
Level 4: Merged from 4 Level-3 files (~3GB)
Compaction trigger: When 4+ files exist at a level
Critical insight: Unlike Prometheus blocks (time-partitioned), TSM files contain data from all time ranges for the series they cover. This means queries must check the index of every TSM file, which is why bloom filters become important at scale.
Learning milestones
- Cache flushes produce valid TSM files → You understand the write path
- Queries use TSM index correctly → You understand the read path
- Compaction reduces file count → You understand LSM maintenance
- Benchmark shows 100K+ writes/sec → You’ve built production-quality code
Project 8: Time-Based Sharding System
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, Python, Java
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Distributed Systems / Partitioning
- Software or Tool: Time-Based Shard Manager
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
A sharding layer that partitions time series data by time windows (hour, day, week), enabling efficient retention policies and parallel queries.
Why it teaches Time Series Databases
Time-based sharding is fundamental to TSDB scalability. You’ll understand why: (1) old data can be deleted by dropping entire shards, (2) queries touching specific time ranges only need specific shards, (3) compaction happens per-shard independently.
Core challenges you’ll face
- Shard routing (which shard handles a given timestamp?) → maps to partitioning logic
- Cross-shard queries (query spans multiple time windows) → maps to query fan-out
- Shard lifecycle (create, compact, delete) → maps to data lifecycle management
- Retention enforcement (drop shards older than X days) → maps to operational automation
- Hot/cold tiering (recent shards in fast storage, old in slow) → maps to storage economics
Key Concepts
- Time-Based Partitioning: “Designing Data-Intensive Applications” Chapter 6 - Martin Kleppmann
- Retention Policies: InfluxDB Retention Documentation
- Shard Groups: InfluxDB Shard Groups
- Query Planning: “Designing Data-Intensive Applications” Chapter 6 - Martin Kleppmann
Difficulty
Advanced
Time estimate
2 weeks
Prerequisites
Basic understanding of distributed systems, file I/O
Real world outcome
$ ./shard-manager status
Shard Configuration:
Shard duration: 1 day
Retention: 7 days
Active Shards:
shard_2024-01-15 [HOT] size: 2.3GB series: 50,000
shard_2024-01-14 [WARM] size: 4.1GB series: 50,000
shard_2024-01-13 [WARM] size: 4.0GB series: 50,000
shard_2024-01-12 [COLD] size: 3.8GB series: 50,000
...
shard_2024-01-08 [COLD] size: 3.5GB series: 50,000
$ ./shard-manager query "SELECT * FROM cpu WHERE time > '2024-01-14'"
Query plan:
Shards to scan: shard_2024-01-15, shard_2024-01-14
Parallel execution: 2 workers
Estimated points: 8,640,000
Results: 8,523,412 points in 1.2s
$ ./shard-manager enforce-retention
Retention policy: 7 days
Shards to delete: shard_2024-01-07
Deleted 3.2GB, recovered disk space
Implementation Hints
Shard structure:
STRUCT ShardManager:
shard_duration: Duration # e.g., 1 day
retention_period: Duration # e.g., 7 days
shards: HashMap<ShardID, Shard>
STRUCT Shard:
id: string # e.g., "shard_2024-01-15"
start_time: Timestamp
end_time: Timestamp
storage: StorageEngine # Could be TSM, Block-based, etc.
state: HOT | WARM | COLD
FUNCTION get_shard_for_timestamp(timestamp):
shard_start = truncate_to_boundary(timestamp, shard_duration)
shard_id = format_shard_id(shard_start)
IF shard_id NOT IN shards:
shards[shard_id] = create_shard(shard_id, shard_start)
RETURN shards[shard_id]
FUNCTION truncate_to_boundary(timestamp, duration):
# For daily shards: truncate to midnight
# For hourly shards: truncate to hour start
RETURN timestamp - (timestamp % duration)
Cross-shard query:
FUNCTION query(start_time, end_time, filters):
relevant_shards = []
FOR EACH shard IN shards:
IF shard.end_time >= start_time AND shard.start_time <= end_time:
relevant_shards.append(shard)
# Execute in parallel
results = parallel_map(relevant_shards,
lambda shard: shard.query(start_time, end_time, filters))
RETURN merge_sorted_by_time(results)
Retention enforcement:
FUNCTION enforce_retention():
cutoff = now() - retention_period
FOR EACH shard IN shards:
IF shard.end_time < cutoff:
shard.close()
delete_shard_directory(shard.id)
shards.remove(shard.id)
LOG("Deleted shard {shard.id}")
Critical insight: The shard boundary alignment is crucial. If you have daily shards, all data for a given day must go to the same shard. This is why time-based sharding works so well for retention: you never partially delete a shard.
Learning milestones
- Writes route to correct shard → You understand partitioning
- Queries span multiple shards correctly → You understand fan-out
- Retention deletes old shards → You understand data lifecycle
- Hot/cold tiering works → You understand storage economics
Project 9: Downsampling and Rollup Engine
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Python, Rust, Java
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Data Aggregation / Stream Processing
- Software or Tool: Downsampling Engine
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
An engine that automatically creates lower-resolution rollups of time series data (e.g., 1-minute raw → 1-hour aggregates → 1-day aggregates).
Why it teaches Time Series Databases
Long-term storage of high-resolution data is expensive and rarely needed. You’ll understand tiered retention: keep raw data for 7 days, hourly rollups for 90 days, daily rollups forever.
Core challenges you’ll face
- Aggregation functions (min, max, sum, count, avg, percentiles) → maps to statistics fundamentals
- Incremental computation (don’t reprocess old data) → maps to stream processing
- Rollup scheduling (when to run, what to process) → maps to job scheduling
- Rollup storage (where to store aggregated data) → maps to data organization
- Query routing (use rollups for long-range queries) → maps to query optimization
Key Concepts
- Continuous Aggregates: TimescaleDB Continuous Aggregates
- Stream Processing: “Designing Data-Intensive Applications” Chapter 11 - Martin Kleppmann
- Percentile Estimation: “Algorithms” Chapter 2.5 - Sedgewick (for t-digest/DDSketch)
- Retention Policies: InfluxDB Retention & Downsampling
Difficulty
Advanced
Time estimate
2 weeks
Prerequisites
Basic aggregation understanding, time-based sharding
Real world outcome
$ ./rollup-engine status
Rollup Configuration:
Raw data retention: 7 days (1-second resolution)
1-minute rollups: 30 days
1-hour rollups: 1 year
1-day rollups: forever
Rollup Jobs:
Last 1-minute rollup: 2024-01-15 14:35:00 (processed 60 seconds of data)
Last 1-hour rollup: 2024-01-15 14:00:00 (processed 60 minutes of data)
Last 1-day rollup: 2024-01-15 00:00:00 (processed 24 hours of data)
$ ./rollup-engine query "avg(cpu_usage) WHERE host='server1'" --range="30d"
Query plan:
Last 7 days: raw data (1-second resolution)
Days 8-30: 1-minute rollups
Total points scanned: 604,800 + 33,120 = 637,920
(vs. 2,592,000 without rollups - 4x fewer points!)
Result: avg=0.342, min=0.05, max=0.98, count=2,592,000
$ ./rollup-engine query "avg(cpu_usage)" --range="1y"
Query plan:
Last 7 days: raw data
Days 8-30: 1-minute rollups
Days 31-365: 1-hour rollups
Total points scanned: 604,800 + 33,120 + 8,040 = 645,960
(vs. 31,536,000 without rollups - 48x fewer points!)
Implementation Hints
Rollup point structure:
STRUCT RollupPoint:
timestamp: int64 # Start of aggregation window
count: int64 # Number of raw points
sum: float64 # Sum of values
min: float64 # Minimum value
max: float64 # Maximum value
# Derived: avg = sum / count
# For percentiles, use a sketch data structure:
STRUCT PercentileRollupPoint:
timestamp: int64
t_digest: TDigest # Or DDSketch
Rollup job pseudo-code:
STRUCT RollupJob:
source_resolution: Duration # e.g., 1 second
target_resolution: Duration # e.g., 1 minute
last_processed: Timestamp
FUNCTION run_rollup_job(job):
start = job.last_processed
end = truncate_to_boundary(now(), job.target_resolution)
IF start >= end:
RETURN # Nothing new to process
windows = generate_windows(start, end, job.target_resolution)
FOR EACH window IN windows:
raw_points = query_raw_data(window.start, window.end)
rollup = RollupPoint(
timestamp = window.start,
count = len(raw_points),
sum = sum(p.value for p in raw_points),
min = min(p.value for p in raw_points),
max = max(p.value for p in raw_points)
)
store_rollup(rollup)
job.last_processed = end
Query routing:
FUNCTION smart_query(start_time, end_time, aggregation):
segments = []
current = start_time
# Use highest resolution available for each time segment
WHILE current < end_time:
IF current > now() - raw_retention:
segments.append(QuerySegment("raw", current, min(end_time, now())))
current = now()
ELIF current > now() - minute_rollup_retention:
segment_end = min(end_time, now() - raw_retention)
segments.append(QuerySegment("1min", current, segment_end))
current = segment_end
ELSE:
segment_end = min(end_time, now() - minute_rollup_retention)
segments.append(QuerySegment("1hour", current, segment_end))
current = segment_end
RETURN merge_results(execute_segments(segments))
Critical insight: Rollups must be additive—you can compute sum and count from rollups and derive avg, but you cannot compute median without storing a sketch. This is why production systems use t-digest or DDSketch for percentile rollups.
Learning milestones
- Basic aggregations work → You understand rollup fundamentals
- Incremental processing works → You understand stream processing
- Query routing uses appropriate rollups → You understand query optimization
- Percentile rollups work → You understand sketch data structures
Project 10: PromQL-Style Query Engine
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, Python, TypeScript
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert (The Systems Architect)
- Knowledge Area: Query Languages / Compilers
- Software or Tool: Query Engine
- Main Book: “Language Implementation Patterns” by Terence Parr
What you’ll build
A query engine that parses and executes PromQL-style queries with selectors, functions, and binary operators.
Why it teaches Time Series Databases
PromQL is the de facto standard for time series queries. You’ll understand how a functional query language works: expression trees, type system (scalars vs. vectors), and streaming execution.
Core challenges you’ll face
- Lexer and parser (tokenize and build AST) → maps to language implementation
- Type system (instant vector, range vector, scalar) → maps to type theory basics
- Selector execution (query storage for matching series) → maps to storage integration
- Function implementation (rate, avg_over_time, histogram_quantile) → maps to aggregation logic
- Binary operator matching (how to combine two vectors) → maps to set operations
Resources for key challenges
- PromQL Internals: Inside PromQL - Grafana Labs
- PromQL Anatomy: Anatomy of a PromQL Query - PromLabs
- PromQL Tutorial: PromQL for Beginners - VictoriaMetrics
Key Concepts
- Lexing and Parsing: “Language Implementation Patterns” Chapters 2-5 - Terence Parr
- Expression Trees: “Compilers: Principles and Practice” Chapter 5 - Dave & Dave
- Type Systems: “Language Implementation Patterns” Chapter 8 - Terence Parr
- Vector Matching: PromQL Documentation
Difficulty
Expert
Time estimate
3-4 weeks
Prerequisites
Basic compiler knowledge (lexer/parser), understanding of time series data model
Real world outcome
$ ./promql-engine
PromQL Engine v0.1
> http_requests_total
TYPE: instant vector
{method="GET", status="200"} 1523
{method="GET", status="404"} 42
{method="POST", status="200"} 891
Query time: 2.3ms
> rate(http_requests_total[5m])
TYPE: instant vector
{method="GET", status="200"} 5.07
{method="GET", status="404"} 0.14
{method="POST", status="200"} 2.97
Query time: 4.1ms
> sum by (method) (rate(http_requests_total[5m]))
TYPE: instant vector
{method="GET"} 5.21
{method="POST"} 2.97
Query time: 4.8ms
> http_requests_total / ignoring(status) http_requests_started
TYPE: instant vector (ratio of completed to started)
{method="GET"} 0.95
{method="POST"} 0.89
Query time: 5.2ms
Implementation Hints
Token types:
TOKENS:
IDENTIFIER # metric names, label names
STRING # "value"
NUMBER # 123, 45.67
DURATION # 5m, 1h, 30s
LBRACE # {
RBRACE # }
LPAREN # (
RPAREN # )
LBRACKET # [
RBRACKET # ]
COMMA # ,
EQ # =
NEQ # !=
REGEX_MATCH # =~
REGEX_NOMATCH # !~
# Binary operators
ADD, SUB, MUL, DIV, MOD, POW
# Keywords
SUM, AVG, MIN, MAX, COUNT, BY, WITHOUT, ON, IGNORING, RATE, ...
AST node types:
ENUM ExprType:
VECTOR_SELECTOR # http_requests_total{method="GET"}
MATRIX_SELECTOR # http_requests_total{method="GET"}[5m]
SCALAR_LITERAL # 42
UNARY_EXPR # -metric
BINARY_EXPR # a + b
CALL # rate(metric[5m])
AGGREGATION # sum by (label) (expr)
STRUCT VectorSelector:
metric_name: string
label_matchers: List<LabelMatcher>
STRUCT MatrixSelector:
vector: VectorSelector
range: Duration
STRUCT BinaryExpr:
left: Expr
right: Expr
op: Operator
matching: VectorMatching # on(), ignoring()
Execution pseudo-code:
FUNCTION evaluate(expr, eval_time):
MATCH expr:
CASE VectorSelector(name, matchers):
series = storage.select(name, matchers)
RETURN instant_vector(series, eval_time)
CASE MatrixSelector(vector, range):
series = storage.select(vector.name, vector.matchers)
RETURN range_vector(series, eval_time - range, eval_time)
CASE Call("rate", [matrix_expr]):
matrix = evaluate(matrix_expr, eval_time)
RETURN apply_rate(matrix)
CASE BinaryExpr(left, right, op, matching):
lhs = evaluate(left, eval_time)
rhs = evaluate(right, eval_time)
RETURN vector_binary_op(lhs, rhs, op, matching)
CASE Aggregation("sum", by_labels, expr):
vector = evaluate(expr, eval_time)
RETURN aggregate_sum(vector, by_labels)
Vector matching (the tricky part):
FUNCTION vector_binary_op(lhs, rhs, op, matching):
result = []
# Build signature map for RHS
rhs_map = {}
FOR EACH sample IN rhs:
sig = signature(sample.labels, matching.on_labels)
rhs_map[sig] = sample
# Match LHS against RHS
FOR EACH lsample IN lhs:
sig = signature(lsample.labels, matching.on_labels)
IF sig IN rhs_map:
rsample = rhs_map[sig]
result_value = apply_op(lsample.value, rsample.value, op)
result.append(Sample(lsample.labels, result_value))
RETURN result
Critical insight: PromQL is a functional language where every subexpression evaluates to a result (scalar, instant vector, or range vector) that feeds into the parent expression. The type rules are strict: rate() requires a range vector, aggregations produce instant vectors, etc.
Learning milestones
- Simple selectors work → You understand the lexer/parser
- rate() and other functions work → You understand range vectors
- Binary operators with matching work → You understand PromQL’s type system
- Aggregations work → You’ve built a functional query engine
Project 11: High-Throughput Ingestion Pipeline
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Rust
- Alternative Programming Languages: Go, C++, C
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert (The Systems Architect)
- Knowledge Area: High Performance / Concurrency
- Software or Tool: Ingestion Pipeline
- Main Book: “Rust Atomics and Locks” by Mara Bos
What you’ll build
A high-performance ingestion pipeline capable of handling 1M+ data points per second with batching, back-pressure, and parallel processing.
Why it teaches Time Series Databases
Production TSDBs must handle enormous write loads from thousands of agents. You’ll understand write-path optimization: batching, lock-free data structures, memory pools, and the trade-offs between latency and throughput.
Core challenges you’ll face
- Batching strategies (when to flush: time-based, size-based, or both) → maps to latency/throughput trade-offs
- Lock-free structures (concurrent writes without blocking) → maps to concurrency patterns
- Memory pooling (avoid allocation overhead) → maps to memory management
- Back-pressure (what happens when writes exceed processing capacity) → maps to flow control
- Wire protocol (efficient parsing of incoming data) → maps to serialization
Key Concepts
- Lock-Free Programming: “Rust Atomics and Locks” Chapters 4-6 - Mara Bos
- Batching and Buffering: “Designing Data-Intensive Applications” Chapter 11 - Martin Kleppmann
- Memory Pools: “C Interfaces and Implementations” Chapter 5 - David Hanson
- Back-Pressure: “Designing Data-Intensive Applications” Chapter 11 - Martin Kleppmann
Difficulty
Expert
Time estimate
3-4 weeks
Prerequisites
Strong concurrency knowledge, understanding of memory layout
Real world outcome
$ ./ingestion-bench --writers=100 --duration=60s
Starting ingestion benchmark...
Writers: 100 concurrent
Duration: 60 seconds
Results:
Total points ingested: 72,345,678
Throughput: 1,205,761 points/second
Latency (p50): 0.4ms
Latency (p99): 2.1ms
Latency (p99.9): 8.3ms
Batches flushed: 723,456
Avg batch size: 100 points
Memory usage: 512MB (stable)
GC pauses: 0 (using memory pool)
$ ./ingestion-server --port=9090
Ingestion server listening on :9090
Protocol: InfluxDB line protocol compatible
$ echo "cpu,host=server1 usage=0.75 1703001600000000000" | nc localhost 9090
OK
$ ./telegraf --config=telegraf.conf # Send 10K metrics/sec
Connected to ingestion server
Sending metrics...
Implementation Hints
Batching architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Receiver 1 │────▶│ Batch Queue │────▶│ Flusher 1 │
├─────────────────┤ │ (lock-free) │ ├─────────────────┤
│ Receiver 2 │────▶│ │────▶│ Flusher 2 │
├─────────────────┤ │ │ ├─────────────────┤
│ Receiver N │────▶│ │────▶│ Flusher M │
└─────────────────┘ └─────────────────┘ └─────────────────┘
▲
│
Flush trigger:
- Time: every 100ms
- Size: every 10K points
Lock-free batch queue pseudo-code:
STRUCT BatchQueue:
batches: Array<AtomicPtr<Batch>>
write_idx: AtomicUint64
read_idx: AtomicUint64
STRUCT Batch:
points: FixedArray<Point, 10000>
count: AtomicUint32
state: AtomicEnum<FILLING, READY, FLUSHING>
FUNCTION append_point(queue, point):
LOOP:
idx = queue.write_idx.load()
batch = queue.batches[idx % NUM_BATCHES]
IF batch.state.load() == FILLING:
slot = batch.count.fetch_add(1)
IF slot < BATCH_SIZE:
batch.points[slot] = point
RETURN OK
ELSE:
# Batch full, try to seal it
batch.state.compare_swap(FILLING, READY)
# Move to next batch
queue.write_idx.compare_swap(idx, idx + 1)
Memory pool:
STRUCT PointPool:
free_list: LockFreeStack<*Point>
slabs: Vec<*Slab>
FUNCTION allocate():
IF point = free_list.pop():
RETURN point
ELSE:
RETURN allocate_from_new_slab()
FUNCTION release(point):
clear(point)
free_list.push(point)
Back-pressure:
FUNCTION handle_write(points):
IF pending_batches() > MAX_PENDING:
IF blocking_mode:
wait_for_flush()
ELSE:
RETURN ERROR_BACKPRESSURE
FOR EACH point IN points:
append_point(batch_queue, point)
RETURN OK
Critical insight: The key to high throughput is avoiding contention. Each receiver thread should have its own batch being filled, and only contend when batches are swapped. Memory pools eliminate allocation overhead, which is critical at 1M+ points/second.
Learning milestones
- Single-threaded hits 200K points/sec → You understand parsing overhead
- Multi-threaded scales linearly → You understand lock-free design
- Latency stays stable under load → You understand batching trade-offs
- Memory stays flat → You understand pooling
Project 12: Cardinality Management System
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, Python, Java
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Metrics / Observability
- Software or Tool: Cardinality Analyzer
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
A system that tracks, analyzes, and limits the cardinality of time series (unique combinations of metric + tags), preventing the “cardinality explosion” problem.
Why it teaches Time Series Databases
High cardinality is the #1 operational challenge for TSDBs. A single metric with a “user_id” tag can create millions of series. You’ll understand why this breaks TSDBs and how to detect and limit it.
Core challenges you’ll face
- Cardinality counting (how many unique series exist?) → maps to HyperLogLog and sketches
- Top-K tracking (which labels contribute most cardinality?) → maps to streaming algorithms
- Cardinality limiting (reject writes that would create too many series) → maps to admission control
- Label analysis (detect problematic labels like user_id) → maps to data quality
- Cardinality forecasting (predict future growth) → maps to capacity planning
Resources for key challenges
- Cardinality Problem: High Cardinality in TSDBs - TimescaleDB
- Solutions Overview: How Databases Handle High Cardinality - TigerData
- InfluxDB Approach: Time Series Data and Cardinality - InfluxData
Key Concepts
- HyperLogLog: “Algorithms” (probabilistic counting) - or HyperLogLog paper
- Top-K/Heavy Hitters: “Algorithms” - Sedgewick (or Space-Saving algorithm)
- Admission Control: “Designing Data-Intensive Applications” Chapter 11 - Martin Kleppmann
- Cardinality Estimation: Greptime: Unveiling High Cardinality
Difficulty
Advanced
Time estimate
2 weeks
Prerequisites
Basic probability/statistics, understanding of time series data model
Real world outcome
$ ./cardinality-analyzer
Cardinality Analysis Report
===========================
Total unique series: 1,234,567
Growth rate: +12,345/hour
Top contributors to cardinality:
1. http_requests{path=*} 523,456 series (42%)
WARNING: "path" label has high cardinality
Unique values: 523,456
Recommendation: Use regex grouping or drop label
2. container_cpu{pod_id=*} 312,789 series (25%)
WARNING: "pod_id" is likely ephemeral
Unique values: 312,789
Recommendation: Consider pod_name instead
3. api_latency{user_id=*} 234,567 series (19%)
CRITICAL: "user_id" creates unbounded cardinality
Unique values: 234,567
Recommendation: Remove label, use exemplars
Cardinality limits:
Metric "http_requests": 10,000 series (current: 523,456 - EXCEEDED)
Action: New series with unique "path" values are being dropped
Dropped in last hour: 45,678 writes
$ ./cardinality-analyzer forecast --days=30
Predicted cardinality in 30 days: 2,456,789
WARNING: Exceeds recommended limit of 2,000,000
Implementation Hints
Cardinality tracking structure:
STRUCT CardinalityTracker:
# Approximate total count
global_hll: HyperLogLog
# Per-metric counts
metric_hlls: HashMap<MetricName, HyperLogLog>
# Per-label cardinality (for analysis)
label_hlls: HashMap<(MetricName, LabelName), HyperLogLog>
# Top-K series by write frequency
heavy_hitters: SpaceSaving<SeriesKey>
# Limits
limits: HashMap<MetricName, uint64>
# Dropped writes counter
dropped: AtomicUint64
FUNCTION track_write(metric, labels, value):
series_key = build_series_key(metric, labels)
# Update HLL
global_hll.add(hash(series_key))
metric_hlls[metric].add(hash(series_key))
FOR EACH (name, value) IN labels:
label_hlls[(metric, name)].add(hash(value))
heavy_hitters.add(series_key)
Cardinality limiting:
FUNCTION should_accept_write(metric, labels):
series_key = build_series_key(metric, labels)
IF series_key IN known_series:
RETURN true # Existing series, always accept
# New series - check limits
current = metric_hlls[metric].count()
limit = limits.get(metric, DEFAULT_LIMIT)
IF current >= limit:
dropped.increment()
LOG("Dropping write: cardinality limit exceeded for {metric}")
RETURN false
known_series.add(series_key)
RETURN true
Label analysis:
FUNCTION analyze_labels(metric):
report = []
metric_cardinality = metric_hlls[metric].count()
FOR EACH (label_name, hll) IN label_hlls WHERE metric == metric:
label_cardinality = hll.count()
contribution = label_cardinality / metric_cardinality
IF contribution > 0.5:
report.append(Warning(
"Label '{label_name}' contributes {contribution*100}% of cardinality",
"Consider removing or bucketing this label"
))
IF label_cardinality > 10000:
report.append(Critical(
"Label '{label_name}' has {label_cardinality} unique values",
"This is likely unbounded (user_id, request_id, etc.)"
))
RETURN report
Critical insight: The cardinality problem is fundamentally about combinatorial explosion. If you have 3 labels with 100, 50, and 20 unique values, you could have up to 100,000 unique series. Adding a fourth label with 1000 values (like user_id) creates up to 100,000,000 series—impossible to handle.
Learning milestones
- HyperLogLog counts work → You understand approximate counting
- Top-K identifies problematic labels → You understand streaming algorithms
- Cardinality limiting works → You understand admission control
- Analysis produces actionable recommendations → You understand operational needs
Project 13: Distributed Time Series Store
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, Java
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 5. The “Industry Disruptor” (VC-Backable Platform)
- Difficulty: Level 5: Master (The First-Principles Wizard)
- Knowledge Area: Distributed Systems
- Software or Tool: Distributed TSDB
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
A distributed time series store with replication, consistent hashing for series placement, and federated queries across nodes.
Why it teaches Time Series Databases
Single-node TSDBs can’t scale forever. You’ll understand how production systems like Cortex, Mimir, and Thanos distribute data: hash ring for writes, fan-out for queries, replication for durability.
Core challenges you’ll face
- Consistent hashing (assign series to nodes) → maps to partitioning
- Replication (write to N nodes for durability) → maps to fault tolerance
- Query fan-out (query all nodes, merge results) → maps to distributed query execution
- Node failure handling (detect and recover) → maps to failure detection
- Membership protocol (nodes joining/leaving) → maps to cluster management
Key Concepts
- Consistent Hashing: “Designing Data-Intensive Applications” Chapter 6 - Martin Kleppmann
- Replication: “Designing Data-Intensive Applications” Chapter 5 - Martin Kleppmann
- Gossip Protocols: “Designing Data-Intensive Applications” Chapter 8 - Martin Kleppmann
- Distributed Query Execution: Grafana Mimir Architecture
Difficulty
Master
Time estimate
6-8 weeks
Prerequisites
All previous projects, strong distributed systems knowledge
Real world outcome
$ ./tsdb-cluster init --nodes=3 --replication=2
Initializing 3-node cluster with replication factor 2...
Node 1 (localhost:9001): ONLINE
Node 2 (localhost:9002): ONLINE
Node 3 (localhost:9003): ONLINE
Hash ring configured: 256 virtual nodes per physical node
$ ./tsdb-cluster status
Cluster Status: HEALTHY
Nodes: 3/3 online
Total series: 5,234,567
Distribution:
Node 1: 1,756,234 series (33.5%)
Node 2: 1,745,678 series (33.3%)
Node 3: 1,732,655 series (33.1%)
Replication lag: <100ms
$ ./tsdb-cluster write "cpu{host=server1} 0.75"
Written to: Node 1 (primary), Node 2 (replica)
Ack: 2/2 nodes confirmed
$ ./tsdb-cluster query "cpu{host=server1}[5m]"
Query plan:
Fan-out to: Node 1 (primary holder)
Fallback: Node 2 (replica)
Result: 30 samples in 4.2ms
$ ./tsdb-cluster kill node-2
Node 2 marked as FAILED
Rebalancing: 1,745,678 series being re-replicated to Node 3
Progress: 45%... 78%... 100%
Cluster Status: DEGRADED (2/3 nodes)
All data remains available (replication factor satisfied)
Implementation Hints
Hash ring structure:
STRUCT HashRing:
nodes: List<Node>
virtual_nodes: SortedMap<uint64, Node> # hash → node
replication_factor: int
FUNCTION add_node(node):
FOR i FROM 0 TO VIRTUAL_NODES_PER_PHYSICAL:
hash = hash(node.id + "_" + i)
virtual_nodes[hash] = node
FUNCTION get_nodes_for_series(series_key):
hash = hash(series_key)
start_idx = virtual_nodes.ceiling_key(hash)
result = []
seen_physical = set()
FOR EACH (_, node) IN virtual_nodes starting from start_idx:
IF node.id NOT IN seen_physical:
result.append(node)
seen_physical.add(node.id)
IF len(result) >= replication_factor:
BREAK
RETURN result
Write path with replication:
FUNCTION write(series_key, timestamp, value):
nodes = ring.get_nodes_for_series(series_key)
# Write to all replicas
responses = parallel_write(nodes, series_key, timestamp, value)
# Wait for quorum (majority)
successes = count(r for r in responses if r.success)
IF successes >= (replication_factor / 2) + 1:
RETURN OK
ELSE:
RETURN ERROR("Failed to achieve write quorum")
Query fan-out:
FUNCTION query(selectors, time_range):
# For each matching series, find primary node
# (In practice, broadcast to all nodes and merge)
results = []
FOR EACH node IN ring.nodes:
IF node.is_healthy():
partial = node.query(selectors, time_range)
results.extend(partial)
# Merge and deduplicate
RETURN merge_sorted_by_time(deduplicate(results))
Failure detection and rebalancing:
STRUCT MembershipManager:
ring: HashRing
heartbeat_interval: Duration = 1s
failure_threshold: int = 3 # missed heartbeats
FUNCTION monitor_nodes():
LOOP:
FOR EACH node IN ring.nodes:
IF time_since_last_heartbeat(node) > failure_threshold * heartbeat_interval:
mark_node_failed(node)
trigger_rebalance(node)
sleep(heartbeat_interval)
FUNCTION trigger_rebalance(failed_node):
# Find series that were on failed node
affected_series = get_series_on_node(failed_node)
FOR EACH series IN affected_series:
# Find new replica location
new_nodes = ring.get_nodes_for_series(series, excluding=failed_node)
# Copy data from existing replica to new location
source = find_healthy_replica(series)
target = new_nodes.last() # New replica
copy_series_data(source, target, series)
Critical insight: The fundamental trade-off in distributed TSDBs is consistency vs. availability. Most choose eventual consistency for writes (all replicas will converge) and read-your-writes consistency where possible. Queries may return slightly stale data during network partitions.
Learning milestones
- Writes distribute across nodes → You understand consistent hashing
- Reads work with one node down → You understand replication
- Cluster rebalances on node failure → You understand failure recovery
- Query performance scales with nodes → You’ve built a distributed system
Project 14: Real-Time Alerting Engine
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, Python, Java
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 3: Advanced (The Engineer)
- Knowledge Area: Stream Processing / Alerting
- Software or Tool: Alerting Engine
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
An alerting engine that evaluates rules against incoming time series data and fires alerts when thresholds are breached.
Why it teaches Time Series Databases
Alerting is the most important use case for TSDBs. You’ll understand streaming evaluation: how to efficiently check thousands of rules against millions of incoming data points without overwhelming the system.
Core challenges you’ll face
- Rule evaluation (periodically query and check thresholds) → maps to scheduled execution
- Alert state machine (pending → firing → resolved) → maps to state management
- Deduplication (don’t spam on every evaluation) → maps to event processing
- Grouping and routing (aggregate related alerts) → maps to alert management
- For-duration (alert only if condition persists for X minutes) → maps to temporal logic
Key Concepts
- Alerting Rules: Prometheus Alerting Rules
- Alert State Machine: Prometheus Alertmanager
- Stream Processing: “Designing Data-Intensive Applications” Chapter 11 - Martin Kleppmann
- Event Processing: “Designing Data-Intensive Applications” Chapter 11 - Martin Kleppmann
Difficulty
Advanced
Time estimate
2-3 weeks
Prerequisites
Query engine (Project 10), basic understanding of state machines
Real world outcome
$ ./alerting-engine --config=alerts.yaml
Loading 150 alerting rules...
Starting evaluation loop (interval: 15s)
$ cat alerts.yaml
groups:
- name: system
rules:
- alert: HighCPU
expr: avg(cpu_usage) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
[Evaluation at 14:00:00]
Rule "HighCPU": avg(cpu_usage) = 0.72 (below threshold)
State: inactive
[Evaluation at 14:00:15]
Rule "HighCPU": avg(cpu_usage) = 0.85 (above threshold!)
State: pending (for: 15s of 5m)
[Evaluation at 14:05:15]
Rule "HighCPU": avg(cpu_usage) = 0.87 (above threshold)
State: firing (for: 5m15s >= 5m)
ALERT FIRED: HighCPU
Labels: {severity=warning, alertname=HighCPU}
Annotations: {summary="High CPU usage detected"}
Firing since: 2024-01-15 14:00:15
[Sending to notification channels...]
Slack #alerts: ⚠️ HighCPU - High CPU usage detected
PagerDuty: Created incident PD-12345
Implementation Hints
Alert rule structure:
STRUCT AlertRule:
name: string
expression: PromQLExpr # Parsed query
for_duration: Duration # How long before firing
labels: Map<string, string>
annotations: Map<string, string>
STRUCT AlertState:
rule: AlertRule
state: INACTIVE | PENDING | FIRING
active_since: Timestamp # When became pending/firing
last_evaluation: Timestamp
last_sample_value: float64
firing_alerts: Map<LabelSet, FiringAlert>
STRUCT FiringAlert:
labels: LabelSet
annotations: Map<string, string>
started_at: Timestamp
resolved_at: Optional<Timestamp>
Evaluation loop:
FUNCTION evaluation_loop(rules, interval):
LOOP:
eval_time = now()
FOR EACH rule IN rules:
evaluate_rule(rule, eval_time)
sleep_until(eval_time + interval)
FUNCTION evaluate_rule(rule, eval_time):
# Execute the PromQL expression
result = query_engine.eval(rule.expression, eval_time)
# For each series in the result
FOR EACH sample IN result:
IF sample.value > 0: # Threshold exceeded
handle_active(rule, sample.labels, eval_time)
ELSE:
handle_inactive(rule, sample.labels, eval_time)
State machine:
FUNCTION handle_active(rule, labels, eval_time):
state = get_or_create_state(rule, labels)
MATCH state.state:
CASE INACTIVE:
state.state = PENDING
state.active_since = eval_time
LOG("Alert {rule.name} now PENDING")
CASE PENDING:
duration = eval_time - state.active_since
IF duration >= rule.for_duration:
state.state = FIRING
fire_alert(rule, labels, eval_time)
LOG("Alert {rule.name} now FIRING")
CASE FIRING:
# Already firing, no action needed
pass
FUNCTION handle_inactive(rule, labels, eval_time):
state = get_state(rule, labels)
IF state AND state.state IN [PENDING, FIRING]:
IF state.state == FIRING:
resolve_alert(rule, labels, eval_time)
state.state = INACTIVE
LOG("Alert {rule.name} now INACTIVE")
Alert grouping (simplified):
STRUCT AlertGroup:
labels: LabelSet # Common labels (e.g., {severity=critical})
alerts: List<FiringAlert>
last_notification: Timestamp
FUNCTION group_alerts(firing_alerts, group_by_labels):
groups = HashMap<LabelSet, AlertGroup>()
FOR EACH alert IN firing_alerts:
group_key = extract_labels(alert.labels, group_by_labels)
IF group_key NOT IN groups:
groups[group_key] = AlertGroup(group_key)
groups[group_key].alerts.append(alert)
RETURN groups
FUNCTION send_notifications(groups):
FOR EACH group IN groups:
IF should_send(group): # Rate limiting, dedup
notification = render_notification(group)
send_to_channels(notification)
group.last_notification = now()
Critical insight: The for duration is essential to prevent flapping. Without it, a metric oscillating around the threshold would generate constant alert/resolve cycles. The pending state acts as a debounce.
Learning milestones
- Simple threshold alerts work → You understand rule evaluation
- For-duration prevents flapping → You understand state machines
- Grouping reduces noise → You understand alert management
- Notifications integrate with external systems → You’ve built a production tool
Project 15: Full-Featured TSDB from Scratch
- File: LEARNING_TIME_SERIES_DATABASES.md
- Main Programming Language: Rust
- Alternative Programming Languages: Go, C++
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 5. The “Industry Disruptor” (VC-Backable Platform)
- Difficulty: Level 5: Master (The First-Principles Wizard)
- Knowledge Area: Complete TSDB Implementation
- Software or Tool: Full TSDB
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build
A complete, production-quality time series database combining all previous projects: storage engine, compression, indexing, query language, ingestion, and clustering.
Why it teaches Time Series Databases
This is the capstone project. You’ll integrate everything you’ve learned into a cohesive system, facing the real challenges of making components work together: buffer management, resource contention, query planning, and operational concerns.
Core challenges you’ll face
- Component integration (storage + index + query + ingestion) → maps to system design
- Resource management (memory limits, file handles, connections) → maps to production engineering
- Query planning (use indexes, rollups, appropriate storage tier) → maps to optimizer design
- Operational API (compaction, retention, cluster management) → maps to admin interface
- Observability (metrics about the database itself) → maps to meta-monitoring
Key Concepts
- System Integration: “Designing Data-Intensive Applications” - Martin Kleppmann (entire book)
- Production Engineering: “Release It!” - Michael Nygard
- Query Optimization: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann
- Observability: “Designing Data-Intensive Applications” Chapter 11 - Martin Kleppmann
Difficulty
Master
Time estimate
3-6 months
Prerequisites
All previous projects completed and understood
Real world outcome
$ ./mytsdb start
MyTSDB v1.0 starting...
Storage: /var/lib/mytsdb
WAL: initialized (256MB limit)
Cache: 1GB
Compaction: background threads started
Query engine: PromQL compatible
HTTP API: listening on :9090
gRPC API: listening on :9091
$ curl -X POST localhost:9090/write -d '
cpu_usage{host="server1",region="us-east"} 0.75 1703001600
cpu_usage{host="server1",region="us-east"} 0.82 1703001610
memory_usage{host="server1"} 0.45 1703001600
'
{"status":"success","samples_written":3}
$ curl 'localhost:9090/query?query=rate(cpu_usage[5m])&time=1703001800'
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{"metric": {"host": "server1", "region": "us-east"}, "value": [1703001800, "0.023"]}
]
}
}
$ curl localhost:9090/admin/status
{
"status": "healthy",
"storage": {
"total_series": 125432,
"total_samples": 45678901,
"disk_usage_bytes": 1234567890,
"compression_ratio": 12.5
},
"ingestion": {
"samples_per_second": 125000,
"active_series": 45000
},
"query": {
"queries_per_second": 150,
"avg_latency_ms": 12.5
}
}
Implementation Hints
System architecture:
┌────────────────────────────────────────────────────────────────┐
│ HTTP/gRPC API │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Ingestion │ │ Query │ │ Admin │ │
│ │ Pipeline │ │ Engine │ │ API │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ ┌──────▼─────────────────▼─────────────────────▼───────────┐ │
│ │ Storage Engine │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │ │
│ │ │ Head │ │ Blocks │ │ Index │ │ Compaction │ │ │
│ │ │ Block │ │ (TSM) │ │ │ │ Manager │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └──────┬──────┘ │ │
│ │ │ │ │ │ │ │
│ │ ┌────▼────────────▼────────────▼──────────────▼───────┐ │ │
│ │ │ WAL │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Resource Manager │ │
│ │ Memory limits, file handles, connection pools │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
Component interfaces:
TRAIT StorageEngine:
FUNCTION append(series_ref, timestamp, value) → Result
FUNCTION query(matchers, start, end) → Iterator<Sample>
FUNCTION compact() → Result
FUNCTION snapshot() → Snapshot
TRAIT QueryEngine:
FUNCTION eval(expr, time) → Result<Value>
FUNCTION eval_range(expr, start, end, step) → Result<Matrix>
TRAIT IngestionPipeline:
FUNCTION ingest(samples) → Result<WriteResponse>
FUNCTION flush() → Result
TRAIT AdminAPI:
FUNCTION status() → SystemStatus
FUNCTION compact_now() → Result
FUNCTION snapshot() → Result
FUNCTION reload_config() → Result
Resource management:
STRUCT ResourceManager:
memory_limit: Bytes
current_memory: AtomicBytes
file_handle_limit: int
open_files: AtomicInt
FUNCTION allocate_memory(size):
new_total = current_memory.fetch_add(size)
IF new_total > memory_limit:
current_memory.fetch_sub(size)
trigger_memory_pressure()
RETURN ERROR("memory limit exceeded")
RETURN OK
FUNCTION trigger_memory_pressure():
# Flush buffers, drop caches, etc.
head_block.flush_oldest_series()
query_cache.clear()
Startup sequence:
FUNCTION start():
# 1. Initialize resource manager
resource_manager = ResourceManager(config.memory_limit, ...)
# 2. Open/recover WAL
wal = WAL.open_or_create(config.wal_dir)
# 3. Initialize storage engine
storage = StorageEngine(config.data_dir, wal, resource_manager)
storage.recover() # Replay WAL, load block metadata
# 4. Initialize index
index = InvertedIndex.load(config.data_dir)
# 5. Start background tasks
spawn(compaction_loop, storage)
spawn(retention_loop, storage, config.retention)
spawn(wal_checkpoint_loop, wal, storage)
# 6. Initialize query engine
query_engine = QueryEngine(storage, index)
# 7. Initialize ingestion pipeline
ingestion = IngestionPipeline(storage, index, config.batch_size)
# 8. Start API servers
start_http_server(ingestion, query_engine, admin_api)
start_grpc_server(ingestion, query_engine)
Self-monitoring (meta-metrics):
FUNCTION collect_internal_metrics():
RETURN [
Metric("mytsdb_storage_series_total", storage.series_count()),
Metric("mytsdb_storage_samples_total", storage.sample_count()),
Metric("mytsdb_storage_bytes", storage.disk_usage()),
Metric("mytsdb_ingestion_samples_per_second", ingestion.rate()),
Metric("mytsdb_query_latency_seconds", query_engine.latency_histogram()),
Metric("mytsdb_memory_usage_bytes", resource_manager.current_memory),
Metric("mytsdb_compaction_duration_seconds", compaction.duration_histogram()),
Metric("mytsdb_wal_bytes", wal.size()),
]
Critical insight: Building a full TSDB is an exercise in trade-offs. Every decision (block size, flush interval, compaction strategy) affects multiple dimensions: write throughput, query latency, disk usage, memory usage, and durability. The art is finding the right balance for your use case.
Learning milestones
- All components integrate → You understand system design
- Benchmarks match expectations → You understand performance
- Failure scenarios are handled → You understand production engineering
- Self-monitoring works → You’ve built a production-quality system
- Documentation is complete → You can explain your design decisions
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. In-Memory Store | Beginner-Intermediate | 1 week | ⭐⭐ | ⭐⭐⭐ |
| 2. Delta Encoding | Intermediate | 1 week | ⭐⭐⭐ | ⭐⭐⭐ |
| 3. Gorilla Compression | Advanced | 2-3 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 4. WAL Implementation | Advanced | 2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 5. Inverted Index | Advanced | 2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 6. Block Storage (Prometheus) | Expert | 3-4 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 7. TSM Tree (InfluxDB) | Expert | 4-5 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 8. Time-Based Sharding | Advanced | 2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 9. Downsampling Engine | Advanced | 2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 10. PromQL Query Engine | Expert | 3-4 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 11. High-Throughput Ingestion | Expert | 3-4 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 12. Cardinality Management | Advanced | 2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 13. Distributed Store | Master | 6-8 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 14. Alerting Engine | Advanced | 2-3 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 15. Full TSDB | Master | 3-6 months | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Recommendation
If you’re just starting out:
Start with Project 1 (In-Memory Store) and Project 2 (Delta Encoding). These give you the foundation without overwhelming complexity. You’ll understand the core data model and why compression is essential.
If you want to understand storage engines deeply:
Do Projects 4 (WAL), 6 (Block Storage), and 7 (TSM Tree). These are the heart of any TSDB. Understanding both Prometheus-style and InfluxDB-style approaches gives you perspective on the design space.
If you’re interested in query languages:
Focus on Project 10 (PromQL Engine). This is a self-contained compiler/interpreter project that teaches you how time series queries actually work.
If you want the “wow factor”:
Project 3 (Gorilla Compression) and Project 13 (Distributed Store) are the showstoppers. Implementing Facebook’s compression paper is impressive, and building a distributed system is always a challenge.
If you have 3-6 months:
Go for Project 15 (Full TSDB). This is the ultimate learning experience—you’ll face every challenge in TSDB design and come out understanding the field deeply.
Summary
| # | Project | Main Language |
|---|---|---|
| 1 | In-Memory Time Series Store | C |
| 2 | Delta Encoding Compressor | C |
| 3 | Gorilla Compression (Facebook Paper) | C |
| 4 | Write-Ahead Log (WAL) for Time Series | C |
| 5 | Inverted Index for Time Series Tags | C |
| 6 | Block-Based Storage Engine (Prometheus-Style) | Go |
| 7 | TSM Tree Storage Engine (InfluxDB-Style) | Go |
| 8 | Time-Based Sharding System | Go |
| 9 | Downsampling and Rollup Engine | Go |
| 10 | PromQL-Style Query Engine | Go |
| 11 | High-Throughput Ingestion Pipeline | Rust |
| 12 | Cardinality Management System | Go |
| 13 | Distributed Time Series Store | Go |
| 14 | Real-Time Alerting Engine | Go |
| 15 | Full-Featured TSDB from Scratch | Rust |
Sources
Time Series Database Architecture
- InfluxData - Time Series Database Explained
- ClickHouse - What Is a Time-Series Database?
- Alibaba Cloud - Key Concepts and Features of TSDBs
- Grafana Labs - Mimir 3.0 Architecture
Storage Engines
- InfluxDB Storage Engine Documentation
- InfluxDB TSM Tree
- Prometheus Storage Documentation
- Prometheus TSDB Explained
- Prometheus TSDB Internals
Compression
- Gorilla Paper (VLDB 2015)
- The Morning Paper - Gorilla
- Time Series Compression Algorithms Explained
- Delta Encoding Tutorial
LSM Trees
- FreeCodeCamp - Build an LSM Tree from Scratch
- MiniLSM Tutorial (Rust)
- Alibaba - LSM Compaction Mechanism
High Cardinality
- TimescaleDB - What is High Cardinality?
- TigerData - How Databases Handle High Cardinality
- InfluxData - Time Series Data and Cardinality