Project 1: In-Memory Cache (Mini-Redis LRU)

Build a TCP cache server with O(1) GET/SET, strict LRU eviction, and latency-safe resizing.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	15-25 hours
Main Programming Language	C (Alternatives: Rust, Zig)
Alternative Programming Languages	Rust, Zig
Coolness Level	Level 4
Business Potential	Infrastructure component
Prerequisites	C pointers, sockets, hash tables, linked lists
Key Topics	LRU cache, hash tables, intrusive lists, TCP IO

1. Learning Objectives

By completing this project, you will:

Implement a production-style LRU cache with O(1) lookup and O(1) eviction.
Build a TCP request/response protocol with robust parsing and error handling.
Design a memory accounting system that enforces strict cache size limits.
Implement incremental hash table resizing to avoid latency spikes.
Measure cache performance using hit/miss/eviction counters and tail latency.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement the cache correctly.

Hash Tables with Chaining and Load Factor Control

Fundamentals

A hash table is the primary lookup mechanism for the cache and is what gives you O(1) average GET and SET operations. The table is built from an array of buckets, and each key is mapped to a bucket index by a hash function. In chaining, each bucket stores a linked list of entries, which absorbs collisions without requiring probing. The key idea is that performance depends on the load factor, which is the ratio of stored entries to buckets. If the load factor is too high, chains get long and you lose O(1) behavior. If it is too low, you waste memory. A good system chooses a target load factor and resizes the table when it grows beyond that threshold. This is critical for caches because they are memory-bound and latency-sensitive. You will implement hashing, equality checks, and bucket management, and you will need to reason about how collisions, load factor, and resizing interact under real workloads.

Deep Dive into the concept

A systems-quality hash table is a contract between your access pattern and your memory layout. In this project the table serves two key goals: fast point lookups and stable latency under load. Chaining is attractive because it avoids the cluster effects of open addressing and allows entries to be removed without shifting neighbors. However, the cost is pointer chasing. Each chain node usually allocates a separate entry, which adds both memory overhead and cache misses. This is where bucket sizing and load factor policy matter. A moderate load factor, such as 0.75, balances memory usage with lookup cost, but the optimal value depends on the average key size, entry metadata, and the size of your LRU nodes. A high load factor means more cache misses on lookup because each collision requires another pointer dereference. In a cache server, that translates to tail latency spikes. The load factor is also intertwined with resizing strategy. If you double the bucket array and rehash every entry in one step, you create a large latency spike proportional to the number of entries, which is unacceptable for an interactive service. This is why incremental rehashing is required (see its own concept section).

Hash functions are another systems detail. You must use a stable, fast hash on byte sequences. In C, you must be careful about integer overflow, sign extension, and endianness. A good hash mixes input bits so that similar keys do not map to nearby buckets. Simple XOR or additive hashes are fast but weak and can lead to hot buckets. A typical choice is FNV-1a or a small variant of Murmur-style mixing. In addition, the hash table stores keys as byte arrays, not null-terminated strings, because you must accept arbitrary binary keys in a real cache. This means your entry must store the key length and compare both length and bytes on lookup. The bucket array should be an array of pointers to entry nodes. When you insert, you compute the hash, map to a bucket, and prepend to the chain for O(1) insert. When you delete, you must update the chain links carefully, especially when you also remove the entry from the LRU list. For lookups, you traverse the chain and compare hash (optional) and key bytes. A good optimization is to store the hash in the entry so you can compare hash first, then key bytes only on a match.

Another subtlety is memory accounting. Each entry uses space for the key, value, entry metadata, and LRU pointers. You must include all of that in your cache size accounting, otherwise eviction will be imprecise and you may exceed your memory budget. The hash table does not track total memory by itself; your cache layer must compute entry size and update a global counter. This counter is used to trigger eviction and resizing decisions. If you allow variable-length values, the memory usage per entry can be large and the load factor may need to be lower to reduce collision chains and improve latency. If you use fixed-size values, a higher load factor might be acceptable. You must decide and document this tradeoff.

Finally, consider failure modes. If the hash table is near capacity and a resize is triggered, you must not lose entries. If memory allocation fails during a resize or insertion, you must return an error and keep the cache consistent. The simplest approach is to allocate new entries before inserting and only update the table once allocation succeeds. This design consideration is part of systems thinking: failures happen, and the data structure must remain correct when they do. The hash table is also a shared data structure; if you later add concurrency, you will need to protect buckets or use a global lock. This project can remain single-threaded, but the layout should not preclude future locking.

How this fit on projects

You use the hash table as the primary index for GET, SET, and DEL. It appears directly in §3.2 Functional Requirements, §4.2 Key Components, §4.4 Data Structures, and §5.10 Phase 2. It is also used in P02-log-structured-kv-store.md for the memtable index and in P04-memory-allocator-debugging.md for leak tracking.

Definitions & key terms

load factor -> ratio of entries to buckets that predicts average chain length
bucket -> array slot that stores a chain of entries
chaining -> collision handling by linked lists per bucket
rehash -> recompute bucket indices when table size changes
hash function -> deterministic mapping of bytes to integer hash

Mental model diagram

Keys --> hash() --> bucket index
                  |
                  v
             [bucket]
               |-> entry -> entry -> entry

How it works (step-by-step, with invariants and failure modes)

Compute hash for the key and map to bucket index with hash % bucket_count.
Traverse bucket chain and compare key length and bytes.
If found on GET, return value and move entry to LRU head (see LRU concept).
On SET, if key exists update value and size accounting; else allocate new entry.
Insert entry at head of bucket chain and LRU head.
Update cache size and check memory limit; evict from LRU tail if needed.
If load factor exceeds threshold, start incremental rehash.

Invariants: each key appears in exactly one bucket and exactly one LRU list node; size accounting equals sum of entry sizes; bucket chains are well-formed. Failure modes: allocation failure, inconsistent accounting, or stale pointers after eviction.

Minimal concrete example

// Insert with chaining
size_t idx = hash(key, key_len) % table->bucket_count;
entry_t *e = malloc(sizeof(*e));
memcpy(e->key, key, key_len);
e->next = table->buckets[idx];
table->buckets[idx] = e;

Common misconceptions

“Hash tables are always O(1)” -> only if load factor stays bounded.
“Resizing is cheap” -> full rehashing causes latency spikes.
“String keys are fine” -> real caches must support binary keys.

Check-your-understanding questions

Why does a high load factor increase tail latency even if average O(1) holds?
How does storing the hash in each entry speed up comparisons?
What breaks if you forget to update size accounting on SET overwrite?
Predict the impact of using a poor hash on keys with similar prefixes.

Check-your-understanding answers

Long chains cause more pointer chasing and cache misses, which increase tail latency.
You can compare hash first and avoid expensive byte-by-byte comparisons when it mismatches.
The cache will either over-evict or exceed the memory limit because the size counter is wrong.
Many keys land in the same bucket, causing hot spots and degraded performance.

Real-world applications

In-memory caches like Redis and Memcached.
Session stores for web servers.
Routing tables and symbol tables in compilers.

Where you’ll apply it

This project: §3.2, §4.2, §4.4, §5.10 Phase 2, §6.2 Critical Test Cases.
Also used in: P02-log-structured-kv-store.md, P04-memory-allocator-debugging.md.

References

Algorithms, Fourth Edition (Sedgewick/Wayne), Ch. 3.4
Computer Systems: A Programmer’s Perspective (Bryant/O’Hallaron), Ch. 6
The Linux Programming Interface (Kerrisk), Ch. 7

Key insights

A hash table only delivers O(1) in practice if load factor and memory layout are actively controlled.

Summary

Hash tables are the indexing backbone of the cache. Chaining makes insertion and deletion simple, but you must manage load factor and memory accounting to keep latency stable. The design should assume resizing, allocation failures, and binary keys.

Homework/Exercises to practice the concept

Implement a small chaining hash table that stores string keys and measure lookup time as load factor grows from 0.5 to 2.0.
Modify it to store key length and allow binary keys with embedded null bytes.
Add a stored hash in each entry and measure comparison count improvements.

Solutions to the homework/exercises

Measure average lookup steps per key; you should observe longer chains after 1.0 load factor.
Replace strcmp with byte comparisons using stored length and memcmp.
Compare hash first and skip memcmp when hash differs; average comparisons drop significantly.

Intrusive Doubly Linked Lists for LRU Ordering

Fundamentals

An intrusive doubly linked list stores its prev and next pointers inside each cache entry rather than in separate list nodes. This is common in systems programming because it avoids extra allocations and keeps list operations O(1). For an LRU cache, the list represents access order: the head is the most recently used entry and the tail is the least recently used. When you read or update a key, you move its node to the head. When you need to evict, you remove from the tail. The list must be updated on every access, which makes correctness critical. Intrusive lists also make pointer ownership explicit: the cache entry owns its own list links, and the list owns no memory beyond the entry. This avoids double-free errors and makes the list operations deterministic. Understanding how to splice nodes in and out safely is essential to implementing a correct LRU cache.

Deep Dive into the concept

LRU ordering is a classic example of a data structure serving a performance contract. The LRU list must update on every read, which might feel expensive, but it is O(1) and fast in practice because it touches only a small number of pointers. The key challenge is correctness under all operations: insert, update, delete, eviction, and resize. Because you use an intrusive list, each entry contains lru_prev and lru_next. That means any change to the list structure directly mutates the entry. This is powerful but also dangerous: a single mistaken pointer update can corrupt the list and cause infinite loops or crashes. To avoid this, the list should be built around a sentinel node, often a dummy head that points to itself when the list is empty. This simplifies edge cases because insertion and removal always work the same way regardless of whether the list is empty or has one element.

When a GET occurs, you remove the node from its current position and move it to the head. This requires updating its neighbors’ pointers as well as the head’s pointers. The list operations should be functions: lru_remove(node) and lru_insert_head(node). These functions must be safe even if a node is already at the head or not in the list (which should not happen, but defensive checks help debugging). For eviction, you remove the tail node, which is typically head->prev if you use a circular sentinel list. The tail entry is then removed from the hash table, its memory is freed, and the cache size counter is decremented.

An important systems aspect is memory locality. Because the LRU pointers live in the cache entry, touching the LRU list touches the entry data itself. This means an access that hits the cache will already have loaded the entry’s cache line, and updating the LRU pointers is typically in the same line or nearby. This is why intrusive lists can be faster than external list nodes. However, it also means that the list pointers increase entry size, which affects memory accounting. You must include the LRU pointers in the memory usage calculation. Another tradeoff is that the list order is a strict representation of access recency. This can be too sensitive for some workloads. For example, a scan of many keys can evict valuable hot keys. In real systems, LRU is often approximated or segmented, but for this project you will implement the exact variant to learn the mechanics.

Concurrency adds another dimension. A single-threaded server can update the list without locks. If you later add threading, you must protect list operations with a global lock or per-bucket locks. The list is global, so per-bucket locks alone are insufficient. Some systems use a segmented LRU to reduce contention. For this project, keep it single-threaded but ensure your list operations are modular and correct so that locking could be added later. Another subtlety is that an entry can be accessed during resizing. Because entries are rehashed but not reallocated, the LRU list should remain valid. This is another reason to avoid allocating separate list nodes, because it would create a second ownership path that complicates resizing and eviction.

LRU correctness is best tested with trace simulations. You should feed a sequence of GET/SET operations and verify the list order at each step. This also helps you reason about cache hit rate and eviction patterns. By the end, you should be able to explain why LRU produces a specific eviction for any access sequence. This is not just a theoretical exercise; it builds the mental model needed to reason about production caches and their performance impact.

How this fit on projects

The intrusive list is the eviction policy used in §3.2 Functional Requirements and §3.7 Real World Outcome. You use it in §4.4 Data Structures and §5.10 Phase 2. It is also used in P06-capstone-simple-database-engine.md for the buffer pool LRU.

Definitions & key terms

intrusive list -> list where nodes store their own pointers
sentinel node -> dummy head that simplifies edge cases
LRU -> least recently used eviction policy
splice -> remove and insert a node without allocating

Mental model diagram

[HEAD] <-> [most recent] <-> [ ... ] <-> [least recent]
   ^                                         |
   +-----------------------------------------+

How it works (step-by-step, with invariants and failure modes)

Initialize a sentinel head that points to itself.
On insert, link node right after head.
On GET, remove node from its current position and insert after head.
On eviction, remove node before head (tail).
When an entry is deleted, unlink it before freeing.

Invariants: list is circular; head always exists; each entry is linked at most once. Failure modes: double unlink, forgetting to unlink on delete, or corrupt pointers after eviction.

Minimal concrete example

// Insert node right after head
node->next = head->next;
node->prev = head;
head->next->prev = node;
head->next = node;

Common misconceptions

“LRU needs extra nodes” -> intrusive lists avoid extra allocations.
“LRU is just a queue” -> it is a linked list with move-to-front semantics.
“Removing is O(n)” -> with intrusive lists, removal is O(1).

Check-your-understanding questions

Why does a sentinel node simplify insert and remove operations?
What happens if you move a node to head but forget to update its previous neighbor?
Why does an intrusive list reduce memory allocations?

Check-your-understanding answers

It removes empty-list and single-node edge cases by making every node have neighbors.
The list becomes corrupted, leading to loops or crashes during traversal.
The entry already owns the pointers, so you do not allocate separate list nodes.

Real-world applications

Cache eviction in Redis and Memcached.
Buffer pool eviction in databases.
Page replacement in OS kernels.

Where you’ll apply it

This project: §3.2, §3.7, §4.4, §5.10 Phase 2, §6.2.
Also used in: P06-capstone-simple-database-engine.md.

References

Operating Systems: Three Easy Pieces, Ch. 17
Linux Kernel Development (Love), linked list chapter
The Linux Programming Interface, memory management sections

Key insights

Intrusive lists make LRU updates constant time without extra allocations, but correctness depends on flawless pointer updates.

Summary

An intrusive doubly linked list gives you O(1) eviction ordering. It is compact and fast, but pointer errors are catastrophic. Use a sentinel node and test with access traces.

Homework/Exercises to practice the concept

Implement a circular doubly linked list with a sentinel head and unit tests for insert/remove.
Simulate an LRU cache for a given trace and output the list after each operation.

Solutions to the homework/exercises

Use a fixed head node and ensure head->next and head->prev always exist.
After each GET or SET, move the node to head and print the list from head to tail.

Incremental Rehashing and Resize Control

Fundamentals

Resizing a hash table by rehashing all entries in a single operation can cause long pauses. In a cache server, this produces latency spikes and violates the performance contract. Incremental rehashing spreads the work across many operations. The system keeps two tables: the old table and a new, larger table. Each operation moves a small number of buckets from the old table to the new one. Lookups check both tables during migration. Once all buckets are moved, the old table is freed. This strategy makes resizing cost proportional to the number of operations rather than a single pause. It also keeps memory use bounded and predictable. For a cache, this is essential because traffic is continuous and latency-sensitive.

Deep Dive into the concept

Incremental rehashing is a form of amortized work scheduling. The key observation is that you do not have to move all entries at once to start benefiting from a larger table. Instead, you can perform a small fixed amount of rehashing per cache operation, such as moving one or two buckets. This transforms a large O(n) pause into many small O(1) steps that are interleaved with normal work. The design uses two arrays: table_old and table_new. When the load factor of the old table exceeds a threshold, you allocate the new table at a larger size and initialize a rehash_index starting at 0. Each cache operation does its normal work, then migrates one bucket at rehash_index from old to new. You move all entries in that bucket by recomputing their new bucket index using the new table size. After moving, you set the old bucket to empty and increment rehash_index. When rehash_index reaches the end, you free the old table and set table_new as the active table.

Reads and writes during migration require special logic. A lookup must check the new table first, then the old table for buckets that have not yet been migrated. A simple rule is: if rehashing is active, perform lookup in new table first, then in old table. Writes always go to the new table, so that new entries are placed directly in the larger table. Deletes must check both tables and remove from whichever table contains the entry. This means your cache operations need to be aware of rehashing state, but the logic can be cleanly isolated in the hash table abstraction. It is crucial to ensure that each entry exists in exactly one table at any time. If you move an entry from old to new, you must unlink it from the old bucket before inserting into the new bucket, otherwise you can create duplicates and break lookups.

One pitfall is that incremental rehashing doubles memory usage while both tables exist. In a memory-limited cache, this matters. You should account for this by allocating the new table cautiously and ensuring that entry memory remains constant (entries themselves are not duplicated). The bucket arrays are the primary extra cost. Another pitfall is that rehashing may never finish if the system is under constant heavy load and you move too few buckets per operation. To avoid this, make the migration work proportional to the operation cost, for example moving multiple buckets per write or during idle time. You can also use a timer or background thread, but for this project a fixed number of buckets per operation is adequate.

From a correctness standpoint, you must maintain invariants even when migration is interrupted. If the process crashes in the middle of rehashing, the in-memory cache will lose data regardless, but you should still ensure that your in-memory structures are consistent. If you later add persistence, you will need to consider how to serialize rehash state. For now, the key is to ensure that all operations remain correct while rehashing is active. Tests should include sequences where the load factor crosses the threshold and a mix of GET/SET/DEL operations occurs while migration is ongoing. Monitor tail latency to ensure the rehash work does not cause spikes.

Incremental rehashing is used in production caches (Redis uses it) and in many dynamic hash table implementations. The lesson is that algorithmic complexity must be balanced with operational latency. A theoretically correct resize that causes a single 100ms pause is not acceptable for low-latency systems. By spreading the work, you keep p99 stable even under growth. This is the systems approach to data structures: you are not just concerned with average complexity, but with distribution and tail behavior.

How this fit on projects

Incremental rehashing is required in §3.2 Functional Requirements and is implemented in §5.10 Phase 2. It is also referenced in the Real World Outcome for stable latency. Related ideas appear in P02-log-structured-kv-store.md for compaction scheduling.

Definitions & key terms

incremental rehash -> migrate buckets gradually during normal operations
rehash index -> pointer to next bucket to move
migration -> moving entries from old to new table
amortized work -> spreading expensive tasks across many operations

Mental model diagram

Old table (shrinking)         New table (growing)
[0][1][2][3][4]  ----move----> [0][1][2][3][4][5][6][7]
      ^
  rehash_index

How it works (step-by-step, with invariants and failure modes)

Detect load factor exceeds threshold.
Allocate new table with larger bucket count.
Set rehash_index = 0 and mark rehashing active.
For each operation, move one bucket at rehash_index.
Update lookups to check both tables.
When rehash_index reaches end, free old table.

Invariants: each entry exists in exactly one table; rehash_index only increases; new writes go to new table. Failure modes: duplicates, lost entries, or use-after-free if old table freed early.

Minimal concrete example

// Move one bucket during rehash
entry_t *e = old->buckets[rehash_index];
old->buckets[rehash_index] = NULL;
while (e) {
    entry_t *next = e->next;
    size_t idx = e->hash % new->bucket_count;
    e->next = new->buckets[idx];
    new->buckets[idx] = e;
    e = next;
}
rehash_index++;

Common misconceptions

“Incremental rehash is optional” -> it is required for latency-sensitive servers.
“Just move buckets on GET” -> you must move during writes too, or rehash can stall.
“Entries are copied” -> entries are moved, not duplicated.

Check-your-understanding questions

Why do lookups need to check both tables during rehash?
What happens if new inserts are placed into the old table during rehash?
How does incremental rehash affect memory usage?

Check-your-understanding answers

Some entries are still in the old table until their buckets are moved.
It breaks the invariant and can lose entries when the old table is freed.
Bucket arrays are doubled while rehashing; entry memory stays the same.

Real-world applications

Redis incremental rehashing for dictionary expansion.
Network device route table resizing without pauses.

Where you’ll apply it

This project: §3.2, §5.10 Phase 2, §7.3 Performance Traps.
Also used in: P02-log-structured-kv-store.md for incremental compaction scheduling.

References

Redis source code (dict.c) conceptual design
The Linux Programming Interface, data structures chapter
High Performance Server Architectures, resizing discussions

Key insights

Incremental rehashing trades complexity for predictable latency, which is essential for systems services.

Summary

Full-table rehash pauses are unacceptable in low-latency caches. Incremental rehashing spreads the work across operations while keeping lookups correct.

Homework/Exercises to practice the concept

Implement a hash table that supports incremental rehash and test with a fixed operation sequence.
Measure p99 latency with and without incremental rehash while inserting 1 million keys.

Solutions to the homework/exercises

Keep two tables and a rehash index; always insert into the new table when rehashing.
Without incremental rehash you should see large spikes; with it, p99 should remain stable.

TCP Request Framing and Partial Reads/Writes

Fundamentals

A TCP socket is a byte stream, not a message queue. This means your cache server must define a protocol that allows the client and server to agree on request boundaries. The simplest approach is a line-based protocol: each command ends with \n. This is easy to parse but requires you to handle partial reads and partial writes because read and write do not guarantee that a full request or response is transferred in a single call. You need a buffer that accumulates incoming bytes until a full line is present. Similarly, you must handle responses that are larger than the socket send buffer. Robust framing is critical for correctness, because malformed or partial requests should not corrupt the server state. Understanding stream semantics is a key part of building networked systems.

Deep Dive into the concept

TCP provides reliable, ordered delivery of bytes, but it does not preserve application-level message boundaries. If a client sends SET key value\nGET key\n, the server might receive these bytes in a single read, or in several fragments. You cannot assume that a single read call corresponds to a single command. This is why you must implement a request parser that can handle arbitrary chunking. The usual pattern is to maintain a per-connection input buffer. Each read appends to the buffer, then you scan for a delimiter (newline) that indicates a complete command. If found, you parse that command, process it, and remove it from the buffer, leaving any remaining bytes to be processed later. For output, you similarly maintain a per-connection output buffer and handle partial writes by tracking how many bytes have been sent.

A line-based protocol is easy to implement and works well for this project, but you must still make design decisions about maximum line length and error handling. If a client sends a line longer than your maximum, you should reject it to avoid unbounded memory usage. This is part of your security posture. You should define an explicit maximum key length and value length, and enforce them during parsing. Another choice is how to represent binary values. If you accept only ASCII values, your protocol is simple but limited. If you accept binary data, you need a length-prefixed protocol instead of line-delimited. For this project, use a line-based protocol with a length limit and treat values as opaque strings without embedded newlines.

The server must also handle multiple clients. You can choose a simple select-based event loop or a blocking single-client server. To make it more realistic, implement a simple select loop that manages multiple sockets, each with its own input/output buffer. The core idea is that sockets are non-blocking and you only read or write when the OS indicates readiness. This prevents a slow client from blocking the entire server. The simplicity of the protocol helps here: each command is a line, and each response is a line. The event loop becomes: wait for readable sockets, read into buffer, parse complete commands, generate responses into output buffer, then write as much as possible when the socket is writable.

Error handling is crucial. If a command is malformed, you should send an explicit error response and keep the connection alive (unless the client continues to send garbage). You should also handle EINTR and EAGAIN properly. A clean design is to abstract a connection object with input buffer, output buffer, and parsing state. This object should be resilient to partial reads and writes. It should also handle connection closures: if read returns 0, the client has closed the connection; you must clean up resources and remove the connection from your loop.

Finally, consider determinism and testing. Because network timing can be non-deterministic, you should design your protocol to be testable with scripted clients (like nc or a small test program). For deterministic golden tests, you can fix inputs and compare exact output lines. Avoid including timestamps in responses unless you can fix them to deterministic values. This aligns with the determinism requirement in §13.1 and ensures your outputs are reproducible.

How this fit on projects

TCP framing is required in §3.5 Data Formats / Protocols and is central to §3.7 Real World Outcome. It is implemented in §5.10 Phase 2 and tested in §6.2. It also appears in P03-network-packet-buffer.md for protocol simulation.

Definitions & key terms

stream socket -> byte stream with no message boundaries
framing -> method to delineate messages within a stream
partial read -> read returns fewer bytes than requested
partial write -> write sends fewer bytes than requested
non-blocking IO -> read/write return immediately if not ready

Mental model diagram

Client bytes -> [input buffer] -> parse line -> command
Server reply -> [output buffer] -> write partial -> socket

How it works (step-by-step, with invariants and failure modes)

Accept connection and allocate per-connection buffers.
Read available bytes and append to input buffer.
Search for newline; if found, parse one command.
Execute command and append response to output buffer.
Write as much as possible from output buffer.
On EOF, close connection and free resources.

Invariants: input buffer is append-only until parsing; output buffer only shrinks after successful writes. Failure modes: buffer overflow, incorrect parsing, or blocking on slow clients.

Minimal concrete example

// Parse a single line command from buffer
char *nl = memchr(buf, '\n', buf_len);
if (nl) {
    size_t len = (size_t)(nl - buf);
    handle_command(buf, len);
    consume_bytes(buf, len + 1);
}

Common misconceptions

“TCP is message-based” -> it is a byte stream only.
“One read equals one command” -> reads can split or merge commands.
“Blocking IO is fine” -> a slow client can stall the server.

Check-your-understanding questions

Why can one read return part of a command or multiple commands?
What happens if you never enforce a maximum line length?
Why are output buffers needed for partial writes?

Check-your-understanding answers

TCP is a stream and the OS decides how to packetize data for reads.
A malicious or buggy client can cause unbounded memory growth.
write may send only part of the response, so you must retry later.

Real-world applications

Redis simple string protocol (RESP) framing.
HTTP request parsing (also a line-based protocol at its start).

Where you’ll apply it

This project: §3.5, §3.7, §5.10 Phase 2, §6.2.
Also used in: P03-network-packet-buffer.md for stream simulation.

References

The Linux Programming Interface, Ch. 56-61
UNIX Network Programming, Stevens, Vol. 1
Beej’s Guide to Network Programming

Key insights

TCP framing is not optional; it defines the boundary between bytes and commands and determines correctness.

Summary

A cache server must parse a byte stream into commands and respond without blocking. This requires explicit framing, partial read/write handling, and strict input limits.

Homework/Exercises to practice the concept

Write a small server that echoes lines and handles partial reads correctly.
Add a maximum line length and test with an oversized input.

Solutions to the homework/exercises

Use a buffer with memchr to find newlines and consume parsed bytes.
Reject lines above the limit and return an error string before closing.

3. Project Specification

3.1 What You Will Build

A single-process TCP cache server that supports GET, SET, DEL, and STATS over a line-based protocol. The cache enforces a strict memory budget and evicts least-recently-used keys deterministically. It uses a chaining hash table for O(1) lookup and an intrusive doubly linked list for LRU ordering. The server must be able to handle multiple clients, parse partial requests, and maintain stable latency during resizing via incremental rehashing.

Included:

TCP server with line-based protocol
O(1) hash table lookup
LRU eviction ordering
Memory usage accounting with byte-precise limits
Incremental rehashing
Stats reporting (hits, misses, evictions, memory)

Excluded:

Persistence to disk
Authentication
Complex data types beyond string values

3.2 Functional Requirements

GET key returns the value or (nil) if not present.
SET key value stores a value and updates LRU order.
DEL key deletes a key if present.
STATS returns hit/miss/eviction counters and memory usage.
The cache must evict least-recently-used entries when memory limit is exceeded.
The cache must support incremental rehashing without full pauses.
The server must accept multiple client connections and handle partial reads/writes.

3.3 Non-Functional Requirements

Performance: O(1) average GET/SET; p99 latency stable during resizing.
Reliability: no memory leaks; no list or table corruption under stress.
Usability: simple, human-readable CLI protocol.

3.4 Example Usage / Output

$ ./mini_cache --port 6380 --max-mb 8
[cache] listening on 0.0.0.0:6380
[cache] max memory: 8 MB

$ nc localhost 6380
SET user:1 alice
OK
SET user:2 bob
OK
GET user:1
alice
STATS
keys=2 hits=1 misses=0 evictions=0 mem=1320B

3.5 Data Formats / Schemas / Protocols

Line-based protocol, ASCII only. One command per line.

Commands:

GET <key>\n
SET <key> <value>\n
DEL <key>\n
STATS\n

Limits:

max key length: 256 bytes
max value length: 1 MB
max line length: 1 MB + 300 bytes

Responses:

OK\n
<value>\n
(nil)\n
ERR <message>\n

3.6 Edge Cases

GET on missing key returns (nil).
SET of existing key updates value and LRU order.
DEL of missing key returns (nil).
Oversized key or value returns ERR too_large and does not modify state.
Partial commands across multiple reads are handled correctly.
Resizing is triggered while handling requests; lookups remain correct.

3.7 Real World Outcome

This section is a golden reference.

3.7.1 How to Run (Copy/Paste)

cc -O2 -Wall -Wextra -o mini_cache src/main.c src/cache.c src/net.c
./mini_cache --port 6380 --max-mb 8 --max-key 256 --max-value 1048576

3.7.2 Golden Path Demo (Deterministic)

Use a fixed sequence of commands and no timestamps in output.

$ ./mini_cache --port 6380 --max-mb 1
[cache] listening on 0.0.0.0:6380
[cache] max memory: 1 MB

$ nc localhost 6380
SET a 1
OK
SET b 2
OK
GET a
1
STATS
keys=2 hits=1 misses=0 evictions=0 mem=512B

3.7.3 CLI Transcript (Success and Failure)

$ nc localhost 6380
SET hello world
OK
GET hello
world
DEL hello
OK
GET hello
(nil)
SET big <too_large>
ERR too_large
BADCOMMAND
ERR unknown_command

Exit codes:

Server exits 0 on normal shutdown.
Server exits 2 on fatal bind/listen errors.

4. Solution Architecture

4.1 High-Level Design

+---------+    +------------------+    +------------------+
| Clients |<-->| TCP event loop   |<-->| Cache core       |
+---------+    +------------------+    +------------------+
                                        | Hash table + LRU |
                                        | Memory accounting|
                                        +------------------+

4.2 Key Components

Component	Responsibility	Key Decisions
TCP loop	accept, read, parse, write	non-blocking with per-connection buffers
Cache core	GET/SET/DEL, LRU, memory	intrusive list + chaining hash table
Rehash engine	incremental resizing	dual-table with rehash index
Stats	counters	atomic or single-threaded increments

4.4 Data Structures (No Full Code)

struct cache_entry {
    uint64_t hash;
    char *key;
    size_t key_len;
    char *value;
    size_t value_len;
    struct cache_entry *bucket_next;
    struct cache_entry *lru_prev;
    struct cache_entry *lru_next;
};

struct hash_table {
    struct cache_entry **buckets;
    size_t bucket_count;
};

4.4 Algorithm Overview

Key Algorithm: GET/SET with LRU update

Lookup key in hash table.
If found, move entry to LRU head.
If not found on SET, allocate entry and insert into hash table + LRU head.
If memory exceeds limit, evict from LRU tail until under limit.
If load factor exceeded, run incremental rehash steps.

Complexity Analysis:

Time: O(1) average GET/SET, O(1) eviction
Space: O(n) entries + O(buckets)

5. Implementation Guide

5.1 Development Environment Setup

cc --version
make --version

5.2 Project Structure

mini_cache/
├── src/
│   ├── main.c
│   ├── cache.c
│   ├── hash.c
│   ├── lru.c
│   └── net.c
├── tests/
│   └── test_cache.c
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How do you combine O(1) lookup with O(1) eviction ordering under memory pressure?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

Hash tables and load factor control
Intrusive doubly linked lists and pointer safety
Incremental rehashing and latency control
TCP stream framing and partial IO handling

5.5 Questions to Guide Your Design

How will you track memory usage precisely per entry?
What is your eviction policy when memory exceeds limit?
How will you parse commands without blocking?
How many buckets and what resize threshold will you use?

5.6 Thinking Exercise

Trace the access sequence SET a, SET b, SET c, GET a, SET d with max capacity 3. Draw the LRU list after each operation and identify which key is evicted.

5.7 The Interview Questions They’ll Ask

Why does LRU require a linked list in addition to a hash table?
How do you avoid latency spikes during resizing?
What breaks if you forget to move a node on GET?

5.8 Hints in Layers

Hint 1: Build hash table and LRU list separately, then integrate. Hint 2: Start with a single-threaded blocking server, then add select-based IO. Hint 3: Add incremental rehash once basic functionality works.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (4-6 hours)

Goals: hash table, LRU list, unit tests Tasks: implement chaining table, intrusive list, tests Checkpoint: pass tests for insert, delete, move-to-front

Phase 2: Core Functionality (6-10 hours)

Goals: TCP server, command parsing, cache integration Tasks: build event loop, parse commands, hook cache operations Checkpoint: end-to-end GET/SET/DEL working via netcat

Phase 3: Polish & Edge Cases (4-8 hours)

Goals: incremental rehash, memory accounting, stats Tasks: implement dual-table, add stats counters, error handling Checkpoint: stable p99 latency in benchmark

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Insert 1000 keys, verify LRU order after access pattern.
Trigger resize during mixed GET/SET and verify no missing keys.
Simulate oversized value and confirm error response.

6.3 Test Data

SET a 1
SET b 2
GET a
SET c 3
SET d 4

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Use valgrind or asan for pointer errors.
Log LRU list after each operation in debug builds.

7.3 Performance Traps

Using a poor hash function leads to hot buckets.
Resizing too frequently wastes CPU on migration.

8. Extensions & Challenges

8.1 Beginner Extensions

Add DEL command and verify correct LRU updates.
Add a PING command for health checks.

8.2 Intermediate Extensions

Implement LFU eviction and compare hit rate.
Add a stats command with p95/p99 latency.

8.3 Advanced Extensions

Implement approximate LRU with segmented queues.
Add persistence using an append-only log.

9. Real-World Connections

9.1 Industry Applications

Redis/Memcached: in-memory key/value caches with LRU or LFU.
CDN edge caches: cache eviction policies for HTTP objects.

Redis: LRU and incremental rehashing techniques.
Memcached: simple protocol and slab allocator design.

9.3 Interview Relevance

Cache eviction strategies and tradeoffs
Hash table and linked list integration

10. Resources

10.1 Essential Reading

The Linux Programming Interface, Ch. 56-61 (network programming)
Algorithms, Fourth Edition, Ch. 3.4 (hash tables)

10.2 Video Resources

“Redis Internals” conference talks (search by title)

10.3 Tools & Documentation

nc for simple TCP client testing
valgrind for memory analysis

11. Self-Assessment Checklist

11.1 Understanding

I can explain why load factor affects latency.
I can describe how incremental rehashing works.
I can explain why LRU uses a linked list.

11.2 Implementation

All functional requirements are met.
All tests pass.
Edge cases are handled.

11.3 Growth

I can describe how I would add concurrency safely.
I can explain this cache in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

LRU cache with GET/SET/DEL over TCP
Enforced memory limit and eviction
Basic stats output

Full Completion:

Incremental rehashing implemented
Multiple client support via select
Stable p99 latency under load

Excellence (Going Above & Beyond):

Benchmark suite with hit/miss curves
Alternative eviction policy comparison