Control Flow & State: Systems Programming Sprint 3

Goal: Build the mental discipline to model programs as explicit state machines, design control flow that is correct on every path, and manage resource lifecycles without leaks or corruption. By the end of this sprint you will be able to design APIs that prevent invalid states, implement crash-safe operations, and reason about time-dependent bugs (undo/redo, reentrancy, ordering) using concrete invariants. You will ship real tools that stay correct under stress: parsers that never desynchronize, pools that never deadlock, editors that never corrupt files, and repositories that recover cleanly after interruptions.

Introduction

Control flow and state describe how a program moves (control flow) and what it is right now (state). In systems programming, these are inseparable: every input, failure, timeout, or signal changes state and forces control flow decisions that must preserve invariants. This guide teaches you to make state explicit, define legal transitions, and structure error paths so that resources are always released and the system never ends up half-updated.

What you will build (by the end of this guide):

A modal text editor that models user interaction as a finite state machine
A streaming HTTP/1.1 parser that never loses synchronization
A database connection pool that survives timeouts and concurrent stress
An undo/redo engine that treats history as first-class state
An embedded sensor state machine with safe error recovery
A Git-like version control system with crash-safe state transitions

Scope (what’s included):

Explicit state machines, invariants, and transition design
Control flow discipline with explicit cleanup and error propagation
Resource lifecycle management (memory, files, sockets, locks)
Temporal correctness: ordering, reentrancy, and undo/redo
Parsing and protocol framing as state machines
Crash safety and atomic updates for persistent state

Out of scope (for this guide):

Full OS kernel scheduling or preemption design
Language runtime garbage collectors
Distributed consensus protocols (Raft/Paxos)
GPU or SIMD-specific control flow optimizations

The Big Picture (Mental Model)

           Inputs / Events                State + Invariants                  Outputs
┌────────────────────────────┐      ┌────────────────────────────┐      ┌─────────────────────┐
│ user input, bytes, timers  │ ──▶  │ explicit state machine      │ ──▶  │ effects + responses │
│ signals, errors, timeouts  │      │ - legal transitions         │      │ file writes, IO     │
└────────────────────────────┘      │ - invariants preserved      │      │ screen updates      │
                                    │ - cleanup on all exits      │      │ network replies     │
                                    └─────────────┬──────────────┘      └─────────────────────┘
                                                  │
                                                  ▼
                                     Resource Lifecycle Discipline
                               (acquire → use → release, even on errors)

Key Terms You’ll See Everywhere

State: The complete set of values that determine how the program behaves right now.
Transition: A legal change from one state to another, triggered by an event.
Invariant: A rule that must always be true after every operation.
Precondition/Postcondition: What must be true before/after an operation for correctness.
Idempotent: An operation that can be safely retried without changing the result.
Reentrant: A function that can be safely called again before a previous call finishes.

How to Use This Guide

Read the Theory Primer first. Each chapter gives the mental model you will use across multiple projects.
Build projects in order if you’re new to stateful systems. Each project reuses earlier concepts.
Treat every project as a state machine. Write explicit states, transitions, and invariants before coding.
Use the Definition of Done checklists. They are your correctness tests, not optional extras.
Return to the primer whenever you get stuck on cleanup paths, error handling, or temporal bugs.

Prerequisites & Background Knowledge

Essential Prerequisites (Must Have)

Programming Skills:

Solid C fundamentals (pointers, structs, enums, manual memory management)
Ability to read and write files, and use the command line confidently
Basic debugging with gdb or lldb

Systems Fundamentals:

Process vs. thread basics and the idea of blocking I/O
Familiarity with file descriptors and how read/write work
Basic understanding of heap vs. stack
Recommended Reading: “The Linux Programming Interface” by Michael Kerrisk — Ch. 3, 5, 6

Data Structures Basics:

Arrays, linked lists, stacks, queues, and hash maps
Recommended Reading: “Algorithms in C, Parts 1-4” by Sedgewick — Ch. 1–3

Helpful But Not Required

Networking protocols:

HTTP message format and headers (you can learn in Project 2)

Embedded basics:

GPIO and sensor read loops (you can learn in Project 5)

Git fundamentals:

Familiarity with git status, git add, git commit (Project 6 teaches the internals)

Self-Assessment Questions

✅ Can you explain what must happen on every error path of a function that opens a file and allocates memory?
✅ Can you write a loop invariant for “cursor is always within buffer bounds”?
✅ Can you explain what it means for an operation to be idempotent?
✅ Can you describe the difference between parsing bytes and parsing messages?
✅ Can you use valgrind or asan to find a memory leak?

If you answered “no” to questions 1-3: Spend 1-2 weeks on “Effective C” Ch. 7-8 and “The Linux Programming Interface” Ch. 3-6.

Development Environment Setup

Required Tools:

A Linux or macOS machine (WSL2 is acceptable)
clang or gcc (C11 or newer)
make
git

Recommended Tools:

valgrind or asan for memory debugging
gdb or lldb for step debugging
strace / dtruss for syscalls (Projects 1, 6)
wireshark or tcpdump for HTTP parser testing (Project 2)
Arduino IDE or arduino-cli (Project 5)

Testing Your Setup:

$ cc --version
# Expect gcc or clang version output

$ make --version
# Expect GNU Make or BSD Make output

$ git --version
# Expect git version output

Time Investment

Smaller projects (Undo/Redo, HTTP parser): 1–2 weeks each
Medium projects (Modal editor, Embedded state machine): 2–3 weeks each
Advanced projects (Connection pool, Git-like VCS): 3–6 weeks each
Total sprint: 2–4 months depending on depth and testing rigor

Important Reality Check

Stateful systems are unforgiving. Expect to rework parts of your code after you discover a missing invariant or an unhandled error path. That is the point. Control flow mastery comes from repeated, deliberate corrections, not from getting it right the first time.

Big Picture / Mental Model

                ┌───────────────────────────────────────────┐
                │              State Machine                │
                │  explicit states + legal transitions      │
                └───────────────┬──────────────────────────┘
                                │
                                ▼
         ┌──────────────────────────────────────────────┐
         │ Control Flow Discipline                       │
         │ - every path preserves invariants            │
         │ - cleanup on every exit                      │
         └───────────────┬──────────────────────────────┘
                         │
                         ▼
        ┌───────────────────────────────────────────────┐
        │ Resource Lifecycle                             │
        │ acquire → use → release (exactly once)         │
        └───────────────┬───────────────────────────────┘
                        │
                        ▼
      ┌──────────────────────────────────────────────────┐
      │ Temporal Correctness                              │
      │ ordering, reentrancy, retries, undo/redo          │
      └───────────────┬──────────────────────────────────┘
                      │
                      ▼
      ┌──────────────────────────────────────────────────┐
      │ Durable State & API Design                        │
      │ atomic updates, crash safety, invalid states      │
      └──────────────────────────────────────────────────┘

Theory Primer (Read This Before Coding)

This section is your mini-book. Each chapter is a tool you will use in multiple projects. Do not skip it.

Chapter 1: Explicit State Machines and Legal Transitions

Fundamentals

A state machine is the simplest reliable way to express control flow in systems software. Instead of scattering flags and booleans across your code, you define a finite set of states and enumerate the legal transitions between them. This forces you to answer: “What states can exist?” and “What events move us between them?” The benefit is that state becomes visible, testable, and enforceable. When you encode state in an enum and require every transition to be explicit, invalid situations become unrepresentable. This is the foundation for correctness in projects like modal editors, protocol parsers, and embedded controllers where the same input means different things depending on current context.

Deep Dive

In systems programming, the complexity is rarely in the arithmetic; it is in the sequence of operations over time. A state machine makes time explicit. You decide the set of states, attach invariants to each state, and define the events that cause transitions. This is not just academic. For example, in a modal editor, a keypress in NORMAL mode moves the cursor; the same keypress in INSERT mode changes the buffer. Without explicit state, you would need a forest of conditionals and hidden flags. That approach scales poorly and is almost impossible to test exhaustively. A state machine scales because it constrains behavior to a finite, known graph.

A good state machine is observable (you can inspect current state), deterministic (given the same state and event, the next state is predictable), and guarded (transitions have preconditions). You also define terminal states where cleanup happens and no further transitions are allowed. In C, this is typically implemented with an enum for state, a switch statement for transitions, and a transition table or function pointers for larger machines. The key is to keep state single-sourced. Do not mirror state in multiple flags; do not let multiple parts of the system update it without coordination. Every update to state should be a conscious transition that can be traced and tested.

State machines also protect you from temporal bugs. If you define that state SAVING must be followed by CLEAN, and that QUIT is only legal in CLEAN, you prevent a user from exiting while a save is in progress. In a protocol parser, you prevent accepting a body before headers are complete. In an embedded controller, you prevent actuators from turning on before sensors are initialized. This discipline yields systems that fail predictably rather than chaotically.

How This Fits in Projects

Modal editor: explicit NORMAL/INSERT/COMMAND modes
HTTP parser: parse states for request line, headers, body, chunked
Embedded sensor: initialization, sampling, error, safe shutdown
Git-like VCS: CLEAN → DIRTY → MERGING → CLEAN

Definitions & Key Terms

Finite State Machine (FSM): A model with a finite set of states and transitions.
Transition Guard: A condition that must be true before a transition can occur.
Terminal State: A state with no outgoing transitions.
Transition Table: A mapping of (state, event) → (next state, action).

Mental Model Diagram

            event: keypress 'i'
      ┌──────────────┐           ┌──────────────┐
      │   NORMAL     │ ───────▶  │   INSERT     │
      └──────┬───────┘           └──────┬───────┘
             │ event: ESC               │ event: ESC
             └──────────────────────────┘

Invariant: exactly one state is active at a time

How It Works (Step-by-Step)

Define a finite enum of states.
Enumerate all events your system can receive.
For each (state, event), decide if transition is legal.
If legal, perform the action, update state, and enforce invariants.
If illegal, return an error or ignore (never silently corrupt state).

Minimal Concrete Example

typedef enum { NORMAL, INSERT, COMMAND } EditorState;

void handle_key(EditorState *state, int key) {
    switch (*state) {
        case NORMAL:
            if (key == 'i') *state = INSERT;
            else if (key == ':') *state = COMMAND;
            break;
        case INSERT:
            if (key == 27) *state = NORMAL; // ESC
            break;
        case COMMAND:
            if (key == 27 || key == '\n') *state = NORMAL;
            break;
    }
}

Common Misconceptions

“State machines are only for UI.” → They are for any system where behavior depends on history.
“A few flags are easier.” → Flags multiply into impossible combinations. Enums prevent this.
“State machines are overkill for small programs.” → Small programs become big when bugs appear.

Check-Your-Understanding Questions

What is an example of an invalid transition in a modal editor?
Why is it dangerous to represent state with multiple booleans?
How would you test that every transition preserves invariants?

Check-Your-Understanding Answers

INSERT → COMMAND without returning to NORMAL is invalid.
Multiple booleans can represent impossible states (both true), which leads to ambiguous behavior.
Enumerate transitions, simulate events, and assert invariants after each action.

Real-World Applications

Network protocol parsers and connection lifecycles
Filesystem journaling and recovery logic
Embedded device initialization and fault handling

Where You’ll Apply It

Projects 1, 2, 5, and 6

References

UML state machine notation (for formal modeling)
Kubernetes Pod lifecycle states (real-world production FSM) - https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

Key Insight

A system without explicit state is a system without a map.

Summary

State machines make behavior explicit, transitions legal, and invariants testable. They are the backbone of robust control flow.

Homework / Exercises

Draw the full state machine for a login flow with states: LOGGED_OUT, AUTH_IN_PROGRESS, LOGGED_IN, LOCKED.
Define all legal transitions and mark which require cleanup.

Solutions

LOGGED_OUT → AUTH_IN_PROGRESS → LOGGED_IN, with LOCKED reachable from AUTH_IN_PROGRESS on failure; LOGGED_IN → LOGGED_OUT on logout.
Transition to LOCKED requires clearing partial auth tokens; logout requires clearing session state.

Chapter 2: Control Flow Discipline and Error Propagation

Fundamentals

Control flow discipline means that every code path is correct, not just the happy path. In systems programming you are responsible for all exit paths: early returns, error handling, signals, and partial failures. If you allocate resources and then return early without releasing them, you leak memory or file descriptors. If you return inconsistent error codes, callers cannot recover. This chapter teaches you to design functions where invariants are preserved on every path, often by using a single cleanup block and clear error propagation rules.

Deep Dive

The C language gives you no exceptions and no automatic cleanup. That is not a weakness; it is a forcing function that makes you design correct control flow. The common discipline is: acquire resources in a clear order, do the work, and on any failure, jump to a shared cleanup label that releases everything that has been successfully acquired. This pattern avoids duplicated cleanup logic and ensures that partial initialization does not leak resources.

Control flow discipline also means consistent error vocabulary. Decide whether your functions return 0/-1, or enum error codes, or populate an error struct. Once chosen, make it consistent across the module. This consistency enables higher-level state machines to react predictably. If one function returns -1 and another returns NULL with errno set, you now have a control flow ambiguity. The other half is idempotency: if a cleanup function is called twice, it should be safe. This is especially important in error paths where you may not know how far initialization progressed.

Consider a file save operation. You open a temporary file, write data, flush, fsync, and rename. Every step can fail. If you abort after the temp file is created, you must remove it. If you abort after fsync, you must decide whether to keep the temp file for recovery. Each step represents a potential exit path, and each path must leave the system in a consistent state. That is control flow discipline: the recognition that a function is not a single path, but a tree of possible exits.

How This Fits in Projects

Modal editor: safe exit paths and terminal restoration
HTTP parser: consistent error handling for invalid requests
Connection pool: timeouts and half-open connections
Git-like VCS: crash-safe commit steps

Definitions & Key Terms

Early Return: Exiting a function before reaching the end.
Cleanup Block: A section that releases resources regardless of exit path.
Error Propagation: Passing failure information upward consistently.
Idempotent Cleanup: Cleanup that is safe to call multiple times.

Mental Model Diagram

Acquire A → Acquire B → Work → Success
     │           │         │
     │           │         └─▶ cleanup A+B
     │           └─▶ cleanup A
     └─▶ return error (no cleanup needed)

How It Works (Step-by-Step)

Initialize all resource pointers to NULL.
Acquire resources in a fixed order.
On any failure, jump to a cleanup label.
Cleanup in reverse order of acquisition.
Return a consistent error code.

Minimal Concrete Example

int read_file(const char *path, char **out) {
    FILE *f = NULL;
    char *buf = NULL;
    long len = 0;
    int rc = -1;

    f = fopen(path, "rb");
    if (!f) goto cleanup;

    if (fseek(f, 0, SEEK_END) != 0) goto cleanup;
    len = ftell(f);
    if (len < 0) goto cleanup;
    rewind(f);

    buf = malloc((size_t)len + 1);
    if (!buf) goto cleanup;

    if (fread(buf, 1, (size_t)len, f) != (size_t)len) goto cleanup;
    buf[len] = '\0';

    *out = buf;
    buf = NULL; // ownership transferred
    rc = 0;

cleanup:
    free(buf);
    if (f) fclose(f);
    return rc;
}

Common Misconceptions

“goto is always bad.” → In C, goto cleanup is the safest way to avoid leaks.
“Error paths can be added later.” → They are the implementation; add them first.
“If it works in tests, cleanup is fine.” → Tests rarely explore all early exits.

Check-Your-Understanding Questions

Why do we clean up in reverse order of acquisition?
What does “idempotent cleanup” mean, and why is it useful?
Why is a single cleanup label safer than multiple return paths?

Check-Your-Understanding Answers

Later resources may depend on earlier ones; reverse order prevents use-after-free.
It means cleanup is safe to call multiple times, which avoids fragile error paths.
It centralizes release logic, preventing missed cleanup on new error paths.

Real-World Applications

Linux kernel error handling patterns
Network servers handling partial connection setup
Filesystem operations with multi-step updates

Where You’ll Apply It

Projects 1, 2, 3, and 6

References

rename() atomic replace semantics for safe updates - https://man7.org/linux/man-pages/man2/rename.2.html

Key Insight

Correct control flow is a proof that every exit path preserves invariants.

Summary

Control flow discipline ensures that no failure path can leak resources or leave state inconsistent.

Homework / Exercises

Write a function that opens a socket, allocates a buffer, and then fails halfway. Sketch the cleanup paths.
Refactor a multi-return function into a single-cleanup-block version.

Solutions

If socket opened but buffer allocation fails, close socket. If both succeed but send fails, free buffer then close socket.
Initialize pointers to NULL, use goto cleanup, and cleanup in reverse order.

Chapter 3: Resource Lifecycle and Ownership

Fundamentals

Every resource in C has a lifecycle: acquire → use → release. Memory, file descriptors, locks, sockets, and device handles all obey this rule. The hard part is that errors can occur at any stage, and partial acquisition is common. Ownership means deciding which function is responsible for releasing the resource and when. If ownership is unclear, leaks and double-frees happen. In this sprint you will use explicit ownership rules and cleanup blocks to guarantee each resource is released exactly once.

Deep Dive

Resource lifecycle is not just a memory management issue; it defines the stability of long-running systems. A single leaked file descriptor per request can kill a server in hours. A leaked lock can deadlock an entire process. Lifecycle management therefore becomes a control flow problem: you need to guarantee that every resource that was acquired is released even when errors or timeouts occur.

Ownership rules are the key. Decide at API boundaries whether ownership is transferred or shared. If a function returns a pointer, does the caller own it? If a function accepts a pointer, does it take ownership or just borrow it? Make these rules explicit in names and documentation. For example, buffer_new() returns owned memory; buffer_view() returns a borrowed pointer. A simple rule is: “who allocates frees” unless explicitly documented otherwise.

Resource lifecycle also includes pooling and reuse. A connection pool does not release a connection after each request; it returns it to an idle state. That means the resource does not disappear, but its state changes. Pooling therefore combines lifecycle management with state machines: IDLE → IN_USE → IDLE, with error transitions to DEAD. The pool is correct only if each resource has one owner at a time and every acquire is paired with a release.

Finally, lifecycle management crosses into persistence. Writing a file safely is an acquire/use/release sequence on the file descriptor and on the filesystem’s internal state. If you do not close or fsync correctly, data may not actually persist. That is a lifecycle failure with real-world consequences.

How This Fits in Projects

Connection pool: idle/in-use/dead states, ownership transfer
Modal editor: buffer allocation and file handles
Embedded sensor: I2C handles, timers, GPIO
Git-like VCS: object file creation and cleanup

Definitions & Key Terms

Ownership: Which component is responsible for releasing a resource.
Borrowed vs. Owned: Borrowed resources are not freed by the borrower.
Reference Counting: Shared ownership via a counter.
Leak: Resource acquired and never released.

Mental Model Diagram

Acquire (open) → Use (read/write) → Release (close)
          │             │                 │
          └───── error ─┴──── error ───────┘
    In all cases, release must happen

How It Works (Step-by-Step)

Define acquisition functions that return NULL or error on failure.
Store resource pointers in a struct that tracks ownership.
Release in reverse order in a shared cleanup path.
Make cleanup idempotent (safe to call twice).

Minimal Concrete Example

typedef struct {
    FILE *f;
    char *buf;
} Reader;

int reader_init(Reader *r, const char *path) {
    r->f = fopen(path, "rb");
    if (!r->f) return -1;
    r->buf = malloc(4096);
    if (!r->buf) {
        fclose(r->f);
        r->f = NULL;
        return -1;
    }
    return 0;
}

void reader_close(Reader *r) {
    free(r->buf);
    r->buf = NULL;
    if (r->f) fclose(r->f);
    r->f = NULL;
}

Common Misconceptions

“free(NULL) is unsafe.” → It is safe and should be used for idempotent cleanup.
“Double free is rare.” → It is common when ownership is unclear.
“Pooling eliminates lifecycle issues.” → Pools add state complexity; they do not remove lifecycle rules.

Check-Your-Understanding Questions

What does it mean to transfer ownership across an API boundary?
Why is it important that cleanup functions be idempotent?
What is the difference between returning a resource to a pool and freeing it?

Check-Your-Understanding Answers

The caller becomes responsible for releasing the resource.
It prevents errors when cleanup is called on partial initialization.
Pooled resources remain allocated but change state from IN_USE to IDLE.

Real-World Applications

Database connection pools and thread pools
Filesystems that track open file handles
Embedded systems that keep sensors initialized for long periods

Where You’ll Apply It

Projects 1, 3, 5, and 6

References

PgBouncer pooling modes and lifecycle behaviors - https://www.pgbouncer.org/features.html

Key Insight

Ownership is a contract. If you do not define it, the program defines it for you with crashes.

Summary

Resource lifecycle management is about explicit ownership, predictable cleanup, and safe reuse.

Homework / Exercises

Design a SocketPool struct with acquire() and release() APIs and define ownership rules.
Write a cleanup function that can be called safely even if initialization failed halfway.

Solutions

acquire() returns a socket that the caller borrows and must return; pool retains ownership.
Initialize all pointers to NULL and free/close only if they are non-NULL.

Chapter 4: Temporal Correctness, Reentrancy, and Ordering

Fundamentals

Temporal correctness means that the order of events matters and your code must respect it. In stateful systems, an operation can be correct or incorrect depending on what happened before. Reentrancy adds another dimension: a function may be called again before it completes, either due to callbacks, signals, or concurrency. If you do not design for this, you get bugs that only appear under load or “worked yesterday” scenarios. This chapter teaches you how to reason about ordering, guard against reentrancy hazards, and preserve invariants across time.

Deep Dive

Temporal bugs are the most expensive bugs to debug because they are rarely deterministic. A connection pool may work under light load but deadlock when two threads acquire connections simultaneously. An undo stack may be correct until a command both modifies the buffer and triggers another command indirectly. These issues are not about correctness in a single step; they are about correctness across sequences.

To reason about temporal correctness, you must define happens-before relationships. For example, in a connection pool, release() must happen after acquire() and before any other thread can acquire the same connection. In an undo/redo system, a “redo” operation is only valid immediately after an undo; any new command invalidates the redo stack. These are state machine constraints, but they are also time constraints.

Reentrancy is particularly tricky in C. Suppose your editor has a callback on keypress that triggers a macro that feeds new keys into the system. If your key handler is not reentrant, you might corrupt internal state by modifying the buffer while the previous modification has not finished. The solution is to serialize modifications or explicitly track a “busy” state that blocks reentry. Similarly, signal handlers can interrupt normal flow and run functions that are not safe to call in that context. Recognizing these temporal hazards and designing explicit constraints is part of control flow discipline.

Finally, temporal correctness connects to retry behavior. If an operation fails and you retry it, is it safe? If it is idempotent, yes. If not, you might duplicate effects (e.g., double-inserting data). This is why durable systems define idempotent operations and clear retry rules. You will apply this thinking in the HTTP parser, connection pool, and version control system.

How This Fits in Projects

Undo/redo engine: redo valid only after undo, not after new edits
Connection pool: no two threads own the same connection at once
Git-like VCS: operations are ordered and crash recovery relies on step order

Definitions & Key Terms

Temporal Bug: A bug that appears only with certain event orderings.
Reentrant Function: Safe to call again before previous call completes.
Idempotent Operation: Safe to retry without changing the result.
Happens-Before: An ordering relationship between events.

Mental Model Diagram

Time →

UNDO stack:  [cmd1][cmd2][cmd3]
Redo stack:  []

undo() => move cmd3 to redo
redo() => move cmd3 back to undo (only if no new command)

How It Works (Step-by-Step)

Enumerate the valid sequences of operations.
Define state that records “what just happened”.
Block or reject operations that violate ordering.
Make retry behavior explicit (idempotent or not).

Minimal Concrete Example

typedef enum { CLEAN, DIRTY, BUSY } PoolState;

int pool_acquire(Pool *p) {
    if (p->state == BUSY) return -1; // prevent reentry
    p->state = BUSY;
    // ... acquire connection
    p->state = DIRTY;
    return 0;
}

Common Misconceptions

“If it works once, it is correct.” → Temporal bugs require multiple steps to appear.
“Reentrancy only matters with threads.” → Callbacks and signals can cause reentry too.
“Retrying is always safe.” → Only idempotent operations can be retried safely.

Check-Your-Understanding Questions

When is a redo operation invalid?
Why are signal handlers risky for non-reentrant functions?
What makes an operation idempotent?

Check-Your-Understanding Answers

Redo is invalid after any new command is executed.
Signal handlers can interrupt and call functions that are not safe mid-operation.
Repeating the operation produces the same final state with no duplicate side effects.

Real-World Applications

Transaction retries in databases
UI event loops and command history
Network protocol retransmissions

Where You’ll Apply It

Projects 3, 4, and 6

References

HTTP/1.1 message ordering and semantics (IETF RFCs) - https://www.rfc-editor.org/rfc/rfc9112 and https://www.rfc-editor.org/rfc/rfc9110

Key Insight

Correctness is not just what you do, but when you do it.

Summary

Temporal correctness forces you to reason about ordering, reentrancy, and valid sequences of operations.

Homework / Exercises

Define the valid operation sequence for save, undo, redo in a text editor.
Identify one operation in your code that should be idempotent and make it so.

Solutions

save can occur after any edit; redo is only valid immediately after undo; new edits clear redo.
Make save idempotent by writing to a temp file and renaming atomically.

Chapter 5: Protocol Parsing and Framing

Fundamentals

Parsing is a state machine that transforms a stream of bytes into structured messages. For protocols like HTTP/1.1, you cannot assume that a full message arrives at once; it may arrive in chunks, and you must keep state between reads. A parser therefore tracks which part of the message is being read (request line, headers, body, chunk length) and advances only when the necessary bytes are present. This chapter teaches you to build parsers that never lose synchronization and never misinterpret partial input.

Deep Dive

HTTP/1.1 is a text-based protocol with strict framing rules. A request starts with a request line, followed by headers, a blank line, and optionally a body. The body length is determined by Content-Length or by chunked transfer encoding. Every one of these boundaries is an opportunity for bugs: accepting \n without \r, permitting invalid header names, or reading too many bytes and corrupting the next message. A correct parser therefore tracks its current state and only consumes bytes that are complete and valid for that state.

Streaming parsing means you keep unconsumed bytes in a buffer and attempt to parse as far as possible. When the input is incomplete, you return “need more data” rather than error. When input violates the spec, you return a parsing error and reset state. This is exactly a state machine: START_LINE → HEADERS → BODY → DONE, with optional transitions into CHUNKED_LEN and CHUNKED_DATA. You will implement this in Project 2.

Protocol parsing is also about robustness against malformed input. Attackers often send partial, oversized, or ambiguous messages to trigger buffer overflows or desynchronize parsers. By limiting line length, header count, and body size, you enforce invariants that protect the system. These limits are part of state, and they must be updated deterministically as bytes are consumed.

How This Fits in Projects

HTTP parser: full streaming state machine
Modal editor: command line parsing (:wq, :s/foo/bar/)
Git-like VCS: parsing object headers

Definitions & Key Terms

Framing: Determining message boundaries in a byte stream.
Streaming Parser: A parser that can process partial input.
Chunked Encoding: HTTP transfer encoding where body is sent in chunks.
Desynchronization: When parser loses correct message boundaries.

Mental Model Diagram

START_LINE → HEADERS → BODY → DONE
                     ↘
                      CHUNK_LEN → CHUNK_DATA → HEADERS

How It Works (Step-by-Step)

Accumulate bytes in a buffer.
Parse the request line until CRLF.
Parse headers until CRLF CRLF.
Decide body length (Content-Length or chunked).
Consume exact body bytes and transition to DONE.

Minimal Concrete Example

typedef enum { START_LINE, HEADERS, BODY, DONE } ParseState;

typedef struct {
    ParseState state;
    size_t content_length;
} HttpParser;

int parser_step(HttpParser *p, Buffer *buf) {
    if (p->state == START_LINE && buf_has_crlf(buf)) {
        parse_request_line(buf);
        p->state = HEADERS;
    }
    // ... continue for headers and body
    return 0;
}

Common Misconceptions

“HTTP is just text, so parsing is easy.” → Partial reads and chunked encoding make it tricky.
“If input is invalid, ignore it.” → Invalid input must be rejected deterministically.
“Read until EOF.” → Persistent connections make EOF meaningless.

Check-Your-Understanding Questions

How do you decide when the HTTP body ends?
What is a parser desynchronization bug?
Why must a streaming parser return “need more data” rather than error?

Check-Your-Understanding Answers

Use Content-Length or chunked framing; otherwise, no body.
It means you consumed too many or too few bytes and misaligned the next message.
Partial reads are normal; erroring would break valid connections.

Real-World Applications

Web servers and proxies
Streaming log parsers
Binary protocols (TLS, MQTT)

Where You’ll Apply It

Projects 1, 2, and 6

References

HTTP/1.1 message syntax and parsing requirements - https://www.rfc-editor.org/rfc/rfc9112 and https://www.rfc-editor.org/rfc/rfc9110

Key Insight

Parsing is a state machine over bytes.

Summary

Streaming parsers maintain state across partial inputs, enforce framing, and reject invalid data safely.

Homework / Exercises

Write a state diagram for chunked transfer encoding.
Implement a CRLF line reader that can handle partial lines.

Solutions

CHUNK_LEN → CHUNK_DATA → CHUNK_LEN … until zero-length chunk.
Store bytes in a buffer and only emit a line when CRLF is present.

Chapter 6: Undo/Redo, History, and Versioned State

Fundamentals

Undo/redo systems treat history as state. Every action becomes a command that can be reversed, and the system tracks two stacks: one for executed commands (undo stack) and one for undone commands (redo stack). Temporal correctness is critical: the redo stack is only valid immediately after an undo; any new command clears it. This chapter explains the command pattern, inverse operations, and how to make undo/redo reliable without corrupting your application’s state.

Deep Dive

Undo/redo is the simplest form of version control. You can implement it with full snapshots (save the whole state after each change) or with command logs (store each operation and its inverse). Snapshots are simpler but memory-heavy; command logs are efficient but require careful design of inverse operations. In systems programming, you often choose command logs because memory and performance matter. The challenge is that not every operation has a trivial inverse. For example, “insert text” is easily undone by “delete text” at the same position, but “sort a list” needs the original order or a stable inverse.

A robust undo/redo engine therefore records enough metadata to reconstruct the previous state. For text editing, that means storing the insertion position, the deleted text, or the replaced substring. For drawing applications, it may mean storing the previous geometry or color. The engine must apply commands in a strict order and update stacks atomically. If a command fails halfway, you must roll back or mark the state as corrupted and recover.

Undo/redo also intersects with persistence. Many systems persist the command log so that crashes do not lose history. This is essentially a write-ahead log. When you restart, you replay commands to rebuild state. That is a direct connection to database recovery and to the Git-like VCS project where you reconstruct state from commits. The same invariants apply: every command is either fully applied or fully rolled back; partial effects are unacceptable.

How This Fits in Projects

Drawing app undo/redo engine (Project 4)
Modal editor undo/redo stack (Project 1)
Git-like VCS history (Project 6)

Definitions & Key Terms

Command Pattern: Encapsulate actions as objects with do and undo.
Undo Stack: History of executed commands.
Redo Stack: History of undone commands.
Snapshot: A full copy of state at a point in time.

Mental Model Diagram

Execute cmd1 → cmd2 → cmd3
Undo stack: [cmd1, cmd2, cmd3]
Redo stack: []

undo() => move cmd3 to redo
redo() => move cmd3 back to undo

How It Works (Step-by-Step)

Represent each operation as a command with apply and revert.
Push command on undo stack after successful apply.
On undo, pop from undo stack, revert, push to redo stack.
On new command, clear redo stack.

Minimal Concrete Example

typedef struct {
    void (*apply)(void *ctx);
    void (*revert)(void *ctx);
    void *ctx;
} Command;

void execute(Command *cmd, Stack *undo, Stack *redo) {
    cmd->apply(cmd->ctx);
    stack_push(undo, cmd);
    stack_clear(redo);
}

Common Misconceptions

“Undo is just reversing the last change.” → Only if you logged enough data.
“Redo is just re-applying a command.” → Only valid if no new command occurred.
“Snapshots are always simpler.” → They can be huge and slow for large states.

Check-Your-Understanding Questions

When should the redo stack be cleared?
What extra data is needed to undo a replace operation?
Why might snapshots be too expensive in systems programming?

Check-Your-Understanding Answers

On any new command after an undo.
The original text and its position.
They copy entire state repeatedly, which increases memory and time costs.

Real-World Applications

Text editors and drawing apps
Game state rollback
Database replication logs

Where You’ll Apply It

Projects 1, 4, and 6

References

Command pattern and undo/redo design principles (GoF Design Patterns)

Key Insight

History is state. Treat it as carefully as current data.

Summary

Undo/redo engines are small versions of transaction logs and version control systems.

Homework / Exercises

Implement an undoable “insert” command that records position and text.
Design a snapshot strategy that stores only diffs.

Solutions

Store index and inserted string; undo by deleting that range.
Store a base snapshot plus a log of diffs; periodically compact.

Chapter 7: Crash Safety and Atomicity

Fundamentals

When a program writes persistent state, it must handle power loss and crashes. Crash safety means that after an interruption, the system is either in the old state or the new state, but never a half-written mixture. The usual tools are write-ahead logs, temporary files + atomic rename, and idempotent recovery steps. This chapter shows how to design updates so that recovery is deterministic and integrity checks can detect corruption.

Deep Dive

Crash safety is a state machine across disk writes. Consider saving a file: write to a temp file, flush and fsync, then rename it into place. The atomic rename ensures that the directory entry points to either the old file or the new file, never a partial mix. This is the core of safe file updates and the same technique used by many systems for configuration updates. The POSIX rename() semantics guarantee atomic replacement within a filesystem, which is why it is the standard primitive for crash-safe updates.

Databases and version control systems take this further with write-ahead logs (WAL). SQLite, for example, uses a rollback journal or WAL file so that if a crash happens mid-transaction, it can replay or rollback to a consistent state. Git writes objects (blobs, trees, commits) to an append-only store and only updates branch references last; this means a crash before the ref update leaves orphaned objects but a consistent repository.

Crash safety also depends on ordering. If you update the index before writing data, and you crash, you now have an index pointing to missing data. The correct order is: write data first, verify, then update pointers. This ordering is the same principle as control flow discipline, just applied to persistence. In Project 6 you will implement this order and test recovery by simulating crashes. In Project 1 you will use temp files to avoid corrupting a user’s file if the editor crashes during save.

How This Fits in Projects

Modal editor: safe save via temp file + rename
Git-like VCS: write objects first, update refs last
Connection pool: recovery after process restart

Definitions & Key Terms

Atomic Update: An update that is all-or-nothing.
Write-Ahead Log (WAL): Record intent before applying changes.
Durability: Ensuring data survives crashes.
Crash Recovery: Restoring a consistent state after failure.

Mental Model Diagram

Old file ──write temp──> temp file ──fsync──> rename() ──> New file
   ^                                                    |
   └────────── crash before rename: old file intact ────┘

How It Works (Step-by-Step)

Write new data to a temp file.
fsync the temp file.
Atomically rename temp file over old file.
fsync the directory entry if required.

Minimal Concrete Example

int atomic_write(const char *path, const char *data, size_t len) {
    char tmp[256];
    snprintf(tmp, sizeof(tmp), "%s.tmp", path);
    int fd = open(tmp, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd < 0) return -1;
    if (write(fd, data, len) != (ssize_t)len) { close(fd); unlink(tmp); return -1; }
    fsync(fd);
    close(fd);
    if (rename(tmp, path) != 0) { unlink(tmp); return -1; }
    return 0;
}

Common Misconceptions

“rename is just a move.” → It is the atomic primitive for crash-safe replacement.
“fsync is optional.” → Without it, data may never reach disk before power loss.
“A crash just loses the last edit.” → It can corrupt everything if not designed.

Check-Your-Understanding Questions

Why must data be written before updating pointers/refs?
What problem does a WAL solve?
What can happen if you skip fsync on a temp file?

Check-Your-Understanding Answers

Pointers to missing data create unrecoverable inconsistency.
WAL ensures recovery can replay or rollback to a consistent state.
The temp file may never reach disk, so the rename swaps in empty or partial data.

Real-World Applications

SQLite rollback journal and WAL
Git object storage and reference updates
Configuration management tools

Where You’ll Apply It

Projects 1 and 6

References

POSIX atomic rename behavior - https://man7.org/linux/man-pages/man2/rename.2.html
SQLite atomic commit and WAL design - https://www.sqlite.org/atomiccommit.html and https://www.sqlite.org/wal.html
Git object model and references (Pro Git) - https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

Key Insight

Crash safety is control flow applied to persistent state.

Summary

Atomic updates and write-ahead logs prevent half-written state and enable deterministic recovery.

Homework / Exercises

Modify a save routine to use temp file + rename.
Sketch a recovery procedure for a crash during commit creation.

Solutions

Write to file.tmp, fsync, rename to file.
On restart, detect incomplete commit and either discard or finish based on ref update.

Chapter 8: API Design and Invariant Enforcement

Fundamentals

APIs are control flow contracts. A good API makes invalid states hard to represent and easy to detect. In C, you do this with opaque types, explicit init/close functions, and strict input validation. You decide which operations are legal in each state, and you enforce those preconditions at the boundary. This means your API itself becomes a state machine that guides users into correct behavior.

Deep Dive

Systems APIs are often misused because they expose too much internal state. If a user can construct a struct manually, they can violate invariants and crash your program. The solution is encapsulation: hide struct definitions in .c files, expose only pointers and functions. Provide create, init, destroy functions and return errors when preconditions are violated. This shifts correctness from documentation to code enforcement.

A powerful technique is to model states as separate types. In C you cannot express this in the type system directly, but you can simulate it by exposing different function sets for different state values. Another approach is to add runtime checks that guard transitions. For example, in a connection pool, pool_release() should verify the connection belongs to the pool and is in the IN_USE state. In a parser, parser_feed() should reject data if the parser is already in DONE state unless it has been reset.

Good APIs are also defensive: they validate inputs, reject null pointers, and return specific errors. They should be stable under adversarial use because in real systems your API will be used incorrectly. The cost of defensive checks is small; the cost of state corruption is huge.

How This Fits in Projects

Connection pool API design and misuse prevention
Parser API that guards against invalid state reentry
Git-like VCS commands and preconditions

Definitions & Key Terms

Opaque Type: A type whose internal structure is hidden from users.
Precondition Check: A runtime check that validates state before an operation.
Capability: A token that grants permission to perform an operation.
Invariant Enforcement: Ensuring rules always hold at API boundaries.

Mental Model Diagram

User → API call → precondition check → state transition → postcondition

How It Works (Step-by-Step)

Define invariants for each state.
Expose only the operations valid for that state.
Guard every transition with precondition checks.
Return explicit errors for invalid use.

Minimal Concrete Example

// pool.h (opaque)
typedef struct Pool Pool;
Pool *pool_create(size_t n);
int pool_acquire(Pool *p, Conn **out);
int pool_release(Pool *p, Conn *c);

// pool.c (internal)
struct Pool {
    Conn conns[MAX];
    bool in_use[MAX];
};

Common Misconceptions

“Documentation is enough.” → Invariants must be enforced in code.
“Opaque types are overkill.” → They prevent invalid states at compile time.
“Validation is expensive.” → The cost is tiny compared to debugging corruption.

Check-Your-Understanding Questions

Why are opaque types safer than exposed structs?
What is a precondition check example in a connection pool?
How do API boundaries relate to state machines?

Check-Your-Understanding Answers

Users cannot construct invalid states because they cannot access fields directly.
pool_release() checks that the connection is currently marked IN_USE.
Each API function is a transition with guards and postconditions.

Real-World Applications

POSIX file APIs (open/close ordering)
Database connection pools
Embedded device drivers

Where You’ll Apply It

Projects 2, 3, 5, and 6

References

HTTP/1.1 semantics and error handling requirements - https://www.rfc-editor.org/rfc/rfc9110

Key Insight

Your API is a state machine for your users. Design it accordingly.

Summary

API design is the control plane that enforces invariants and prevents invalid states.

Homework / Exercises

Redesign a struct-based API to use opaque types.
Add precondition checks to every public function in a module.

Solutions

Move struct definition to .c, expose only pointer and functions.
Add if (!ptr) return ERR_INVALID; and state checks before transitions.

Glossary

Invariant: A rule that must always hold after any operation.
State: A snapshot of all values that influence program behavior.
Transition: A legal change from one state to another.
Idempotent: Safe to repeat without changing the outcome.
Reentrant: Safe to call again before a previous call completes.
Atomic: All-or-nothing update with no partial visibility.
Rollback: Undoing partial changes to return to a consistent state.
Write-Ahead Log: A log that records intent before applying changes.
Parser: A component that transforms bytes into structured data.
Framing: Determining message boundaries in a byte stream.

Why Control Flow & State Matters

The Modern Problem It Solves

Most catastrophic production failures come from state transitions gone wrong: an update that leaves an API in an invalid state, a resource leak that accumulates over time, or a parser that accepts malformed input and desynchronizes. These issues are disproportionately costly because they are hard to reproduce and often require full system recovery.

Real-world impact (recent data):

Average data breach cost is $4.88 million (2024), which grows rapidly with outage duration and incident response overhead (IBM Cost of a Data Breach 2024) - https://www.ibm.com/reports/data-breach
The July 2024 CrowdStrike outage affected 8.5 million Windows devices, demonstrating how a single state management failure can cascade globally (Microsoft Windows blog, July 2024) - https://blogs.windows.com/windows-insider/2024/07/20/helping-our-customers-restore-access-to-windows-devices/

Why This Matters to Your Career

Senior engineering interviews test state reasoning constantly: connection pool design, undo/redo models, concurrency hazards, and crash safety. If you can explain state transitions and invariant enforcement, you can explain why your systems stay correct under stress.

Old Approach vs. Explicit State Discipline

Implicit State (Old)                 Explicit State (New)
┌────────────────────────┐          ┌─────────────────────────┐
│ Flags scattered        │          │ Explicit enum states     │
│ Hidden transitions     │          │ Transition guards        │
│ Cleanup ad-hoc         │          │ Single cleanup paths     │
│ Bugs appear "later"    │          │ Bugs prevented earlier   │
└────────────────────────┘          └─────────────────────────┘

Context & Evolution (Brief)

C intentionally exposes raw control flow and memory management. That makes it the ideal environment to learn these disciplines without a safety net. Modern languages add features (RAII, exceptions, GC), but the mental models still rely on explicit control flow and state reasoning.

Concept Summary Table

Concept Cluster	What You Need to Internalize
State Machines	Enumerate states, legal transitions, and invariants explicitly.
Control Flow Discipline	Every exit path cleans up, and errors are propagated consistently.
Resource Lifecycle	Acquire/use/release and ownership rules are explicit and enforced.
Temporal Correctness	Valid sequences, reentrancy, and retry rules are defined.
Protocol Parsing	Streaming parsers maintain state and enforce framing.
Undo/Redo & History	History is state; redo is conditional and bounded by order.
Crash Safety & Atomicity	Persistent updates are all-or-nothing with recovery logic.
API Design & Invariants	Public APIs enforce state transitions and prevent misuse.

Project-to-Concept Map

Project	What It Builds	Primer Chapters It Uses
Project 1: Modal Text Editor	UI state machine + invariants	1, 2, 3, 6, 7, 8
Project 2: HTTP/1.1 Parser	Streaming protocol state machine	1, 2, 3, 5, 8
Project 3: Connection Pool	Resource lifecycle + concurrency safety	2, 3, 4, 8
Project 4: Undo/Redo Engine	History as state	4, 6
Project 5: Embedded Sensor FSM	Deterministic states + error recovery	1, 2, 3, 4, 8
Project 6: Git-like VCS	Durable state + complex transitions	1, 2, 3, 4, 6, 7, 8

Deep Dive Reading by Concept

State Machines & Control Flow

Concept	Book & Chapter	Why This Matters
FSM fundamentals	“Code” by Charles Petzold — Ch. 14	Builds intuition for explicit state representation.
Control flow discipline	“Effective C” by Robert Seacord — Ch. 8	Error handling patterns in C.
UNIX APIs and error codes	“The Linux Programming Interface” — Ch. 3-5	Practical error handling and system calls.

Resource Lifecycle & Ownership

Concept	Book & Chapter	Why This Matters
Ownership rules	“C Interfaces and Implementations” — Ch. 4-6	Module boundaries and resource handling.
Memory + resource management	“Computer Systems: A Programmer’s Perspective” — Ch. 9	Memory behavior and failure modes.

Protocol Parsing & Networking

Concept	Book & Chapter	Why This Matters
HTTP parsing patterns	“TCP/IP Illustrated, Vol. 1” — Ch. 28	Practical HTTP behavior and pitfalls.
Socket I/O	“UNIX Network Programming Vol. 1” — Ch. 3-5	Streaming I/O and framing.

Crash Safety & Durable State

Concept	Book & Chapter	Why This Matters
Atomic filesystem operations	“The Linux Programming Interface” — Ch. 5	Atomic rename and fsync behavior.
Crash safety design	“Designing Data-Intensive Applications” — Ch. 7	Transaction semantics and recovery patterns.

Quick Start: Your First 48 Hours

Day 1 (4 hours):

Read Chapters 1 and 2 of the primer.
Build a minimal state machine for the modal editor (Project 1).
Implement only mode switching and a status bar.

Day 2 (4 hours):

Add buffer editing and cursor invariants.
Implement save with temp file + rename.
Run valgrind and fix all leaks.

End of Weekend: You can explain what a state machine is, why cleanup paths matter, and you have a working modal editor skeleton.

Recommended Learning Paths

Path 1: The State Machine First (Recommended)

Best for: Learners new to explicit state modeling.

Project 4: Undo/Redo Engine (small, clear state transitions)
Project 1: Modal Text Editor (full UI state machine)
Project 3: Connection Pool (resource lifecycle + concurrency)
Projects 2, 5, 6 in any order

Path 2: The Protocol & Parsing Path

Best for: Network or parsing enthusiasts.

Project 2: HTTP/1.1 Parser
Project 1: Modal Text Editor (command parsing)
Project 6: Git-like VCS (object parsing)

Path 3: Embedded Reliability Path

Best for: Embedded or hardware-focused engineers.

Project 5: Embedded Sensor FSM
Project 1: Modal Text Editor (state machine discipline)
Project 3: Connection Pool (resource lifecycle)

Path 4: The Completionist

Best for: Those building full mastery.

Phase 1: Projects 4, 1
Phase 2: Projects 2, 3
Phase 3: Project 5
Phase 4: Project 6 (final synthesis)

Success Metrics

You can draw the state machine for each project before coding.
Every project passes its Definition of Done checklist.
You can explain where each project enforces invariants and why.
You can recover cleanly after intentional crash tests.
You can explain why a given error path cannot leak resources.

Project Overview Table

Project	Difficulty	Time	Primary Skill	Best For
Modal Text Editor	Advanced	2-3 weeks	UI state machines + invariants	Visual state transitions
HTTP Parser	Advanced	2-3 weeks	Streaming parser design	Protocol correctness
Connection Pool	Advanced	2-3 weeks	Resource lifecycle + concurrency	API robustness
Undo/Redo Engine	Intermediate	1-2 weeks	Temporal reasoning	Command pattern mastery
Embedded Sensor FSM	Intermediate	2-3 weeks	Deterministic states + recovery	Hardware reliability
Git-like VCS	Expert	4-6 weeks	Durable state + complex transitions	Systems synthesis

Project List

Projects are ordered from fundamental understanding to advanced implementations.

File: SPRINT_3_CONTROL_FLOW_STATE_PROJECTS.md
Programming Language: C
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: State Machines / Text Processing
Software or Tool: Finite State Machine
Main Book: “Code: The Hidden Language” by Charles Petzold

What you’ll build: A terminal-based text editor with distinct modes (NORMAL, INSERT, COMMAND) that edits real files and displays changes in real-time.

Why it teaches Control Flow & State: A modal editor is a textbook state machine. Every keypress triggers different behavior depending on current mode. You’ll face reentrancy when a command triggers another command, must maintain invariants (cursor never outside buffer bounds), and handle cleanup paths (what happens when user quits with unsaved changes?).

Core challenges you’ll face:

State transition validity (can’t go INSERT→COMMAND directly) → maps to state machines
Cursor invariants (cursor.x ≤ line_length, cursor.y ≤ total_lines at ALL times) → maps to loop invariants
Command reentrancy (:s/foo/bar/g triggers multiple buffer modifications) → maps to reentrancy
Dirty buffer cleanup (quit without save, crash recovery) → maps to cleanup paths
Undo/redo stack integrity (can’t undo into invalid state) → maps to temporal bugs

Key Concepts:

Finite State Machines: “Code: The Hidden Language” Chapter 14 - Charles Petzold
Loop Invariants: “Programming: Principles and Practice Using C++” Chapter 4.5 - Bjarne Stroustrup
Resource Cleanup in C: “Effective C” Chapter 8 - Robert C. Seacord
Terminal Raw Mode: “The Linux Programming Interface” Chapter 62 - Michael Kerrisk

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Basic C, terminal I/O basics, understanding of file operations

Real world outcome:

You will have a working text editor you can actually use to edit code
Open a file: ./myvi README.md
Press i to enter INSERT mode, type text, press ESC for NORMAL mode
Press :w to save, :q to quit
The mode indicator changes visibly in the status bar

Learning milestones:

Mode switching works → You understand explicit state representation
Cursor never goes out of bounds → You’ve internalized invariant maintenance
:wq works even if file doesn’t exist yet → You understand cleanup and error paths
Undo/redo doesn’t corrupt buffer → You’ve mastered temporal state consistency

Real World Outcome

When you complete this project, you will have built a working modal text editor that you can actually use to edit files on your system. Here’s exactly what the user experience looks like:

Starting the editor:

$ ./myvi README.md

The terminal clears and you see:

README.md                                    [NORMAL] Line 1/45 Col 1
--------------------------------------------------------------------------------
# My Project

This is a test file.
TODO: Add more content here.


~
~
~
--------------------------------------------------------------------------------
-- NORMAL MODE --                                              Modified: No

The mode indicator changes dynamically:

Bottom left shows -- NORMAL MODE -- in white
When you press i, it changes to -- INSERT MODE -- in green
When you press :, it shows -- COMMAND MODE -- and a command prompt appears at the bottom
Press ESC from any mode to return to NORMAL

Editing workflow:

# User presses 'i' to enter INSERT mode
-- INSERT MODE --

# User types: "New line here"
# The text appears character-by-character on screen

# User presses ESC to return to NORMAL mode
-- NORMAL MODE --                                              Modified: Yes

# User presses ':w' and ENTER to save
:w
"README.md" 47 lines written

# User presses ':q' to quit
:q
$

Real-time cursor tracking:

README.md                                    [NORMAL] Line 12/47 Col 23
--------------------------------------------------------------------------------
# My Project

This is a test file.
TODO: Add more content here.
New line here█
                  ↑
           cursor position

Handling unsaved changes:

# User edits file, presses ':q' without saving
:q
Error: No write since last change (use :q! to override)

# User presses ':q!'
:q!
$  # File closed without saving

Navigation commands in NORMAL mode:

h,j,k,l → Move cursor left, down, up, right (like Vim)
w → Jump to next word
0 → Jump to start of line
$ → Jump to end of line
gg → Jump to first line
G → Jump to last line

The cursor NEVER goes out of bounds:

Line with text here█
                   ↑
              cursor at position 19

# Press 'l' (move right)
Line with text here█
                   ↑
        cursor stays at end, doesn't move beyond

# Delete characters with 'x' in NORMAL mode
Line with text her█

# Cursor automatically adjusts to new line length

Command mode examples:

:w           → Save file
:q           → Quit (fails if unsaved changes)
:wq          → Save and quit
:q!          → Quit without saving
:12          → Jump to line 12
:s/old/new   → Substitute 'old' with 'new' on current line

Visual feedback for operations:

# After saving:
"README.md" 47 lines, 1234 bytes written      [shows for 2 seconds]

# After invalid command:
Error: Unknown command ':xyz'                  [shows at bottom]

# After trying to quit with unsaved changes:
Error: No write since last change (use :q! to override)

What makes this a “real” editor:

Opens actual files from filesystem: ./myvi test.c
Displays file content accurately with proper line breaks
Changes appear instantly as you type (INSERT mode) or navigate (NORMAL mode)
Status bar shows current mode, line number, column number, and modification status
Saves changes back to the filesystem
Handles files that don’t exist yet (creates them on :w)
Terminal returns to normal state on exit (no corruption of terminal)

The state machine in action:

[NORMAL MODE]
     |
     | press 'i'
     v
[INSERT MODE]
     |
     | press ESC
     v
[NORMAL MODE]
     |
     | press ':'
     v
[COMMAND MODE]
     |
     | press ENTER (execute command)
     | OR press ESC (cancel)
     v
[NORMAL MODE]

Error handling you’ll see:

# Try to open non-existent file:
$ ./myvi /nonexistent/path/file.txt
Error: Cannot open file: No such file or directory
# Editor starts with empty buffer, can still edit and save

# Try to save to read-only file:
:w
Error: Permission denied: Cannot write to file
# Buffer remains open, user can :w! or :w /different/path

# Disk full during save:
:w
Error: Write failed: No space left on device
# Original file remains intact, changes in buffer preserved

This is what separates a toy project from a real tool: every edge case is handled, the user experience is polished, and the state machine guarantees correctness.

The Core Question You’re Answering

“How do you build a system where the same input (keypress) produces different behavior depending on context (mode), while guaranteeing that the system never enters an invalid state?”

This is the fundamental challenge of modal interfaces and state machines:

In a text editor, pressing j in NORMAL mode means “move cursor down”
In INSERT mode, pressing j means “insert the letter ‘j’”
In COMMAND mode, pressing j is part of a command string (:jump)

The deeper questions:

State Representation: How do you represent “which mode am I in” in code?
- Enum? String? Integer constant?
- Where is this state stored?
- Who can modify it?
Transition Validity: Can you go from any state to any other state?
- Can you go INSERT → COMMAND directly? (No, must go through NORMAL)
- What if the user tries an invalid transition?
- How do you enforce valid transitions in code?
Invariant Preservation: What must ALWAYS be true, regardless of state?
- cursor.x must be ≤ current line length
- cursor.y must be < total number of lines
- If buffer is empty, cursor must be at (0, 0)
- How do you maintain these across all operations?
Reentrancy: What if an operation triggers another operation?
- Command :s/foo/bar/g (substitute all) triggers multiple buffer modifications
- Each modification might trigger cursor adjustment
- How do you prevent infinite loops or state corruption?
Cleanup on Exit: What happens when the user quits?
- If buffer is dirty (modified), warn the user
- If buffer is clean, exit immediately
- If writing to disk fails, what state should we be in?
- How do you restore terminal to normal mode on exit (even if editor crashes)?

This project teaches you that “control flow” isn’t just if/else—it’s about designing systems where:

Invalid states are impossible to represent (not just “checked at runtime”)
Transitions have explicit preconditions and postconditions
Cleanup happens regardless of how you exit (normal quit, error, signal)

Real-world parallel: This is exactly how operating systems manage process states (RUNNING, BLOCKED, READY), how network protocols manage connection states (SYN_SENT, ESTABLISHED, FIN_WAIT), and how UI frameworks manage application states (LOADING, READY, ERROR).

If you understand modal editors, you understand the mental model behind every state machine in computing.

Concepts You Must Understand First

Before you write a single line of code, you need to internalize these foundational concepts. A modal editor is where theory meets brutal reality.

1. Finite State Machines (FSMs)

What it is: A system that can be in exactly one state at any time, with well-defined transitions between states.

Why you need it: Your editor has three modes (NORMAL, INSERT, COMMAND). At any moment, you are in exactly one mode. Keypresses trigger transitions.

Pre-study:

“Code: The Hidden Language” by Charles Petzold, Chapter 14: “Feedback and Flip-Flops”
- Pages 192-210: How binary state is represented in hardware
- Pages 211-224: Building stateful circuits
- Key insight: State machines are hardware-level concepts, not just software abstractions

Mental model:

State = current mode (enum: NORMAL, INSERT, COMMAND)
Input = keypress (char or special key)
Transition = function that takes (current_state, input) → new_state

The test: Can you draw the state transition diagram on paper?

        i (insert)
NORMAL ─────────────> INSERT
  ^                     |
  |                     |
  └─────────────────────┘
        ESC (escape)

If you can’t draw it, you don’t understand your system.

2. Loop Invariants

What it is: A condition that is true before and after every iteration of a loop, or more generally, a property that is always true at certain program points.

Why you need it: Your cursor position must satisfy invariants:

cursor.y >= 0 and cursor.y < num_lines
cursor.x >= 0 and cursor.x <= line_length[cursor.y]

Pre-study:

“Programming: Principles and Practice Using C++” by Bjarne Stroustrup, Chapter 4.5: “Logic Errors”
- Pages 120-135: What invariants are and why they matter
- Key insight: “An invariant is something that must be true at a given point in a computation”

Mental model:

// INVARIANT: 0 <= cursor.y < buffer.num_lines
// INVARIANT: 0 <= cursor.x <= buffer.lines[cursor.y].length

void move_cursor_down() {
    // Before: invariant holds

    cursor.y++;

    // After: must restore invariant
    if (cursor.y >= buffer.num_lines) {
        cursor.y = buffer.num_lines - 1;  // clamp
    }

    // After: invariant holds again
}

The test: After every cursor operation, can you assert the invariants hold?

3. Resource Cleanup in C

What it is: Ensuring that resources (file handles, memory, terminal state) are released properly, even when errors occur.

Why you need it: Your editor must:

Open files → close them on exit
Switch terminal to raw mode → restore it on exit (even on crash)
Allocate memory for buffer → free it on exit

Pre-study:

“Effective C” by Robert C. Seacord, Chapter 8: “Input/Output”
- Pages 221-245: File handling patterns
- Pages 246-260: Error handling with goto cleanup
- Key pattern: The goto cleanup idiom for C error handling

Mental model:

int edit_file(const char *filename) {
    FILE *f = NULL;
    char *buffer = NULL;
    int result = -1;

    f = fopen(filename, "r");
    if (!f) goto cleanup;

    buffer = malloc(MAX_SIZE);
    if (!buffer) goto cleanup;

    // ... do work ...

    result = 0;  // success

cleanup:
    if (buffer) free(buffer);
    if (f) fclose(f);
    return result;
}

The test: Can you trace every resource and guarantee it’s freed on all exit paths?

4. Terminal Raw Mode and Control Sequences

What it is: By default, terminals buffer input and interpret control characters (Ctrl+C, Ctrl+Z). Raw mode disables this, giving you direct access to keypresses.

Why you need it: You need to capture ESC, arrow keys, Ctrl+C yourself—not let the shell intercept them.

Pre-study:

“The Linux Programming Interface” by Michael Kerrisk, Chapter 62: “Terminals”
- Pages 1319-1340: Terminal attributes and raw mode
- Pages 1341-1360: ANSI escape sequences for cursor control
- Key functions: tcgetattr(), tcsetattr(), cfmakeraw()

Mental model:

struct termios orig_termios;

void enable_raw_mode() {
    tcgetattr(STDIN_FILENO, &orig_termios);  // save original

    struct termios raw = orig_termios;
    cfmakeraw(&raw);  // modify to raw mode

    tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw);  // apply
}

void disable_raw_mode() {
    tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig_termios);  // restore
}

// CRITICAL: Must call disable_raw_mode() even on crash!
// Use atexit() or signal handlers

The test: Does your terminal still work after your program crashes?

5. Buffer Data Structures

What it is: How you store the text being edited. Common choices:

Array of strings (simple, inefficient for large files)
Gap buffer (what Emacs uses)
Piece table (what Word uses)
Rope (what Xi editor uses)

Why you need it: You’ll be inserting/deleting characters, splitting lines, joining lines—constantly modifying the buffer.

Pre-study:

“Programming Pearls” by Jon Bentley, Chapter 3: “Data Structures”
- Pages 33-50: Choosing the right data structure
- Start simple: Array of strings is fine for this project

Mental model:

typedef struct {
    char **lines;       // array of strings
    size_t num_lines;   // how many lines
    size_t capacity;    // allocated capacity
    bool dirty;         // has unsaved changes
} Buffer;

// Operations:
void buffer_insert_char(Buffer *buf, int y, int x, char c);
void buffer_delete_char(Buffer *buf, int y, int x);
void buffer_insert_line(Buffer *buf, int y);
void buffer_delete_line(Buffer *buf, int y);

The test: Can you insert 1000 characters without corrupting the buffer?

6. Error Propagation Without Exceptions

What it is: C doesn’t have exceptions. You must manually check return values and propagate errors up the call stack.

Why you need it: File I/O can fail (disk full, permissions). You must handle these without crashing.

Pre-study:

“C Interfaces and Implementations” by David Hanson, Chapter 4: “Exceptions and Assertions”
- Pages 41-62: Error handling patterns in C
- Key insight: Return codes are your only tool—use them rigorously

Mental model:

// Pattern 1: Return -1 on error, 0 on success
int save_buffer(Buffer *buf, const char *filename) {
    FILE *f = fopen(filename, "w");
    if (!f) return -1;

    for (size_t i = 0; i < buf->num_lines; i++) {
        if (fprintf(f, "%s\n", buf->lines[i]) < 0) {
            fclose(f);
            return -1;
        }
    }

    fclose(f);
    return 0;
}

// Pattern 2: errno for detailed error
if (save_buffer(buf, filename) < 0) {
    fprintf(stderr, "Save failed: %s\n", strerror(errno));
}

The test: Does your editor tell the user why the save failed?

Prerequisite Knowledge Checklist (be honest with yourself):

I can write a simple FSM (e.g., traffic light simulator) in C
I understand what “invariant” means and can identify invariants in existing code
I’ve used malloc/free and understand memory leaks
I’ve done file I/O in C (fopen, fread, fwrite, fclose)
I know what “raw mode” means for terminals
I’ve used tcgetattr/tcsetattr at least once
I understand ANSI escape sequences (e.g., \033[2J clears screen)
I can handle errors in C without exceptions (return codes, errno)

If you checked all boxes: Proceed. If you missed 1-2: Study those topics first, then proceed. If you missed 3+: Build a simpler project first (e.g., a command-line calculator with state).

Questions to Guide Your Design

Before you start coding, think deeply about these questions. Your answers will shape your architecture.

State Machine Design

How will you represent the current mode?
- An enum? A global variable? Passed as a parameter?
- Where does this state live in your program?
- Who is allowed to change it?
What are the valid state transitions?
- Draw the state diagram on paper
- Can you go INSERT → COMMAND directly, or must you go through NORMAL?
- What happens if the user tries an invalid transition?
How will you handle input differently based on mode?
- One giant switch statement?
- Separate functions per mode (handle_normal_input, handle_insert_input)?
- Function pointers in a dispatch table?
Can the system ever be in an undefined mode?
- What if mode is initialized to 0 but your enum starts at 1?
- How do you guarantee mode is always valid?

Invariants and Correctness

What invariants must hold for the cursor?
- Write them as assertions: assert(cursor.y >= 0 && cursor.y < num_lines);
- After which operations might these invariants be violated?
- How will you restore them?
What invariants must hold for the buffer?
- “Every line is null-terminated”
- “num_lines matches actual number of allocated lines”
- “dirty flag is true iff buffer != file contents”
What happens when the buffer is empty?
- Is num_lines == 0? Or num_lines == 1 with an empty line?
- Where is the cursor? (0, 0)?
- Can you insert text in an empty buffer?
What happens when you delete the last character on a line?
- Line becomes empty string ""?
- Line is deleted and merged with previous line?
- Cursor adjusts to where?

Error Handling

What if the file doesn’t exist?
- Start with empty buffer?
- Show error and exit?
- Create file on first save?
What if you can’t open the file (permissions)?
- Read-only mode?
- Error and exit?
- How do you communicate this to the user?
What if writing to disk fails?
- Disk full?
- Permissions changed during editing?
- Do you lose the buffer? Or keep it in memory?
What if terminal is too small?
- Minimum 80x24?
- Refuse to run?
- Render as much as possible?

Resource Management

How will you manage terminal state?
- Save original termios on startup?
- Restore on exit?
- What about signals (Ctrl+C, SIGTERM)?
How will you clean up on crash?
- Register atexit() handler?
- Signal handler for SIGINT, SIGTERM?
- What if you get SIGKILL? (You can’t catch it—that’s why you need atomic saves!)
How will you prevent memory leaks?
- Every malloc paired with free?
- Who owns the buffer? (Who frees it?)
- What about reallocation when buffer grows?

User Experience

How will you display the mode to the user?
- Status bar at bottom?
- Color coding?
- What if terminal doesn’t support color?
How will you show unsaved changes?
- [Modified] indicator?
- * in status bar?
- How do you detect “buffer is dirty”?
What commands will you support?
- :w (save), :q (quit), :wq (save and quit)?
- :q! (force quit)?
- :12 (jump to line 12)?
- How will you parse command strings?

Reentrancy and Edge Cases

What if the user presses ESC rapidly?
- Does it toggle modes?
- Or always return to NORMAL regardless of current mode?
- What’s the safest behavior?
What if a command triggers another command?
- Example: :s/foo/bar/g (substitute all) loops over buffer
- Each substitution modifies buffer
- How do you avoid infinite loops?
What if the user presses Ctrl+Z (suspend)?
- Terminal state must be restored
- On resume, must reapply raw mode
- How do you handle this?

Testing Strategy

How will you test state transitions?
- Write a test that cycles: NORMAL → INSERT → NORMAL → COMMAND → NORMAL
- Verify mode is correct after each transition
How will you test invariants?
- Sprinkle assert() statements throughout code
- Write a check_invariants() function, call it after every operation
- If an assertion fails, you’ve found a bug
How will you test error paths?
- Create a read-only file, try to save to it
- Fill disk (simulate with ulimit), try to save
- Send SIGINT while in INSERT mode

Your job: Write down answers to these questions before coding. This is your design document.

Thinking Exercise (Before You Code)

Do not touch your keyboard yet. Spend 30-60 minutes on paper answering these:

Exercise 1: State Diagram

Draw the complete state transition diagram for your editor. Include:

All states (NORMAL, INSERT, COMMAND)
All transitions (labeled with triggering input)
Self-loops (e.g., typing ‘a’ in INSERT mode stays in INSERT)

Example:

         i
NORMAL ────> INSERT
  ^  ^        |
  │  │        │ ESC
  │  └────────┘
  │
  │ :
  v
COMMAND
  │
  │ ENTER or ESC
  v
NORMAL

Questions:

Can you reach INSERT from COMMAND without going through NORMAL?
What happens if you press : in INSERT mode? (Is it a transition, or is it inserted as text?)

Exercise 2: Invariant Violations

Given this code:

void delete_current_line(Buffer *buf, Cursor *cursor) {
    if (buf->num_lines > 1) {
        free(buf->lines[cursor->y]);

        // Shift all subsequent lines up
        for (size_t i = cursor->y; i < buf->num_lines - 1; i++) {
            buf->lines[i] = buf->lines[i + 1];
        }

        buf->num_lines--;
    }
}

Questions:

What invariants might be violated after this function runs?
Where should the cursor be after deleting the current line?
What if you delete line 10 but the cursor is at line 12?
What if you delete the last line in the file?

Fix the code to restore all invariants.

Exercise 3: Error Path Tracing

Given this code:

int save_buffer(Buffer *buf, const char *filename) {
    FILE *f = fopen(filename, "w");
    if (!f) return -1;

    for (size_t i = 0; i < buf->num_lines; i++) {
        fprintf(f, "%s\n", buf->lines[i]);
    }

    fclose(f);
    buf->dirty = false;
    return 0;
}

Questions:

What if fprintf fails halfway through (disk full)?
Is the file left corrupted?
Is buf->dirty set correctly?
How would you make this function “atomic” (all-or-nothing)?

Rewrite the function to handle errors correctly.

Exercise 4: Reentrancy Scenario

Imagine you implement a :s/old/new/g command (substitute all occurrences on current line).

Internally, it does:

void substitute_all(Buffer *buf, int line, const char *old, const char *new) {
    char *pos = buf->lines[line];
    while ((pos = strstr(pos, old)) != NULL) {
        // Replace 'old' with 'new'
        buffer_replace(buf, line, pos - buf->lines[line], old, new);
        pos += strlen(new);
    }
}

Questions:

What if buffer_replace() resizes the line (reallocates memory)?
Does pos become a dangling pointer?
How do you fix this?

Rewrite the function to be reentrant-safe.

Exercise 5: Resource Cleanup

Your editor has these resources:

termios orig_termios (saved terminal state)
Buffer *buf (allocated buffer)
FILE *file (open file handle)

Questions:

What’s the correct cleanup order on exit?
How do you ensure cleanup happens even if:
- User presses Ctrl+C?
- Your code hits an assertion and aborts?
- User sends SIGTERM?
Write pseudo-code for cleanup.

Answer:

void cleanup() {
    // Restore terminal FIRST (so user can type again)
    disable_raw_mode();

    // Close file
    if (file) fclose(file);

    // Free buffer
    if (buf) buffer_free(buf);
}

int main() {
    atexit(cleanup);  // Runs on normal exit
    signal(SIGINT, signal_handler);  // Ctrl+C
    signal(SIGTERM, signal_handler);  // kill

    // ... editor logic ...
}

void signal_handler(int sig) {
    cleanup();
    exit(1);
}

After completing these exercises, you’ll have:

A clear mental model of the state machine
An understanding of invariants and how to maintain them
A strategy for error handling
A plan for resource cleanup

Only then should you start coding.

The Interview Questions They’ll Ask

If you complete this project thoroughly, you’ll be able to answer these real interview questions with confidence:

Question 1: State Machine Design

“Design a state machine for a vending machine that accepts coins, dispenses products, and gives change.”

Why this project prepares you:

You’ve built a state machine (NORMAL, INSERT, COMMAND)
You understand state transitions (what inputs trigger what state changes)
You know how to enforce valid transitions (can’t go INSERT → COMMAND directly)

Your answer: “I’d define states: IDLE, ACCEPTING_COINS, DISPENSING, GIVING_CHANGE. Transitions occur on events: insert_coin, select_product, dispense_complete. I’d use an enum for state and a switch statement to handle transitions, similar to how I handled mode switching in my text editor. Each transition would have preconditions (e.g., can’t dispense if insufficient funds) and postconditions (e.g., after dispensing, state is GIVING_CHANGE if overpaid, else IDLE).”

Question 2: Invariant Maintenance

“You’re building a sorted array data structure. How do you ensure it stays sorted after insertions and deletions?”

Why this project prepares you:

You maintained cursor invariants (cursor.y < num_lines, cursor.x ≤ line_length)
You know to check invariants after every operation
You understand “invariant” isn’t just a nice property—it’s a guarantee

Your answer: “I’d define the invariant: ‘For all i, array[i] ≤ array[i+1]’. After every insertion, I’d use binary search to find the correct position, shift elements, and insert. After every deletion, I’d shift elements to fill the gap. I’d assert the invariant holds after each operation during testing, similar to how I checked cursor bounds after every move operation in my editor.”

Question 3: Error Handling Without Exceptions

“In a language without exceptions (like C or Go), how do you handle errors from deeply nested function calls?”

Why this project prepares you:

You propagated file I/O errors (fopen, fprintf, fclose) up the call stack
You used return codes and errno
You implemented cleanup with goto cleanup

Your answer: “I’d use return codes: 0 for success, -1 for error. Each function checks the return value of functions it calls and propagates errors upward. For cleanup, I’d use the goto cleanup pattern in C, or defer in Go. In my text editor, save_buffer() had to handle fopen failure, fprintf failure, and ensure fclose() always ran—I used a cleanup label to guarantee this.”

Question 4: Reentrancy

“What is a reentrant function? Give an example of a non-reentrant function and how to fix it.”

Why this project prepares you:

You dealt with reentrancy in substitute_all (buffer modification during iteration)
You know that global state and pointers can become invalid

Your answer: “A reentrant function can be called again before the previous call completes, without corrupting state. Example of non-reentrant:

char *result;
char* to_upper(const char *str) {
    static char buffer[256];  // GLOBAL STATE!
    strcpy(buffer, str);
    // ... convert to uppercase ...
    return buffer;
}

If to_upper() is called recursively, the static buffer is shared, causing corruption. Fix: allocate buffer on stack or heap per call. In my editor, I hit this when buffer_replace() could reallocate memory while iterating—I had to cache indices instead of pointers.”

Question 5: Cleanup Paths

“Your function acquires three resources: a lock, a file handle, and a memory buffer. Halfway through, an error occurs. How do you ensure all resources are cleaned up?”

Why this project prepares you:

You managed terminal state, file handles, and memory
You used goto cleanup to ensure all resources released

Your answer: “I’d use the goto cleanup pattern:

int process() {
    lock_acquired = false;
    file = NULL;
    buffer = NULL;

    acquire_lock();
    lock_acquired = true;

    file = fopen(...);
    if (!file) goto cleanup;

    buffer = malloc(...);
    if (!buffer) goto cleanup;

    // ... work ...

cleanup:
    if (buffer) free(buffer);
    if (file) fclose(file);
    if (lock_acquired) release_lock();
    return result;
}

This guarantees cleanup happens in reverse order of acquisition, even on error. My editor did this for terminal raw mode, file handles, and buffer memory.”

Question 6: Temporal Bugs

“What is a ‘use-after-free’ bug? How can you prevent it?”

Why this project prepares you:

You worried about dangling pointers after buffer reallocation
You know that “it worked yesterday” means state changed

Your answer: “Use-after-free: accessing memory after it’s been freed. Example:

char *ptr = malloc(100);
free(ptr);
printf("%s", ptr);  // BUG!

Prevention: Set pointers to NULL after freeing. Use tools like Valgrind. In my editor, if I deleted a line and the cursor pointed to it, I had to adjust the cursor immediately—otherwise I’d access freed memory.”

Question 7: API Design

“Design an API for a text buffer that makes it impossible to get the cursor out of bounds.”

Why this project prepares you:

You enforced cursor invariants
You know good APIs prevent misuse, not just detect it

Your answer: “Instead of exposing cursor coordinates directly, provide:

void move_cursor_up(Buffer *buf);
void move_cursor_down(Buffer *buf);
int get_cursor_y(Buffer *buf);  // read-only

Internally, move functions clamp cursor to valid range. Users can’t set cursor to invalid position because they have no direct access. This is ‘making invalid states unrepresentable’. My editor did this by encapsulating cursor in the Buffer struct and providing only movement functions.”

Question 8: State Explosion

“You’re building a UI with 5 boolean flags (is_logged_in, is_admin, is_premium, is_loading, has_error). How many possible states are there? How would you manage this complexity?”

Why this project prepares you:

You know 5 booleans = 2^5 = 32 states, but not all are valid
You learned to use enums for explicit states

Your answer: “5 booleans = 32 combinations, but many are invalid (e.g., is_admin but not is_logged_in). I’d use an enum for high-level states:

enum AppState { LOGGED_OUT, LOGGED_IN_USER, LOGGED_IN_ADMIN, LOADING, ERROR };

This reduces 32 states to 5 valid ones. Orthogonal concerns (is_loading, has_error) might be separate flags, but the core state machine is explicit. My editor had 3 modes (enum), plus separate dirty flag (bool)—the mode was the primary state machine.”

If you can answer these questions, you’ve internalized the core concepts. Interviewers will be impressed because you’re not reciting theory—you’re sharing experience from a real project.

Hints in Layers (When You’re Stuck)

Hint Layer 1: High-Level Architecture

Click if you're stuck on "where do I even start?"

Start with the main loop:

int main(int argc, char **argv) {
    // 1. Initialize
    enable_raw_mode();
    Buffer *buf = buffer_load(argv[1]);  // or empty buffer
    Cursor cursor = {0, 0};
    EditorMode mode = NORMAL;

    // 2. Main loop
    while (running) {
        render_screen(buf, cursor, mode);

        int c = read_key();  // blocking read

        switch (mode) {
            case NORMAL:
                handle_normal_mode(buf, &cursor, &mode, c);
                break;
            case INSERT:
                handle_insert_mode(buf, &cursor, &mode, c);
                break;
            case COMMAND:
                handle_command_mode(buf, &cursor, &mode, c);
                break;
        }
    }

    // 3. Cleanup
    disable_raw_mode();
    buffer_free(buf);
    return 0;
}

Key insight: The main loop is dead simple. All complexity is in the handler functions.

Hint Layer 2: State Representation

Click if you're stuck on "how do I represent mode?"

Use an enum:

typedef enum {
    MODE_NORMAL,
    MODE_INSERT,
    MODE_COMMAND
} EditorMode;

EditorMode current_mode = MODE_NORMAL;

Why enum, not int or string?

Type-safe: compiler catches typos
Explicit: only 3 valid values
Fast: single integer comparison

State transitions:

void transition_to_insert(EditorMode *mode) {
    if (*mode != MODE_NORMAL) {
        return;  // invalid transition, ignore
    }
    *mode = MODE_INSERT;
}

Key insight: Make invalid transitions impossible by checking preconditions.

Hint Layer 3: Cursor Invariants

Click if you're stuck on "how do I keep cursor in bounds?"

Define invariants as assertions:

void assert_cursor_invariants(Buffer *buf, Cursor *cursor) {
    assert(cursor->y >= 0);
    assert(cursor->y < buf->num_lines);
    assert(cursor->x >= 0);
    assert(cursor->x <= strlen(buf->lines[cursor->y]));
}

Call after every operation:

void move_cursor_right(Buffer *buf, Cursor *cursor) {
    cursor->x++;

    // Restore invariant
    size_t line_len = strlen(buf->lines[cursor->y]);
    if (cursor->x > line_len) {
        cursor->x = line_len;
    }

    assert_cursor_invariants(buf, cursor);
}

Key insight: Check invariants obsessively during development. Remove assertions for release build.

Hint Layer 4: Terminal Raw Mode

Click if you're stuck on "how do I capture ESC and arrow keys?"

Enable raw mode:

struct termios orig_termios;

void enable_raw_mode() {
    tcgetattr(STDIN_FILENO, &orig_termios);

    struct termios raw = orig_termios;
    raw.c_lflag &= ~(ECHO | ICANON | ISIG | IEXTEN);
    raw.c_iflag &= ~(IXON | ICRNL | BRKINT | INPCK | ISTRIP);
    raw.c_oflag &= ~(OPOST);
    raw.c_cflag |= (CS8);
    raw.c_cc[VMIN] = 0;
    raw.c_cc[VTIME] = 1;

    tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw);
}

void disable_raw_mode() {
    tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig_termios);
}

Read keys:

int read_key() {
    char c;
    while (read(STDIN_FILENO, &c, 1) != 1);  // blocking read

    if (c == '\x1b') {  // ESC sequence
        char seq[3];
        if (read(STDIN_FILENO, &seq[0], 1) != 1) return '\x1b';
        if (read(STDIN_FILENO, &seq[1], 1) != 1) return '\x1b';

        if (seq[0] == '[') {
            switch (seq[1]) {
                case 'A': return ARROW_UP;
                case 'B': return ARROW_DOWN;
                case 'C': return ARROW_RIGHT;
                case 'D': return ARROW_LEFT;
            }
        }
        return '\x1b';
    }

    return c;
}

Key insight: Arrow keys are escape sequences: ESC [ A for up, ESC [ B for down, etc.

Hint Layer 5: Buffer Operations

Click if you're stuck on "how do I insert/delete characters?"

Insert character in the middle of a line:

void buffer_insert_char(Buffer *buf, int y, int x, char c) {
    char *line = buf->lines[y];
    size_t len = strlen(line);

    // Reallocate to make room for 1 more char + null terminator
    line = realloc(line, len + 2);
    buf->lines[y] = line;

    // Shift characters right
    memmove(&line[x + 1], &line[x], len - x + 1);

    // Insert character
    line[x] = c;

    buf->dirty = true;
}

Delete character:

void buffer_delete_char(Buffer *buf, int y, int x) {
    char *line = buf->lines[y];
    size_t len = strlen(line);

    if (x >= len) return;  // nothing to delete

    // Shift characters left
    memmove(&line[x], &line[x + 1], len - x);

    buf->dirty = true;
}

Key insight: Use memmove (not memcpy) because source and destination overlap.

Hint Layer 6: Command Parsing

Click if you're stuck on "how do I parse :w, :q, :wq?"

Accumulate command string:

char command_buf[256] = {0};
int command_len = 0;

void handle_command_mode(Buffer *buf, Cursor *cursor, EditorMode *mode, int c) {
    if (c == '\r' || c == '\n') {  // ENTER
        execute_command(buf, command_buf);
        command_buf[0] = '\0';
        command_len = 0;
        *mode = MODE_NORMAL;
    } else if (c == '\x1b') {  // ESC
        command_buf[0] = '\0';
        command_len = 0;
        *mode = MODE_NORMAL;
    } else if (c == 127 || c == '\b') {  // BACKSPACE
        if (command_len > 0) {
            command_buf[--command_len] = '\0';
        }
    } else {
        command_buf[command_len++] = c;
        command_buf[command_len] = '\0';
    }
}

Execute command:

void execute_command(Buffer *buf, const char *cmd) {
    if (strcmp(cmd, "w") == 0) {
        save_buffer(buf, buf->filename);
    } else if (strcmp(cmd, "q") == 0) {
        if (buf->dirty) {
            show_message("Error: No write since last change (use :q! to override)");
        } else {
            running = false;
        }
    } else if (strcmp(cmd, "wq") == 0) {
        save_buffer(buf, buf->filename);
        running = false;
    } else if (strcmp(cmd, "q!") == 0) {
        running = false;
    } else {
        show_message("Error: Unknown command");
    }
}

Key insight: Command mode is just string accumulation until ENTER.

Hint Layer 7: Rendering

Click if you're stuck on "how do I draw the screen?"

Clear screen and reposition cursor:

void render_screen(Buffer *buf, Cursor cursor, EditorMode mode) {
    // Clear screen
    write(STDOUT_FILENO, "\x1b[2J", 4);

    // Move cursor to top-left
    write(STDOUT_FILENO, "\x1b[H", 3);

    // Draw each line
    for (size_t i = 0; i < buf->num_lines; i++) {
        printf("%s\r\n", buf->lines[i]);
    }

    // Draw status bar
    printf("\r\n-- ");
    switch (mode) {
        case MODE_NORMAL:  printf("NORMAL"); break;
        case MODE_INSERT:  printf("INSERT"); break;
        case MODE_COMMAND: printf("COMMAND"); break;
    }
    printf(" MODE --");

    // Position cursor at correct location
    char buf_cursor[32];
    snprintf(buf_cursor, sizeof(buf_cursor), "\x1b[%d;%dH", cursor.y + 1, cursor.x + 1);
    write(STDOUT_FILENO, buf_cursor, strlen(buf_cursor));
}

ANSI escape sequences:

\x1b[2J → Clear screen
\x1b[H → Move cursor to (1, 1)
\x1b[<row>;<col>H → Move cursor to (row, col)

Key insight: ANSI codes are how you control the terminal. No ncurses needed!

Hint Layer 8: Cleanup

Click if you're stuck on "how do I restore terminal on exit?"

Use atexit() for normal exit:

void cleanup() {
    disable_raw_mode();

    // Clear screen on exit
    write(STDOUT_FILENO, "\x1b[2J", 4);
    write(STDOUT_FILENO, "\x1b[H", 3);
}

int main() {
    atexit(cleanup);
    enable_raw_mode();
    // ... editor logic ...
}

Use signal handlers for Ctrl+C:

void signal_handler(int sig) {
    cleanup();
    exit(0);
}

int main() {
    signal(SIGINT, signal_handler);
    signal(SIGTERM, signal_handler);
    // ...
}

Key insight: atexit() runs on normal exit. Signal handlers catch Ctrl+C and kills. Both call cleanup().

Books That Will Help

This table maps specific challenges to chapters in authoritative books. Read the chapters before or during implementation.

Challenge	Book & Chapter	Pages	What You’ll Learn
Understanding finite state machines	“Code: The Hidden Language” by Charles Petzold, Chapter 14	192-224	How state is represented in hardware; flip-flops and latches; binary state transitions
Loop invariants and assertions	“Programming: Principles and Practice Using C++” by Bjarne Stroustrup, Chapter 4.5	120-135	What invariants are; how to identify them; using `assert()` for debugging
Resource cleanup in C	“Effective C” by Robert C. Seacord, Chapter 8	221-260	File handling patterns; the `goto cleanup` idiom; ensuring resources are freed
Terminal raw mode	“The Linux Programming Interface” by Michael Kerrisk, Chapter 62	1319-1360	Terminal attributes; `tcgetattr`/`tcsetattr`; ANSI escape sequences
Error handling without exceptions	“C Interfaces and Implementations” by David Hanson, Chapter 4	41-62	Return code patterns; propagating errors; using `errno`
State machine design patterns	“Language Implementation Patterns” by Terence Parr, Chapter 2	15-40	Explicit state vs implicit state; state transition tables; dispatch patterns
Buffer data structures	“Programming Pearls” by Jon Bentley, Chapter 3	33-50	Choosing the right data structure; gap buffers vs arrays vs ropes
Signal handling	“The Linux Programming Interface” by Michael Kerrisk, Chapter 20	411-446	Signal handlers; signal safety; cleaning up on signals
Memory management	“Effective C” by Robert C. Seacord, Chapter 4	71-110	Dynamic memory allocation; avoiding leaks; `valgrind` usage
Command pattern	“Design Patterns” by Gang of Four, Command chapter	233-242	Encapsulating actions as objects; undo/redo; command queues

Reading strategy:

Before coding: Read “Code” Chapter 14, “Programming Principles” Chapter 4.5, “Effective C” Chapter 8
When implementing terminal I/O: Read “Linux Programming Interface” Chapter 62
When stuck on state transitions: Read “Language Implementation Patterns” Chapter 2
When implementing undo/redo (optional): Read “Design Patterns” Command chapter

Online resources (free):

ANSI Escape Sequences Cheat Sheet
VT100 Terminal Codes
Build Your Own Text Editor (in C) — excellent step-by-step tutorial

Key insight: Don’t try to read entire books. Read the specific chapters that address your current challenge. Books are tools, not novels.

Common Pitfalls & Debugging

This section addresses the most common bugs you’ll encounter when building a modal text editor. Learn from others’ mistakes!

Problem 1: “Cursor moves beyond end of line when switching modes”

Symptom:

Line with text here█     # Cursor at column 19 (end of line)
# Press 'i' for INSERT mode
Line with text here     █ # Cursor now at column 20 (invalid!)
# Type 'x'
Line with text herex█     # Character inserted at wrong position

Why this happens:

In NORMAL mode, cursor can be at line_length - 1 (last character)
In INSERT mode, cursor can be at line_length (past last character, for appending)
When switching modes, you forgot to adjust cursor position

Fix:

void enter_insert_mode(Editor *ed) {
    ed->mode = MODE_INSERT;

    // If cursor is at end of line in NORMAL mode (last character),
    // move it one position right for INSERT mode (past last character)
    Line *line = get_current_line(ed);
    if (ed->cursor.x == line->length && line->length > 0) {
        ed->cursor.x = line->length;  // Position for appending
    }
}

Quick test:

# Create file with single line "hello"
$ echo "hello" > test.txt
$ ./myvi test.txt
# Move cursor to end (position 4, the 'o')
# Press 'i' → cursor should move to position 5 (past 'o')
# Type 'x' → should append, producing "hellox"

Problem 2: “Segfault when undoing the first change”

Symptom:

$ ./myvi test.txt
# Make a change (insert character)
# Press 'u' to undo
Segmentation fault (core dumped)

Why this happens:

Undo stack is empty before first change
undo() pops from empty stack without checking
Null pointer dereference when accessing stack->top

Fix:

void undo(Editor *ed) {
    if (ed->undo_stack->count == 0) {
        display_message(ed, "Already at oldest change");
        return;  // Don't crash, just notify user
    }

    UndoNode *node = pop_undo(ed->undo_stack);
    // ... apply undo ...
}

Quick test:

$ ./myvi empty.txt
# Press 'u' immediately → should show "Already at oldest change"
# Make a change, press 'u' → should undo successfully
# Press 'u' again → should show "Already at oldest change" again

Problem 3: “File corrupted after crash during save”

Symptom:

$ ./myvi important.txt
# Edit file extensively
# Press ':w'
# Power loss or kill -9 during write
# File now contains partial/corrupted data

Why this happens:

Opening file with "w" mode truncates it immediately
If write fails midway, original data is lost
No atomic write strategy

Fix:

int save_file(Editor *ed, const char *path) {
    // Write to temporary file first
    char temp_path[PATH_MAX];
    snprintf(temp_path, sizeof(temp_path), "%s.tmp", path);

    FILE *f = fopen(temp_path, "w");
    if (!f) return -1;

    // Write all lines
    for (size_t i = 0; i < ed->buffer->line_count; i++) {
        fprintf(f, "%s\n", ed->buffer->lines[i]->data);
    }

    if (fclose(f) != 0) {
        unlink(temp_path);  // Delete temp file on failure
        return -1;
    }

    // Atomic rename (POSIX guarantees atomicity)
    if (rename(temp_path, path) != 0) {
        unlink(temp_path);
        return -1;
    }

    return 0;
}

Quick test:

# Simulate crash during save
$ ./myvi test.txt
# Edit file, press ':w'
# In another terminal: kill -9 $(pgrep myvi)
# Original file should be intact (tmp file may exist)

Problem 4: “Terminal corrupted when editor crashes”

Symptom:

$ ./myvi test.txt
# Segfault occurs
Segmentation fault (core dumped)
$ ls    # Terminal doesn't show input, Ctrl+C doesn't work
        # Terminal appears frozen

Why this happens:

Editor sets terminal to raw mode (disables line buffering, echoing)
Crash happens before restoring terminal to normal mode
Terminal remains in raw mode even after program exits

Fix:

#include <signal.h>

void restore_terminal_on_exit(void) {
    // Restore terminal attributes
    tcsetattr(STDIN_FILENO, TCSAFLUSH, &original_termios);

    // Clear screen and move cursor to bottom
    printf("\033[2J\033[H");
    fflush(stdout);
}

int main(int argc, char **argv) {
    // Register cleanup function
    atexit(restore_terminal_on_exit);

    // Register signal handlers for crashes
    signal(SIGSEGV, handle_signal);
    signal(SIGINT, handle_signal);
    signal(SIGTERM, handle_signal);

    // ... editor code ...
}

void handle_signal(int sig) {
    restore_terminal_on_exit();
    exit(sig);
}

Quick test:

$ ./myvi test.txt
# Press Ctrl+C → terminal should restore correctly
# Force segfault (intentionally corrupt memory)
# Terminal should still restore (may need 'reset' command as backup)

Problem 5: “Memory leak when opening large files repeatedly”

Symptom:

$ valgrind ./myvi largefile.txt
# Open, edit, save, quit multiple times
==12345== 524,288 bytes in 1 blocks are definitely lost
==12345==    at 0x4C2DB8F: malloc
==12345==    by 0x400850: load_file (editor.c:234)

Why this happens:

load_file() allocates buffer for file contents
When quitting or opening new file, buffer isn’t freed
Each file open leaks previous buffer

Fix:

void close_buffer(Buffer *buf) {
    if (!buf) return;

    // Free each line
    for (size_t i = 0; i < buf->line_count; i++) {
        free(buf->lines[i]->data);
        free(buf->lines[i]);
    }

    // Free line array
    free(buf->lines);

    // Free buffer itself
    free(buf);
}

void editor_quit(Editor *ed) {
    close_buffer(ed->buffer);  // Must free buffer!
    restore_terminal_on_exit();
    exit(0);
}

Quick test:

$ valgrind --leak-check=full ./myvi test.txt
# Edit, save, quit normally
# Should show: "All heap blocks were freed -- no leaks are possible"

Problem 6: “Command mode doesn’t handle backspace correctly”

Symptom:

# In COMMAND mode
:wqsomething    # Typed ':wq' then 'something'
# Press backspace to correct
:wqsomething^?^?^?    # Backspace prints '^?' characters instead of deleting

Why this happens:

Terminal is in raw mode
Backspace key sends ASCII 127 (DEL) or ASCII 8 (BS)
Your code doesn’t recognize it as backspace

Fix:

void command_mode_handle_key(Editor *ed, int key) {
    if (key == 127 || key == 8) {  // Backspace or DEL
        if (ed->command_buffer_len > 0) {
            ed->command_buffer[--ed->command_buffer_len] = '\0';

            // Move cursor back, print space, move back again
            printf("\b \b");
            fflush(stdout);
        }
    }
    else if (key == '\n' || key == '\r') {  // Enter
        execute_command(ed, ed->command_buffer);
        ed->command_buffer_len = 0;
        ed->mode = MODE_NORMAL;
    }
    else if (key >= 32 && key < 127) {  // Printable characters
        if (ed->command_buffer_len < MAX_COMMAND_LEN - 1) {
            ed->command_buffer[ed->command_buffer_len++] = key;
            ed->command_buffer[ed->command_buffer_len] = '\0';
            printf("%c", key);
            fflush(stdout);
        }
    }
}

Quick test:

$ ./myvi test.txt
# Press ':' to enter COMMAND mode
# Type 'wq'
# Press backspace twice → should delete 'q' and 'w'
# Type 'w' and ENTER → should save file

Problem 7: “Reentrancy bug: :s/foo/bar/g crashes with infinite recursion”

Symptom:

$ ./myvi test.txt
# File contains: "foo foo foo"
# Execute: :s/foo/bar/g
Segmentation fault (stack overflow)

Why this happens:

Substitution command calls buffer_replace()
buffer_replace() triggers undo stack push
Undo stack push calls buffer_copy() to save state
buffer_copy() inadvertently triggers another substitution (if poorly designed)
Infinite recursion

Fix:

void substitute_command(Editor *ed, const char *pattern, const char *replacement) {
    // Disable undo recording during bulk operations
    ed->undo_recording_disabled = true;

    for (size_t i = 0; i < ed->buffer->line_count; i++) {
        // Perform substitution
        replace_in_line(ed->buffer->lines[i], pattern, replacement);
    }

    // Re-enable undo recording
    ed->undo_recording_disabled = false;

    // Record the ENTIRE substitution as a single undo step
    push_undo(ed, create_undo_node(ed, UNDO_SUBSTITUTE, ...));
}

Quick test:

$ echo -e "foo\nfoo\nfoo" > test.txt
$ ./myvi test.txt
# Execute :s/foo/bar/g
# Should succeed without crash
# Press 'u' → should undo ALL substitutions as a single step

Debugging Checklist

When your editor behaves incorrectly, check these systematically:

State machine invariants:
- Is ed->mode one of the valid enum values?
- Is the current state allowed to transition to the next state?
- Are you checking the mode before handling each keypress?
Cursor invariants:
- Is cursor.x always < line_length (or <= line_length in INSERT mode)?
- Is cursor.y always < total_lines?
- Does cursor adjust when deleting characters or lines?
Buffer invariants:
- Are all lines null-terminated?
- Is line_count always accurate?
- Are there any dangling pointers after deleting lines?
Resource cleanup:
- Is every malloc() paired with a free()?
- Is every fopen() paired with fclose()?
- Does the goto cleanup pattern handle ALL error paths?
Terminal handling:
- Is terminal restored to normal mode on ALL exit paths?
- Are signal handlers registered for SIGINT, SIGTERM, SIGSEGV?
- Is atexit() used to ensure cleanup on normal exit?

Run valgrind after every significant change:

$ valgrind --leak-check=full --track-origins=yes ./myvi test.txt

Use gdb to catch crashes:

$ gdb ./myvi
(gdb) run test.txt
# Crash occurs
(gdb) backtrace    # See where it crashed
(gdb) print ed->mode    # Inspect state
(gdb) print ed->cursor  # Check cursor position

Definition of Done

The editor has explicit NORMAL, INSERT, and COMMAND states, and invalid transitions are rejected.
Cursor invariants hold for every command (never outside buffer bounds).
Save uses temp-file + atomic rename; kill -9 during save never corrupts the original file.
Undo/redo works; redo stack clears after any new edit.
Errors for invalid commands and unsaved quit paths are displayed without crashing.
Terminal is restored on normal exit and on SIGINT/SIGTERM/SIGSEGV.
valgrind --leak-check=full reports zero leaks after open/edit/save/quit loops.
All tests in the debugging checklist pass.

Project 2: HTTP/1.1 Protocol Parser

File: SPRINT_3_CONTROL_FLOW_STATE_PROJECTS.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go, Zig
Coolness Level: Level 3: Genuinely Clever
Business Potential: Level 1: The “Resume Gold”
Difficulty: Level 3: Advanced (The Engineer)
Knowledge Area: Networking, Parsing, State Machines
Software or Tool: HTTP Server
Main Book: TCP/IP Illustrated, Volume 1 by W. Richard Stevens

What you’ll build: A from-scratch HTTP/1.1 request parser that reads bytes from a socket and produces structured request objects, handling chunked encoding, keep-alive, and malformed requests gracefully.

Why it teaches Control Flow & State: HTTP parsing is a state machine with strict transition rules. You’ll implement states like PARSING_METHOD, PARSING_HEADERS, PARSING_BODY, with explicit transitions. Error handling is critical—you must return proper 400/500 errors without crashing or leaking resources.

Core challenges you’ll face:

Incremental parsing (data arrives in chunks, parser must pause/resume) → maps to state transitions
Malformed input handling (return error code, don’t crash, clean up) → maps to error propagation
Buffer management (don’t overflow, don’t leak, reuse properly) → maps to resource patterns
Keep-alive connection reuse (reset parser state correctly for next request) → maps to reentrancy
Content-Length vs Chunked (different parsing modes, must not mix) → maps to control flow beyond if/else

Key Concepts:

State Machine Design: “Language Implementation Patterns” Chapter 2 - Terence Parr
Error Codes vs Exceptions: “C Interfaces and Implementations” Chapter 4 - David Hanson
Network Protocol Parsing: “TCP/IP Illustrated, Volume 1” Chapter 14 - W. Richard Stevens
Defensive Parsing: “Practical Binary Analysis” Chapter 3 - Dennis Andriesse

Difficulty: Intermediate-Advanced Time estimate: 2-3 weeks Prerequisites: Basic C, socket programming basics, familiarity with HTTP

Real world outcome:

Run your parser as a simple server: ./http_parser 8080
Open browser to http://localhost:8080/hello
See your parser correctly identify: method=GET, path=/hello, headers={…}
Send malformed requests with curl and see proper error responses, no crashes

Learning milestones:

Simple GET requests parse correctly → You understand basic state machines
Chunked encoding works → You’ve handled complex control flow
Malformed requests return 400, server keeps running → You’ve mastered error propagation
Keep-alive handles 100 requests without leaks → You understand resource lifecycle

Real World Outcome

When you complete this project, you will have a streaming HTTP/1.1 parser that behaves correctly under partial reads and malformed input.

Start the server:

$ ./http_parser 8080
[LISTEN] 0.0.0.0:8080
[STATE] WAITING_FOR_REQUEST

Valid request (netcat):

$ printf 'GET /hello HTTP/1.1\r\nHost: localhost\r\n\r\n' | nc 127.0.0.1 8080
HTTP/1.1 200 OK
Content-Length: 5
Connection: close

hello

Server trace output:

[PARSE] START_LINE: GET /hello HTTP/1.1
[PARSE] HEADER: Host=localhost
[PARSE] END_HEADERS
[PARSE] BODY: 0 bytes
[STATE] REQUEST_COMPLETE

Chunked encoding test:

$ printf 'POST /upload HTTP/1.1\r\nHost: localhost\r\nTransfer-Encoding: chunked\r\n\r\n5\r\nhello\r\n0\r\n\r\n' | nc 127.0.0.1 8080
HTTP/1.1 200 OK
Content-Length: 2
Connection: close

ok

Malformed input (no CRLF) is rejected safely:

$ printf 'GET /bad HTTP/1.1\r\nHost: localhost\r\n\r\n' | nc 127.0.0.1 8080
HTTP/1.1 400 Bad Request
Content-Length: 11
Connection: close

bad request

The Core Question You’re Answering

“How do you parse a byte stream into complete HTTP/1.1 requests without losing synchronization, even when input is partial or malformed?”

You will learn to build a parser that is strict, streaming, and recoverable.

Concepts You Must Understand First

HTTP/1.1 Message Syntax (RFC 9112)
- How request lines, headers, and bodies are framed
- https://www.rfc-editor.org/rfc/rfc9112
HTTP Semantics (RFC 9110)
- Meaning of headers and connection behavior
- https://www.rfc-editor.org/rfc/rfc9110
Streaming Parsing
- Partial reads and buffering
- Book Reference: “UNIX Network Programming Vol. 1” — Ch. 3-5
Buffer Limits & Safety
- Header size limits and body size enforcement
- Book Reference: “TCP/IP Illustrated, Vol. 1” — Ch. 28

Questions to Guide Your Design

Parser State: What are your explicit states, and what resets them?
Buffering: How do you handle partial CRLF across reads?
Framing: How do you decide body length (Content-Length vs chunked)?
Error Policy: When do you return 400 vs. close immediately?
Limits: What happens if headers exceed your size limit?

Thinking Exercise

Trace this input stream on paper:

Read #1: "POST /x HTTP/1.1\r\nHost: a\r\nContent-Length: 5\r\n\r\nhe"
Read #2: "lloGET /next HTTP/1.1\r\nHost: b\r\n\r\n"

Questions:

After Read #1, what state are you in?
How many body bytes are still needed?
After Read #2, what bytes belong to the next request?

The Interview Questions They’ll Ask

“How do you prevent desynchronization in a streaming parser?”
“Explain chunked transfer encoding.”
“Why must CRLF be enforced strictly?”
“How do you handle pipelined requests on keep-alive?”
“What limits do you enforce to prevent DoS?”
“How would you fuzz test this parser?”

Hints in Layers

Hint 1: Start with strict parser states

typedef enum { P_START_LINE, P_HEADERS, P_BODY, P_CHUNK_LEN, P_CHUNK_DATA, P_DONE, P_ERROR } ParseState;

Hint 2: Build a line reader that returns NEED_MORE

ParseResult read_line(Buffer *b, char *out, size_t max) {
    char *crlf = buffer_find_crlf(b);
    if (!crlf) return NEED_MORE;
    // copy line, consume bytes
}

Hint 3: Track body counters explicitly

if (parser->content_length > MAX_BODY) return PARSE_ERROR;
parser->body_read += n;

Hint 4: Use a netcat test harness

$ printf 'GET / HTTP/1.1\r\nHost: a\r\n\r\n' | nc 127.0.0.1 8080

Books That Will Help

Topic	Book	Chapter
HTTP parsing	“TCP/IP Illustrated, Vol. 1” by Stevens	Ch. 28
Stream I/O	“UNIX Network Programming Vol. 1” by Stevens	Ch. 3-5
Error handling	“Effective C” by Seacord	Ch. 8
Parsing patterns	“Language Implementation Patterns” by Parr	Ch. 2
Defensive parsing	“Practical Binary Analysis” by Andriesse	Ch. 3

Standards:

RFC 9112 - https://www.rfc-editor.org/rfc/rfc9112
RFC 9110 - https://www.rfc-editor.org/rfc/rfc9110

Common Pitfalls & Debugging

Problem 1: “Accepting LF-only lines”

Why: You split on \n without enforcing CRLF.
Fix: Require \r\n; reject lone LF.
Quick test: Send \n-only headers and expect 400.

Problem 2: “Chunked body never completes”

Why: You forgot to consume the CRLF after each chunk.
Fix: After chunk data, consume the trailing CRLF.
Quick test: Two-chunk request should reach DONE state.

Problem 3: “Keep-alive mixes requests”

Why: Parser state not reset after DONE.
Fix: Clear per-request fields and reset to START_LINE.
Quick test: Send two requests in one TCP stream.

Problem 4: “Header overflow crash”

Why: No header size limit.
Fix: Enforce a max header length (e.g., 8KB) and return 431/400.
Quick test: Send a 64KB header; expect rejection without crash.

Definition of Done

Strict CRLF framing enforced for request line and headers.
Supports Content-Length and chunked encoding without desync.
Keep-alive parses 100 sequential requests on one connection.
Invalid input returns 400 and resets parser state safely.
Header and body size limits enforced.
Fuzz input never crashes the parser.
valgrind shows no leaks in parse loops.

Project 3: Database Connection Pool

File: SPRINT_3_CONTROL_FLOW_STATE_PROJECTS.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go, Java
Coolness Level: Level 3: Genuinely Clever
Business Potential: Level 3: The “Service & Support” Model
Difficulty: Level 3: Advanced (The Engineer)
Knowledge Area: Concurrency, Resource Management
Software or Tool: PostgreSQL, pthreads
Main Book: C Programming: A Modern Approach by K.N. King

What you’ll build: A thread-safe connection pool manager that pre-allocates database connections, hands them out on request, reclaims them on release, handles timeouts, and recovers from dead connections.

Why it teaches Control Flow & State: A connection pool is pure state management under pressure. Each connection has states (IDLE, IN_USE, DEAD, VALIDATING). You’ll face reentrancy (what if validation triggers a callback that requests another connection?), temporal bugs (use-after-release), and must design an API that prevents misuse.

Core challenges you’ll face:

Connection state tracking (IDLE→IN_USE→IDLE, DEAD→REMOVED) → maps to state transitions
Acquire timeout (can’t wait forever, must propagate failure) → maps to error propagation
Use-after-release bugs (user keeps reference, pool reuses connection) → maps to temporal bugs
Cleanup on shutdown (all connections must close, even busy ones) → maps to cleanup paths
API that prevents misuse (can’t release twice, can’t use released connection) → maps to API design

Key Concepts:

Resource Pools: “Pattern-Oriented Software Architecture Vol. 3” Chapter on Resource Lifecycle Manager - Kircher & Jain
RAII and Cleanup: “Effective C” Chapter 8 - Robert C. Seacord
Thread Safety: “C Programming: A Modern Approach” (online supplement on POSIX threads) - K.N. King
Object Pooling: “Patterns of Enterprise Application Architecture” - Martin Fowler

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: C with pthreads, basic SQL, understanding of mutex/condition variables

Real world outcome:

Your test program spawns 50 threads, each doing 100 database operations
Pool has only 10 connections, threads wait when exhausted
Console shows: [POOL] Connection 3 acquired by thread 12, [POOL] Connection 3 released
Killing the database and restarting shows: [POOL] Connection 5 marked DEAD, reconnecting...
No leaks, no deadlocks, no use-after-free

Learning milestones:

Basic acquire/release works → You understand resource lifecycle
Timeout returns error, doesn’t hang → You’ve implemented error propagation
50 threads, 10 connections, no deadlock → You understand reentrancy concerns
Double-release is caught, not silently corrupting → You’ve designed a misuse-proof API

Real World Outcome

When you run your connection pool implementation, here’s what you’ll see:

$ ./test_pool
[POOL] Initializing pool with 10 connections to database 'testdb'
[POOL] Connection 0: CONNECTING to localhost:5432... OK (state: IDLE)
[POOL] Connection 1: CONNECTING to localhost:5432... OK (state: IDLE)
[POOL] Connection 2: CONNECTING to localhost:5432... OK (state: IDLE)
...
[POOL] Connection 9: CONNECTING to localhost:5432... OK (state: IDLE)
[POOL] Pool initialized successfully. Available: 10/10

[TEST] Spawning 50 worker threads...
[THREAD 01] Requesting connection... acquired #3 (state: IDLE → IN_USE)
[THREAD 05] Requesting connection... acquired #7 (state: IDLE → IN_USE)
[THREAD 12] Requesting connection... acquired #2 (state: IDLE → IN_USE)
...
[THREAD 15] Requesting connection... WAITING (pool exhausted: 0/10 available)
[THREAD 23] Requesting connection... WAITING (pool exhausted: 0/10 available)

[THREAD 01] Executing query: SELECT * FROM users WHERE id=42
[THREAD 01] Query completed in 23ms
[THREAD 01] Releasing connection #3 (state: IN_USE → IDLE)

[THREAD 15] Wait satisfied, acquired #3 (state: IDLE → IN_USE)
[THREAD 23] Timeout after 5000ms, returning NULL (pool still exhausted)

[VALIDATION] Connection #5 idle for 300s, validating...
[VALIDATION] Test query: SELECT 1... OK (latency: 2ms)

[ERROR] Connection #7 query failed: server closed connection unexpectedly
[ERROR] Marking connection #7 as DEAD
[RECOVERY] Attempting to reconnect #7... FAILED (attempt 1/3)
[RECOVERY] Attempting to reconnect #7... FAILED (attempt 2/3)
[RECOVERY] Attempting to reconnect #7... SUCCESS (state: DEAD → IDLE)

[SHUTDOWN] Received SIGTERM, initiating graceful shutdown...
[SHUTDOWN] Waiting for 3 busy connections to complete...
[THREAD 08] Releasing connection #1 (state: IN_USE → IDLE)
[THREAD 12] Releasing connection #4 (state: IN_USE → IDLE)
[THREAD 31] Releasing connection #9 (state: IN_USE → IDLE)
[SHUTDOWN] All connections released. Closing pool...
[POOL] Connection 0: CLOSING... OK
[POOL] Connection 1: CLOSING... OK
...
[POOL] Connection 9: CLOSING... OK
[POOL] Pool destroyed. No leaks detected.

Summary:
  Total operations: 5000
  Successful: 4987
  Timeouts: 13
  Reconnections: 8
  Peak concurrency: 50 threads on 10 connections
  Average wait time: 127ms
  Memory leaks: 0

What makes this impressive:

Explicit state transitions: Every connection moves through IDLE → IN_USE → IDLE (or DEAD → IDLE) with logging, making the state machine visible
Controlled concurrency: 50 threads competing for 10 connections without deadlock shows proper mutex/condition variable usage
Graceful degradation: Timeouts don’t crash, dead connections auto-recover, showing robust error handling
Clean shutdown: Even with busy connections, shutdown waits then closes everything cleanly (no leaks)
Observable behavior: Rich logging reveals the internal state machine, making debugging and learning possible

The Core Question You’re Answering

“How do you share a limited pool of expensive resources among many concurrent clients without deadlock, starvation, or corruption?”

This question is fundamental to:

Web servers: Database connections are expensive (TCP handshake, authentication, memory). Opening one per request doesn’t scale. Pools reuse connections.
Thread pools: Creating threads is costly. Pools pre-allocate workers and queue tasks.
Memory pools: Frequent malloc/free fragments memory. Pools pre-allocate blocks.
GPU contexts: Creating OpenGL/Vulkan contexts is slow. Pools share contexts across render calls.

The database connection pool is the canonical example because:

Connections have real state (unlike memory blocks): they can be IDLE, IN_USE, DEAD, or VALIDATING
Concurrency is unavoidable: multiple threads genuinely need connections simultaneously
Failure modes are realistic: network partitions, database restarts, timeouts all happen in production
API misuse is easy: releasing twice, using after release, forgetting to release—all common bugs

By building this, you’ll understand:

Resource lifecycle management in concurrent environments
Thread synchronization primitives (mutexes, condition variables) in practice
API design that prevents misuse through types/patterns
Failure handling without exceptions (C-style error codes)
Temporal bugs: use-after-release, double-release, deadlock from incorrect lock ordering

Concepts You Must Understand First

Before writing a single line of code, ensure you grasp these foundational concepts:

1. Multi-threading Fundamentals (CRITICAL)

What it is: Multiple threads executing concurrently within the same process, sharing memory.

Why it matters: Your pool will be accessed by many threads simultaneously. Without understanding races, you’ll create subtle corruption bugs.

Key mental model:

// WITHOUT synchronization (WRONG):
// Thread A reads connections[0]->state == IDLE
// Thread B reads connections[0]->state == IDLE (same!)
// Thread A sets connections[0]->state = IN_USE
// Thread B sets connections[0]->state = IN_USE (corrupted!)
// Both threads think they own the same connection

// WITH synchronization (CORRECT):
pthread_mutex_lock(&pool->mutex);
if (connections[0]->state == IDLE) {
    connections[0]->state = IN_USE;
    // No other thread can execute this block simultaneously
}
pthread_mutex_unlock(&pool->mutex);

What to study:

Race conditions: Two threads modifying shared data without coordination
Atomic operations: Operations that complete indivisibly (can’t be interrupted mid-operation)
Memory visibility: Thread A’s writes may not be immediately visible to Thread B without synchronization

Resources:

“C Programming: A Modern Approach” online supplement on POSIX threads - K.N. King
“Programming with POSIX Threads” Chapters 1-3 - David Butenhof
“The Linux Programming Interface” Chapters 30-31 - Michael Kerrisk

2. Mutexes (Mutual Exclusion Locks) (CRITICAL)

What it is: A lock ensuring only one thread executes a critical section at a time.

Why it matters: Without mutexes, your pool’s state (which connections are free, which are busy) will become corrupted when threads race to acquire connections.

Key mental model:

typedef struct {
    Connection connections[MAX_CONN];
    int available_count;  // Shared state!
    pthread_mutex_t mutex; // Protects all shared state
} ConnectionPool;

// Every function that touches shared state must lock:
Connection* pool_acquire(ConnectionPool* pool) {
    pthread_mutex_lock(&pool->mutex);

    // Critical section: only one thread can be here
    if (pool->available_count > 0) {
        Connection* conn = find_idle_connection(pool);
        conn->state = IN_USE;
        pool->available_count--;
    }

    pthread_mutex_unlock(&pool->mutex);
}

Common mistakes:

Forgetting to unlock: Other threads will hang forever (deadlock)
Unlocking wrong mutex: Corruption continues
Holding locks too long: Performance degrades (threads queue up waiting)

Resources:

“Programming with POSIX Threads” Chapter 3.2 - David Butenhof
“The Linux Programming Interface” Chapter 30 - Michael Kerrisk

3. Condition Variables (CRITICAL)

What it is: A synchronization primitive that lets threads wait for a condition to become true, and notifies them when it does.

Why it matters: When the pool is exhausted, threads must wait (not busy-loop) until a connection becomes available. Condition variables implement this efficiently.

Key mental model:

typedef struct {
    Connection connections[MAX_CONN];
    pthread_mutex_t mutex;
    pthread_cond_t cond;  // Signals when a connection becomes available
} ConnectionPool;

// Thread waiting for a connection:
pthread_mutex_lock(&pool->mutex);
while (pool->available_count == 0) {  // Always check in a loop!
    pthread_cond_wait(&pool->cond, &pool->mutex);
    // This atomically: unlocks mutex, sleeps, re-locks mutex when woken
}
// Connection is now available
Connection* conn = find_idle_connection(pool);
pthread_mutex_unlock(&pool->mutex);

// Thread releasing a connection:
pthread_mutex_lock(&pool->mutex);
conn->state = IDLE;
pool->available_count++;
pthread_cond_signal(&pool->cond);  // Wake one waiting thread
pthread_mutex_unlock(&pool->mutex);

Why the while loop?: Spurious wakeups (thread wakes but condition not true) and broadcasts (multiple threads wake, only one gets resource).

Resources:

“Programming with POSIX Threads” Chapter 3.3 - David Butenhof
“The Linux Programming Interface” Chapter 30.2 - Michael Kerrisk

4. Resource Lifecycle States (CRITICAL)

What it is: A connection isn’t just “available” or “busy”—it has a richer state machine.

Why it matters: Dead connections must be detected and recovered. Long-idle connections may need validation. These states must be tracked explicitly.

State diagram:

[CONNECTING] → [IDLE] ⇄ [IN_USE]
                  ↓         ↓
              [VALIDATING] [DEAD]
                  ↓         ↓
                [IDLE]  [RECONNECTING] → [IDLE]

Implementation sketch:

typedef enum {
    CONN_CONNECTING,    // Initial connection in progress
    CONN_IDLE,          // Available for use
    CONN_IN_USE,        // Currently assigned to a thread
    CONN_VALIDATING,    // Testing if connection is still alive
    CONN_DEAD,          // Connection lost, needs reconnection
    CONN_RECONNECTING   // Recovery in progress
} ConnectionState;

typedef struct {
    PGconn* pg_conn;
    ConnectionState state;
    time_t last_used;    // For detecting idle timeout
    int error_count;     // For exponential backoff on failures
} Connection;

Transitions:

IDLE → IN_USE: Thread acquires connection
IN_USE → IDLE: Thread releases connection
IN_USE → DEAD: Query fails with unrecoverable error
IDLE → VALIDATING: Connection idle > threshold, test it
VALIDATING → IDLE: Test passed
VALIDATING → DEAD: Test failed
DEAD → RECONNECTING: Attempting recovery
RECONNECTING → IDLE: Recovery succeeded

Resources:

“Pattern-Oriented Software Architecture Vol. 3” Resource Lifecycle Manager pattern - Kircher & Jain
“Code Complete” Chapter 8 on state transitions - Steve McConnell

5. PostgreSQL libpq (or equivalent database client library) (IMPORTANT)

What it is: The C library for communicating with PostgreSQL. You’ll use it to open connections, execute queries, detect errors.

Why it matters: You need to know how to:

Open a connection: PGconn* PQconnectdb(const char* conninfo)
Check if connected: PQstatus(conn) == CONNECTION_OK
Execute a query: PGresult* PQexec(conn, "SELECT 1")
Detect connection loss: PQstatus(conn) == CONNECTION_BAD
Close a connection: PQfinish(conn)

Minimal example:

PGconn* conn = PQconnectdb("dbname=testdb user=postgres");
if (PQstatus(conn) != CONNECTION_OK) {
    fprintf(stderr, "Connection failed: %s\n", PQerrorMessage(conn));
    PQfinish(conn);
    return NULL;
}
// Connection is ready
PQfinish(conn);  // Must clean up

Resources:

PostgreSQL documentation: libpq - C Library
“PostgreSQL: Up and Running” Chapter 11 - Regina Obe & Leo Hsu

6. Timeout Handling (IMPORTANT)

What it is: When pool is exhausted, threads can’t wait forever—they must timeout and fail gracefully.

Why it matters: A buggy client that never releases connections would hang your entire application if acquire() never times out.

Implementation with pthread_cond_timedwait:

struct timespec timeout;
clock_gettime(CLOCK_REALTIME, &timeout);
timeout.tv_sec += 5;  // Wait up to 5 seconds

pthread_mutex_lock(&pool->mutex);
while (pool->available_count == 0) {
    int result = pthread_cond_timedwait(&pool->cond, &pool->mutex, &timeout);
    if (result == ETIMEDOUT) {
        pthread_mutex_unlock(&pool->mutex);
        return NULL;  // Timeout, no connection available
    }
}
// Connection available
pthread_mutex_unlock(&pool->mutex);

Resources:

“Programming with POSIX Threads” Chapter 3.3.4 - David Butenhof

Questions to Guide Your Design

Before implementing, answer these design questions. They’ll force you to think through the hard cases:

About the Pool Structure

Q: Should the pool pre-allocate all connections at init time, or create them lazily on first request?

Why it matters: Pre-allocation ensures predictable startup time but wastes resources if pool is rarely full. Lazy allocation is efficient but adds complexity (what if connection creation fails during acquire()?).

Hint: For learning, pre-allocate. Production pools often hybrid (create minimum at init, grow to max on demand).
Q: How do you store connections? Array? Linked list? Queue?

Why it matters: Data structure affects acquire() performance. Array with linear search is O(n). Free-list (linked list of idle connections) is O(1).

Hint: Start with array for simplicity. Optimize to free-list later.
Q: What happens if database is down when pool initializes?

Why it matters: Should pool_init() fail hard (return NULL), or succeed with zero connections and retry later?

Hint: Fail hard initially. Retry logic is an advanced feature.

About Thread Safety

Q: Do you need one global mutex protecting the entire pool, or per-connection mutexes?

Why it matters: Global mutex is simple but creates contention (threads queue even if different connections available). Per-connection mutexes are complex but allow parallelism.

Hint: Start global. Per-connection is premature optimization.
Q: What data is shared and needs protection?

Why it matters: If you forget to protect a shared variable (e.g., available_count), race conditions will corrupt your pool.

Hint: List all struct fields. Ask “Can two threads read/write this simultaneously?” If yes, protect with mutex.
Q: Can a thread hold the pool mutex while executing a slow database query?

Why it matters: If yes, the pool is effectively single-threaded (other threads block waiting for mutex). Disaster for performance.

Hint: Lock must be released before calling PQexec(). Only protect pool metadata (which connection is IN_USE), not the actual query execution.

About Resource Lifecycle

Q: What does “acquire” return? A raw PGconn*? Or a wrapper struct with metadata?

Why it matters: If raw pointer, how do you prevent use-after-release? Wrapper can embed a “generation number” to detect stale references.

Hint: Return wrapper PooledConnection { PGconn* conn; int pool_id; uint64_t generation; }. Check generation on release.
Q: What happens if a thread acquires a connection but crashes before releasing?

Why it matters: Connection is leaked forever, pool slowly exhausts.

Hint: No perfect solution in C. In C++, use RAII (destructor releases). In C, require explicit pool_release() + document carefully. Advanced: Use pthread cleanup handlers.
Q: Should connections self-validate periodically?

Why it matters: A connection idle for hours may have been silently closed by server. Returning it to a thread causes immediate error.

Hint: Add last_used timestamp. Before acquire, if now - last_used > threshold, execute “SELECT 1” to validate.

About Error Handling

Q: When a connection dies mid-query, should the pool auto-reconnect or mark it permanently dead?

Why it matters: Auto-reconnect adds complexity (exponential backoff? max retries?). But marking dead means pool shrinks over time.

Hint: Start simple: mark DEAD, return NULL to caller (they handle retry). Advanced: Background thread attempts reconnection.
Q: What error code do you return when pool is exhausted?

Why it matters: Caller must distinguish “temporary exhaustion” from “pool destroyed” from “internal error”.

Hint: Return NULL on timeout, set errno or pool->last_error.
Q: Can pool_release() fail? What if the connection passed is invalid?

Why it matters: Defensive programming. Release is critical for pool health—corruption here is catastrophic.

Hint: Validate connection ID against pool bounds. Check generation number. Log error, maybe assert() in debug builds.

About API Design

Q: Should release() take the connection pointer, or a handle/ID?

Why it matters: Pointers can be stale. Handles can be validated.

Hint: Pointer is simpler. Add validation (check if pointer is in pool’s array range, check state == IN_USE).
Q: What prevents double-release?

Why it matters: Releasing a connection twice corrupts the pool (two threads might get the same connection).

Hint: On release, check state == IN_USE. If not, it’s a bug. Log error and return EINVAL.
Q: How does shutdown work if threads still hold connections?

Why it matters: Forcibly closing connections while threads use them causes crashes.

Hint: pool_destroy() sets a shutting_down flag. Acquire returns NULL when flag set. Then wait for in_use_count to reach zero (with timeout). Finally close all connections.

Thinking Exercise: Trace Connection States on Paper

This exercise builds intuition for how the state machine behaves under concurrent access.

Setup:

Pool has 3 connections: C0, C1, C2 (all initially IDLE)
5 threads: T1, T2, T3, T4, T5
Timeline shows actions in chronological order

Exercise: For each event, update the state table and answer the questions.

Event Timeline:
T1 calls acquire()
T2 calls acquire()
T3 calls acquire()
T4 calls acquire() → must wait (pool exhausted)
T5 calls acquire() → must wait (pool exhausted)
T1 releases C0
T2 query fails, C1 marked DEAD
T4's wait times out (5 seconds elapsed)
T3 releases C2
Background thread attempts to reconnect C1
Reconnect succeeds, C1 → IDLE
T5's wait satisfied, acquires C1

State Table (fill this in):

Event	C0 State	C1 State	C2 State	T1	T2	T3	T4	T5	available_count	Waiting Threads
Start	IDLE	IDLE	IDLE	-	-	-	-	-	3	[]
After 1	IN_USE	IDLE	IDLE	has C0	-	-	-	-	2	[]
After 2	?	?	?	?	?	?	?	?	?	?
…

Questions to answer:

After event 4: T4 calls pthread_cond_wait(). What happens to the mutex during the wait?

Answer: Mutex is atomically unlocked by pthread_cond_wait(), allowing other threads to lock and release connections. When T4 wakes, mutex is automatically re-locked.
After event 6: T1 releases C0. How does T4 or T5 get notified?

Answer: Release calls pthread_cond_signal() (or broadcast()). One waiting thread wakes, re-checks condition (available_count > 0), acquires C0.
After event 7: C1 is DEAD. Should it count toward available_count?

Answer: No! Only IDLE connections are available. DEAD connections are excluded from the count. This is why tracking state is critical.
After event 8: T4 times out. What does it receive?

Answer: acquire() returns NULL. T4’s application code must handle this (retry? fail request? use fallback?).
After event 10: Reconnection happens. Who tracks this—pool or connection?

Answer: Connection’s state changes DEAD → RECONNECTING → IDLE. Pool’s available_count increments when state becomes IDLE. If reconnect fails, state returns to DEAD.
After event 12: T5 was waiting before reconnect. How did it get C1?

Answer: Reconnect changed C1 to IDLE and called pthread_cond_signal(). T5’s pthread_cond_wait() returned, loop re-checked condition (available_count > 0 now), acquired C1.

Key insights from this exercise:

State transitions and metadata (available_count) must stay in sync
Condition variables decouple “state changed” (signal) from “who gets resource” (first waiter to wake)
Timeouts allow graceful degradation but don’t free resources
Reconnection is asynchronous—waiting threads must re-check state when woken

The Interview Questions They’ll Ask

If you truly understand connection pools, you should be able to answer these questions in detail:

Conceptual Questions

“Why use a connection pool instead of opening a new database connection per request?”

What they’re testing: Do you understand the cost model?

Strong answer: “Database connections are expensive to create (TCP handshake, authentication, session initialization). For a web server handling 1000 req/sec, opening 1000 connections/sec would overwhelm the database. Pools pre-allocate 10-50 connections and reuse them, amortizing the setup cost. Trade-off: requests may wait for an available connection, but throughput is much higher.”
“What happens if all connections are busy when a request arrives?”

What they’re testing: Do you understand queueing vs. rejection?

Strong answer: “Depends on design. Options: (1) Block with timeout—thread waits up to N seconds then fails. (2) Reject immediately—return error, let caller retry. (3) Queue the request—hold it in a queue until connection available. I implemented (1) because it balances responsiveness (timeout prevents infinite hangs) with utilization (gives transient load spikes a chance to resolve).”
“How do you detect a dead connection?”

What they’re testing: Do you understand failure modes?

Strong answer: “Three ways: (1) Error on use—query fails with CONNECTION_BAD, mark connection DEAD immediately. (2) Proactive validation—before returning an idle connection, execute ‘SELECT 1’ to test liveness. (3) Idle timeout—connections unused for >5 minutes might be closed by firewall, validate before reuse. I use (1) always, (2) for long-idle connections.”

Design Questions

“Walk me through what happens when a thread calls pool_acquire() and the pool is exhausted.”

What they’re testing: Do you understand condition variables?

Strong answer:

Thread locks pool mutex
Checks available_count—it's 0
Enters while loop: pthread_cond_timedwait(&pool->cond, &pool->mutex, &timeout)
This atomically unlocks mutex and sleeps
Another thread releases a connection, calls pthread_cond_signal(), wakes us
Our thread re-locks mutex, re-checks condition (available_count > 0 now)
Exits loop, finds IDLE connection, marks IN_USE, decrements available_count
Unlocks mutex, returns connection

“How do you prevent use-after-release bugs?”

What they’re testing: Can you design APIs that prevent misuse?

Strong answer: “Three techniques: (1) Opaque handles—return a handle ID, not a raw pointer. Release invalidates the handle. (2) Generation numbers—each connection has a generation counter. Incremented on release. Release checks caller’s generation matches. (3) State validation—on release, assert connection state is IN_USE. If not, it’s already released or never acquired. I use (3) because it’s simple and catches bugs immediately.”
“What’s your shutdown strategy if threads still hold connections?”

What they’re testing: Do you handle cleanup paths?

Strong answer: “Set a shutting_down flag. New acquires immediately return NULL. Then wait (with timeout) for in_use_count to reach zero. If timeout expires, log which threads are stuck (helps debug). Finally, close all connections. Trade-off: graceful shutdown is slow but safe. Forceful shutdown (close immediately) risks crashes in active threads.”

Concurrency Questions

“Can two threads acquire the same connection simultaneously? How do you prevent it?”

What they’re testing: Do you understand critical sections?

Strong answer: “No, because the entire acquire operation is protected by a mutex. Only one thread can execute the critical section (checking available_count, finding IDLE connection, marking IN_USE) at a time. The mutex ensures mutual exclusion.”
“Why use a while loop with pthread_cond_wait() instead of if?”

What they’re testing: Do you know about spurious wakeups?

Strong answer: “Two reasons: (1) Spurious wakeups—thread can wake even if condition not signaled. (2) Multiple waiters—signal wakes one thread, but by the time it re-locks mutex, another thread might have taken the connection. The while loop ensures we re-check the condition after waking. Never assume the condition is true just because wait returned.”
“What if a thread acquires a connection, then the thread is killed (SIGKILL)?”

What they’re testing: Do you understand unrecoverable failures?

Strong answer: “The connection leaks—it’s marked IN_USE forever, pool shrinks by one. No way to prevent this in C (SIGKILL can’t be caught). Mitigations: (1) Use process-level pools (pool in separate process, connections handed via sockets). (2) Add health checks—if connection unused for >1 hour, mark DEAD and recover. (3) In production, prevent SIGKILL—use SIGTERM and handle gracefully.”

Performance Questions

“Your pool has 100 threads and 10 connections. Where’s the bottleneck?”

What they’re testing: Do you understand contention?

Strong answer: “The pool mutex. All 100 threads serialize on acquire/release, even though only 10 can have connections at once. Optimization: Use thread-local caching (each thread keeps last connection), only return to pool on thread exit. Or partition pool (10 sub-pools of 1 connection each, threads hash to a sub-pool). Trade-off: Complexity vs. reduced contention.”
“How would you measure pool health in production?”

What they’re testing: Do you think about observability?

Strong answer: “Metrics: (1) Average wait time—high means pool too small. (2) Timeout rate—frequent timeouts mean undersized pool or slow queries. (3) Connection churn—high DEAD→RECONNECTING means network issues. (4) Utilization—if always <50%, pool oversized. I’d expose these via Prometheus and alert if wait time >100ms.”

Hints in Layers

These hints are progressive—try to implement without them first, then consult if stuck.

Layer 1: Initial Structure (Start Here)

Click to reveal basic structure hints

Hint 1.1: Your pool struct needs at least:

typedef struct {
    Connection* connections;    // Array of connections
    int capacity;               // Total number of connections
    int available_count;        // How many are IDLE
    pthread_mutex_t mutex;      // Protects all shared state
    pthread_cond_t cond;        // Signals when connection available
    bool shutting_down;         // Flag for graceful shutdown
} ConnectionPool;

Hint 1.2: Your connection struct needs:

typedef struct {
    PGconn* pg_conn;            // PostgreSQL connection handle
    ConnectionState state;      // IDLE, IN_USE, DEAD, etc.
    time_t last_used;           // For validation logic
    int pool_index;             // Position in pool's array (for validation)
} Connection;

Hint 1.3: Initialization skeleton:

ConnectionPool* pool_create(const char* conn_string, int capacity) {
    ConnectionPool* pool = malloc(sizeof(ConnectionPool));
    pool->connections = calloc(capacity, sizeof(Connection));
    pool->capacity = capacity;
    pool->available_count = 0;
    pthread_mutex_init(&pool->mutex, NULL);
    pthread_cond_init(&pool->cond, NULL);

    for (int i = 0; i < capacity; i++) {
        // TODO: Connect to database, set state to IDLE
        // Increment available_count on success
    }

    return pool;
}

Layer 2: Basic Acquire/Release (Core Functionality)

Click to reveal acquire/release hints

Hint 2.1: Acquire without timeout (simpler):

Connection* pool_acquire(ConnectionPool* pool) {
    pthread_mutex_lock(&pool->mutex);

    while (pool->available_count == 0) {
        pthread_cond_wait(&pool->cond, &pool->mutex);
    }

    // Find first IDLE connection
    Connection* conn = NULL;
    for (int i = 0; i < pool->capacity; i++) {
        if (pool->connections[i].state == CONN_IDLE) {
            conn = &pool->connections[i];
            break;
        }
    }

    // Mark as IN_USE
    conn->state = CONN_IN_USE;
    conn->last_used = time(NULL);
    pool->available_count--;

    pthread_mutex_unlock(&pool->mutex);
    return conn;
}

Hint 2.2: Release with validation:

void pool_release(ConnectionPool* pool, Connection* conn) {
    pthread_mutex_lock(&pool->mutex);

    // Validate connection belongs to this pool
    if (conn < pool->connections ||
        conn >= pool->connections + pool->capacity) {
        fprintf(stderr, "Invalid connection released!\n");
        pthread_mutex_unlock(&pool->mutex);
        return;
    }

    // Validate state (catch double-release)
    if (conn->state != CONN_IN_USE) {
        fprintf(stderr, "Connection not in use! (state=%d)\n", conn->state);
        pthread_mutex_unlock(&pool->mutex);
        return;
    }

    conn->state = CONN_IDLE;
    conn->last_used = time(NULL);
    pool->available_count++;

    pthread_cond_signal(&pool->cond);  // Wake one waiter
    pthread_mutex_unlock(&pool->mutex);
}

Hint 2.3: Why signal instead of broadcast?

pthread_cond_signal(): Wakes one thread (efficient, only one can acquire anyway)
pthread_cond_broadcast(): Wakes all threads (wasteful, they’ll just re-sleep)
Use signal unless you have multiple condition variables

Layer 3: Timeout Handling

Click to reveal timeout hints

Hint 3.1: Acquire with timeout:

Connection* pool_acquire_timeout(ConnectionPool* pool, int timeout_sec) {
    struct timespec abs_timeout;
    clock_gettime(CLOCK_REALTIME, &abs_timeout);
    abs_timeout.tv_sec += timeout_sec;

    pthread_mutex_lock(&pool->mutex);

    while (pool->available_count == 0) {
        int result = pthread_cond_timedwait(&pool->cond, &pool->mutex,
                                             &abs_timeout);
        if (result == ETIMEDOUT) {
            pthread_mutex_unlock(&pool->mutex);
            return NULL;  // Timeout
        }
    }

    // Same as before: find IDLE connection, mark IN_USE
    // ...

    pthread_mutex_unlock(&pool->mutex);
    return conn;
}

Hint 3.2: Common mistake:

// WRONG: relative time
timeout.tv_sec = 5;  // 5 seconds from epoch (1970!) → immediate timeout

// CORRECT: absolute time
clock_gettime(CLOCK_REALTIME, &timeout);  // Current time
timeout.tv_sec += 5;  // 5 seconds from now

Layer 4: Connection Validation

Click to reveal validation hints

Hint 4.1: Validate before returning idle connection:

Connection* pool_acquire(ConnectionPool* pool) {
    pthread_mutex_lock(&pool->mutex);

    while (pool->available_count == 0) {
        pthread_cond_wait(&pool->cond, &pool->mutex);
    }

    Connection* conn = find_idle_connection(pool);

    // If idle for >5 minutes, validate
    time_t idle_time = time(NULL) - conn->last_used;
    if (idle_time > 300) {  // 5 minutes
        pthread_mutex_unlock(&pool->mutex);  // Release lock during I/O!

        bool valid = validate_connection(conn);

        pthread_mutex_lock(&pool->mutex);  // Re-lock

        if (!valid) {
            conn->state = CONN_DEAD;
            pool->available_count--;
            // Try to find another connection (or retry validation)
        }
    }

    conn->state = CONN_IN_USE;
    conn->last_used = time(NULL);
    pool->available_count--;

    pthread_mutex_unlock(&pool->mutex);
    return conn;
}

bool validate_connection(Connection* conn) {
    PGresult* res = PQexec(conn->pg_conn, "SELECT 1");
    if (PQresultStatus(res) != PGRES_TUPLES_OK) {
        PQclear(res);
        return false;
    }
    PQclear(res);
    return true;
}

Hint 4.2: Why unlock during validation?

Validation does I/O (network round-trip to database)
Holding mutex during I/O blocks all other threads
Unlock → validate → re-lock

Hint 4.3: What if validation fails?

Mark connection DEAD
Decrement available_count
Return to top of loop (wait for another connection)
Or attempt reconnection immediately

Layer 5: Graceful Shutdown

Click to reveal shutdown hints

Hint 5.1: Shutdown with waiting:

void pool_destroy(ConnectionPool* pool, int shutdown_timeout_sec) {
    pthread_mutex_lock(&pool->mutex);
    pool->shutting_down = true;
    pthread_cond_broadcast(&pool->cond);  // Wake all waiting threads
    pthread_mutex_unlock(&pool->mutex);

    // Wait for all connections to be released
    time_t start = time(NULL);
    while (1) {
        pthread_mutex_lock(&pool->mutex);
        int in_use = pool->capacity - pool->available_count;
        pthread_mutex_unlock(&pool->mutex);

        if (in_use == 0) break;

        if (time(NULL) - start > shutdown_timeout_sec) {
            fprintf(stderr, "Shutdown timeout: %d connections still in use\n", in_use);
            break;
        }

        sleep(1);
    }

    // Close all connections
    for (int i = 0; i < pool->capacity; i++) {
        if (pool->connections[i].pg_conn) {
            PQfinish(pool->connections[i].pg_conn);
        }
    }

    pthread_mutex_destroy(&pool->mutex);
    pthread_cond_destroy(&pool->cond);
    free(pool->connections);
    free(pool);
}

Hint 5.2: Modify acquire to respect shutdown:

Connection* pool_acquire(ConnectionPool* pool) {
    pthread_mutex_lock(&pool->mutex);

    if (pool->shutting_down) {
        pthread_mutex_unlock(&pool->mutex);
        return NULL;  // Pool is closing
    }

    // Rest of acquire logic...
}

Books That Will Help

This table maps specific topics to relevant book chapters for deep study:

Topic	Book	Chapter	Why It Helps
Thread fundamentals	Programming with POSIX Threads by David Butenhof	Chapters 1-2	Canonical guide to pthreads. Explains thread lifecycle, synchronization primitives from first principles.
Mutexes and condition variables	Programming with POSIX Threads by David Butenhof	Chapter 3	Deep dive into when to use mutexes vs. condition variables. Covers deadlock prevention, spurious wakeups.
Linux threading specifics	The Linux Programming Interface by Michael Kerrisk	Chapters 30-33	Platform-specific details: futex implementation, pthread_cond_timedwait(), clock selection (CLOCK_REALTIME vs MONOTONIC).
Resource pooling patterns	Pattern-Oriented Software Architecture Vol. 3 by Kircher & Jain	Resource Lifecycle Manager pattern	Architectural patterns for resource pools. Covers acquisition, release, validation, health checks.
State machines in C	Patterns in C by Adam Tornhill (online)	State pattern implementation	How to implement clean state machines in C without OOP. Useful for connection state tracking.
PostgreSQL libpq	PostgreSQL Documentation (online)	Chapter 34: libpq - C Library	Official API reference. Covers PQconnectdb, PQstatus, error handling, asynchronous queries.
Error handling in C	C Interfaces and Implementations by David Hanson	Chapter 4: Exceptions and Assertions	Patterns for error propagation in C (return codes, errno, setjmp/longjmp). Critical for robust pool implementation.
Defensive programming	Code Complete by Steve McConnell	Chapter 8: Defensive Programming	How to validate inputs, assert invariants, design APIs that prevent misuse (e.g., double-release).
Memory management	Effective C by Robert C. Seacord	Chapter 8: Memory Management	RAII alternatives in C, cleanup paths, avoiding leaks in error cases.
Concurrency bugs	The Art of Multiprocessor Programming by Herlihy & Shavit	Chapters 2-3	Foundational theory: linearizability, progress guarantees (deadlock-free, starvation-free). Helps reason about correctness.
Timeout design	UNIX Network Programming Vol. 1 by Stevens	Chapter 14: Timeouts	How to implement timeouts correctly (absolute vs. relative time, handling clock adjustments).
Real-world connection pools	High Performance MySQL by Schwartz, Zaitsev & Tkachenko	Chapter 11: Scaling MySQL	Production connection pool sizing, monitoring, failure modes. Practical wisdom from field experience.

Recommended reading order:

Start: Butenhof Chapters 1-3 (threading fundamentals)
Then: Kerrisk Chapters 30-31 (Linux-specific threading)
Next: PostgreSQL libpq docs (database client API)
Deep dive: Kircher & Jain (resource pooling patterns)
Advanced: Seacord Chapter 8 (cleanup/error handling)

Common Pitfalls & Debugging

Problem 1: “Deadlock under load”

Why: Lock order is inconsistent or a mutex is not released on error.
Fix: Enforce a single lock order and use goto cleanup for error paths.
Quick test: Run 50 threads with a 10-connection pool for 5 minutes.

Problem 2: “Threads wait forever even when connections are released”

Why: Using if instead of while around pthread_cond_wait (spurious wakeups).
Fix: Always loop on the condition.
Quick test: Add logging before/after waits to confirm condition re-checks.

Problem 3: “Double release corrupts pool state”

Why: Connection state not validated on release.
Fix: Track state per connection and reject releases unless IN_USE.
Quick test: Release the same connection twice and expect an error.

Problem 4: “Timeouts behave inconsistently”

Why: Using CLOCK_REALTIME allows system clock changes to break timeouts.
Fix: Use CLOCK_MONOTONIC with timed waits.
Quick test: Change system time and verify timeouts still behave.

Definition of Done

Pool supports concurrent acquire/release with correct state transitions (IDLE ↔ IN_USE, DEAD → RECOVERED).
Acquire timeouts work without busy-waiting or deadlock.
Double-release and use-after-release are detected and rejected.
Dead connections are detected and reconnected automatically.
Graceful shutdown closes all connections after waiting for active users.
Stress test: 50 threads, 10 connections, 5k ops with zero deadlocks.
valgrind shows no leaks and no invalid accesses.

Project 4: Undo/Redo Engine for a Drawing Application

File: SPRINT_3_CONTROL_FLOW_STATE_PROJECTS.md
Programming Language: C or C++
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Software Design Patterns / GUI
Software or Tool: Command Pattern
Main Book: “Design Patterns” by Gang of Four

What you’ll build: A command-pattern based undo/redo system for a simple drawing app where you can add shapes, move them, change colors, group them, and undo/redo any operation while maintaining complete state consistency.

Why it teaches Control Flow & State: Undo/redo forces you to think about state as snapshots in time. Each command must know how to execute and reverse itself. You’ll face temporal ordering (undo must be LIFO), cleanup (redo stack clears on new action), and invariant maintenance (undo must never leave canvas in invalid state).

Core challenges you’ll face:

Command inversion (every action needs an inverse) → maps to state transitions
Redo stack invalidation (new action clears redo history) → maps to temporal bugs
Grouped operations (undo “group shapes” must ungroup exactly) → maps to invariants
Memory management (command history can grow unbounded) → maps to resource patterns
API clarity (canvas.do(cmd) vs cmd.execute(canvas)) → maps to API design

Key Concepts:

Command Pattern: “Design Patterns” Chapter on Command - Gang of Four
State Snapshots: “Refactoring” Chapter on temporal coupling - Martin Fowler
Invariant Preservation: “Code Complete” Chapter 8 on defensive programming - Steve McConnell
UI State Machines: “Designing Data-Intensive Applications” Chapter 7 (transactions analogy) - Martin Kleppmann

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic C or C++, understanding of data structures (stacks, linked lists)

Real world outcome:

Launch app: ./drawing_app
Draw a rectangle (click-drag), change its color to blue, move it
Press Ctrl+Z three times: rectangle moves back, color reverts to original, rectangle disappears
Press Ctrl+Shift+Z: rectangle reappears exactly as first drawn
Console shows command history: [CMD] AddShape(rect_1) | [CMD] SetColor(rect_1, blue) | ...

Learning milestones:

Single undo/redo works → You understand command inversion
New action clears redo stack → You understand temporal ordering
Group undo is atomic → You’ve mastered compound state transitions
1000 operations, memory stable → You understand resource lifecycle in long-running apps

Real World Outcome

What success looks like:

You launch your drawing application and immediately see a blank canvas with a toolbar containing shape tools (rectangle, circle, line) and color options. The status bar at the bottom shows “Ready

Actions: 0

Undo: 0

Redo: 0”.

First interaction sequence:

You click the rectangle tool and drag on the canvas. A blue rectangle appears. The status updates to “Actions: 1 Undo: 1 Redo: 0”. The console logs: [EXECUTE] AddShapeCommand(Rectangle, x:100, y:100, w:200, h:150, color:blue)

You click the color picker and change the rectangle to red. The rectangle instantly changes color. Status: “Actions: 2 Undo: 2 Redo: 0”. Console: [EXECUTE] SetColorCommand(shape_id:1, old_color:blue, new_color:red)

You drag the rectangle to a new position. Status: “Actions: 3 Undo: 3 Redo: 0”. Console: [EXECUTE] MoveShapeCommand(shape_id:1, from:(100,100), to:(250,200))

You draw a second rectangle. Status: “Actions: 4 Undo: 4 Redo: 0”.

Now the magic of undo/redo:

You press Ctrl+Z (undo). The second rectangle vanishes. Status: “Actions: 4 Undo: 3 Redo: 1”. Console: [UNDO] AddShapeCommand -> executed unexecute(), removed shape_id:2

You press Ctrl+Z again. The first rectangle jumps back to position (100,100). Status: “Actions: 4 Undo: 2 Redo: 2”. Console: [UNDO] MoveShapeCommand -> moved shape_id:1 from (250,200) back to (100,100)

You press Ctrl+Z again. The rectangle turns blue again. Status: “Actions: 4 Undo: 1 Redo: 3”. Console: [UNDO] SetColorCommand -> reverted shape_id:1 from red to blue

You press Ctrl+Shift+Z (redo). The rectangle turns red again. Status: “Actions: 4 Undo: 2 Redo: 2”. Console: [REDO] SetColorCommand -> reapplied color change blue→red

Critical behavior that proves your state management works:

You press Ctrl+Z twice more. Now you’re back to a single blue rectangle at (100,100). Undo stack has 1 command, redo stack has 3 commands.

This is the crucial test: You now draw a new circle. The moment you execute this action, the redo stack completely clears. Status shows: “Actions: 5 Undo: 2 Redo: 0”. Console: [EXECUTE] AddShapeCommand(Circle, ...) | [SYSTEM] Redo stack cleared (3 commands discarded)

If you now press Ctrl+Shift+Z, nothing happens—there’s no redo history anymore. But you can still undo to remove the circle and the original rectangle.

Advanced features that demonstrate mastery:

You select three shapes and click “Group”. The three shapes become a single logical unit with a dotted bounding box. Console: [EXECUTE] GroupShapesCommand(shape_ids: [1,3,5]) -> created group_id:6
You press Ctrl+Z. The group dissolves, and the three shapes are independent again. Console: [UNDO] GroupShapesCommand -> ungrouped 3 shapes
You perform 1000 rapid actions (holding down the mouse and dragging wildly, changing colors rapidly). Memory usage stays stable at ~2MB for command history. The oldest commands start getting pruned when you hit the 500-command limit. Console: [MEMORY] Command history limit reached, pruning oldest command (action #501)

What you’ve actually built:

A visual application where every single user action is reversible
State consistency: undo-redo-undo returns to the exact same visual state
Temporal correctness: the redo stack clears when you branch from history (new action after undo)
Memory discipline: unbounded operations don’t cause memory leaks
Atomic compound operations: grouping/ungrouping is a single undo step

This isn’t a toy. You’ve implemented the exact same pattern used in Photoshop, Microsoft Word, and every professional editing application.

The Core Question You’re Answering

“How do you make every operation in a program reversible without keeping complete snapshots of the entire program state at each step?”

This question strikes at a fundamental tension in software design: users expect infinite undo/redo, but naive implementations (saving the entire canvas after each action) would consume gigabytes of memory for a few hundred operations.

The deeper questions embedded in this:

What is the minimum information needed to reverse an operation?
- If you change a rectangle’s color from blue to red, you only need to store (shape_id, old_color, new_color), not the entire canvas
- This maps to the concept of differential state vs absolute state
How do you handle operations that can’t be trivially inverted?
- Moving a shape is easy: just store old position and new position
- But what about “delete all red shapes”? You need to store which shapes were deleted and their complete state
- This forces you to think about command granularity
What happens to the future when you change the past?
- If you undo 5 steps and then do a new action, the old “future” (those 5 redo steps) must be discarded
- This is the concept of branching timelines in version control—git does the same thing when you commit from a detached HEAD
How do you make grouped operations atomic?
- If “group 3 shapes” is undone, it must not leave 2 shapes grouped and 1 ungrouped
- This requires treating multiple operations as a single transaction

Why this matters beyond drawing apps:

Databases use Write-Ahead Logs (WAL) for crash recovery—same concept as command history
Text editors implement undo/redo for edits—same pattern
Version control systems (git, mercurial) are command-pattern based—commits are commands that can be inverted (revert)
Financial systems maintain audit logs that can roll back transactions—exactly command pattern
Game engines implement replay systems by recording commands and re-executing them

The command pattern is how you build temporal control into systems. You’re not just learning to implement undo—you’re learning how to make programs that can reason about their own history.

Concepts You Must Understand First

Before you start coding, these concepts must be crystal clear. Skim them first, then return when you hit walls.

1. The Command Pattern (Design Patterns, Gang of Four, p. 233)

Core idea: Encapsulate a request as an object, thereby letting you parameterize clients with different requests, queue or log requests, and support undoable operations.

In plain terms: Instead of calling canvas.addRectangle(x, y, w, h, color) directly, you create an AddRectangleCommand object that knows how to add the rectangle. Your canvas then executes the command. The crucial insight: the command object also knows how to undo itself.

// BAD: Direct modification (can't undo)
canvas_add_rectangle(&canvas, 100, 100, 50, 30, BLUE);

// GOOD: Command object (can undo)
Command* cmd = create_add_rectangle_command(100, 100, 50, 30, BLUE);
command_execute(cmd, &canvas);  // Adds rectangle, returns shape_id
command_undo(cmd, &canvas);      // Removes the rectangle using stored shape_id

Why this matters: Without this pattern, you’d have to store complete canvas snapshots. With it, you only store tiny command objects.

Where beginners stumble: They try to make the command store the entire canvas state. The command should only store deltas—what changed, not everything.

Critical reading:

“Design Patterns” Chapter on Command (Gang of Four) - canonical reference
“Head First Design Patterns” Chapter 6 - more accessible explanation with diagrams

2. Stacks for Temporal Ordering (LIFO = Last In, First Out)

Core idea: Undo/redo are inherently stack operations. The most recent action is undone first (LIFO).

In plain terms:

Undo stack: push every executed command here
Redo stack: when you undo, pop from undo stack, push to redo stack
When you execute a new command after undoing, clear the redo stack

Visual:

Initial state: undo=[], redo=[]
Execute Cmd1:  undo=[Cmd1], redo=[]
Execute Cmd2:  undo=[Cmd1, Cmd2], redo=[]
Undo:          undo=[Cmd1], redo=[Cmd2]
Undo:          undo=[], redo=[Cmd2, Cmd1]
Execute Cmd3:  undo=[Cmd3], redo=[]  ← redo stack CLEARED

Why this matters: If you don’t clear the redo stack on new actions, you get timeline paradoxes (what does it mean to redo Cmd2 when you just did Cmd3 instead?).

Where beginners stumble: Forgetting to clear the redo stack, or trying to use a queue (FIFO) instead of a stack.

Critical reading:

“Algorithms, Fourth Edition” Section 1.3 (Bags, Queues, and Stacks) - Sedgewick & Wayne
“Introduction to Algorithms” Section 10.1 (Stacks and Queues) - CLRS

3. The Memento Pattern (for complex state)

Core idea: When a command can’t easily compute its inverse, it can store a memento (snapshot) of the affected object’s state before executing.

In plain terms: For “delete all red shapes,” instead of trying to compute which shapes to restore, just store all the deleted shapes. When undoing, restore them.

Example:

typedef struct {
    Command base;
    Shape* deleted_shapes;  // Array of deleted shapes
    int num_deleted;
} DeleteByColorCommand;

void delete_by_color_execute(Command* cmd, Canvas* canvas) {
    DeleteByColorCommand* dbc = (DeleteByColorCommand*)cmd;
    dbc->deleted_shapes = canvas_remove_shapes_by_color(canvas, dbc->color, &dbc->num_deleted);
}

void delete_by_color_undo(Command* cmd, Canvas* canvas) {
    DeleteByColorCommand* dbc = (DeleteByColorCommand*)cmd;
    canvas_restore_shapes(canvas, dbc->deleted_shapes, dbc->num_deleted);
}

Why this matters: Not all operations have clean mathematical inverses. Sometimes you need to “take a snapshot before destroying.”

Where beginners stumble: Taking snapshots of everything instead of just the affected objects.

Critical reading:

“Design Patterns” Chapter on Memento (Gang of Four)
“Refactoring” Chapter 10 on dealing with temporal coupling - Martin Fowler

4. Invariants in State Management

Core idea: Certain properties must always be true, no matter what sequence of operations you perform.

Invariants for undo/redo:

Consistency: undo(execute(cmd)) must return to the exact prior state
Commutativity of undo/redo: redo(undo(execute(cmd))) = execute(cmd)
Stack integrity: After undo, redo stack top must be the just-undone command
Memory bounds: Command history can’t grow unbounded (must prune old commands)

In plain terms: If your canvas had 5 shapes before executing a command, and the command adds 1 shape, then undoing must leave you with exactly 5 shapes, not 4 or 6.

Where beginners stumble: Commands that partially succeed (add 2 shapes, fail to add 3rd) leave the system in an inconsistent state.

Critical reading:

“Code Complete” Chapter 8.2 on assertions and defensive programming - Steve McConnell
“Programming: Principles and Practice Using C++” Chapter 4.5 on loop invariants - Stroustrup

5. Resource Lifecycle in Long-Running Applications

Core idea: Commands store data (old positions, deleted shapes, etc.). In a real application, users might perform 10,000 actions in one session. You must either prune old commands or implement a circular buffer.

In plain terms: Set a limit (e.g., 500 commands). When you reach 501, drop the oldest command and free its memory.

Why this matters: Without limits, your drawing app will consume 1GB of RAM after a long session.

Where beginners stumble: Implementing a hard cap but not handling edge cases (what if user undoes past the pruning point?).

Critical reading:

“Effective C” Chapter 8 on resource management - Robert C. Seacord
“The Garbage Collection Handbook” Chapter 1 (even if you’re in C, understand the problem) - Jones, Hosking, Moss

Questions to Guide Your Design

Work through these questions before writing code. Write down your answers. When you hit a wall, revisit them.

Beginner Questions (Must answer these)

What goes in a command object?
- For AddShapeCommand: What data is needed to undo the addition? (Hint: the shape’s ID)
- For MoveShapeCommand: What’s the minimum data to reverse a move? (Hint: old position and new position)
- For DeleteShapeCommand: What if the shape had lots of properties? (Hint: Memento pattern)
How do you handle the redo stack clearing?
- When exactly does the redo stack clear? (Hint: on any new execute)
- What happens if the redo stack has 100 commands? Do you free their memory? (Hint: yes, walk the redo stack and free)

What’s the interface for a command?

typedef struct Command {
    void (*execute)(struct Command*, Canvas*);
    void (*undo)(struct Command*, Canvas*);
    void (*destroy)(struct Command*);  // For memory cleanup
} Command;

Why do you need a destroy function pointer? (Hint: different commands store different data, need different cleanup)

How do you test that undo/redo works?
- What’s a simple test case? (Hint: execute, undo, check canvas state)
- What’s a complex test case? (Hint: execute 10 commands, undo 5, redo 3, execute new command, verify redo stack cleared)

Intermediate Questions (Should answer these)

How do you implement grouped operations?
- Is GroupShapesCommand a single command or multiple? (Hint: it’s a composite command containing sub-commands)
- When you undo a group, do you call undo on each sub-command? (Hint: yes, in reverse order)
- What if a sub-command fails during redo? (Hint: you need transactional semantics—all or nothing)
What about commands that affect multiple objects?
- AlignShapesCommand moves 5 shapes to align their centers. How do you undo this?
- Option A: Store 5 separate MoveShapeCommand objects
- Option B: Store all 5 old positions in a single command
- Which is better? (Hint: Option B is more efficient, Option A is more composable)
How do you limit memory usage?
- Strategy 1: Fixed-size circular buffer (oldest commands dropped)
- Strategy 2: Prune commands older than N minutes
- Strategy 3: Compress old commands (store diffs, not full state)
- Which fits your use case? (Hint: Strategy 1 is simplest)
What’s the difference between execute and redo?
- execute: First-time execution (might generate a shape ID)
- redo: Re-execution (shape ID already exists)
- Should these be the same function? (Hint: often yes, but sometimes redo is simpler because IDs are already assigned)

Advanced Questions (Will naturally arise)

How do you handle commands that are no longer valid?
- User does: AddShape(A), AddShape(B), DeleteShape(A), then undoes DeleteShape(A)
- Now redo stack has AddShape(B). But what if undoing DeleteShape(A) changes the canvas in a way that invalidates the redo?
- (This is a timeline consistency problem—advanced apps version their state)
Can commands modify each other?
- What if MoveShapeCommand stores absolute coordinates, but you want to implement “undo 5 moves as one operation”?
- (This requires command coalescing—merging consecutive similar commands)
How do you serialize command history?
- For autosave, you want to save the command history to disk, not just the canvas
- Why? (Hint: you can replay commands to reconstruct state, enables time-travel debugging)
What about non-deterministic commands?
- RandomizeColorsCommand picks random colors. How do you redo this?
- (Hint: store the random seed in the command, or store the chosen colors as a memento)

Thinking Exercise (Do This Before Coding)

Get a piece of paper. Draw three columns: Undo Stack, Redo Stack, Canvas State.

Start with:

Undo: []
Redo: []
Canvas: empty

Trace these operations by hand:

execute(AddRectangle(id:1, color:blue))
- What’s in undo stack? [AddRect(1,blue)]
- What’s in redo stack? []
- What’s on canvas? [Rectangle1(blue)]
execute(SetColor(id:1, blue->red))
- Undo stack? [AddRect(1,blue), SetColor(1,blue->red)]
- Redo stack? []
- Canvas? [Rectangle1(red)]
undo()
- Undo stack? [AddRect(1,blue)]
- Redo stack? [SetColor(1,blue->red)]
- Canvas? [Rectangle1(blue)]
undo()
- Undo stack? []
- Redo stack? [SetColor(1,blue->red), AddRect(1,blue)] ← order matters!
- Canvas? []
redo()
- Undo stack? [AddRect(1,blue)]
- Redo stack? [SetColor(1,blue->red)]
- Canvas? [Rectangle1(blue)]
Critical step: execute(AddCircle(id:2, color:green))
- Undo stack? [AddRect(1,blue), AddCircle(2,green)]
- Redo stack? [] ← CLEARED!
- Canvas? [Rectangle1(blue), Circle2(green)]

If you got all of these right, you understand the core mechanism.

Advanced trace: Now do the same for grouped commands:

execute(AddRect(id:1))
execute(AddCircle(id:2))
execute(GroupShapes([1,2]) -> creates group_id:3)
undo()  // What happens? Group dissolves, but shapes remain

The Interview Questions They’ll Ask

These are real questions from FAANG interviews (source: Leetcode, Glassdoor, personal experience).

Question 1: “Design an undo/redo system for a text editor”

What they’re testing: Do you understand command pattern? Can you handle variable-size edits?

Key insight: Naive approach stores entire document after each edit—terrible for 1MB files. Better approach: store edit deltas.

Minimal viable command:

typedef struct {
    Command base;
    int position;       // Where edit happened
    char* inserted;     // What was inserted (NULL if deletion)
    char* deleted;      // What was deleted (NULL if insertion)
} TextEditCommand;

Follow-up: “What if the user types ‘hello’ one character at a time? Do you store 5 commands?”

Answer: Command coalescing—merge consecutive inserts at the same position into one command.

Follow-up: “What about memory for large pastes?”

Answer: Reference counting—if user pastes 1MB of text then undoes immediately, don’t duplicate the 1MB. Store a reference.

Question 2: “How would you implement Google Docs-style collaborative undo?”

What they’re testing: Do you understand that undo is per-user in collaborative settings?

Key insight: Each user has their own undo stack. When User A undoes, it doesn’t affect User B’s edits.

The hard part: Operational Transform (OT) or CRDTs—if User A’s edit is at position 10, and User B inserts 5 characters at position 5, User A’s undo needs to account for the shift.

Honest answer for interview: “I’d use a CRDT library like Yjs, which handles this automatically. Implementing OT from scratch is a PhD-level problem.”

Question 3: “Implement undo for a painting app where each brush stroke is 10,000 pixels”

What they’re testing: Can you handle large commands efficiently?

Bad approach: Store all 10,000 pixel coordinates in the command.

Good approach: Store a compressed representation (e.g., Bezier curve parameters) or use a dirty-rectangle system (only repaint the affected region).

Follow-up: “What if the user has 1000 brush strokes?”

Answer: Pruning strategy—keep last 100 commands in memory, older commands are irreversible (or persisted to disk).

Question 4: “How do you test an undo/redo system?”

What they’re testing: Do you think about edge cases?

Test cases you must mention:

Execute, undo, redo → should return to post-execute state
Execute 10, undo 10 → should return to initial state
Execute 5, undo 3, execute new → redo stack clears
Undo when undo stack is empty → no-op (doesn’t crash)
Redo when redo stack is empty → no-op
Execute 1000 commands → memory stable (pruning works)
Grouped command: execute group, undo → atomic revert

Advanced: Property-based testing—generate random sequences of execute/undo/redo, verify invariants hold.

Hints in Layers (Progressive Revelation)

If you’re stuck, reveal hints in order. Don’t skip ahead.

Layer 1: Project Structure

Hint 1.1: What files should you create?

drawing_app/
├── src/
│   ├── main.c              // Entry point, event loop
│   ├── canvas.c/.h         // Canvas data structure and rendering
│   ├── command.c/.h        // Base command interface
│   ├── commands/
│   │   ├── add_shape.c/.h
│   │   ├── delete_shape.c/.h
│   │   ├── move_shape.c/.h
│   │   ├── set_color.c/.h
│   │   └── group_shapes.c/.h
│   ├── undo_manager.c/.h   // Manages undo/redo stacks
│   └── shapes.c/.h         // Shape data structures
├── tests/
│   ├── test_command.c
│   └── test_undo_manager.c
└── Makefile

Hint 1.2: What's the minimal canvas structure?

typedef struct {
    Shape* shapes;       // Dynamic array of shapes
    int num_shapes;
    int capacity;
    int next_shape_id;   // Auto-incrementing ID
} Canvas;

Layer 2: Command Interface

Hint 2.1: How do you define the base command?

typedef struct Command Command;

struct Command {
    void (*execute)(Command* self, Canvas* canvas);
    void (*undo)(Command* self, Canvas* canvas);
    void (*destroy)(Command* self);  // Free resources
};

Key insight: Function pointers allow polymorphism in C.

Hint 2.2: How do you implement a concrete command?

typedef struct {
    Command base;          // Inherit from Command
    int shape_id;          // Set during execute
    int x, y, w, h;
    Color color;
} AddShapeCommand;

void add_shape_execute(Command* self, Canvas* canvas) {
    AddShapeCommand* cmd = (AddShapeCommand*)self;
    cmd->shape_id = canvas_add_rectangle(canvas, cmd->x, cmd->y, cmd->w, cmd->h, cmd->color);
}

void add_shape_undo(Command* self, Canvas* canvas) {
    AddShapeCommand* cmd = (AddShapeCommand*)self;
    canvas_remove_shape(canvas, cmd->shape_id);
}

void add_shape_destroy(Command* self) {
    free(self);
}

Command* create_add_shape_command(int x, int y, int w, int h, Color color) {
    AddShapeCommand* cmd = malloc(sizeof(AddShapeCommand));
    cmd->base.execute = add_shape_execute;
    cmd->base.undo = add_shape_undo;
    cmd->base.destroy = add_shape_destroy;
    cmd->x = x;
    cmd->y = y;
    cmd->w = w;
    cmd->h = h;
    cmd->color = color;
    return (Command*)cmd;
}

Layer 3: Undo Manager

Hint 3.1: What data structure for undo/redo stacks?

typedef struct {
    Command** undo_stack;
    int undo_count;
    int undo_capacity;

    Command** redo_stack;
    int redo_count;
    int redo_capacity;

    int max_history;  // Prune if exceeded
} UndoManager;

Hint 3.2: How do you execute a command?

void undo_manager_execute(UndoManager* mgr, Canvas* canvas, Command* cmd) {
    // 1. Execute the command
    cmd->execute(cmd, canvas);

    // 2. Clear redo stack (timeline branch)
    for (int i = 0; i < mgr->redo_count; i++) {
        mgr->redo_stack[i]->destroy(mgr->redo_stack[i]);
    }
    mgr->redo_count = 0;

    // 3. Push to undo stack
    if (mgr->undo_count >= mgr->undo_capacity) {
        // Resize array
    }
    mgr->undo_stack[mgr->undo_count++] = cmd;

    // 4. Prune if over limit
    if (mgr->undo_count > mgr->max_history) {
        Command* oldest = mgr->undo_stack[0];
        oldest->destroy(oldest);
        // Shift array left
        memmove(mgr->undo_stack, mgr->undo_stack + 1, (mgr->undo_count - 1) * sizeof(Command*));
        mgr->undo_count--;
    }
}

Hint 3.3: How do you implement undo?

void undo_manager_undo(UndoManager* mgr, Canvas* canvas) {
    if (mgr->undo_count == 0) return;  // Nothing to undo

    // 1. Pop from undo stack
    Command* cmd = mgr->undo_stack[--mgr->undo_count];

    // 2. Undo the command
    cmd->undo(cmd, canvas);

    // 3. Push to redo stack
    if (mgr->redo_count >= mgr->redo_capacity) {
        // Resize array
    }
    mgr->redo_stack[mgr->redo_count++] = cmd;
}

Critical insight: The command is not destroyed—it moves from undo stack to redo stack.

Layer 4: Testing Your Implementation

Hint 4.1: How do you write a basic test?

void test_execute_undo() {
    Canvas canvas = {0};
    UndoManager mgr = create_undo_manager(10);

    // Execute command
    Command* cmd = create_add_shape_command(100, 100, 50, 50, BLUE);
    undo_manager_execute(&mgr, &canvas, cmd);

    assert(canvas.num_shapes == 1);
    assert(mgr.undo_count == 1);
    assert(mgr.redo_count == 0);

    // Undo command
    undo_manager_undo(&mgr, &canvas);

    assert(canvas.num_shapes == 0);
    assert(mgr.undo_count == 0);
    assert(mgr.redo_count == 1);

    printf("test_execute_undo PASSED\n");
}

Hint 4.2: How do you test redo stack clearing?

void test_redo_stack_clears() {
    Canvas canvas = {0};
    UndoManager mgr = create_undo_manager(10);

    // Execute two commands
    Command* cmd1 = create_add_shape_command(100, 100, 50, 50, BLUE);
    Command* cmd2 = create_add_shape_command(200, 200, 30, 30, RED);
    undo_manager_execute(&mgr, &canvas, cmd1);
    undo_manager_execute(&mgr, &canvas, cmd2);

    // Undo one command
    undo_manager_undo(&mgr, &canvas);
    assert(mgr.redo_count == 1);

    // Execute new command
    Command* cmd3 = create_add_shape_command(150, 150, 40, 40, GREEN);
    undo_manager_execute(&mgr, &canvas, cmd3);

    // Redo stack should be empty
    assert(mgr.redo_count == 0);

    printf("test_redo_stack_clears PASSED\n");
}

Books That Will Help

This table maps specific challenges to book sections. Consult these when you hit walls.

When You’re Struggling With…	Read This	Why It Helps
Defining the command interface	“Design Patterns” pp. 233-242, Command pattern	Shows canonical OOP implementation; translate to C using function pointers
Managing undo/redo stacks	“Algorithms, Fourth Edition” Section 1.3.1, Stack implementation	Master stack operations: push, pop, peek, clear
Clearing redo stack on new action	“Pro Git” Chapter 3 on branching	Git does the exact same thing—new commits from detached HEAD
Memory management for commands	“Effective C” Chapter 8, Resource management	How to avoid leaks when destroying command objects
Grouped/compound commands	“Design Patterns” pp. 163-173, Composite pattern	Treat group of commands as a single command
Storing state snapshots (Memento)	“Design Patterns” pp. 283-291, Memento pattern	When you can’t compute the inverse, save the state
Implementing pruning/limits	“The Garbage Collection Handbook” Chapter 1	Understand why unbounded memory is bad; how to reclaim
Testing state consistency	“Code Complete” Chapter 8.2, Assertions	Use assertions to check invariants after each operation
Handling partial failures	“Release It!” Chapter 5, Stability patterns	What if execute() succeeds but undo() fails? Transactional semantics
Serializing command history	“Designing Data-Intensive Applications” Chapter 7, Transactions	How databases do this—Write-Ahead Logs are command logs
Making canvas operations atomic	“Operating Systems: Three Easy Pieces” Chapter 32, Concurrency bugs	Even single-threaded apps have temporal bugs
Debugging temporal issues	“Debugging” by David Agans, Rule 6: Divide and Conquer	Bisect the command history to find where state diverged

The single most important book for this project: “Design Patterns: Elements of Reusable Object-Oriented Software” by Gang of Four (specifically Command, Memento, and Composite patterns).

Free online resource:

“Undo/Redo with Command Pattern” - Refactoring Guru (https://refactoring.guru/design-patterns/command) - visual diagrams, code examples in multiple languages

Common Pitfalls & Debugging

Problem 1: “Redo works after new edits”

Why: Redo stack is not cleared when a new command executes.
Fix: Clear redo stack on every new execute.
Quick test: Undo twice, execute new command, ensure redo is empty.

Problem 2: “Undo doesn’t restore exact previous state”

Why: Command doesn’t store enough data (missing old position/color).
Fix: Store minimal but complete deltas or a memento.
Quick test: Execute, undo, and compare state with a snapshot.

Problem 3: “Grouped commands partially undo”

Why: Group undo executes sub-commands in the wrong order.
Fix: Undo sub-commands in reverse order, redo in forward order.
Quick test: Group 3 shapes, undo, ensure all are restored atomically.

Problem 4: “Command history leaks memory”

Why: Commands allocate but are never freed when pruned.
Fix: Add destroy() per command and free pruned history.
Quick test: Execute 1000 commands and check valgrind.

Definition of Done

Every command has execute, undo, and destroy behavior.
Undo returns the canvas to the exact prior state (byte-for-byte).
Redo is only available immediately after undo; new command clears redo stack.
Grouped commands undo/redo atomically.
History size is bounded (pruning works without leaks).
1000 random operations + undos maintain invariant checks.
valgrind reports zero leaks.

Project 5: Embedded Sensor State Machine (Arduino/STM32)

File: SPRINT_3_CONTROL_FLOW_STATE_PROJECTS.md
Programming Language: C
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Embedded Systems / Real-time
Software or Tool: State Machines
Main Book: “Making Embedded Systems” by Elecia White

What you’ll build: A sensor monitoring system that reads temperature/humidity, manages power states (ACTIVE, LOW_POWER, SLEEP), handles sensor errors gracefully, and logs data—all with explicit state machine code, no hidden magic.

Why it teaches Control Flow & State: Embedded systems have no garbage collector, no exception unwinding, no “retry later.” You must handle every error explicitly, clean up every resource, and state transitions must be rock-solid because “undefined behavior” means hardware malfunction.

Core challenges you’ll face:

Power state machine (ACTIVE↔LOW_POWER↔SLEEP with rules) → maps to state transitions
Sensor timeout (what if sensor doesn’t respond?) → maps to error propagation
Interrupt safety (sensor interrupt during state transition) → maps to reentrancy
I2C cleanup (bus stuck? must reset) → maps to cleanup paths
Watchdog feeding (forget to feed → reset) → maps to temporal bugs

Resources for key challenges:

“Making Embedded Systems, 2nd Edition” Chapter 5 - Elecia White - State machine patterns for embedded
“AVR Workshop” Chapters 8-10 - John Boxall - Practical I2C and sensor handling

Key Concepts:

Embedded State Machines: “Making Embedded Systems” Chapter 5 - Elecia White
Error Handling without Exceptions: “Bare Metal C” Chapter 6 - Steve Oualline
Interrupt Safety: “The Art of Debugging with GDB, DDD, and Eclipse” Chapter on embedded - Matloff & Salzman
Resource Cleanup in C: “Effective C” Chapter 8 - Robert C. Seacord

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Basic C, Arduino or STM32 basics, understanding of I2C/SPI

Real world outcome:

Device on your desk reads temperature every 5 seconds
LED blinks: green=ACTIVE, yellow=LOW_POWER, off=SLEEP
Serial monitor shows: [STATE] ACTIVE → LOW_POWER (idle 30s)
Disconnect sensor wire: [ERROR] I2C timeout, retry 1/3... [ERROR] Sensor DEAD, entering SAFE mode
Watchdog triggers if your state machine hangs: device resets cleanly

Learning milestones:

State LED reflects actual state → You understand explicit state representation
Sensor disconnect doesn’t crash → You’ve mastered error propagation without exceptions
Interrupt during transition is safe → You understand reentrancy in hard real-time
24-hour run, no hangs → You’ve eliminated temporal bugs

Real World Outcome

When you complete this project, you’ll have a physical embedded device sitting on your desk that you can interact with and observe:

Hardware Setup:

Arduino Uno/Nano or STM32 Blue Pill connected via USB
DHT22 temperature/humidity sensor connected via I2C (SDA to A4, SCL to A5 on Arduino)
Three LEDs with resistors: Green (pin 9), Yellow (pin 10), Red (pin 11)
Optional: buzzer on pin 12 for error alerts
Serial monitor at 115200 baud displaying real-time logs

Observable Behavior in ACTIVE State:

[00:00:05] [STATE: ACTIVE] Temp: 23.5°C, Humidity: 45.2%
[00:00:10] [STATE: ACTIVE] Temp: 23.6°C, Humidity: 45.1%
[00:00:15] [STATE: ACTIVE] Temp: 23.5°C, Humidity: 45.3%

Green LED blinks every 5 seconds when sensor reads successfully
Serial output shows timestamp, current state, sensor readings
Watchdog timer is fed every loop iteration (invisible but critical)

Transition to LOW_POWER State (after 30s of stable readings):

[00:00:30] [TRANSITION] ACTIVE → LOW_POWER (trigger: idle_time > 30s)
[00:00:30] [I2C] Sending LOW_POWER command to sensor (0x3C)
[00:00:30] [STATE: LOW_POWER] Reading interval: 5s → 30s

Green LED fades to yellow (PWM brightness 50%)
Sensor reading interval changes from 5s to 30s
CPU enters sleep mode between readings (visible power consumption drop if measuring)
Serial output slows down significantly

Error Handling - Sensor Disconnect:

Physically unplug the I2C sensor wire while running:

[00:01:45] [STATE: ACTIVE] Temp: 23.5°C, Humidity: 45.2%
[00:01:50] [I2C ERROR] No ACK from device 0x40 (timeout: 100ms)
[00:01:50] [RECOVERY] Retry 1/3...
[00:01:51] [I2C ERROR] No ACK from device 0x40 (timeout: 100ms)
[00:01:51] [RECOVERY] Retry 2/3...
[00:01:52] [I2C ERROR] No ACK from device 0x40 (timeout: 100ms)
[00:01:52] [RECOVERY] Retry 3/3... FAILED
[00:01:52] [TRANSITION] ACTIVE → ERROR_SAFE (trigger: sensor_failure)
[00:01:52] [STATE: ERROR_SAFE] Sensor marked DEAD, using last known values

Red LED starts blinking rapidly (200ms on/off)
Buzzer emits three short beeps (if connected)
State machine transitions to ERROR_SAFE mode
Device continues running with last known good values
I2C bus reset sequence executes (visible as brief LED flicker)
Device does not crash, reset, or hang

Reconnecting the Sensor:

[00:02:00] [RECOVERY] Attempting sensor reinitialization...
[00:02:00] [I2C] Bus reset complete
[00:02:00] [I2C] Device 0x40 ACK received
[00:02:01] [SENSOR] Configuration restored: resolution=12-bit, mode=continuous
[00:02:01] [TRANSITION] ERROR_SAFE → ACTIVE (trigger: sensor_recovered)
[00:02:01] [STATE: ACTIVE] Temp: 23.4°C, Humidity: 45.8%

Red LED transitions to green
Single long beep (recovery confirmation)
Normal operation resumes seamlessly

Watchdog Demonstration - Simulated Hang:

Press the “hang simulation” button (or send ‘H’ via serial):

[00:03:15] [DEBUG] Simulating infinite loop (watchdog test)...
[00:03:15] [HANG] Entering deliberate 5s delay without feeding watchdog

All LEDs freeze in current state
Serial output stops

After ~4 seconds (watchdog timeout):

[00:03:19] [WATCHDOG] Reset triggered! Cause: Timeout
[00:03:19] [BOOT] Embedded Sensor State Machine v1.0
[00:03:19] [INIT] Watchdog configured: timeout=4s
[00:03:19] [INIT] I2C initialized: speed=100kHz
[00:03:20] [STATE: INIT → ACTIVE] Sensor ready

Device automatically resets and returns to ACTIVE state
Boot sequence shows “Reset cause: Watchdog”
Previous state is lost, but device recovers cleanly

Interrupt Safety Demonstration:

The sensor uses a DATA_READY interrupt on pin 2:

[00:04:30] [STATE: ACTIVE] Temp: 23.5°C, Humidity: 45.2%
[00:04:31] [ISR] Data ready interrupt triggered
[00:04:31] [ISR] Setting flag: sensor_data_ready = true
[00:04:31] [MAIN] Processing sensor data from ISR flag
[00:04:35] [STATE: ACTIVE → LOW_POWER] Preparing state transition...
[00:04:35] [ISR] Data ready interrupt triggered DURING TRANSITION
[00:04:35] [ISR] Transition in progress, deferring data read
[00:04:35] [MAIN] Transition complete, processing deferred ISR request

Interrupts can fire at any time, including during state transitions
Critical sections use cli() / sei() (disable/enable interrupts) during state changes
ISR never calls blocking I2C functions directly
Demonstrates proper flag-based communication between ISR and main loop

Power Consumption Logging (if measuring with multimeter):

[STATE: ACTIVE]      → Current: 45mA @ 5V
[STATE: LOW_POWER]   → Current: 15mA @ 5V
[STATE: SLEEP]       → Current: 8mA @ 5V (theoretical, requires advanced config)

24-Hour Stability Test Results:

After running overnight:

[23:59:55] [STATS] Uptime: 86395s (23h 59m 55s)
[23:59:55] [STATS] Total readings: 17279
[23:59:55] [STATS] I2C errors: 3 (all recovered)
[23:59:55] [STATS] State transitions: 47
[23:59:55] [STATS] Watchdog feeds: 17279 (100% success)
[23:59:55] [STATS] Memory: heap free=1456/2048 bytes (no leaks detected)

This tangible, observable behavior is what separates embedded state machine design from abstract theory—you can see, touch, and measure the direct consequences of every design decision you make.

The Core Question You’re Answering

“How do you design a state machine for an embedded system that handles errors gracefully, manages power states, and maintains correctness even when hardware misbehaves—all without the safety nets of exceptions, garbage collection, or operating system support?”

This question forces you to confront:

Explicit State Representation: How do you represent states explicitly when there’s no framework to help you? Your state machine must be visible in the code, not hidden in boolean flags scattered across the program.
Error Propagation Without Exceptions: When I2C_read() fails, you can’t throw an exception. How do you propagate errors up the call stack, ensure cleanup happens, and make the state transition to a safe fallback mode?
Interrupt Safety and Reentrancy: Your sensor interrupt can fire at any moment, including mid-state-transition. How do you protect critical sections without blocking interrupts for too long (missing events) or too little (corrupting state)?
Resource Cleanup with No Destructor: In C on bare metal, there’s no RAII, no finally blocks. If I2C initialization fails halfway, how do you ensure the bus is reset, GPIO pins are released, and memory is freed?
Temporal Correctness Under Real-Time Constraints: The watchdog timer doesn’t care about your excuses. If you don’t feed it every 4 seconds, the device resets. How do you structure control flow so that, no matter which path the code takes, the watchdog gets fed?
Power State Transitions with Side Effects: Transitioning from ACTIVE to LOW_POWER means sending I2C commands to the sensor, reconfiguring timers, and changing LED state. What if the I2C command fails halfway? Can you end up in a state where the sensor thinks it’s in LOW_POWER but your code thinks it’s in ACTIVE?

Why This Question Matters:

In desktop/web programming, you can rely on:

The OS catching segfaults
Garbage collectors preventing leaks
Exception handlers unwinding the stack
Debuggers stepping through crashes

In embedded systems:

Undefined behavior means the hardware does something physically wrong (motor spins backward, sensor overheats, LED stays on draining battery)
“Try later” isn’t an option when you’re controlling a pacemaker or antilock brakes
You have 2KB of RAM and 32KB of flash—every byte counts
Your code runs for months without restarting

The Insight You’ll Gain:

After building this project, you’ll understand that state machines aren’t just a pattern—they’re a discipline. Every function call is a potential failure point. Every state transition is a contract. Every interrupt is an adversary trying to catch you with your guard down.

You’ll learn to think in terms of:

State invariants: “If state == LOW_POWER, then timer_interval >= 30s” (always true)
Transition preconditions: “Can only enter SLEEP if I2C bus is idle” (checked before transition)
Postconditions: “After entering ERROR_SAFE, sensor is reset and LED is red” (verified after transition)

This mindset—defending against every possible failure mode, explicitly managing every resource, never assuming anything “just works”—is what separates code that runs in a lab from code that runs in production for 10 years.

Concepts You Must Understand First

Before diving into implementation, you need solid mental models of these foundational concepts. Without them, you’ll be debugging mysterious hangs at 2 AM.

1. I2C Protocol and Bus Management

What you must know:

I2C is a two-wire serial protocol (SDA=data, SCL=clock) where one master communicates with multiple slaves
Each slave has a 7-bit address (e.g., DHT22 at 0x40)
Communication happens in transactions: START → ADDRESS → ACK/NACK → DATA → STOP
The bus can “hang” if a slave holds SDA low (clock stretching gone wrong)

Why it matters here: Your sensor communicates via I2C. If you don’t handle NACK (sensor not responding) or timeout (slave holding bus), your state machine will hang forever waiting for data.

Critical details:

// This code can HANG if sensor doesn't respond:
uint8_t data = I2C_read(SENSOR_ADDR);  // Infinite wait = watchdog reset

// Correct approach: timeout and error handling
I2C_Result result = I2C_read_timeout(SENSOR_ADDR, &data, 100ms);
if (result == I2C_TIMEOUT) {
    // State transition to ERROR_SAFE
}

Where to learn:

“Making Embedded Systems” Chapter 6 (Communicating with Peripherals) explains I2C in practical embedded context
“AVR Workshop” Chapters 8-10 cover I2C/TWI implementation with real code examples
Arduino Wire library documentation (understanding the abstraction helps, but you’ll implement lower-level)

2. Interrupt Service Routines (ISRs) and Atomicity

What you must know:

ISRs are functions that run when hardware triggers an interrupt (sensor data ready, timer overflow)
ISRs can interrupt your main code at any instruction—even mid-state-transition
ISRs should be fast (microseconds) and minimal (set flags, don’t do I2C)
Shared variables between ISR and main loop must be volatile and access must be atomic

Why it matters here: Your sensor asserts an interrupt when data is ready. If your ISR tries to read I2C (blocking call) or modifies state machine variables without protection, you get corruption or deadlock.

Critical details:

// WRONG: I2C in ISR = disaster
ISR(SENSOR_INTERRUPT) {
    float temp = read_sensor();  // I2C is blocking, will hang
    state = ACTIVE;              // Non-atomic write, race condition
}

// RIGHT: Set flag, defer work to main loop
volatile bool sensor_ready = false;
ISR(SENSOR_INTERRUPT) {
    sensor_ready = true;  // Single atomic byte write
}

void main_loop() {
    if (sensor_ready) {
        cli();  // Disable interrupts during critical section
        sensor_ready = false;
        sei();  // Re-enable interrupts

        float temp = read_sensor();  // Safe to block here
        // ... state machine update
    }
}

Where to learn:

“Making Embedded Systems” Chapter 4 (Interrupts) with embedded-specific gotchas
AVR datasheet section on interrupts (dense but authoritative)
“The Art of Designing Embedded Systems” Chapter 3 covers ISR best practices

3. Watchdog Timers and System Recovery

What you must know:

A watchdog timer is independent hardware that resets the MCU if not “fed” within a timeout period
Feeding = writing a specific pattern to a watchdog register (proves code is still running)
If your code hangs, the watchdog fires, resetting the device to a known-good state
Watchdog timeout should be tuned: too short = spurious resets, too long = prolonged malfunction

Why it matters here: Your state machine must guarantee the watchdog gets fed in every control flow path. Miss it once during an error handler, and the device resets.

Critical details:

void state_machine_tick() {
    switch (state) {
        case ACTIVE:
            read_sensor();  // Might take 50ms
            wdt_reset();    // Feed watchdog AFTER potentially slow operation
            break;

        case ERROR_SAFE:
            attempt_recovery();
            wdt_reset();    // MUST feed here too, or we reset during recovery
            break;
    }
}

// Configure watchdog timeout based on worst-case execution time:
// If sensor read takes 50ms, I2C timeout is 100ms, set watchdog to 500ms (safety margin)

Where to learn:

“Making Embedded Systems” Chapter 8 (Watchdogs and fault handling)
AVR/STM32 datasheet sections on WDT configuration
“Patterns in the Machine” discusses watchdog strategy in safety-critical systems

4. Power States and Clock Management

What you must know:

MCUs have multiple power modes: ACTIVE (full speed), IDLE (CPU off, peripherals on), SLEEP (everything off except watchdog)
Entering low-power modes requires disabling peripherals (ADC, timers, UART)
Waking from sleep can be triggered by interrupts (sensor ready, timer overflow)
Current consumption: ACTIVE ~45mA, IDLE ~15mA, SLEEP ~1-5mA (varies by MCU)

Why it matters here: Your state machine’s LOW_POWER and SLEEP states map to actual hardware power modes. Transition logic must configure peripherals correctly or you’ll wake up with non-functional I2C.

Critical details:

void enter_low_power_state() {
    // 1. Tell sensor to reduce sample rate
    I2C_write(SENSOR_ADDR, CMD_LOW_POWER);

    // 2. Reconfigure timer from 5s to 30s
    Timer1_set_period(30000);  // ms

    // 3. Reduce LED brightness (PWM)
    analogWrite(LED_GREEN, 0);
    analogWrite(LED_YELLOW, 128);  // 50% brightness

    // 4. Enter CPU idle mode (wakes on timer or sensor interrupt)
    set_sleep_mode(SLEEP_MODE_IDLE);
    sleep_enable();
    sei();  // Interrupts must be enabled to wake
    sleep_cpu();  // Blocks here until interrupt
    sleep_disable();
}

Where to learn:

“Making Embedded Systems” Chapter 10 (Low-Power Design)
AVR/STM32 app notes on power optimization
Arduino’s <avr/sleep.h> library documentation (abstracts the registers)

5. State Machine Design Patterns

What you must know:

Mealy machines: Output depends on state + input (e.g., LED color depends on state AND sensor status)
Moore machines: Output depends only on state (e.g., LED color = f(state))
Explicit vs. implicit state: enum State state; (good) vs. scattered booleans (bad)
Transition table: Formal representation of (current_state, event) → (next_state, action)

Why it matters here: You’ll implement a Moore machine where each state has well-defined outputs (LED pattern, reading interval). Events (sensor_timeout, idle_time_exceeded) trigger transitions.

Critical details:

typedef enum {
    STATE_ACTIVE,
    STATE_LOW_POWER,
    STATE_SLEEP,
    STATE_ERROR_SAFE
} SystemState;

typedef enum {
    EVENT_SENSOR_OK,
    EVENT_SENSOR_TIMEOUT,
    EVENT_IDLE_TIMEOUT,
    EVENT_USER_WAKEUP
} SystemEvent;

// Transition table (conceptual):
// (ACTIVE, IDLE_TIMEOUT) → LOW_POWER + action: slow_sensor()
// (LOW_POWER, USER_WAKEUP) → ACTIVE + action: wake_sensor()
// (ANY_STATE, SENSOR_TIMEOUT) → ERROR_SAFE + action: reset_i2c()

void handle_event(SystemEvent event) {
    SystemState old_state = state;

    switch (state) {
        case STATE_ACTIVE:
            if (event == EVENT_IDLE_TIMEOUT) {
                state = STATE_LOW_POWER;
                enter_low_power_state();
            } else if (event == EVENT_SENSOR_TIMEOUT) {
                state = STATE_ERROR_SAFE;
                enter_error_safe_state();
            }
            break;

        case STATE_LOW_POWER:
            // ... transitions
            break;
    }

    if (old_state != state) {
        log_transition(old_state, state, event);
    }
}

Where to learn:

“Making Embedded Systems” Chapter 5 (State Machines) with embedded focus
“UML Distilled” Chapter 8 (State Machine Diagrams) for formal notation
“Design Patterns” Chapter on State pattern (Gang of Four) for OOP approach (you’ll use C version)

6. Error Handling without Exceptions

What you must know:

C has no exceptions—functions return error codes (e.g., -1 or enum I2C_Result)
Every function call must be checked: if (I2C_read(...) != I2C_OK) { /* handle */ }
Cleanup must happen in reverse order of acquisition (LIFO): init A, init B → cleanup B, cleanup A
Early returns require goto cleanup; pattern (controversial but practical)

Why it matters here: Your sensor initialization might fail. I2C reads might timeout. You need a disciplined approach to propagate errors up the call stack and clean up partial state.

Critical details:

// WRONG: No error checking, resource leak on failure
void init_sensor() {
    I2C_init();
    GPIO_init();
    sensor_config();
}

// RIGHT: Check every step, clean up on failure
int init_sensor() {
    if (I2C_init() != 0) {
        return -1;  // Nothing to clean up yet
    }

    if (GPIO_init() != 0) {
        I2C_cleanup();  // Clean up I2C before returning
        return -2;
    }

    if (sensor_config() != 0) {
        GPIO_cleanup();
        I2C_cleanup();
        return -3;
    }

    return 0;  // Success
}

// BETTER: goto cleanup pattern (reduces duplication)
int init_sensor() {
    int result = 0;

    if (I2C_init() != 0) {
        result = -1;
        goto cleanup_none;
    }

    if (GPIO_init() != 0) {
        result = -2;
        goto cleanup_i2c;
    }

    if (sensor_config() != 0) {
        result = -3;
        goto cleanup_gpio;
    }

    return 0;  // Success

cleanup_gpio:
    GPIO_cleanup();
cleanup_i2c:
    I2C_cleanup();
cleanup_none:
    return result;
}

Where to learn:

“Effective C” Chapter 8 (Resource Management) with goto cleanup pattern
Linux kernel coding style (uses goto for cleanup extensively)
“C Interfaces and Implementations” Chapter 4 (Error Handling) by David Hanson

Foundational mental model:

Think of your embedded system as a defense contractor where every subsystem (I2C, timer, sensor) has a service-level agreement (SLA):

I2C promises to respond in 100ms or return TIMEOUT
Sensor promises valid data or ERROR state
Watchdog promises to reset if not fed every 4s

Your state machine is the orchestrator that holds everyone accountable. If I2C breaks its SLA, you don’t crash—you transition to ERROR_SAFE and try plan B (reset bus). If sensor dies, you log it and run on last known good values.

This isn’t paranoia—it’s correctness by contract. Master these concepts, and you’ll write embedded code that runs for years without human intervention.

Questions to Guide Your Design

State Transitions: What events trigger ACTIVE → LOW_POWER → SLEEP?
Error Handling: How do you detect sensor timeout and transition to ERROR_SAFE?
ISR Safety: Which data is written by ISRs, and how is it synchronized?
Watchdog: Where is the watchdog fed on every control path?
Power: What peripherals must be disabled before entering sleep?

Thinking Exercise

Trace this scenario on paper:

System boots → INIT
Sensor read succeeds for 5 cycles
Idle timeout triggers LOW_POWER
Sensor NACKs 3 times
User presses button to wake device

Questions:

What state are you in after step 4?
Which resources must be reset?
Does the user wake event override ERROR_SAFE?

The Interview Questions They’ll Ask

“How do you prevent I2C calls inside an interrupt handler?”
“How do you guarantee the watchdog is fed even on errors?”
“How do you debounce a noisy sensor or button input?”
“How would you test your state machine without real hardware?”
“What happens if a sensor fails permanently?”

Hints in Layers

Hint 1: Start with explicit states and events

typedef enum { INIT, ACTIVE, LOW_POWER, SLEEP, ERROR_SAFE } State;
typedef enum { EVT_TICK, EVT_SENSOR_OK, EVT_SENSOR_TIMEOUT, EVT_IDLE, EVT_WAKE } Event;

Hint 2: Keep ISR minimal

volatile bool sensor_ready = false;
ISR(SENSOR_INT) { sensor_ready = true; }

Hint 3: Centralize transitions

void handle_event(Event e) {
    switch (state) {
        case ACTIVE:
            if (e == EVT_IDLE) enter_low_power();
            if (e == EVT_SENSOR_TIMEOUT) enter_error_safe();
            break;
        // ...
    }
}

Hint 4: Add a test harness

$ ./sensor_sim --events "OK,OK,OK,TIMEOUT,TIMEOUT,TIMEOUT"
[STATE] ACTIVE -> ERROR_SAFE

Books That Will Help

Topic	Book	Chapter
Embedded state machines	“Making Embedded Systems” by White	Ch. 5
Interrupts & timing	“Making Embedded Systems” by White	Ch. 4
Error handling in C	“Effective C” by Seacord	Ch. 8
Low-power design	“Making Embedded Systems” by White	Ch. 10
Bare-metal patterns	“Bare Metal C” by Oualline	Ch. 6

Online references:

Arduino Debounce Example - https://www.arduino.cc/en/Tutorial/BuiltInExamples/Debounce
AVR sleep modes (if using AVR) - https://www.microchip.com/wwwproducts/en/ATmega328p

Common Pitfalls & Debugging

Problem 1: “Random hangs after a few minutes”

Why: ISR calls blocking I2C or modifies shared state without guards.
Fix: Only set flags in ISR; handle I2C in main loop.
Quick test: Toggle debug LED in ISR only; no I2C calls.

Problem 2: “Watchdog resets during error handling”

Why: Error paths skip the watchdog feed.
Fix: Feed watchdog in every state handler.
Quick test: Force sensor timeout and observe no resets.

Problem 3: “Button/sensor input is noisy”

Why: No debounce or filtering.
Fix: Add software debounce and require stable readings for N ticks.
Quick test: Log raw vs. filtered input transitions.

Problem 4: “State machine stuck in ERROR_SAFE”

Why: Recovery state never retries initialization.
Fix: Add periodic recovery attempts with backoff.
Quick test: Disconnect sensor, reconnect, and confirm recovery.

Definition of Done

Explicit state machine with ACTIVE/LOW_POWER/SLEEP/ERROR_SAFE transitions.
Sensor disconnect triggers ERROR_SAFE without crashing or hanging.
Watchdog is fed on every control path (including error states).
ISR does minimal work and never performs blocking I2C.
Debounce/filtering prevents false state transitions.
24-hour soak test runs without hangs or leaks.

Project 6: Build a Git-like Version Control System

What you’ll build: A simplified version control system that tracks file changes, manages commits as a directed acyclic graph, handles branches/merges, and maintains repository integrity even on interrupted operations.

Why it synthesizes everything: This project is the ultimate state machine challenge. The repository has states (clean, modified, merging, rebasing). Operations have strict preconditions (“can’t commit with unstaged changes”), postconditions (“after commit, working tree matches HEAD”), and cleanup requirements (“interrupted merge must be recoverable”). You’ll face:

State transitions: repo states (CLEAN, DIRTY, MERGING, REBASING, DETACHED)
Invariants: commit graph must be acyclic, every commit reachable from refs
Error propagation: disk full during commit? must not corrupt repository
Cleanup paths: interrupted operations must leave recoverable state
Temporal bugs: operations depend on previous state
Reentrancy: hooks that trigger other git operations
API design: commands must prevent impossible states

Core challenges you’ll face:

Atomic commits (all-or-nothing, even on power failure) → everything you learned
Merge conflict state machine (MERGING→resolve→CLEAN, or abort→CLEAN) → state transitions
Object database integrity (SHA verification, dangling object cleanup) → resource patterns
Hook reentrancy (post-commit hook does another commit?!) → reentrancy
Index/staging area invariants (index always consistent with something) → loop invariants
Detached HEAD handling (user can get into weird states) → API misuse prevention

Key Concepts:

Content-Addressable Storage: “Pro Git” Chapter 10 (Git Internals) - Scott Chacon (free online)
DAG Structures: “Algorithms, Fourth Edition” Chapter on Directed Graphs - Sedgewick & Wayne
Atomic File Operations: “The Linux Programming Interface” Chapter 5 - Michael Kerrisk
Crash-Safe Design: “Designing Data-Intensive Applications” Chapter 7 - Martin Kleppmann

Difficulty: Advanced Time estimate: 4-6 weeks Prerequisites: All projects above, or equivalent experience

Real world outcome:

$ ./mygit init
Initialized empty repository in .mygit/

$ echo "hello" > file.txt
$ ./mygit add file.txt
$ ./mygit commit -m "Initial commit"
[master (root-commit) a1b2c3d] Initial commit

$ ./mygit branch feature
$ ./mygit checkout feature
Switched to branch 'feature'

$ echo "world" >> file.txt
$ ./mygit commit -am "Add world"
[feature e4f5g6h] Add world

$ ./mygit checkout master
$ ./mygit merge feature
Merge made by the 'recursive' strategy.

$ ./mygit log --graph
* e4f5g6h (HEAD -> master, feature) Add world
* a1b2c3d Initial commit

Learning milestones:

Basic commit/checkout works → You understand state snapshots
Interrupted commit doesn’t corrupt repo → You’ve mastered cleanup paths
Merge conflicts enter MERGING state, abort works → State machine mastery
100 rapid commits, no corruption → Temporal correctness achieved
mygit fsck finds no errors after stress test → Invariants are bulletproof

Real World Outcome

Enhanced example output:

$ ./mygit init
Initialized empty repository in .mygit/
[REPO STATE] CLEAN

$ echo "hello" > file.txt
$ ./mygit status
[REPO STATE] DIRTY (1 untracked file)
Untracked files:
  file.txt

$ ./mygit add file.txt
[INDEX] Added blob sha1:ce013625 (file.txt)
[REPO STATE] DIRTY (1 staged file)

$ ./mygit commit -m "Initial commit"
[COMMIT] Creating commit object...
[COMMIT] Writing tree sha1:9a6f2b1e
[COMMIT] Writing commit sha1:a1b2c3d4
[COMMIT] Updating ref refs/heads/master -> a1b2c3d4
[master (root-commit) a1b2c3d4] Initial commit
[REPO STATE] CLEAN

$ ./mygit branch feature
[REF] Created refs/heads/feature -> a1b2c3d4

$ ./mygit checkout feature
[CHECKOUT] Detaching HEAD from master
[CHECKOUT] Attaching HEAD to feature
[WORKTREE] Updating working directory...
Switched to branch 'feature'
[REPO STATE] CLEAN

$ echo "world" >> file.txt
$ ./mygit commit -am "Add world"
[INDEX] Auto-staging modified file: file.txt
[COMMIT] Creating commit object with parent a1b2c3d4...
[COMMIT] Writing tree sha1:f8e7d6c5
[COMMIT] Writing commit sha1:e4f5g6h7
[COMMIT] Updating ref refs/heads/feature -> e4f5g6h7
[feature e4f5g6h7] Add world
[REPO STATE] CLEAN

$ ./mygit checkout master
[CHECKOUT] Saving current branch state: feature -> e4f5g6h7
[CHECKOUT] Switching to master
[WORKTREE] Rewinding file.txt to previous state
Switched to branch 'master'
[REPO STATE] CLEAN

$ ./mygit merge feature
[MERGE] Analyzing merge base...
[MERGE] Fast-forward possible: master is ancestor of feature
[MERGE] Fast-forwarding master to e4f5g6h7
[WORKTREE] Updating working directory...
Updating a1b2c3d4..e4f5g6h7
Fast-forward
[REPO STATE] CLEAN

# Simulate interrupted operation
$ echo "conflict" > file.txt
$ ./mygit checkout feature
[CHECKOUT] ERROR: uncommitted changes in file.txt
[REPO STATE] DIRTY (1 modified file)

$ ./mygit status
[REPO STATE] DIRTY (1 modified file)
Modified files:
  file.txt

# Create actual merge conflict
$ ./mygit checkout -b conflict-branch
$ echo "branch change" > file.txt
$ ./mygit commit -am "Branch modification"
[conflict-branch b8c9d0e1] Branch modification

$ ./mygit checkout master
$ echo "master change" > file.txt
$ ./mygit commit -am "Master modification"
[master f2g3h4i5] Master modification

$ ./mygit merge conflict-branch
[MERGE] Analyzing merge base...
[MERGE] Three-way merge required
[MERGE] Detecting conflicts...
[MERGE] CONFLICT in file.txt
[REPO STATE] MERGING
Automatic merge failed; fix conflicts and then commit the result.

$ ./mygit status
[REPO STATE] MERGING
On branch master
You have unmerged paths.
  (fix conflicts and run "mygit commit")
  (use "mygit merge --abort" to abort the merge)

Unmerged paths:
  both modified:   file.txt

$ ./mygit merge --abort
[MERGE] Aborting merge...
[MERGE] Restoring working directory to pre-merge state
[MERGE] Removing .mygit/MERGE_HEAD
[REPO STATE] CLEAN
Merge aborted.

# Verify repository integrity
$ ./mygit fsck
[FSCK] Checking object database...
[FSCK] Verifying 12 objects...
[FSCK] Checking commit graph for cycles...
[FSCK] Checking reachability from refs...
[FSCK] Verifying blob SHA-1 checksums...
[FSCK] All checks passed. Repository is healthy.

This shows the observable state transitions, atomic operations, error handling, and recovery mechanisms that make your VCS robust.

The Core Question You’re Answering

Synthesis Question: How does a version control system maintain consistency guarantees across concurrent file modifications, branch operations, and merge conflicts while ensuring that every repository state is either fully valid or recoverable?

This question sits at the intersection of:

State machine design (repository states: CLEAN, DIRTY, MERGING, REBASING)
Atomic operations (commits must be all-or-nothing)
Crash recovery (interrupted operations must be resumable)
Temporal correctness (operations depend on historical state)
Invariant preservation (commit graph must always be a valid DAG)

Why this matters: Git is used by millions of developers daily because it never corrupts repositories. Even power failures during commits leave the repository in a recoverable state. This level of robustness requires understanding state machines, atomic file operations, and careful invariant maintenance—the core of Sprint 3.

Concepts You Must Understand First

Before you can build a robust version control system, you need solid understanding of:

1. Directed Acyclic Graphs (DAGs)

Commits form a DAG where each commit points to parent(s)
DAG properties: no cycles, topological ordering exists
Merge commits have multiple parents
Branch operations are just moving pointers in the DAG

Why it matters: Your commit graph must never have cycles, or you can’t determine linear history. The repository state machine depends on DAG integrity.

2. Content-Addressable Storage

Objects (blobs, trees, commits) are identified by SHA-1 hash of content
Same content → same hash → automatic deduplication
Hash integrity: if hash matches, content is valid
Object database is append-only for safety

Why it matters: Content addressing gives you free integrity checking and prevents accidental corruption. It’s the foundation of Git’s reliability.

3. Atomic File Operations

Write to temporary file, then rename() to final location
rename() is atomic on POSIX systems
Interrupted operation leaves either old file or new file, never corrupted file
References (branch pointers) must be updated atomically

Why it matters: Without atomic operations, a crash during commit could corrupt your repository. You need OS-level guarantees.

4. State Machine Design for Repository States

States: CLEAN, DIRTY, MERGING, REBASING, CHERRY_PICKING, DETACHED

Transitions:
CLEAN --[modify file]--> DIRTY
DIRTY --[commit]--> CLEAN
CLEAN --[merge with conflicts]--> MERGING
MERGING --[resolve + commit]--> CLEAN
MERGING --[abort]--> CLEAN

Why it matters: Invalid state transitions (like committing during a rebase without resolution) must be rejected. The state machine enforces correctness.

5. Index (Staging Area) Semantics

Index is a snapshot of what will be in the next commit
It’s a state between working directory and repository
Index must always be valid: parseable, references existing objects
Partial staging: index can differ from both working dir and HEAD

Why it matters: The index is the most complex state component. It must maintain invariants even when partially updated.

Questions to Guide Your Design

As you build, constantly ask:

Repository Integrity

What if the program crashes mid-commit?
- Which files have been written? Which haven’t?
- How do you detect incomplete commits?
- How do you roll back or complete the operation?
How do you prevent commit graph cycles?
- When creating a merge commit, how do you verify parent relationships?
- Can an attacker manually edit objects to create cycles?
- How does fsck detect this?
What if .mygit/ directory is corrupted?
- Which files are critical vs recoverable?
- Can you rebuild the index from HEAD?
- How do you detect bit-rot in stored objects?

State Transitions

What operations are valid in MERGING state?
- Can you commit normally? (Yes, to complete merge)
- Can you checkout another branch? (No, finish or abort merge first)
- Can you start a rebase? (No, one operation at a time)
How do you represent “detached HEAD” state?
- HEAD points to commit directly, not to a ref
- What warnings should you give the user?
- What happens if they commit in this state?
What if merge conflicts span multiple files?
- Do you enter MERGING state only after all conflicts are resolved?
- Or does each file have individual merge state?
- How do you track partial resolution?

Atomic Operations

How do you make branch creation atomic?
- Write ref file atomically (temp file + rename)
- What if two processes create same branch simultaneously?
- Do you need locking?
How do you ensure “commit” is all-or-nothing?
- Order of operations: write blobs, write tree, write commit, update ref
- If power fails after tree but before ref update, what happens?
- Can you detect and clean up orphaned objects?

Cleanup and Recovery

What happens if merge is interrupted?
- You write .mygit/MERGE_HEAD to mark MERGING state
- User must either complete merge or run merge --abort
- How do you restore pre-merge state on abort?
How do you clean up orphaned objects?
- Objects not reachable from any ref
- Keep recent orphans (might be in progress), delete old ones
- How do you implement garbage collection?

Thinking Exercise: Trace a Merge Operation

Scenario: You have this repository state:

(master)        (feature)
    A             C
     \           /
      B---------D

Commits A and B are on master, C and D are on feature. They both modified file.txt in conflicting ways. You run mygit merge feature from master.

Trace the state machine:

[STATE: CLEAN] HEAD=master=B, working_dir=clean

1. User runs: mygit merge feature
   → Check precondition: working directory must be CLEAN
   → Identify merge base: commit A (common ancestor of B and D)

2. Three-way merge:
   → Read file.txt from commit A (base version)
   → Read file.txt from commit B (ours version)
   → Read file.txt from commit D (theirs version)
   → Attempt automatic merge
   → CONFLICT DETECTED

3. Enter MERGING state:
   → Write .mygit/MERGE_HEAD with commit ID of D
   → Write conflict markers to working directory file.txt:
     <<<<<<< HEAD (B)
     our changes
     =======
     their changes
     >>>>>>> feature (D)
   → Update index to mark file.txt as unmerged

[STATE: MERGING] HEAD=master=B, MERGE_HEAD=D, file.txt has conflicts

4. User edits file.txt to resolve conflicts
   [STATE: MERGING] (no change, still conflicted)

5. User runs: mygit add file.txt
   → Read resolved file.txt
   → Compute blob hash: sha1:xyz...
   → Write blob object to .mygit/objects/xy/z...
   → Update index to mark file.txt as resolved
   [STATE: MERGING] (still in MERGING, but conflicts resolved)

6. User runs: mygit commit -m "Merge feature"
   → Check precondition: state must be MERGING
   → Check that all conflicts are resolved (no unmerged paths in index)
   → Create tree from index: sha1:tree123
   → Create commit with TWO parents: B (HEAD) and D (MERGE_HEAD)
   → Write commit object: sha1:commit456
   → Update master ref: B -> commit456
   → Delete .mygit/MERGE_HEAD
   → Transition to CLEAN

[STATE: CLEAN] HEAD=master=commit456, working_dir=clean

Graph now looks like:
    A     C
     \   /
      B-D
       \|
        M (merge commit456)

What if user ran mygit merge --abort instead?

[STATE: MERGING] after step 3 above

6. User runs: mygit merge --abort
   → Check precondition: state must be MERGING
   → Read .mygit/MERGE_HEAD to find D
   → Read HEAD to find B
   → Restore working directory to state at B:
     - Read tree from commit B
     - Overwrite file.txt with version from B
     - Clear conflict markers
   → Delete .mygit/MERGE_HEAD
   → Transition to CLEAN

[STATE: CLEAN] HEAD=master=B, working_dir=clean (as if merge never happened)

Critical questions from this trace:

What if power fails after writing .mygit/MERGE_HEAD but before writing conflict markers?
- On restart, repository is in MERGING state, but working directory is clean
- mygit status should detect this and either complete or allow abort
What if user modifies file.txt while in MERGING state but doesn’t run add?
- mygit status should show “modified but not staged for commit”
- mygit commit should fail: “conflicts not resolved”
What if .mygit/MERGE_HEAD contains an invalid commit hash?
- Repository is corrupted
- mygit fsck should detect this
- User must manually delete .mygit/MERGE_HEAD to recover

The Interview Questions They’ll Ask

Junior Level (Basic State Machine Understanding)

“Explain what happens when you run git commit“
- Expected answer: Creates blob objects for staged files, creates tree object, creates commit object with parent pointer, updates branch ref
- Follow-up: “What if commit fails halfway through?”
- Expected answer: Objects are written but ref is not updated, so orphaned objects exist but don’t affect repository state
“What’s the difference between git reset --soft, --mixed, and --hard?”
- Expected answer: All move HEAD pointer, but differ in what they do to index and working directory
- Shows understanding of three-state model: repository, index, working directory
“Why does Git use SHA-1 hashes instead of sequential numbers?”
- Expected answer: Content addressing, integrity checking, distributed operation without coordination
- Follow-up: “What happens if two objects have same hash?”
- Expected answer: Hash collision (astronomically unlikely), but Git would detect it

Mid-Level (Operational Consistency)

“Design the data structure for the staging area (index)”
- Expected answer: Map of filepath -> (blob_hash, mode, stage_number)
- Stage numbers: 0=normal, 1=base, 2=ours, 3=theirs (for merge conflicts)
- Must be efficiently serializable and parseable
“How would you implement atomic branch creation?”
- Expected answer: Write to temporary file, compute final path, rename() atomically
- Follow-up: “What if two processes create same branch?”
- Expected answer: Second rename() fails, detect and report error
“Explain the state machine for a merge operation”
- Expected answer: CLEAN -> (detect conflicts) -> MERGING -> (resolve) -> (commit) -> CLEAN
- Alternative path: MERGING -> (abort) -> CLEAN
- Follow-up: “Can you commit during MERGING state?”
- Expected answer: Yes, that’s how you complete the merge. But it’s a special commit (merge commit with two parents)

Senior Level (System Design & Edge Cases)

“Design garbage collection for your VCS”
- Expected answer:
  - Find all reachable objects from refs (mark phase)
  - Find all objects in object database (sweep phase)
  - Delete unreachable objects older than N days (grace period for in-progress operations)
  - Compress/repack remaining objects for efficiency
- Follow-up: “How do you avoid deleting objects from a concurrent commit?”
- Expected answer: Use timestamp-based grace period, or read-write locks on object database
“How do you ensure repository consistency after a crash during rebase?”
- Expected answer:
  - Write .mygit/REBASE_HEAD and .mygit/rebase-apply/ state directory
  - Each rebase step is atomic (apply patch + create commit)
  - On crash, detect REBASE state on next operation
  - Offer to continue (--continue) or abort (--abort)
  - Abort restores original branch pointer from saved state
“How would you detect and prevent commit graph cycles?”
- Expected answer:
  - During commit creation, verify parent commits exist and are reachable
  - Implement cycle detection in fsck: DFS with visited set, detect back edges
  - Reject operations that would create cycles (manual object editing)
  - Use topological sort to verify DAG property
“Design the protocol for safe concurrent access to the repository”
- Expected answer:
  - Object writes are safe (content-addressed, append-only)
  - Ref updates need locking or compare-and-swap
  - Use lock files (.mygit/refs/heads/master.lock)
  - Implement timeout and stale lock detection
  - Alternative: use atomic rename() with retry on conflict

Hints in Layers (Progressive Revelation)

Layer 1: Architecture Hints

Click to reveal high-level architecture hints

Repository Structure:

.mygit/
  objects/          # Content-addressable object database
    xy/
      z123...       # Blob, tree, or commit object
  refs/
    heads/
      master        # File containing commit hash
      feature
    tags/
      v1.0
  HEAD              # Points to current branch or commit
  index             # Staging area (binary file)
  MERGE_HEAD        # Present during merge (commit being merged)
  config            # Repository configuration

Core Data Structures:

Blob: raw file content
Tree: directory listing (filename -> blob hash + mode)
Commit: tree hash + parent hash(es) + author + message
Index: staged changes (filepath -> blob hash + metadata)

Key Operations:

add: Compute blob hash, write blob object, update index
commit: Create tree from index, create commit object, update branch ref
checkout: Update HEAD, update index, update working directory
merge: Find merge base, three-way merge, create merge commit or enter MERGING state

Layer 2: State Machine Hints

Click to reveal state machine implementation hints

Repository States (as enum):

typedef enum {
    REPO_CLEAN,           // No changes, no operations in progress
    REPO_DIRTY,           // Uncommitted changes in working dir or index
    REPO_MERGING,         // Merge in progress (MERGE_HEAD exists)
    REPO_REBASING,        // Rebase in progress (rebase-apply/ exists)
    REPO_CHERRY_PICKING,  // Cherry-pick in progress
    REPO_DETACHED         // HEAD points to commit, not branch
} RepoState;

State Detection Function:

RepoState get_repo_state(void) {
    if (file_exists(".mygit/MERGE_HEAD"))
        return REPO_MERGING;
    if (dir_exists(".mygit/rebase-apply"))
        return REPO_REBASING;
    if (is_head_detached())
        return REPO_DETACHED;
    if (has_uncommitted_changes())
        return REPO_DIRTY;
    return REPO_CLEAN;
}

Precondition Checking:

// Before checkout
void check_checkout_preconditions(void) {
    RepoState state = get_repo_state();
    if (state == REPO_MERGING)
        die("cannot checkout during merge; commit or abort first");
    if (state == REPO_DIRTY && !force_flag)
        die("uncommitted changes; commit or stash first");
}

Layer 3: Atomic Operation Hints

Click to reveal atomic operation implementation hints

Atomic Ref Update (branch pointer):

int update_ref_atomic(const char* refname, const char* new_hash) {
    char path[PATH_MAX];
    char tmppath[PATH_MAX];

    snprintf(path, sizeof(path), ".mygit/refs/heads/%s", refname);
    snprintf(tmppath, sizeof(tmppath), "%s.lock", path);

    // Write to temporary file
    FILE* f = fopen(tmppath, "w");
    if (!f) return -1;
    fprintf(f, "%s\n", new_hash);
    fclose(f);

    // Atomic rename
    if (rename(tmppath, path) != 0) {
        unlink(tmppath);  // Clean up on failure
        return -1;
    }

    return 0;
}

Atomic Commit Creation:

int create_commit_atomic(const char* message, const char* parent_hash) {
    // 1. Write blob objects for all staged files (idempotent, safe)
    write_staged_blobs();

    // 2. Write tree object (idempotent, content-addressed)
    char tree_hash[41];
    write_tree_from_index(tree_hash);

    // 3. Write commit object (idempotent)
    char commit_hash[41];
    write_commit_object(tree_hash, parent_hash, message, commit_hash);

    // 4. Update branch ref (atomic, last step)
    // If this fails, objects are orphaned but repo is consistent
    if (update_ref_atomic(current_branch(), commit_hash) != 0)
        return -1;

    return 0;
}

Why this order matters:

Steps 1-3 write to object database (append-only, no existing data modified)
Step 4 is the “commit point”—if it succeeds, commit is visible; if it fails, repo unchanged
Crash before step 4: orphaned objects (harmless, cleaned by GC later)
Crash during step 4: rename() is atomic, so ref points to old or new commit, never corrupted

Layer 4: Merge State Machine Hints

Click to reveal merge conflict handling hints

Three-Way Merge Algorithm:

typedef enum {
    MERGE_SUCCESS,        // Clean merge
    MERGE_CONFLICT,       // Conflicts detected
    MERGE_FAST_FORWARD    // Can fast-forward
} MergeResult;

MergeResult merge_branch(const char* their_branch) {
    // 1. Check preconditions
    if (get_repo_state() != REPO_CLEAN)
        die("repository not clean");

    // 2. Find merge base
    char* base = find_merge_base(current_commit(), their_commit);

    // 3. Check for fast-forward
    if (strcmp(base, current_commit()) == 0) {
        // Our branch hasn't diverged, just move pointer
        update_ref_atomic(current_branch(), their_commit);
        return MERGE_FAST_FORWARD;
    }

    // 4. Three-way merge
    MergeResult result = three_way_merge(base, current_commit(), their_commit);

    if (result == MERGE_CONFLICT) {
        // 5. Enter MERGING state
        write_merge_head(their_commit);  // Create .mygit/MERGE_HEAD
        write_conflict_markers_to_worktree();
        mark_index_unmerged();
        return MERGE_CONFLICT;
    }

    // 6. Create merge commit (two parents)
    create_merge_commit(current_commit(), their_commit, "Merge ...");
    return MERGE_SUCCESS;
}

Merge Abort:

void merge_abort(void) {
    // 1. Verify we're in MERGING state
    if (!file_exists(".mygit/MERGE_HEAD"))
        die("no merge in progress");

    // 2. Read original HEAD
    char original_head[41];
    read_head(original_head);

    // 3. Restore working directory to HEAD state
    checkout_tree_to_worktree(original_head);

    // 4. Reset index to HEAD
    reset_index_to_commit(original_head);

    // 5. Remove MERGE_HEAD (exit MERGING state)
    unlink(".mygit/MERGE_HEAD");
}

Conflict Markers:

void write_conflict_markers(const char* filepath,
                            const char* ours_content,
                            const char* theirs_content) {
    FILE* f = fopen(filepath, "w");
    fprintf(f, "<<<<<<< HEAD\n");
    fprintf(f, "%s", ours_content);
    fprintf(f, "=======\n");
    fprintf(f, "%s", theirs_content);
    fprintf(f, ">>>>>>> %s\n", their_branch_name);
    fclose(f);
}

Layer 5: Recovery and Fsck Hints

Click to reveal crash recovery and integrity checking hints

Detect Incomplete Operations on Startup:

void check_repo_integrity_on_startup(void) {
    // If MERGE_HEAD exists but no conflict markers, merge was interrupted
    if (file_exists(".mygit/MERGE_HEAD")) {
        fprintf(stderr, "WARNING: Merge in progress. Use 'merge --continue' or 'merge --abort'\n");
    }

    // If rebase-apply/ exists, rebase was interrupted
    if (dir_exists(".mygit/rebase-apply")) {
        fprintf(stderr, "WARNING: Rebase in progress. Use 'rebase --continue' or 'rebase --abort'\n");
    }

    // Check for lock files (stale locks from crashes)
    remove_stale_locks();
}

Fsck: Verify Repository Integrity:

void fsck(void) {
    // 1. Enumerate all objects in object database
    HashSet* db_objects = find_all_objects_in_database();

    // 2. Verify each object's SHA-1 checksum
    for (each object in db_objects) {
        if (!verify_object_hash(object))
            fprintf(stderr, "ERROR: Corrupted object %s\n", object);
    }

    // 3. Find all reachable objects from refs
    HashSet* reachable = new_hashset();
    for (each ref in refs/) {
        mark_reachable_recursive(ref_commit, reachable);
    }

    // 4. Report unreachable objects (candidates for GC)
    for (each object in db_objects) {
        if (!hashset_contains(reachable, object))
            fprintf(stderr, "Unreachable object: %s\n", object);
    }

    // 5. Verify commit graph is acyclic
    if (detect_cycle_in_commit_graph())
        fprintf(stderr, "FATAL: Cycle detected in commit graph!\n");
}

Cycle Detection:

bool detect_cycle_dfs(const char* commit_hash, HashSet* visiting, HashSet* visited) {
    if (hashset_contains(visited, commit_hash))
        return false;  // Already processed, no cycle here

    if (hashset_contains(visiting, commit_hash))
        return true;   // Back edge detected—CYCLE!

    hashset_add(visiting, commit_hash);

    // Recurse on parents
    char** parents = get_commit_parents(commit_hash);
    for (int i = 0; parents[i]; i++) {
        if (detect_cycle_dfs(parents[i], visiting, visited))
            return true;
    }

    hashset_remove(visiting, commit_hash);
    hashset_add(visited, commit_hash);
    return false;
}

Books That Will Help

Topic/Challenge	Primary Resource	Chapter/Section	Why This Specific Section
Content-Addressable Storage	Pro Git by Scott Chacon	Chapter 10: Git Internals	Free online. Shows exact `.git` structure, object types, how Git actually stores data. Read this first.
DAG Algorithms	Algorithms, 4th Edition by Sedgewick & Wayne	Chapter 4.2: Directed Graphs	Topological sort, cycle detection, reachability—exactly what you need for commit graphs
Atomic File Operations	The Linux Programming Interface by Michael Kerrisk	Chapter 5.1: Atomic Operations, Chapter 15.1: File Attributes	Explains why `rename()` is atomic, how to use temp files safely
Crash-Safe Design	Designing Data-Intensive Applications by Martin Kleppmann	Chapter 7: Transactions	Explains atomicity, durability, consistency—same principles Git uses
State Machine Implementation	Language Implementation Patterns by Terence Parr	Chapter 2: Basic Parsing Patterns	State-driven parsing applies to repository state machines
Hash Functions & Integrity	Applied Cryptography by Bruce Schneier	Chapter 18.14: SHA	Understand why Git uses SHA-1, what collision resistance means
Error Handling in C	Effective C by Robert Seacord	Chapter 8: Error Handling	How to propagate errors without exceptions (critical for C implementation)
Merge Algorithms	Version Control with Git by Prem Kumar Ponuthorai & Jon Loeliger	Chapter 9: Merges	Three-way merge, recursive strategy, conflict resolution details
Index/Staging Design	Pro Git by Scott Chacon	Chapter 10.2: Git Objects, 10.4: Packfiles	How index file format works, why it’s separate from working dir and commits
Memory-Mapped Files	The Linux Programming Interface by Michael Kerrisk	Chapter 49: Memory Mappings	Efficient way to read/write index and object files
Reference Counting & GC	The Garbage Collection Handbook by Jones, Hosking, Moss	Chapter 3: Mark-Sweep Garbage Collection	Git’s GC is mark-sweep over object graph

Reading Order for This Project:

Pro Git Chapter 10 (understand Git internals) - START HERE
Algorithms Chapter 4.2 (DAG algorithms)
Linux Programming Interface Chapter 5 (atomic operations)
Effective C Chapter 8 (error handling patterns)
Designing Data-Intensive Applications Chapter 7 (consistency guarantees)

When You Get Stuck:

“How do I make operations atomic?” → Linux Programming Interface Chapter 5
“How do I detect cycles?” → Algorithms Chapter 4.2
“How do merge algorithms work?” → Version Control with Git Chapter 9
“How do I design the index file format?” → Pro Git Chapter 10.2

Common Pitfalls & Debugging

Problem 1: “Crash during commit corrupts refs”

Why: Ref updated before objects are fully written.
Fix: Write objects first, update refs last (atomic rename).
Quick test: Kill process after object write but before ref update; repo should remain consistent.

Problem 2: “Merge state gets stuck”

Why: MERGE_HEAD not cleaned up or conflicts not marked resolved.
Fix: Ensure merge abort removes MERGE_HEAD and resets index.
Quick test: Simulate merge conflict and abort; repo should return to CLEAN.

Problem 3: “Index corruption”

Why: Partial writes to index file or stale lock files.
Fix: Write index to temp file and rename; remove stale locks on startup.
Quick test: Force crash during index write; ensure recovery path detects and repairs.

Problem 4: “Commit graph cycles”

Why: Invalid parent references or manual object edits.
Fix: Validate parents exist and run cycle detection in fsck.
Quick test: Inject a cycle and confirm fsck fails.

Definition of Done

init, add, commit, status, checkout, branch, merge, and log operate correctly.
Commit creation is crash-safe (objects written first, ref updated last).
Merge conflicts enter MERGING state and are abortable.
Repository state detection correctly reports CLEAN/DIRTY/MERGING/DETACHED.
fsck detects corrupted objects and commit graph cycles.
Garbage collection removes unreachable objects after a grace period.
Stress test (100 commits, 20 branches) shows no corruption or leaks.

This final project integrates every concept from Sprint 3 into a single, real-world tool that you’ll actually understand at the deepest level—because you built it from nothing.