Project 3: Mini Async Runtime

Build a minimal async runtime with a reactor, timers, and non-blocking TCP I/O.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2 to 4 weeks
Main Programming Language C (C17)
Alternative Programming Languages Rust, Zig
Coolness Level Level 4: Hardcore Tech Flex
Business Potential 3 - Runtime Core
Prerequisites Sockets, file descriptors, basic concurrency
Key Topics epoll/kqueue, event loop, timers, futures, backpressure

1. Learning Objectives

By completing this project, you will:

  1. Implement a working event loop with non-blocking I/O and timers.
  2. Build a task scheduler that drives state machines instead of threads.
  3. Understand the reactor model and readiness vs completion semantics.
  4. Enforce backpressure to avoid unbounded memory growth.
  5. Produce a deterministic demo and benchmark for thousands of connections.

2. All Theory Needed (Per-Concept Breakdown)

Concept 1: Non-Blocking I/O and Readiness Models

Fundamentals

Non-blocking I/O lets a single thread manage many connections. When a socket is non-blocking, operations like read and write return immediately. If no data is available, the call returns an error such as EAGAIN or EWOULDBLOCK. To avoid busy loops, the program waits on a poller like epoll or kqueue which tells it which file descriptors are ready. This is the readiness model: the OS tells you when a resource is ready for I/O, not when the operation has completed. The runtime must handle partial reads and writes, because readiness means “you can try” rather than “you will finish”. This is the foundation of async runtimes and the reason a single thread can handle thousands of connections.

Deep Dive into the concept

Readiness-based I/O multiplexing is built around a poller: select, poll, epoll, or kqueue. select and poll scale poorly because they scan all file descriptors each time. epoll (Linux) and kqueue (BSD/macOS) are more scalable because they maintain state in the kernel and return only the ready descriptors. They can operate in level-triggered or edge-triggered mode. Level-triggered means the descriptor remains ready until the condition is cleared; edge-triggered means you only get notified when readiness transitions. Edge-triggered can be more efficient but is easier to get wrong, because you must drain the socket fully or you may never get another event.

Non-blocking sockets require state machines. Suppose a write only writes part of the buffer. Your runtime must store the remaining bytes and retry when the socket is writable again. Similarly, a read might return fewer bytes than requested or even zero if the connection is closed. The runtime must interpret these results, update connection state, and reregister interest appropriately. This is why async runtimes track per-connection state and attach it to the poller.

Ready does not equal complete. A socket can be reported writable, but the write can still return EAGAIN because the socket buffer filled between readiness check and write. That is normal and must be handled without spinning. The runtime should register interest again and wait. This is often the source of high CPU bugs: a runtime that immediately retries instead of waiting will spin at 100 percent CPU.

On Linux, epoll operates with an explicit registration model: you add file descriptors with EPOLLIN or EPOLLOUT events. You can use epoll_ctl to modify interest, for example enabling EPOLLOUT only when you actually have data to send. This avoids being woken up unnecessarily. On BSD, kqueue uses events like EVFILT_READ and EVFILT_WRITE. The semantics are similar but the API differs. An abstraction layer is often needed if you want portability. In your runtime, you can focus on one poller but keep the design extensible.

A correct runtime must handle file descriptor lifetimes carefully. If a connection is closed, you must remove it from the poller. If you reuse file descriptors, you must ensure that stale registrations do not trigger callbacks for the wrong connection. This is particularly important in a low-level C runtime where there is no automatic lifetime tracking. A common pattern is to assign each connection an ID and store a pointer to a connection struct in the poller data field. When you handle an event, verify that the connection is still alive.

Finally, readiness interacts with backpressure. If you accept connections faster than you can process them, your runtime may run out of memory for buffers or state. You need to implement limits on the number of open connections, the size of per-connection buffers, and the size of the task queue. If these limits are exceeded, you should stop accepting new connections until the queue drains. This is not just a performance feature; it is essential for stability under load.

How This Fits on Projects

This concept defines the socket handling in Section 3.2 Functional Requirements, the poller abstraction in Section 4.1, and the event loop implementation in Section 5.10 Phase 1. It also connects to syscall wrappers in P04-cross-platform-syscall-abstraction.md and concurrency in P02-work-stealing-thread-pool.md.

Definitions & Key Terms

  • Non-blocking I/O: Operations return immediately with EAGAIN if they would block.
  • Readiness: A signal that an I/O operation is likely to succeed.
  • Poller: OS mechanism to wait for events on file descriptors.
  • Level-triggered: Event stays active until condition is cleared.
  • Edge-triggered: Event fires only on state transitions.

Mental Model Diagram (ASCII)

Socket ready? -> event loop -> try read/write -> update state

How It Works (Step-by-Step)

  1. Set sockets to non-blocking mode.
  2. Register sockets with the poller for read/write events.
  3. Wait for events with epoll_wait or kevent.
  4. On readiness, attempt I/O and update state.
  5. If EAGAIN, re-register interest and continue.

Invariants:

  • All sockets registered with the poller are non-blocking.
  • Events are processed until buffers are drained.

Failure modes:

  • Busy loops due to retrying without waiting.
  • Missed events due to incorrect edge-triggered handling.

Minimal Concrete Example

int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);

Common Misconceptions

  • “Ready means the operation will succeed” -> It can still return EAGAIN.
  • “Edge-triggered is always better” -> It is easier to miss events.
  • “Non-blocking means faster” -> It means controlled latency, not raw speed.

Check-Your-Understanding Questions

  1. Why must you handle partial writes in a non-blocking runtime?
  2. What is the difference between readiness and completion?
  3. Why can an edge-triggered loop miss events if you do not drain the socket?

Check-Your-Understanding Answers

  1. The kernel may only accept part of the buffer in a single write.
  2. Readiness signals that you can try; completion means the operation finished.
  3. Edge-triggered events fire only on state changes; if you leave data unread, no new edge occurs.

Real-World Applications

  • Web servers like Nginx use readiness-based event loops.
  • Node.js uses libuv to multiplex I/O with epoll and kqueue.

Where You’ll Apply It

References

  • “Advanced Programming in the UNIX Environment” - I/O multiplexing.
  • “The Linux Programming Interface” - epoll details.

Key Insights

Readiness-based I/O is the reason one thread can handle thousands of connections, but it requires careful state management.

Summary

Non-blocking I/O and pollers provide scalable concurrency, but they demand explicit handling of partial I/O and readiness semantics. A runtime must treat EAGAIN as a normal state, not an error.

Homework/Exercises to Practice the Concept

  1. Write a non-blocking echo server without an event loop and observe busy loops.
  2. Implement a tiny epoll loop that accepts and reads connections.
  3. Compare level-triggered vs edge-triggered behavior.

Solutions to the Homework/Exercises

  1. Without a poller, the loop spins at high CPU when no data is ready.
  2. The epoll loop blocks efficiently and wakes on readiness.
  3. Edge-triggered requires draining; level-triggered re-fires until cleared.

Concept 2: Event Loops, Timers, and the Reactor Pattern

Fundamentals

An event loop waits for events (I/O readiness, timers, signals), processes them, and then repeats. The reactor pattern is a design where the loop receives events and dispatches them to handlers. Timers are part of the same system: the loop must wake up when the next timer expires. This means the loop must calculate the next deadline and pass a timeout to the poller. A common implementation uses a min-heap of timer deadlines. The heap allows you to quickly find the next timer to fire and adjust the poller timeout accordingly. Without timers, you cannot implement timeouts, retries, or scheduled tasks. With timers integrated, the runtime can schedule work without spinning or sleeping arbitrarily.

Deep Dive into the concept

The core of an event loop is a loop that waits on a poller and a timer structure. Each iteration does three things: (1) determine the timeout based on the next timer, (2) block on the poller for I/O events up to that timeout, and (3) process both I/O events and expired timers. This design ensures the loop does not busy-wait and that timers fire as close to their deadlines as possible. The min-heap provides O(log n) insertion and removal, which is fine for thousands of timers.

Timers are tricky in the presence of drift and clock changes. If you use the wall clock (CLOCK_REALTIME), system time adjustments can shift timer deadlines and cause jumps. A runtime should use a monotonic clock (CLOCK_MONOTONIC) that only moves forward. The event loop should compute timeouts based on the monotonic clock and handle the case where multiple timers expire in the same iteration. If your runtime uses a fixed tick, you may introduce jitter that affects latency. A timer heap allows exact deadlines but requires careful handling of expired timers to avoid O(n) scans.

Another design choice is how to represent tasks and callbacks. In a reactor, each file descriptor has a callback for read or write readiness. The callback can schedule additional tasks, update the interest set, or close the connection. Timers are similar: a timer expiration triggers a callback. The event loop should avoid running unbounded work in a single iteration, or else it can starve I/O. A common strategy is to limit the number of tasks processed per tick or to implement a run queue with a fairness policy.

The loop also needs a wakeup mechanism. If another thread wants to schedule work onto the loop, it must be able to wake it. This is often done with an eventfd on Linux or a pipe that is registered in the poller. When a task is scheduled, the other thread writes to the eventfd, waking the loop. Even in a single-threaded runtime, you may want a wakeup mechanism to handle signals or to integrate with future multi-threaded extensions.

Correct integration of timers with I/O readiness prevents busy loops. Suppose there are no I/O events and the next timer expires in 100ms. The poller timeout should be 100ms, so the loop sleeps until either I/O arrives or the timer fires. If the timeout is set incorrectly (e.g., always 0), the loop spins. If the timeout is too large, timers fire late. The runtime must compute the timeout precisely and update it as timers are added or canceled.

How This Fits on Projects

This concept defines the event loop architecture in Section 4.1 and the timer implementation in Section 5.10 Phase 2. It also affects the benchmark output in Section 3.4 and the timeouts used in P06-embedded-key-value-database.md.

Definitions & Key Terms

  • Event loop: A loop that waits for events and dispatches handlers.
  • Reactor: Pattern where events are dispatched to callbacks.
  • Timer heap: A min-heap of deadlines for fast next-timer lookup.
  • Monotonic clock: Clock that only moves forward.
  • Wakeup fd: A file descriptor used to wake a poller.

Mental Model Diagram (ASCII)

[events] + [timers] -> event loop -> callbacks -> tasks

How It Works (Step-by-Step)

  1. Compute next timer deadline and timeout.
  2. Call poller with timeout.
  3. Process I/O events and dispatch callbacks.
  4. Pop expired timers and run timer callbacks.
  5. Repeat.

Invariants:

  • Timer deadlines are ordered in a min-heap.
  • Poller timeout equals time until next deadline.

Failure modes:

  • Timer drift due to wrong clock.
  • Busy loop due to zero timeout.

Minimal Concrete Example

int timeout_ms = compute_next_timer_timeout();
int n = epoll_wait(epfd, events, max_events, timeout_ms);

Common Misconceptions

  • “Timers are separate from I/O” -> They must be integrated with poller timeouts.
  • “Wall clock is fine” -> It can jump and break deadlines.
  • “Process all tasks immediately” -> It can starve I/O.

Check-Your-Understanding Questions

  1. Why should the event loop use a monotonic clock?
  2. How does a timer heap reduce overhead?
  3. What happens if the poller timeout is always 0?

Check-Your-Understanding Answers

  1. It avoids time jumps that would break deadlines.
  2. It provides O(log n) insertion and O(1) access to the next deadline.
  3. The loop spins at 100 percent CPU.

Real-World Applications

  • libuv and Tokio use event loops with timer heaps.
  • Game loops integrate timers for fixed update ticks.

Where You’ll Apply It

References

  • “Advanced Programming in the UNIX Environment” - event loops and pollers.
  • “Designing Data-Intensive Applications” - background scheduling ideas.

Key Insights

A correct event loop is about precise timeouts and fairness, not just polling.

Summary

Event loops and timers are the heartbeat of an async runtime. A timer heap and monotonic clock prevent drift and busy loops.

Homework/Exercises to Practice the Concept

  1. Implement a timer heap and test with fixed deadlines.
  2. Integrate a timer heap with epoll_wait timeout.
  3. Add a wakeup fd and schedule a timer from another thread.

Solutions to the Homework/Exercises

  1. The min-heap returns the next deadline in O(1).
  2. The loop sleeps until the earliest timer or I/O event.
  3. The wakeup fd interrupts the poller so new timers are recognized immediately.

Concept 3: Futures, Wakers, and Backpressure

Fundamentals

Async runtimes use tasks that represent suspended computations. A task runs until it cannot make progress, then it yields and registers interest in some event. When the event occurs, the runtime wakes the task and it continues. This is usually modeled as a state machine: each await point is a state. A “waker” is a callback or handle that re-enqueues the task. Backpressure is the control mechanism that prevents the runtime from being overwhelmed. If you accept connections faster than you can process, memory usage grows without bound. Backpressure can be implemented by limiting the number of active tasks, by pausing acceptance, or by bounding queues. Without it, an async runtime can fail under load even if it is logically correct.

Deep Dive into the concept

In languages like Rust, futures are explicit types. In C, you implement the same idea by hand: a task is a struct with a state field and a function that advances it. The function returns a status: READY, PENDING, or DONE. If PENDING, it must arrange for a wake-up when progress is possible. This can be done by registering the task with the event loop and storing a pointer to it in the poller data. When the corresponding event fires, the loop pushes the task back into the run queue.

The key is that a task must not block. If a task performs a blocking read on a socket, the entire loop stalls. Every I/O operation must be non-blocking, and any long computation should be broken into chunks and rescheduled to avoid starving other tasks. This leads to the idea of cooperative scheduling: tasks yield explicitly when they cannot make progress. A runtime often limits the number of tasks processed per tick to avoid starvation. This is a policy decision; you should choose a simple fairness policy and document it.

Wakers are the bridge between the event loop and tasks. A waker is typically a function pointer plus a task pointer. When an I/O event occurs, the event loop calls the waker, which enqueues the task for execution. In a single-threaded runtime, this can be a simple push into a queue. In a multi-threaded extension, the waker may need to be thread-safe and may need to wake the event loop itself. For this project, you can keep it single-threaded but structure the code so that a future multi-threaded extension is possible.

Backpressure is where runtime design meets system stability. Consider an HTTP server that accepts new connections. If the run queue or connection table is full, the runtime should stop accepting new connections. This can be implemented by temporarily removing the listening socket from the poller or by rejecting new connections with a controlled error. Similarly, if per-connection output buffers are full, the runtime should stop reading more input. This prevents memory from growing without bound. Backpressure also interacts with timers: you may want to drop idle connections after a timeout. That requires a timer per connection or a wheel structure for efficient timeouts.

Another part of backpressure is bounded channels between tasks. If you implement message passing, a bounded queue can force producers to wait or drop messages. This is common in production runtimes. For this project, you can keep it simple: cap the number of active tasks and the size of per-connection buffers. The important part is to define clear limits and enforce them consistently. The runtime should log when backpressure is applied so you can verify it under load.

How This Fits on Projects

This concept defines the task scheduler in Section 4.1, the run queue in Section 4.4, and the backpressure logic in Section 3.2 and Section 5.10 Phase 3. It also connects to work-stealing decisions in P02-work-stealing-thread-pool.md and data durability backpressure in P06-embedded-key-value-database.md.

Definitions & Key Terms

  • Future: A representation of a computation that may complete later.
  • Waker: A callback that re-schedules a task when it can make progress.
  • Run queue: A queue of tasks ready to execute.
  • Backpressure: Mechanisms that prevent unbounded growth under load.
  • Cooperative scheduling: Tasks yield explicitly rather than being preempted.

Mental Model Diagram (ASCII)

Task -> PENDING -> register waker -> event -> wake -> run queue

How It Works (Step-by-Step)

  1. Task runs and attempts I/O.
  2. If it would block, task registers a waker and returns PENDING.
  3. Event loop waits for readiness.
  4. On readiness, event loop calls waker.
  5. Task is re-enqueued and continues.

Invariants:

  • Tasks never block the event loop.
  • Run queue is bounded to enforce backpressure.

Failure modes:

  • Unbounded queues leading to memory exhaustion.
  • Tasks that never yield, starving others.

Minimal Concrete Example

typedef enum { TASK_READY, TASK_PENDING, TASK_DONE } task_state_t;

task_state_t task_poll(task_t* t) {
    if (io_ready(t->fd)) return TASK_READY;
    register_waker(t);
    return TASK_PENDING;
}

Common Misconceptions

  • “Async means parallel” -> Async is about interleaving, not multiple cores.
  • “Backpressure is optional” -> It is required for stability.
  • “Futures are magic” -> They are explicit state machines.

Check-Your-Understanding Questions

  1. Why must tasks never block the event loop?
  2. How does a waker connect I/O readiness to task execution?
  3. What happens if you accept unlimited connections?

Check-Your-Understanding Answers

  1. A blocking task stalls the entire runtime.
  2. The waker schedules the task when readiness occurs.
  3. Memory usage grows until the process crashes or is killed.

Real-World Applications

  • Tokio and libuv use wakers and run queues.
  • High-performance HTTP servers enforce backpressure to survive load spikes.

Where You’ll Apply It

References

  • “Programming Rust” - Futures and wakers concepts.
  • “Designing Data-Intensive Applications” - backpressure in systems.

Key Insights

Async runtimes succeed by turning blocking waits into explicit state transitions and enforcing backpressure.

Summary

Futures, wakers, and backpressure form the core of async runtimes. Without clear state machines and queue limits, an event loop either spins or collapses under load.

Homework/Exercises to Practice the Concept

  1. Implement a state machine for a simple line-based protocol.
  2. Add a bounded run queue and observe rejection under load.
  3. Add a timeout to close idle connections.

Solutions to the Homework/Exercises

  1. The state machine tracks read buffer and parsing progress.
  2. The bounded queue forces backpressure and prevents unbounded memory growth.
  3. Timers enforce liveness and free resources.

3. Project Specification

3.1 What You Will Build

A minimal async runtime that supports non-blocking TCP servers, timers, and task scheduling. The runtime will include an event loop with a poller (epoll or kqueue), a run queue for tasks, and a timer heap. It will ship with a demo HTTP server and a deterministic benchmark mode that replays a fixed workload.

3.2 Functional Requirements

  1. Event loop: Poller integration with epoll or kqueue.
  2. Task scheduler: Run queue with explicit state machines.
  3. Timers: Min-heap timers and timeouts for connections.
  4. Backpressure: Limit active connections and queue size.
  5. Demo server: HTTP or echo server using the runtime.
  6. Benchmark mode: Deterministic load generator with fixed seed.

3.3 Non-Functional Requirements

  • Performance: Handle 1,000+ concurrent connections on one thread.
  • Reliability: No busy loops at idle; clean shutdown.
  • Usability: Clear CLI flags and documented behavior.

3.4 Example Usage / Output

$ ./minirt --serve 8080 --max-conns 1024
[minirt] poller=epoll timers=heap
[minirt] listening on 0.0.0.0:8080

3.5 Data Formats / Schemas / Protocols

The benchmark output is JSON:

{
  "conns": 1000,
  "seed": 2024,
  "requests": 10000,
  "rps": 32000,
  "p99_ms": 2.4
}

3.6 Edge Cases

  • Accept loop disabled when max-conns reached.
  • Timer heap empty should block indefinitely on poller.
  • Partial writes must resume correctly.
  • Closed sockets must be removed from poller.

3.7 Real World Outcome

A single-threaded async server that can handle thousands of connections and a deterministic benchmark output.

3.7.1 How to Run (Copy/Paste)

make
./minirt --serve 8080 --max-conns 1024
./minirt --bench --seed 2024 --requests 10000

3.7.2 Golden Path Demo (Deterministic)

  • Run --bench with seed 2024 and fixed request count.
  • Expect stable RPS and latency across runs.

3.7.3 If CLI: Exact Terminal Transcript

$ ./minirt --bench --seed 2024 --requests 5000
[minirt] bench seed=2024 requests=5000
[minirt] rps=31500 p99_ms=2.7
exit_code=0

$ ./minirt --serve
minirt: missing port
usage: minirt --serve PORT
exit_code=2

3.7.4 If Web App

Not applicable.

3.7.5 If API

Not applicable.

3.7.6 If Library

Install/import

#include "minirt.h"

Minimal usage

rt_t* rt = rt_create();
rt_listen(rt, 8080, on_conn);
rt_run(rt);

Error handling

if (!rt_listen(rt, 8080, on_conn)) {
    fprintf(stderr, "listen failed\n");
}

3.7.7 If GUI / Desktop / Mobile

Not applicable.

3.7.8 If TUI

Not applicable.


4. Solution Architecture

4.1 High-Level Design

+------------+    +---------------+    +-----------+
| Poller     | -> | Event Loop    | -> | Task Queue|
+------------+    +---------------+    +-----------+
        |                 |
        v                 v
   I/O Events         Timer Heap

4.2 Key Components

Component Responsibility Key Decisions
Poller Wait for I/O readiness epoll/kqueue abstraction
Event loop Dispatch events fairness policy
Task queue Run ready tasks bounded queue
Timer heap Track deadlines monotonic clock

4.4 Data Structures (No Full Code)

typedef struct timer {
    uint64_t deadline_ms;
    void (*cb)(void*);
    void* data;
} timer_t;

typedef struct task {
    int fd;
    int state;
    void (*poll)(struct task*);
} task_t;

4.4 Algorithm Overview

Key Algorithm: Event Loop

  1. Compute timeout from timer heap.
  2. Wait for I/O events.
  3. Dispatch events to tasks.
  4. Run expired timers.

Complexity Analysis:

  • Time: O(E + log T) per tick.
  • Space: O(N) for connections and tasks.

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential
make

5.2 Project Structure

minirt/
|-- src/
|   |-- poller.c
|   |-- loop.c
|   |-- timers.c
|   `-- server.c
|-- include/
|   `-- minirt.h
|-- tests/
|   `-- test_runtime.c
`-- Makefile

5.3 The Core Question You’re Answering

“How can one thread handle thousands of connections without blocking?”

5.4 Concepts You Must Understand First

  1. Non-blocking I/O and EAGAIN
  2. Event loops and timer heaps
  3. State machines and backpressure

5.5 Questions to Guide Your Design

  1. Will you use edge-triggered or level-triggered events?
  2. How will you store per-connection state?
  3. How do you enforce max-conns backpressure?

5.6 Thinking Exercise

A socket is writable but write returns EAGAIN.

  • What state should you store?
  • How do you avoid spinning?

5.7 The Interview Questions They’ll Ask

  1. What is readiness-based I/O?
  2. Why do async runtimes need timers?
  3. How do you implement backpressure?

5.8 Hints in Layers

Hint 1: Start with poller-only

Accept and read connections without tasks.

Hint 2: Add state machines

Represent each connection as a state struct.

Hint 3: Add timers

Use a min-heap and compute poller timeouts.

Hint 4: Add backpressure

Stop accepting when max-conns is reached.

5.9 Books That Will Help

Topic Book Chapter
I/O multiplexing “Advanced Programming in the UNIX Environment” I/O chapter
Async runtime “Computer Systems: A Programmer’s Perspective” System I/O

5.10 Implementation Phases

Phase 1: Poller and Non-Blocking I/O (1 week)

Goals:

  • Build poller abstraction and non-blocking server

Tasks:

  1. Implement epoll/kqueue wrapper.
  2. Add non-blocking accept and read.

Checkpoint: Echo server handles multiple clients.

Phase 2: Timers and Scheduling (1 week)

Goals:

  • Add timer heap and task queue

Tasks:

  1. Implement timer heap and monotonic clock.
  2. Add task queue and fairness policy.

Checkpoint: Timers fire accurately in tests.

Phase 3: Backpressure and Benchmarks (1 to 2 weeks)

Goals:

  • Add limits and deterministic benchmarking

Tasks:

  1. Enforce max connections and buffer sizes.
  2. Build benchmark mode with fixed seed.

Checkpoint: Benchmarks are repeatable and CPU stays low at idle.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Poller epoll vs kqueue epoll first Simpler on Linux
Timer storage heap vs wheel heap Easier to implement accurately
Backpressure reject vs pause accept pause accept Avoids dropping connections

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Poller and timer correctness timer ordering
Integration Tests Server correctness echo tests
Edge Case Tests Backpressure max-conns reached

6.2 Critical Test Cases

  1. Timers fire within tolerance of deadline.
  2. Backpressure halts accepting new connections.
  3. No busy loop at idle.

6.3 Test Data

seed: 2024
requests: 5000
conns: 1000

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Busy loop 100% CPU at idle Use poller timeouts
Missed events Stalled connections Drain sockets in edge-triggered mode
No backpressure OOM under load Cap connections and queues

7.2 Debugging Strategies

  • Trace state transitions: log when tasks move between states.
  • Use strace: verify epoll_wait blocking behavior.

7.3 Performance Traps

  • Excessive wakeups due to enabling EPOLLOUT when not needed.
  • Too small buffers causing frequent syscalls.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a simple HTTP parser.
  • Add configurable timeouts per connection.

8.2 Intermediate Extensions

  • Add a multi-threaded front-end acceptor.
  • Add a bounded channel between tasks.

8.3 Advanced Extensions

  • Integrate io_uring as an alternative backend.
  • Add a work-stealing pool for CPU-bound tasks.

9. Real-World Connections

9.1 Industry Applications

  • Nginx: event loop with readiness-based I/O.
  • Tokio/libuv: async runtimes with pollers and timers.
  • libuv: cross-platform event loop.
  • Tokio: Rust async runtime.

9.3 Interview Relevance

  • Explain readiness vs completion.
  • Describe how to integrate timers with an event loop.

10. Resources

10.1 Essential Reading

  • “Advanced Programming in the UNIX Environment” - I/O multiplexing.
  • “Computer Systems: A Programmer’s Perspective” - network I/O.

10.2 Video Resources

  • Talks on async runtimes and event loops.

10.3 Tools & Documentation

  • man epoll, man kqueue, man fcntl.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain readiness vs completion.
  • I can explain how timers integrate with a poller.
  • I can explain backpressure in an async runtime.

11.2 Implementation

  • Event loop handles 1000+ connections.
  • Timers fire correctly and deterministically.
  • No busy loops at idle.

11.3 Growth

  • I can explain the runtime design in an interview.
  • I can identify how to add multi-threading safely.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Non-blocking server with poller and timer heap.
  • Deterministic benchmark mode.

Full Completion:

  • Backpressure and fairness implemented.
  • Clean shutdown and resource cleanup.

Excellence (Going Above & Beyond):

  • io_uring backend or multi-threaded front-end.
  • Detailed latency profiling.