Project 6: User-space Thread Library (Green Threads)

Build a cooperative threading library with user-level context switches, stacks, and a simple scheduler.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 2-3 weeks
Main Programming Language C (Alternatives: Rust)
Alternative Programming Languages Rust
Coolness Level Level 5: Magic
Business Potential Level 4: Infrastructure core
Prerequisites C, calling conventions, basic scheduling
Key Topics context switching, stack management, run queues

1. Learning Objectives

By completing this project, you will:

  1. Implement a user-level context switch mechanism.
  2. Allocate, align, and protect per-thread stacks.
  3. Build a scheduler with a run queue and cooperative yields.
  4. Understand limitations of green threads with blocking I/O.
  5. Provide deterministic scheduling for tests.
  6. Explain differences between green threads and kernel threads.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Context Switching and Register Preservation

Fundamentals

A context switch saves the CPU state of one thread and restores another. At minimum, you must save callee-saved registers, stack pointer, and instruction pointer so the thread can resume later.

Deep Dive into the concept

On x86-64 System V ABI, callee-saved registers include rbx, rbp, r12-r15 plus the stack pointer rsp. A green-thread switch can be implemented with setjmp/longjmp or handwritten assembly. The key is to store enough state so the thread can continue as if it never left. If you miss a register, threads will behave nondeterministically. Correct stack alignment (16 bytes) is also required before calling C functions.

How this fits on projects

This is the core of your scheduler and thread switching in Section 4.4 and Section 5.10 Phase 2.

Definitions & key terms

  • context switch -> saving/restoring CPU state
  • callee-saved -> registers preserved across function calls
  • stack pointer -> register pointing to top of stack

Mental model diagram (ASCII)

thread A regs -> save -> switch -> restore regs -> thread B runs

How it works (step-by-step)

  1. Save current registers into thread context.
  2. Store current stack pointer.
  3. Load next thread context and stack pointer.
  4. Jump to saved instruction pointer.

Minimal concrete example

if (setjmp(cur->ctx) == 0) longjmp(next->ctx, 1);

Common misconceptions

  • Misconception: only the stack pointer matters. Correction: registers must be saved too.

Check-your-understanding questions

  1. Which registers must be saved on x86-64 SysV?
  2. Why is stack alignment important?

Check-your-understanding answers

  1. rbx, rbp, r12-r15, rsp, rip.
  2. ABI and SIMD instructions require 16-byte alignment.

Real-world applications

  • User-level schedulers, coroutines, fibers

Where you’ll apply it

References

  • OSTEP concurrency chapters
  • System V ABI docs

Key insights

A context switch is pure state capture and restore.

Summary

If state is incomplete, threads are corrupted.

Homework/Exercises to practice the concept

  1. Save/restore registers in inline assembly and verify values.

Solutions to the homework/exercises

  1. Set known register values, switch, then verify after resume.

2.2 Stack Layout and Guard Pages

Fundamentals

Each green thread needs its own stack. Stacks grow downward and must be aligned. Guard pages prevent stack overflows from corrupting memory.

Deep Dive into the concept

You can allocate stacks using mmap to get page-aligned regions. Reserve an extra page at the end as PROT_NONE to catch overflows. Stack initialization requires placing a fake return address or trampoline so the first switch jumps into the thread entry function. The stack must be aligned to 16 bytes before any function call.

How this fits on projects

Stack allocation is in Section 5.10 Phase 1 and memory safety is in Section 7 pitfalls.

Definitions & key terms

  • guard page -> unmapped page to detect overflow
  • stack frame -> function’s local storage region
  • trampoline -> initial entry wrapper for new thread

Mental model diagram (ASCII)

[guard page][stack memory..............]

How it works (step-by-step)

  1. mmap stack region + guard page.
  2. Protect guard page with mprotect(PROT_NONE).
  3. Initialize stack pointer at top of stack.
  4. Place trampoline to call thread entry.

Minimal concrete example

void *stack = mmap(NULL, stack_size + page, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);

Common misconceptions

  • Misconception: stacks can share memory. Correction: each thread needs independent stack space.

Check-your-understanding questions

  1. Why add a guard page?
  2. What happens if stack alignment is wrong?

Check-your-understanding answers

  1. It catches stack overflow as a fault.
  2. ABI violations can crash or corrupt data.

Real-world applications

  • Fiber libraries and coroutine runtimes

Where you’ll apply it

References

  • CS:APP stack layout sections

Key insights

Stack management is a memory safety problem.

Summary

Correct stacks prevent subtle crashes and UB.

Homework/Exercises to practice the concept

  1. Allocate a stack with guard page and trigger overflow intentionally.

Solutions to the homework/exercises

  1. Write beyond stack and observe SIGSEGV.

2.3 Cooperative Scheduling and Run Queues

Fundamentals

Green threads are cooperative: they run until they yield. The scheduler maintains a run queue and selects the next runnable thread.

Deep Dive into the concept

A simple scheduler uses a circular list (round-robin). Threads yield explicitly, moving themselves to the end of the queue. Blocking I/O is a limitation: if a thread calls a blocking syscall, the entire process blocks. To mitigate, you can use non-blocking I/O or integrate with an event loop. Deterministic scheduling can be achieved by fixed ordering and a seed-based scheduler for tests.

How this fits on projects

This defines core runtime behavior in Section 3.2 and Section 5.10 Phase 3.

Definitions & key terms

  • cooperative -> threads must yield voluntarily
  • run queue -> list of runnable threads
  • yield -> explicit handoff to scheduler

Mental model diagram (ASCII)

[thread1] -> [thread2] -> [thread3] -> back to thread1

How it works (step-by-step)

  1. Scheduler selects head of run queue.
  2. Thread runs until it calls yield.
  3. Scheduler saves context and enqueues thread at tail.
  4. Next thread resumes.

Minimal concrete example

void thread_yield() { sched_next(); }

Common misconceptions

  • Misconception: green threads can use blocking I/O safely. Correction: blocking I/O blocks the whole process.

Check-your-understanding questions

  1. Why is cooperative scheduling deterministic?
  2. How do you avoid blocking the whole runtime?

Check-your-understanding answers

  1. Only explicit yield points cause switches.
  2. Use non-blocking I/O or async event loops.

Real-world applications

  • Game engines, scripting runtimes, embedded systems

Where you’ll apply it

References

  • OSTEP scheduling chapters

Key insights

Cooperative schedulers are simple but require discipline.

Summary

You control scheduling, but you also bear responsibility for fairness.

Homework/Exercises to practice the concept

  1. Add yield points to a tight loop and observe interleaving.

Solutions to the homework/exercises

  1. Insert thread_yield() every N iterations.

3. Project Specification

3.1 What You Will Build

A user-space thread library providing thread_create, thread_yield, thread_exit, and scheduler_run. Includes a demo program showing interleaved output and optional stats.

3.2 Functional Requirements

  1. Create threads with independent stacks.
  2. Switch context between threads.
  3. Round-robin scheduler with run queue.
  4. Thread exit cleans resources.
  5. Provide stats (--stats): switches, ready count.
  6. Deterministic scheduling mode (--seed 42).

3.3 Non-Functional Requirements

  • Reliability: no stack corruption or crashes.
  • Performance: 1M context switches/sec on a modern CPU.
  • Usability: clear API and demo.

3.4 Example Usage / Output

$ ./green-threads-demo --seed 42
[thread 1] count=1
[thread 2] count=1
[thread 1] count=2
[thread 2] count=2

3.5 Data Formats / Schemas / Protocols

  • Stats output: threads=<n> switches=<n> avg_slice=<ms>

3.6 Edge Cases

  • Thread exits without yielding.
  • Stack overflow.
  • Thread yields while already yielding (re-entrant).

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./green-threads-demo --seed 42

3.7.2 Golden Path Demo (Deterministic)

  • Use fixed seed and fixed loop counts for stable output.

3.7.3 CLI Transcript (Success + Failure)

$ ./green-threads-demo --seed 42
[thread 1] count=1
[thread 2] count=1

$ ./green-threads-demo --stack-size 1024
error: stack size too small (min 4096)
exit code: 2

3.7.4 Exit Codes

  • 0 success
  • 2 invalid configuration

4. Solution Architecture

4.1 High-Level Design

API -> Scheduler -> Context switch -> Stacks

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Thread control block | store ctx + state | array or list | | Scheduler | choose next thread | round-robin | | Stack manager | allocate/free stacks | mmap + guard page |

4.3 Data Structures (No Full Code)

struct thread {
    jmp_buf ctx;
    void *stack;
    enum { READY, RUNNING, DEAD } state;
};

4.4 Algorithm Overview

  1. Create thread with new stack and context.
  2. Scheduler selects next READY thread.
  3. Save current context and restore next.
  4. Cleanup on exit.

5. Implementation Guide

5.1 Development Environment Setup

sudo apt install build-essential

5.2 Project Structure

threads/
|-- src/
|   |-- thread.c
|   |-- sched.c
|   `-- demo.c
`-- Makefile

5.3 The Core Question You’re Answering

“What are the minimum mechanisms needed to schedule multiple tasks without kernel threads?”

5.4 Concepts You Must Understand First

  1. Context switching.
  2. Stack allocation and alignment.
  3. Cooperative scheduling.

5.5 Questions to Guide Your Design

  1. How do you create a thread context from scratch?
  2. How do you ensure stack safety?
  3. What happens if a thread blocks on I/O?

5.6 Thinking Exercise

What goes wrong if you forget to save r12-r15 during a switch?

5.7 The Interview Questions They’ll Ask

  1. Why are green threads different from kernel threads?
  2. How does a context switch work?

5.8 Hints in Layers

  • Hint 1: implement with setjmp/longjmp first.
  • Hint 2: add explicit stacks with mmap.
  • Hint 3: add stats and deterministic scheduling.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Concurrency | OSTEP | Ch. 26-28 | | ABI | CS:APP | Ch. 3 |

5.10 Implementation Phases

Phase 1: minimal scheduler. Phase 2: proper stack allocation + guard page. Phase 3: stats + deterministic mode.


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | Unit | context switch | save/restore registers | | Integration | demo | interleaved output | | Edge | stack overflow | guard page fault |

6.2 Critical Test Cases

  1. Two threads interleave output deterministically.
  2. Stack overflow triggers SIGSEGV at guard page.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Stack misalignment | crashes | align to 16 bytes | | Missing register save | random corruption | save callee-saved regs |


8. Extensions & Challenges

  • Add preemption using signals and timers.
  • Integrate non-blocking I/O and event loop.

9. Real-World Connections

  • Green threads in language runtimes (Go, Erlang, Java fibers)

10. Resources

  • System V ABI

11. Self-Assessment Checklist

  • I can explain how a context switch works.

12. Submission / Completion Criteria

Minimum: create and run two threads. Full: deterministic scheduling + stats. Excellence: preemption or async I/O integration.


13. Determinism Notes

  • Use fixed seeds and fixed loop counts in demos.