Project 6: User-space Thread Library (Green Threads)
Build a cooperative threading library with user-level context switches, stacks, and a simple scheduler.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C (Alternatives: Rust) |
| Alternative Programming Languages | Rust |
| Coolness Level | Level 5: Magic |
| Business Potential | Level 4: Infrastructure core |
| Prerequisites | C, calling conventions, basic scheduling |
| Key Topics | context switching, stack management, run queues |
1. Learning Objectives
By completing this project, you will:
- Implement a user-level context switch mechanism.
- Allocate, align, and protect per-thread stacks.
- Build a scheduler with a run queue and cooperative yields.
- Understand limitations of green threads with blocking I/O.
- Provide deterministic scheduling for tests.
- Explain differences between green threads and kernel threads.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Context Switching and Register Preservation
Fundamentals
A context switch saves the CPU state of one thread and restores another. At minimum, you must save callee-saved registers, stack pointer, and instruction pointer so the thread can resume later.
Deep Dive into the concept
On x86-64 System V ABI, callee-saved registers include rbx, rbp, r12-r15 plus the stack pointer rsp. A green-thread switch can be implemented with setjmp/longjmp or handwritten assembly. The key is to store enough state so the thread can continue as if it never left. If you miss a register, threads will behave nondeterministically. Correct stack alignment (16 bytes) is also required before calling C functions.
How this fits on projects
This is the core of your scheduler and thread switching in Section 4.4 and Section 5.10 Phase 2.
Definitions & key terms
- context switch -> saving/restoring CPU state
- callee-saved -> registers preserved across function calls
- stack pointer -> register pointing to top of stack
Mental model diagram (ASCII)
thread A regs -> save -> switch -> restore regs -> thread B runs
How it works (step-by-step)
- Save current registers into thread context.
- Store current stack pointer.
- Load next thread context and stack pointer.
- Jump to saved instruction pointer.
Minimal concrete example
if (setjmp(cur->ctx) == 0) longjmp(next->ctx, 1);
Common misconceptions
- Misconception: only the stack pointer matters. Correction: registers must be saved too.
Check-your-understanding questions
- Which registers must be saved on x86-64 SysV?
- Why is stack alignment important?
Check-your-understanding answers
rbx,rbp,r12-r15,rsp,rip.- ABI and SIMD instructions require 16-byte alignment.
Real-world applications
- User-level schedulers, coroutines, fibers
Where you’ll apply it
- This project: Section 4.4 Algorithm Overview, Section 5.10 Phase 2.
- Also used in: P03-process-scheduler-visualization-tool for scheduling insight.
References
- OSTEP concurrency chapters
- System V ABI docs
Key insights
A context switch is pure state capture and restore.
Summary
If state is incomplete, threads are corrupted.
Homework/Exercises to practice the concept
- Save/restore registers in inline assembly and verify values.
Solutions to the homework/exercises
- Set known register values, switch, then verify after resume.
2.2 Stack Layout and Guard Pages
Fundamentals
Each green thread needs its own stack. Stacks grow downward and must be aligned. Guard pages prevent stack overflows from corrupting memory.
Deep Dive into the concept
You can allocate stacks using mmap to get page-aligned regions. Reserve an extra page at the end as PROT_NONE to catch overflows. Stack initialization requires placing a fake return address or trampoline so the first switch jumps into the thread entry function. The stack must be aligned to 16 bytes before any function call.
How this fits on projects
Stack allocation is in Section 5.10 Phase 1 and memory safety is in Section 7 pitfalls.
Definitions & key terms
- guard page -> unmapped page to detect overflow
- stack frame -> function’s local storage region
- trampoline -> initial entry wrapper for new thread
Mental model diagram (ASCII)
[guard page][stack memory..............]
How it works (step-by-step)
mmapstack region + guard page.- Protect guard page with
mprotect(PROT_NONE). - Initialize stack pointer at top of stack.
- Place trampoline to call thread entry.
Minimal concrete example
void *stack = mmap(NULL, stack_size + page, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
Common misconceptions
- Misconception: stacks can share memory. Correction: each thread needs independent stack space.
Check-your-understanding questions
- Why add a guard page?
- What happens if stack alignment is wrong?
Check-your-understanding answers
- It catches stack overflow as a fault.
- ABI violations can crash or corrupt data.
Real-world applications
- Fiber libraries and coroutine runtimes
Where you’ll apply it
- This project: Section 5.10 Phase 1, Section 7 pitfalls.
- Also used in: P02-memory-allocator-malloc-free-from-scratch.
References
- CS:APP stack layout sections
Key insights
Stack management is a memory safety problem.
Summary
Correct stacks prevent subtle crashes and UB.
Homework/Exercises to practice the concept
- Allocate a stack with guard page and trigger overflow intentionally.
Solutions to the homework/exercises
- Write beyond stack and observe SIGSEGV.
2.3 Cooperative Scheduling and Run Queues
Fundamentals
Green threads are cooperative: they run until they yield. The scheduler maintains a run queue and selects the next runnable thread.
Deep Dive into the concept
A simple scheduler uses a circular list (round-robin). Threads yield explicitly, moving themselves to the end of the queue. Blocking I/O is a limitation: if a thread calls a blocking syscall, the entire process blocks. To mitigate, you can use non-blocking I/O or integrate with an event loop. Deterministic scheduling can be achieved by fixed ordering and a seed-based scheduler for tests.
How this fits on projects
This defines core runtime behavior in Section 3.2 and Section 5.10 Phase 3.
Definitions & key terms
- cooperative -> threads must yield voluntarily
- run queue -> list of runnable threads
- yield -> explicit handoff to scheduler
Mental model diagram (ASCII)
[thread1] -> [thread2] -> [thread3] -> back to thread1
How it works (step-by-step)
- Scheduler selects head of run queue.
- Thread runs until it calls
yield. - Scheduler saves context and enqueues thread at tail.
- Next thread resumes.
Minimal concrete example
void thread_yield() { sched_next(); }
Common misconceptions
- Misconception: green threads can use blocking I/O safely. Correction: blocking I/O blocks the whole process.
Check-your-understanding questions
- Why is cooperative scheduling deterministic?
- How do you avoid blocking the whole runtime?
Check-your-understanding answers
- Only explicit yield points cause switches.
- Use non-blocking I/O or async event loops.
Real-world applications
- Game engines, scripting runtimes, embedded systems
Where you’ll apply it
- This project: Section 3.2 Functional Requirements, Section 5.10 Phase 3.
- Also used in: P03-process-scheduler-visualization-tool.
References
- OSTEP scheduling chapters
Key insights
Cooperative schedulers are simple but require discipline.
Summary
You control scheduling, but you also bear responsibility for fairness.
Homework/Exercises to practice the concept
- Add yield points to a tight loop and observe interleaving.
Solutions to the homework/exercises
- Insert
thread_yield()every N iterations.
3. Project Specification
3.1 What You Will Build
A user-space thread library providing thread_create, thread_yield, thread_exit, and scheduler_run. Includes a demo program showing interleaved output and optional stats.
3.2 Functional Requirements
- Create threads with independent stacks.
- Switch context between threads.
- Round-robin scheduler with run queue.
- Thread exit cleans resources.
- Provide stats (
--stats): switches, ready count. - Deterministic scheduling mode (
--seed 42).
3.3 Non-Functional Requirements
- Reliability: no stack corruption or crashes.
- Performance: 1M context switches/sec on a modern CPU.
- Usability: clear API and demo.
3.4 Example Usage / Output
$ ./green-threads-demo --seed 42
[thread 1] count=1
[thread 2] count=1
[thread 1] count=2
[thread 2] count=2
3.5 Data Formats / Schemas / Protocols
- Stats output:
threads=<n> switches=<n> avg_slice=<ms>
3.6 Edge Cases
- Thread exits without yielding.
- Stack overflow.
- Thread yields while already yielding (re-entrant).
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
./green-threads-demo --seed 42
3.7.2 Golden Path Demo (Deterministic)
- Use fixed seed and fixed loop counts for stable output.
3.7.3 CLI Transcript (Success + Failure)
$ ./green-threads-demo --seed 42
[thread 1] count=1
[thread 2] count=1
$ ./green-threads-demo --stack-size 1024
error: stack size too small (min 4096)
exit code: 2
3.7.4 Exit Codes
0success2invalid configuration
4. Solution Architecture
4.1 High-Level Design
API -> Scheduler -> Context switch -> Stacks
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Thread control block | store ctx + state | array or list | | Scheduler | choose next thread | round-robin | | Stack manager | allocate/free stacks | mmap + guard page |
4.3 Data Structures (No Full Code)
struct thread {
jmp_buf ctx;
void *stack;
enum { READY, RUNNING, DEAD } state;
};
4.4 Algorithm Overview
- Create thread with new stack and context.
- Scheduler selects next READY thread.
- Save current context and restore next.
- Cleanup on exit.
5. Implementation Guide
5.1 Development Environment Setup
sudo apt install build-essential
5.2 Project Structure
threads/
|-- src/
| |-- thread.c
| |-- sched.c
| `-- demo.c
`-- Makefile
5.3 The Core Question You’re Answering
“What are the minimum mechanisms needed to schedule multiple tasks without kernel threads?”
5.4 Concepts You Must Understand First
- Context switching.
- Stack allocation and alignment.
- Cooperative scheduling.
5.5 Questions to Guide Your Design
- How do you create a thread context from scratch?
- How do you ensure stack safety?
- What happens if a thread blocks on I/O?
5.6 Thinking Exercise
What goes wrong if you forget to save r12-r15 during a switch?
5.7 The Interview Questions They’ll Ask
- Why are green threads different from kernel threads?
- How does a context switch work?
5.8 Hints in Layers
- Hint 1: implement with
setjmp/longjmpfirst. - Hint 2: add explicit stacks with
mmap. - Hint 3: add stats and deterministic scheduling.
5.9 Books That Will Help
| Topic | Book | Chapter | |——|——|———| | Concurrency | OSTEP | Ch. 26-28 | | ABI | CS:APP | Ch. 3 |
5.10 Implementation Phases
Phase 1: minimal scheduler. Phase 2: proper stack allocation + guard page. Phase 3: stats + deterministic mode.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———|———|———-| | Unit | context switch | save/restore registers | | Integration | demo | interleaved output | | Edge | stack overflow | guard page fault |
6.2 Critical Test Cases
- Two threads interleave output deterministically.
- Stack overflow triggers SIGSEGV at guard page.
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | Stack misalignment | crashes | align to 16 bytes | | Missing register save | random corruption | save callee-saved regs |
8. Extensions & Challenges
- Add preemption using signals and timers.
- Integrate non-blocking I/O and event loop.
9. Real-World Connections
- Green threads in language runtimes (Go, Erlang, Java fibers)
10. Resources
- System V ABI
11. Self-Assessment Checklist
- I can explain how a context switch works.
12. Submission / Completion Criteria
Minimum: create and run two threads. Full: deterministic scheduling + stats. Excellence: preemption or async I/O integration.
13. Determinism Notes
- Use fixed seeds and fixed loop counts in demos.