Project 6: User-space Thread Library (Green Threads)

Build a cooperative threading library with user-level context switches, stacks, and a simple scheduler.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	2-3 weeks
Main Programming Language	C (Alternatives: Rust)
Alternative Programming Languages	Rust
Coolness Level	Level 5: Magic
Business Potential	Level 4: Infrastructure core
Prerequisites	C, calling conventions, basic scheduling
Key Topics	context switching, stack management, run queues

1. Learning Objectives

By completing this project, you will:

Implement a user-level context switch mechanism.
Allocate, align, and protect per-thread stacks.
Build a scheduler with a run queue and cooperative yields.
Understand limitations of green threads with blocking I/O.
Provide deterministic scheduling for tests.
Explain differences between green threads and kernel threads.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Context Switching and Register Preservation

Fundamentals

A context switch saves the CPU state of one thread and restores another. At minimum, you must save callee-saved registers, stack pointer, and instruction pointer so the thread can resume later.

Deep Dive into the concept

On x86-64 System V ABI, callee-saved registers include rbx, rbp, r12-r15 plus the stack pointer rsp. A green-thread switch can be implemented with setjmp/longjmp or handwritten assembly. The key is to store enough state so the thread can continue as if it never left. If you miss a register, threads will behave nondeterministically. Correct stack alignment (16 bytes) is also required before calling C functions.

How this fits on projects

This is the core of your scheduler and thread switching in Section 4.4 and Section 5.10 Phase 2.

Definitions & key terms

context switch -> saving/restoring CPU state
callee-saved -> registers preserved across function calls
stack pointer -> register pointing to top of stack

Mental model diagram (ASCII)

thread A regs -> save -> switch -> restore regs -> thread B runs

How it works (step-by-step)

Save current registers into thread context.
Store current stack pointer.
Load next thread context and stack pointer.
Jump to saved instruction pointer.

Minimal concrete example

if (setjmp(cur->ctx) == 0) longjmp(next->ctx, 1);

Common misconceptions

Misconception: only the stack pointer matters. Correction: registers must be saved too.

Check-your-understanding questions

Which registers must be saved on x86-64 SysV?
Why is stack alignment important?

Check-your-understanding answers

rbx, rbp, r12-r15, rsp, rip.
ABI and SIMD instructions require 16-byte alignment.

Real-world applications

User-level schedulers, coroutines, fibers

Where you’ll apply it

This project: Section 4.4 Algorithm Overview, Section 5.10 Phase 2.
Also used in: P03-process-scheduler-visualization-tool for scheduling insight.

References

OSTEP concurrency chapters
System V ABI docs

Key insights

A context switch is pure state capture and restore.

Summary

If state is incomplete, threads are corrupted.

Homework/Exercises to practice the concept

Save/restore registers in inline assembly and verify values.

Solutions to the homework/exercises

Set known register values, switch, then verify after resume.

2.2 Stack Layout and Guard Pages

Fundamentals

Each green thread needs its own stack. Stacks grow downward and must be aligned. Guard pages prevent stack overflows from corrupting memory.

Deep Dive into the concept

You can allocate stacks using mmap to get page-aligned regions. Reserve an extra page at the end as PROT_NONE to catch overflows. Stack initialization requires placing a fake return address or trampoline so the first switch jumps into the thread entry function. The stack must be aligned to 16 bytes before any function call.

How this fits on projects

Stack allocation is in Section 5.10 Phase 1 and memory safety is in Section 7 pitfalls.

Definitions & key terms

guard page -> unmapped page to detect overflow
stack frame -> function’s local storage region
trampoline -> initial entry wrapper for new thread

Mental model diagram (ASCII)

[guard page][stack memory..............]

How it works (step-by-step)

mmap stack region + guard page.
Protect guard page with mprotect(PROT_NONE).
Initialize stack pointer at top of stack.
Place trampoline to call thread entry.

Minimal concrete example

void *stack = mmap(NULL, stack_size + page, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);

Common misconceptions

Misconception: stacks can share memory. Correction: each thread needs independent stack space.

Check-your-understanding questions

Why add a guard page?
What happens if stack alignment is wrong?

Check-your-understanding answers

It catches stack overflow as a fault.
ABI violations can crash or corrupt data.

Real-world applications

Fiber libraries and coroutine runtimes

Where you’ll apply it

This project: Section 5.10 Phase 1, Section 7 pitfalls.
Also used in: P02-memory-allocator-malloc-free-from-scratch.

References

CS:APP stack layout sections

Key insights

Stack management is a memory safety problem.

Summary

Correct stacks prevent subtle crashes and UB.

Homework/Exercises to practice the concept

Allocate a stack with guard page and trigger overflow intentionally.

Solutions to the homework/exercises

Write beyond stack and observe SIGSEGV.

2.3 Cooperative Scheduling and Run Queues

Fundamentals

Green threads are cooperative: they run until they yield. The scheduler maintains a run queue and selects the next runnable thread.

Deep Dive into the concept

A simple scheduler uses a circular list (round-robin). Threads yield explicitly, moving themselves to the end of the queue. Blocking I/O is a limitation: if a thread calls a blocking syscall, the entire process blocks. To mitigate, you can use non-blocking I/O or integrate with an event loop. Deterministic scheduling can be achieved by fixed ordering and a seed-based scheduler for tests.

How this fits on projects

This defines core runtime behavior in Section 3.2 and Section 5.10 Phase 3.

Definitions & key terms

cooperative -> threads must yield voluntarily
run queue -> list of runnable threads
yield -> explicit handoff to scheduler

Mental model diagram (ASCII)

[thread1] -> [thread2] -> [thread3] -> back to thread1

How it works (step-by-step)

Scheduler selects head of run queue.
Thread runs until it calls yield.
Scheduler saves context and enqueues thread at tail.
Next thread resumes.

Minimal concrete example

void thread_yield() { sched_next(); }

Common misconceptions

Misconception: green threads can use blocking I/O safely. Correction: blocking I/O blocks the whole process.

Check-your-understanding questions

Why is cooperative scheduling deterministic?
How do you avoid blocking the whole runtime?

Check-your-understanding answers

Only explicit yield points cause switches.
Use non-blocking I/O or async event loops.

Real-world applications

Game engines, scripting runtimes, embedded systems

Where you’ll apply it

This project: Section 3.2 Functional Requirements, Section 5.10 Phase 3.
Also used in: P03-process-scheduler-visualization-tool.

References

OSTEP scheduling chapters

Key insights

Cooperative schedulers are simple but require discipline.

Summary

You control scheduling, but you also bear responsibility for fairness.

Homework/Exercises to practice the concept

Add yield points to a tight loop and observe interleaving.

Solutions to the homework/exercises

Insert thread_yield() every N iterations.

3. Project Specification

3.1 What You Will Build

A user-space thread library providing thread_create, thread_yield, thread_exit, and scheduler_run. Includes a demo program showing interleaved output and optional stats.

3.2 Functional Requirements

Create threads with independent stacks.
Switch context between threads.
Round-robin scheduler with run queue.
Thread exit cleans resources.
Provide stats (--stats): switches, ready count.
Deterministic scheduling mode (--seed 42).

3.3 Non-Functional Requirements

Reliability: no stack corruption or crashes.
Performance: 1M context switches/sec on a modern CPU.
Usability: clear API and demo.

3.4 Example Usage / Output

$ ./green-threads-demo --seed 42
[thread 1] count=1
[thread 2] count=1
[thread 1] count=2
[thread 2] count=2

3.5 Data Formats / Schemas / Protocols

Stats output: threads=<n> switches=<n> avg_slice=<ms>

3.6 Edge Cases

Thread exits without yielding.
Stack overflow.
Thread yields while already yielding (re-entrant).

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./green-threads-demo --seed 42

3.7.2 Golden Path Demo (Deterministic)

Use fixed seed and fixed loop counts for stable output.

3.7.3 CLI Transcript (Success + Failure)

$ ./green-threads-demo --seed 42
[thread 1] count=1
[thread 2] count=1

$ ./green-threads-demo --stack-size 1024
error: stack size too small (min 4096)
exit code: 2

3.7.4 Exit Codes

0 success
2 invalid configuration

4. Solution Architecture

4.1 High-Level Design

API -> Scheduler -> Context switch -> Stacks

4.2 Key Components

4.3 Data Structures (No Full Code)

struct thread {
    jmp_buf ctx;
    void *stack;
    enum { READY, RUNNING, DEAD } state;
};

4.4 Algorithm Overview

Create thread with new stack and context.
Scheduler selects next READY thread.
Save current context and restore next.
Cleanup on exit.

5. Implementation Guide

5.1 Development Environment Setup

sudo apt install build-essential

5.2 Project Structure

threads/
|-- src/
|   |-- thread.c
|   |-- sched.c
|   `-- demo.c
`-- Makefile

5.3 The Core Question You’re Answering

“What are the minimum mechanisms needed to schedule multiple tasks without kernel threads?”

5.4 Concepts You Must Understand First

Context switching.
Stack allocation and alignment.
Cooperative scheduling.

5.5 Questions to Guide Your Design

How do you create a thread context from scratch?
How do you ensure stack safety?
What happens if a thread blocks on I/O?

5.6 Thinking Exercise

What goes wrong if you forget to save r12-r15 during a switch?

5.7 The Interview Questions They’ll Ask

Why are green threads different from kernel threads?
How does a context switch work?

5.8 Hints in Layers

Hint 1: implement with setjmp/longjmp first.
Hint 2: add explicit stacks with mmap.
Hint 3: add stats and deterministic scheduling.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Concurrency | OSTEP | Ch. 26-28 | | ABI | CS:APP | Ch. 3 |

5.10 Implementation Phases

Phase 1: minimal scheduler. Phase 2: proper stack allocation + guard page. Phase 3: stats + deterministic mode.

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Two threads interleave output deterministically.
Stack overflow triggers SIGSEGV at guard page.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

8. Extensions & Challenges

Add preemption using signals and timers.
Integrate non-blocking I/O and event loop.

9. Real-World Connections

Green threads in language runtimes (Go, Erlang, Java fibers)

10. Resources

System V ABI

11. Self-Assessment Checklist

I can explain how a context switch works.

12. Submission / Completion Criteria

Minimum: create and run two threads. Full: deterministic scheduling + stats. Excellence: preemption or async I/O integration.

13. Determinism Notes

Use fixed seeds and fixed loop counts in demos.