CONCURRENCY THREADS SEMAPHORES PROJECTS
Learning Concurrency, Parallelism, Threads & Synchronization
Goal: Deeply understand how threads, processes, semaphores, and all concurrency primitives work at the operating system level through hands-on projects.
Core Concept Overview
Before diving into projects, here’s the mental map of what you’re learning:
| Concept | What It Is | Why It’s Confusing |
|---|---|---|
| Process | Independent program with its own memory space | People conflate it with “program” or “thread” |
| Thread | Execution unit within a process, shares memory | There are multiple types of threads |
| Kernel Thread | Thread managed by the OS kernel (1:1 model) | The OS does the scheduling |
| Green Thread/User Thread | Thread managed in user-space (N:1 model) | Your code does the scheduling |
| Concurrency | Dealing with multiple things at once (structure) | Often confused with parallelism |
| Parallelism | Doing multiple things at once (execution) | Requires multiple CPU cores |
| Semaphore | Counter that controls access to resources | It’s not just a mutex with a count |
| Mutex | Lock that ensures only one thread enters critical section | Semaphore with count=1 |
The key insight: Threads share memory within a process. This sharing is both the power (fast communication) and the danger (race conditions, deadlocks).
Learning Path
Phase 1: Foundations (2-3 weeks)
- Project 1: Process Forker & IPC Visualizer
- Project 2: Multi-threaded Task Executor
Phase 2: Deep Dive (3-4 weeks)
- Project 3: Build Your Own Mutex & Semaphore
- Project 5: Dining Philosophers Visualizer
Phase 3: Mastery (4-6 weeks)
- Project 4: Green Thread Library
- Project 6: Thread Pool with Work Stealing
- Project 7: Concurrent Hash Map
Phase 4: Capstone
- Project 8: Mini Operating System Scheduler
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| Process Forker & IPC | ⭐⭐ | Weekend | Foundational | ⭐⭐⭐ |
| Multi-threaded Executor | ⭐⭐⭐ | 1 week | Core threading | ⭐⭐⭐⭐ |
| Build Mutex & Semaphore | ⭐⭐⭐⭐ | 1-2 weeks | Deep primitives | ⭐⭐⭐⭐⭐ |
| Green Thread Library | ⭐⭐⭐⭐⭐ | 2-3 weeks | Expert-level | ⭐⭐⭐⭐⭐ |
| Dining Philosophers Viz | ⭐⭐⭐ | 1 week | Deadlock mastery | ⭐⭐⭐⭐⭐ |
| Thread Pool + Work Stealing | ⭐⭐⭐⭐ | 2-3 weeks | Production patterns | ⭐⭐⭐⭐ |
| Concurrent Hash Map | ⭐⭐⭐⭐ | 2-3 weeks | Data structure concurrency | ⭐⭐⭐⭐ |
| Mini OS Scheduler | ⭐⭐⭐⭐⭐ | 1-2 months | Complete mastery | ⭐⭐⭐⭐⭐ |
Project 1: Process Forker & IPC Visualizer
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Operating Systems / Process Management
- Software or Tool: Unix fork/exec syscalls
- Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau
What you’ll build
A visual tool that forks child processes, displays their PIDs, shows memory isolation (same variable address, different values), and demonstrates IPC via pipes and shared memory.
Why it teaches processes
You can’t understand threads until you understand what they’re not. A process has its own memory space—when you fork, the child gets a copy. By printing the same variable’s address and value in parent/child, you’ll see “same address, different value” which clicks the concept of virtual memory isolation.
Core challenges you’ll face
- Fork semantics (child gets copy of memory, returns different PID) → maps to process isolation
- Pipe communication (why can’t parent just write to child’s variable?) → maps to IPC necessity
- Shared memory setup (
shmget/mmap) → maps to explicit memory sharing - Zombie processes (forgetting to
wait()) → maps to process lifecycle
Key Concepts
| Concept | Resource |
|---|---|
| Process Creation | Operating Systems: Three Easy Pieces Chapter 5 - OSTEP |
| Virtual Memory Basics | Computer Systems: A Programmer’s Perspective Chapter 9 - Bryant & O’Hallaron |
| IPC Mechanisms | The Linux Programming Interface Chapters 44-49 - Michael Kerrisk |
Difficulty & Time
- Difficulty: Beginner-Intermediate
- Time estimate: Weekend
- Prerequisites: Basic C, comfort with command line
Real World Outcome
$ ./process_viz
[Parent PID: 1234] Variable 'counter' at 0x7fff5678 = 0
[Forking...]
[Child PID: 1235] Variable 'counter' at 0x7fff5678 = 0 # Same address!
[Child] Incrementing counter...
[Child PID: 1235] Variable 'counter' at 0x7fff5678 = 100
[Parent PID: 1234] Variable 'counter' at 0x7fff5678 = 0 # Still 0! Isolated!
[Demo: Pipe IPC]
Parent sends: "Hello from parent"
Child receives: "Hello from parent"
[Demo: Shared Memory]
Parent writes 42 to shared memory
Child reads 42 from shared memory
Parent sees child wrote 99 back
Learning Milestones
- Fork works, child has different PID → You understand process creation
- Same address, different values after fork → You understand memory isolation
- Pipes working → You understand why IPC is necessary
- Shared memory without race conditions → You’re ready for threads
Implementation Hints
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/mman.h>
int main() {
int counter = 0;
printf("[Parent PID: %d] Variable 'counter' at %p = %d\n",
getpid(), (void*)&counter, counter);
pid_t pid = fork();
if (pid == 0) {
// Child process
printf("[Child PID: %d] Variable 'counter' at %p = %d\n",
getpid(), (void*)&counter, counter);
counter = 100;
printf("[Child] After increment: counter = %d\n", counter);
exit(0);
} else {
// Parent process
wait(NULL); // Wait for child to finish
printf("[Parent PID: %d] Variable 'counter' at %p = %d\n",
getpid(), (void*)&counter, counter);
// counter is still 0! Memory is isolated!
}
return 0;
}
Key System Calls to Master
| System Call | Purpose |
|---|---|
fork() |
Create a child process (copy of parent) |
exec*() |
Replace current process with new program |
wait() / waitpid() |
Wait for child to terminate |
pipe() |
Create a unidirectional communication channel |
mmap() |
Create shared memory region |
shmget() / shmat() |
System V shared memory |
Project 2: Multi-threaded Task Executor (pthreads)
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Concurrency / Thread Management
- Software or Tool: POSIX Threads (pthreads)
- Main Book: The Linux Programming Interface by Michael Kerrisk
What you’ll build
A command-line task executor that spawns N threads, each pulling tasks from a shared queue, with visible race conditions when you forget to lock, and proper synchronization when you add mutexes.
Why it teaches kernel threads
This is where threads become real. You’ll see that threads share memory by default (unlike processes). You’ll intentionally create race conditions, watch the chaos, then fix them. This visceral experience of “broken → fixed” is the best teacher.
Core challenges you’ll face
- Race conditions (counter increments “lose” updates) → maps to critical sections
- Mutex usage (lock before access, unlock after) → maps to mutual exclusion
- Thread joining (waiting for all threads to finish) → maps to thread lifecycle
- Shared queue access (producer-consumer pattern) → maps to concurrent data structures
Resources for Key Challenges
- Threads, Mutexes and Concurrent Programming in C - Excellent walkthrough of pthread mutex usage
Key Concepts
| Concept | Resource |
|---|---|
| Thread Creation & Joining | The Linux Programming Interface Chapter 29 - Kerrisk |
| Race Conditions | Operating Systems: Three Easy Pieces Chapter 26 - OSTEP |
| Mutexes | C++ Concurrency in Action Chapter 3 - Anthony Williams |
Difficulty & Time
- Difficulty: Intermediate
- Time estimate: 1 week
- Prerequisites: Project 1 completed, basic C
Real World Outcome
# First run: NO LOCKS (intentional race condition)
$ ./task_executor --threads 4 --tasks 10000 --no-lock
Starting 4 threads to process 10000 tasks...
Thread 0: processing task 0
Thread 1: processing task 0 # OOPS! Same task!
Thread 2: processing task 1
...
Expected completed: 10000
Actual completed: 9847 # RACE CONDITION! Lost updates!
# Second run: WITH LOCKS
$ ./task_executor --threads 4 --tasks 10000
Starting 4 threads to process 10000 tasks...
Thread 0: processing task 0
Thread 1: processing task 1
Thread 2: processing task 2
...
Expected completed: 10000
Actual completed: 10000 # CORRECT!
Total time: 0.23s (parallel speedup!)
Learning Milestones
- Threads share memory (see same variable) → You understand thread vs process
- Race condition breaks count → You understand critical sections
- Mutex fixes the race → You understand synchronization
- Parallel speedup measurable → You understand concurrency vs parallelism
Implementation Hints
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_THREADS 4
#define NUM_INCREMENTS 1000000
long shared_counter = 0;
pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
int use_mutex = 1;
void* increment_counter(void* arg) {
int thread_id = *(int*)arg;
for (int i = 0; i < NUM_INCREMENTS; i++) {
if (use_mutex) {
pthread_mutex_lock(&counter_mutex);
shared_counter++;
pthread_mutex_unlock(&counter_mutex);
} else {
shared_counter++; // Race condition!
}
}
printf("Thread %d finished\n", thread_id);
return NULL;
}
int main(int argc, char* argv[]) {
if (argc > 1 && strcmp(argv[1], "--no-lock") == 0) {
use_mutex = 0;
printf("Running WITHOUT mutex (expect race condition)\n");
} else {
printf("Running WITH mutex\n");
}
pthread_t threads[NUM_THREADS];
int thread_ids[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) {
thread_ids[i] = i;
pthread_create(&threads[i], NULL, increment_counter, &thread_ids[i]);
}
for (int i = 0; i < NUM_THREADS; i++) {
pthread_join(threads[i], NULL);
}
long expected = (long)NUM_THREADS * NUM_INCREMENTS;
printf("Expected: %ld\n", expected);
printf("Actual: %ld\n", shared_counter);
printf("Lost: %ld\n", expected - shared_counter);
return 0;
}
Compile with: gcc -pthread -o task_executor task_executor.c
Key pthread Functions to Master
| Function | Purpose |
|---|---|
pthread_create() |
Create a new thread |
pthread_join() |
Wait for thread to terminate |
pthread_mutex_init() |
Initialize a mutex |
pthread_mutex_lock() |
Acquire the lock (blocks if held) |
pthread_mutex_unlock() |
Release the lock |
pthread_mutex_destroy() |
Clean up mutex resources |
Project 3: Build Your Own Mutex & Semaphore
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Synchronization Primitives / Atomics
- Software or Tool: Atomic operations, futex
- Main Book: The Linux Programming Interface by Michael Kerrisk
What you’ll build
Your own mutex and semaphore from scratch using atomic compare-and-swap (CAS) operations, spinlocks, and optionally the Linux futex syscall for sleeping.
Why it teaches synchronization deeply
Most people use pthread_mutex_lock() as a black box. By building your own, you’ll understand:
- What is a spinlock?
- Why does spinning waste CPU?
- What is a futex and why does it exist?
- What’s the difference between a mutex and a semaphore at the implementation level?
Core challenges you’ll face
- Atomic operations (why normal
counter++isn’t thread-safe) → maps to memory model - Spinlock implementation (busy-wait CAS loop) → maps to lock-free programming
- Futex for sleeping (avoid wasting CPU spinning) → maps to kernel integration
- Semaphore counting (allow N concurrent accessors) → maps to resource pools
Resources for Key Challenges
- Barrgroup: Mutexes and Semaphores Demystified - Historical context and the difference between signaling vs locking
- GeeksforGeeks: Mutex vs Semaphore - Clear comparison with examples
Key Concepts
| Concept | Resource |
|---|---|
| Atomic Operations | C++ Concurrency in Action Chapter 5 - Anthony Williams |
| Futex Deep Dive | The Linux Programming Interface Chapter 30 - Kerrisk |
| Memory Barriers | Rust Atomics and Locks by Mara Bos (excellent even for C programmers) |
Difficulty & Time
- Difficulty: Advanced
- Time estimate: 1-2 weeks
- Prerequisites: Projects 1-2 completed, understanding of assembly basics
Real World Outcome
$ ./my_sync_test
=== Testing MY_MUTEX (spinlock-based) ===
Spawning 8 threads, each incrementing counter 100000 times
No mutex: counter = 634982 (WRONG - race condition)
pthread_mutex: counter = 800000 (correct)
my_mutex: counter = 800000 (correct!) ✓
=== Testing MY_SEMAPHORE ===
Semaphore initialized with count = 3
Thread 0: acquired (active: 1)
Thread 1: acquired (active: 2)
Thread 2: acquired (active: 3)
Thread 3: waiting... (sem count = 0)
Thread 0: released
Thread 3: acquired (active: 3)
...
All threads completed, max concurrent = 3 ✓
=== Spinlock vs Futex comparison ===
Spinlock (high contention): 2.3s, CPU 780%
Futex-based (high contention): 0.4s, CPU 95%
Conclusion: Futex avoids wasting CPU when waiting!
Learning Milestones
- Spinlock works → You understand CAS and atomic operations
- You see 800% CPU usage with spinlock → You understand why sleeping matters
- Futex reduces CPU usage → You understand kernel/user-space cooperation
- Semaphore limits concurrency to N → You understand counting semaphores
Implementation Hints
Spinlock Mutex
#include <stdatomic.h>
#include <stdbool.h>
typedef struct {
atomic_flag locked;
} my_spinlock_t;
void my_spinlock_init(my_spinlock_t* lock) {
atomic_flag_clear(&lock->locked);
}
void my_spinlock_lock(my_spinlock_t* lock) {
// Spin until we acquire the lock
while (atomic_flag_test_and_set(&lock->locked)) {
// Busy-wait (wastes CPU!)
// Could add: __builtin_ia32_pause() for x86
}
}
void my_spinlock_unlock(my_spinlock_t* lock) {
atomic_flag_clear(&lock->locked);
}
Futex-based Mutex (Linux)
#include <linux/futex.h>
#include <sys/syscall.h>
#include <stdatomic.h>
typedef struct {
atomic_int state; // 0 = unlocked, 1 = locked
} my_futex_mutex_t;
void my_futex_lock(my_futex_mutex_t* m) {
int expected = 0;
while (!atomic_compare_exchange_weak(&m->state, &expected, 1)) {
// Failed to acquire - sleep until state changes
syscall(SYS_futex, &m->state, FUTEX_WAIT, 1, NULL, NULL, 0);
expected = 0; // Reset for next try
}
}
void my_futex_unlock(my_futex_mutex_t* m) {
atomic_store(&m->state, 0);
// Wake one waiting thread
syscall(SYS_futex, &m->state, FUTEX_WAKE, 1, NULL, NULL, 0);
}
Counting Semaphore
typedef struct {
atomic_int count;
pthread_mutex_t mutex;
pthread_cond_t cond;
} my_semaphore_t;
void my_sem_init(my_semaphore_t* sem, int initial) {
atomic_store(&sem->count, initial);
pthread_mutex_init(&sem->mutex, NULL);
pthread_cond_init(&sem->cond, NULL);
}
void my_sem_wait(my_semaphore_t* sem) {
pthread_mutex_lock(&sem->mutex);
while (atomic_load(&sem->count) <= 0) {
pthread_cond_wait(&sem->cond, &sem->mutex);
}
atomic_fetch_sub(&sem->count, 1);
pthread_mutex_unlock(&sem->mutex);
}
void my_sem_post(my_semaphore_t* sem) {
pthread_mutex_lock(&sem->mutex);
atomic_fetch_add(&sem->count, 1);
pthread_cond_signal(&sem->cond);
pthread_mutex_unlock(&sem->mutex);
}
Key Atomic Operations to Master
| Operation | Purpose |
|---|---|
atomic_flag_test_and_set() |
Atomically set flag and return old value |
atomic_compare_exchange_weak() |
CAS - the building block of lock-free code |
atomic_fetch_add() |
Atomically increment and return old value |
atomic_load() / atomic_store() |
Thread-safe read/write |
Project 4: Green Thread Library (User-Space Scheduler)
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 5: Master
- Knowledge Area: Runtime Systems / Cooperative Scheduling
- Software or Tool: ucontext.h, setjmp/longjmp, or raw assembly
- Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau
What you’ll build
A complete green thread (user-space thread) library that manages its own stack allocations, implements cooperative scheduling (yield), and multiplexes many green threads onto a single OS thread.
Why it teaches threading deeply
This is where you become the operating system. You’ll allocate stacks, save/restore registers, and implement yield(). After this project, you’ll understand exactly what Go’s goroutines and Rust’s async/await are doing under the hood.
Core challenges you’ll face
- Stack allocation (each green thread needs its own stack) → maps to memory layout
- Context switching (save registers, swap stack pointer) → maps to CPU state
- Scheduler implementation (round-robin, run queue) → maps to scheduling algorithms
- Cooperative vs preemptive (explicit yield vs timer interrupt) → maps to scheduling models
Resources for Key Challenges
- Green threads explained (c9x.me) - Hands-on implementation guide for x86-64
- Wikipedia: Green thread - Conceptual overview
Key Concepts
| Concept | Resource |
|---|---|
| Context Switching | Operating Systems: Three Easy Pieces Chapter 6 - OSTEP |
| Stack Management | Computer Systems: A Programmer’s Perspective Chapter 3 - Bryant & O’Hallaron |
| ucontext API | man pages + GNU C Library docs |
Difficulty & Time
- Difficulty: Advanced-Expert
- Time estimate: 2-3 weeks
- Prerequisites: Projects 1-3 completed, some assembly knowledge
Real World Outcome
$ ./green_threads_demo
=== Green Thread Library Demo ===
Creating 1000 green threads on 1 OS thread...
Green thread 0: Hello! (stack at 0x7f1234560000)
Green thread 0: Yielding...
Green thread 1: Hello! (stack at 0x7f1234570000)
Green thread 1: Yielding...
Green thread 2: Hello! (stack at 0x7f1234580000)
...
Green thread 0: Resumed!
Green thread 0: Done!
...
Stats:
- Total green threads created: 1000
- OS threads used: 1
- Context switches performed: 5000
- Average switch time: 45 nanoseconds
For comparison:
- 1000 OS threads would use ~8GB stack memory
- 1000 green threads used ~8MB stack memory
Learning Milestones
- Single green thread runs to completion → You understand stack allocation
- Two threads yield to each other → You understand context switching
- Round-robin scheduler works → You understand scheduling
- 1000 threads work smoothly → You understand why green threads scale
Implementation Hints
Using ucontext.h (simpler approach)
#include <ucontext.h>
#include <stdlib.h>
#include <stdio.h>
#define STACK_SIZE 16384
#define MAX_THREADS 64
typedef struct {
ucontext_t context;
char stack[STACK_SIZE];
int id;
int finished;
} green_thread_t;
green_thread_t threads[MAX_THREADS];
int current_thread = 0;
int num_threads = 0;
ucontext_t main_context;
void gt_yield() {
int prev = current_thread;
// Find next runnable thread (round-robin)
do {
current_thread = (current_thread + 1) % num_threads;
} while (threads[current_thread].finished && current_thread != prev);
if (current_thread != prev) {
swapcontext(&threads[prev].context, &threads[current_thread].context);
}
}
void gt_create(void (*func)(void*), void* arg) {
green_thread_t* t = &threads[num_threads];
t->id = num_threads;
t->finished = 0;
getcontext(&t->context);
t->context.uc_stack.ss_sp = t->stack;
t->context.uc_stack.ss_size = STACK_SIZE;
t->context.uc_link = &main_context;
makecontext(&t->context, (void (*)())func, 1, arg);
num_threads++;
}
void thread_func(void* arg) {
int id = *(int*)arg;
for (int i = 0; i < 3; i++) {
printf("Thread %d: iteration %d\n", id, i);
gt_yield();
}
printf("Thread %d: done!\n", id);
threads[id].finished = 1;
}
Raw Assembly Context Switch (x86-64)
// Context: saved registers
typedef struct {
uint64_t rsp;
uint64_t r15;
uint64_t r14;
uint64_t r13;
uint64_t r12;
uint64_t rbx;
uint64_t rbp;
} context_t;
// Assembly function to switch contexts
// Saves current registers, restores target's registers
extern void context_switch(context_t* old, context_t* new);
; context_switch.s (NASM syntax)
global context_switch
context_switch:
; Save old context (rdi = old)
mov [rdi + 0], rsp
mov [rdi + 8], r15
mov [rdi + 16], r14
mov [rdi + 24], r13
mov [rdi + 32], r12
mov [rdi + 40], rbx
mov [rdi + 48], rbp
; Restore new context (rsi = new)
mov rsp, [rsi + 0]
mov r15, [rsi + 8]
mov r14, [rsi + 16]
mov r13, [rsi + 24]
mov r12, [rsi + 32]
mov rbx, [rsi + 40]
mov rbp, [rsi + 48]
ret
Green Thread vs OS Thread Comparison
| Aspect | OS Thread (kernel) | Green Thread (user) |
|---|---|---|
| Created by | pthread_create() |
Your library |
| Scheduled by | OS kernel | Your scheduler |
| Context switch | ~1-10 μs (syscall) | ~50-100 ns (function call) |
| Stack size | 1-8 MB default | You decide (8KB typical) |
| 10,000 threads | 10-80 GB memory | 80 MB memory |
| Blocking syscall | Only that thread blocks | ALL green threads block! |
Project 5: Dining Philosophers Visualizer
- Main Programming Language: C
- Alternative Programming Languages: Rust, Python, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Deadlock / Synchronization
- Software or Tool: ncurses or terminal graphics
- Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau
What you’ll build
An animated terminal visualization of the Dining Philosophers problem, where you can toggle between naive (deadlock-prone), ordered locking, and semaphore-based solutions, watching deadlocks happen in real-time.
Why it teaches deadlocks
Deadlock is abstract until you see it. Watching 5 philosophers all grab their left fork and freeze forever makes the concept visceral. Then implementing fixes (resource ordering, arbitrator semaphore) teaches how to prevent them.
Core challenges you’ll face
- Deadlock creation (circular wait) → maps to Coffman conditions
- Deadlock prevention (resource ordering) → maps to deadlock avoidance
- Livelock (philosophers keep retrying) → maps to liveness properties
- Starvation (one philosopher never eats) → maps to fairness
Key Concepts
| Concept | Resource |
|---|---|
| Dining Philosophers Problem | Operating Systems: Three Easy Pieces Chapter 31 - OSTEP |
| Deadlock Conditions | Operating System Concepts Chapter 7 - Silberschatz |
| Semaphore Solutions | Day 12: Mutex and Semaphores - Exploring OS |
Difficulty & Time
- Difficulty: Intermediate
- Time estimate: 1 week
- Prerequisites: Project 2-3 completed
Real World Outcome
$ ./dining_philosophers --mode naive
🍝 Dining Philosophers 🍝
P0 P1
🧠 🧠
🍴 🍴 🍴 🍴
\ /
\ /
P4 🧠--🍽️--🧠 P2
/ \
/ \
🍴 🍴
🧠
P3
Status: [Mode: NAIVE - Deadlock prone!]
P0: 🍴 HOLDING left fork, WAITING for right
P1: 🍴 HOLDING left fork, WAITING for right
P2: 🍴 HOLDING left fork, WAITING for right
P3: 🍴 HOLDING left fork, WAITING for right
P4: 🍴 HOLDING left fork, WAITING for right
⚠️ DEADLOCK DETECTED! All philosophers stuck!
Press 's' for semaphore solution, 'o' for ordered locking...
Learning Milestones
- Deadlock happens visually → You understand circular wait
- Resource ordering prevents deadlock → You understand prevention
- Semaphore (max 4 eating) works → You understand limiting concurrency
- You can explain all 4 Coffman conditions → Mastery
The Four Coffman Conditions for Deadlock
All four must be present for deadlock to occur:
- Mutual Exclusion: Resources cannot be shared (fork can only be held by one philosopher)
- Hold and Wait: Process holds one resource while waiting for another
- No Preemption: Resources cannot be forcibly taken away
- Circular Wait: P0 waits for P1, P1 waits for P2, … P4 waits for P0
Implementation Hints
Naive Solution (causes deadlock)
void philosopher_naive(int id) {
int left = id;
int right = (id + 1) % NUM_PHILOSOPHERS;
while (1) {
think(id);
pthread_mutex_lock(&forks[left]); // Grab left fork
printf("P%d: got left fork\n", id);
usleep(1000); // Small delay makes deadlock more likely
pthread_mutex_lock(&forks[right]); // Grab right fork (DEADLOCK!)
printf("P%d: got right fork\n", id);
eat(id);
pthread_mutex_unlock(&forks[right]);
pthread_mutex_unlock(&forks[left]);
}
}
Solution 1: Resource Ordering
void philosopher_ordered(int id) {
int left = id;
int right = (id + 1) % NUM_PHILOSOPHERS;
// Always lock lower-numbered fork first
int first = (left < right) ? left : right;
int second = (left < right) ? right : left;
while (1) {
think(id);
pthread_mutex_lock(&forks[first]);
pthread_mutex_lock(&forks[second]);
eat(id);
pthread_mutex_unlock(&forks[second]);
pthread_mutex_unlock(&forks[first]);
}
}
Solution 2: Semaphore (limit to N-1)
sem_t table_seats; // Initialized to NUM_PHILOSOPHERS - 1
void philosopher_semaphore(int id) {
int left = id;
int right = (id + 1) % NUM_PHILOSOPHERS;
while (1) {
think(id);
sem_wait(&table_seats); // Only N-1 can try to eat at once
pthread_mutex_lock(&forks[left]);
pthread_mutex_lock(&forks[right]);
eat(id);
pthread_mutex_unlock(&forks[right]);
pthread_mutex_unlock(&forks[left]);
sem_post(&table_seats);
}
}
Project 6: Thread Pool with Work Stealing
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Parallel Computing / Load Balancing
- Software or Tool: Lock-free queues, work stealing deques
- Main Book: C++ Concurrency in Action by Anthony Williams
What you’ll build
A production-quality thread pool where worker threads can “steal” tasks from each other’s queues when idle, maximizing CPU utilization and reducing latency.
Why it teaches advanced concurrency
Real systems use thread pools, not raw threads. Work stealing is how Go, Tokio (Rust), and Java’s ForkJoin work. Building this teaches you lock-free data structures, cache locality, and real-world parallel performance.
Core challenges you’ll face
- Thread-safe queue (multiple threads push/pop) → maps to concurrent data structures
- Work stealing deque (steal from back, pop from front) → maps to load balancing
- Lock-free implementation (CAS-based operations) → maps to lock-free programming
- Cache-friendly design (avoid false sharing) → maps to performance optimization
Key Concepts
| Concept | Resource |
|---|---|
| Thread Pools | C++ Concurrency in Action Chapter 9 - Anthony Williams |
| Work Stealing | “Scheduling Multithreaded Computations by Work Stealing” - Blumofe & Leiserson (paper) |
| Lock-Free Queues | Rust Atomics and Locks Chapter 6 - Mara Bos |
Difficulty & Time
- Difficulty: Expert
- Time estimate: 2-3 weeks
- Prerequisites: Projects 1-4 completed
Real World Outcome
$ ./thread_pool_bench --workers 8 --tasks 100000
=== Thread Pool Benchmark ===
[Naive approach: spawn thread per task]
Time: 45.2s
Threads created: 100000
Context switches: 2,340,000
[Thread pool with single queue]
Time: 2.1s
Threads created: 8
Context switches: 12,000
Queue contention: HIGH (single lock)
[Thread pool with work stealing]
Time: 0.8s
Threads created: 8
Context switches: 9,000
Load balance: 12.5% per worker (perfect!)
Steals performed: 3,421
Work stealing is 56x faster than naive, 2.6x faster than single queue!
Learning Milestones
- Basic thread pool works → You understand pooling
- Per-worker queues reduce contention → You understand lock contention
- Work stealing balances load → You understand dynamic scheduling
- Benchmark shows real speedup → You understand parallel efficiency
Architecture
┌─────────────────────────────────────────┐
│ Thread Pool │
└─────────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Worker 0 │ │ Worker 1 │ │ Worker 2 │
│ │ │ │ │ │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │ Deque │ │◄─STEAL───│ │ Deque │ │───STEAL─►│ │ Deque │ │
│ │ │ │ │ │ T T T │ │ │ │ │ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │
└────────────┘ └────────────┘ └────────────┘
│ │ │
│ Pop from front │ Steal from back │
▼ ▼ ▼
[Execute] [Execute] [Execute]
Implementation Hints
Work-Stealing Deque (Chase-Lev)
#include <stdatomic.h>
#define DEQUE_SIZE 1024
typedef struct {
atomic_size_t top; // Thieves steal from here
atomic_size_t bottom; // Owner pushes/pops here
void* buffer[DEQUE_SIZE];
} work_stealing_deque_t;
// Owner: push task (only owner calls this)
void deque_push(work_stealing_deque_t* d, void* task) {
size_t b = atomic_load(&d->bottom);
d->buffer[b % DEQUE_SIZE] = task;
atomic_thread_fence(memory_order_release);
atomic_store(&d->bottom, b + 1);
}
// Owner: pop task (only owner calls this)
void* deque_pop(work_stealing_deque_t* d) {
size_t b = atomic_load(&d->bottom) - 1;
atomic_store(&d->bottom, b);
atomic_thread_fence(memory_order_seq_cst);
size_t t = atomic_load(&d->top);
if (t <= b) {
void* task = d->buffer[b % DEQUE_SIZE];
if (t == b) {
// Last element - race with stealers
if (!atomic_compare_exchange_strong(&d->top, &t, t + 1)) {
task = NULL; // Lost race
}
atomic_store(&d->bottom, b + 1);
}
return task;
}
atomic_store(&d->bottom, b + 1);
return NULL;
}
// Thief: steal task (other workers call this)
void* deque_steal(work_stealing_deque_t* d) {
size_t t = atomic_load(&d->top);
atomic_thread_fence(memory_order_seq_cst);
size_t b = atomic_load(&d->bottom);
if (t < b) {
void* task = d->buffer[t % DEQUE_SIZE];
if (!atomic_compare_exchange_strong(&d->top, &t, t + 1)) {
return NULL; // Lost race with another stealer
}
return task;
}
return NULL;
}
Thread Pool Structure
typedef struct {
pthread_t* threads;
work_stealing_deque_t* deques; // One per worker
int num_workers;
atomic_bool shutdown;
} thread_pool_t;
void* worker_loop(void* arg) {
worker_context_t* ctx = arg;
while (!atomic_load(&ctx->pool->shutdown)) {
// Try to get task from own deque
void* task = deque_pop(&ctx->pool->deques[ctx->id]);
if (!task) {
// Own deque empty - try to steal from others
for (int i = 0; i < ctx->pool->num_workers; i++) {
if (i != ctx->id) {
task = deque_steal(&ctx->pool->deques[i]);
if (task) break;
}
}
}
if (task) {
execute_task(task);
} else {
// No work anywhere - yield CPU
sched_yield();
}
}
return NULL;
}
Project 7: Concurrent Hash Map
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Java
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Concurrent Data Structures
- Software or Tool: Lock striping, RCU
- Main Book: C++ Concurrency in Action by Anthony Williams
What you’ll build
A high-performance concurrent hash map that supports concurrent reads and writes from multiple threads, using techniques like lock striping, read-write locks, and optionally RCU (Read-Copy-Update).
Why it teaches real-world concurrency
Every serious system needs concurrent data structures. Building a concurrent hash map teaches you the trade-offs between correctness, performance, and complexity. You’ll understand why Java’s ConcurrentHashMap and Go’s sync.Map exist.
Core challenges you’ll face
- Lock striping (lock per bucket, not whole table) → maps to fine-grained locking
- Read-write locks (many readers, one writer) → maps to reader/writer problem
- Resize under contention (grow table without blocking readers) → maps to online operations
- RCU pattern (readers never block) → maps to lock-free reads
Key Concepts
| Concept | Resource |
|---|---|
| Lock Striping | C++ Concurrency in Action Chapter 6 - Anthony Williams |
| Read-Write Locks | The Linux Programming Interface Chapter 30 - Kerrisk |
| RCU Basics | “What is RCU, Fundamentally?” - Paul McKenney (LWN article) |
Difficulty & Time
- Difficulty: Expert
- Time estimate: 2-3 weeks
- Prerequisites: Projects 1-3 and basic hash table implementation
Real World Outcome
$ ./concurrent_map_bench --readers 6 --writers 2 --ops 1000000
=== Concurrent Hash Map Benchmark ===
[Single global lock]
Throughput: 234,000 ops/sec
Reader wait time: 89% (readers blocked by writers)
Writer wait time: 45%
[Lock striping (256 buckets)]
Throughput: 1,890,000 ops/sec (8x faster!)
Reader wait time: 12%
Writer wait time: 8%
[RW-lock per bucket]
Throughput: 2,340,000 ops/sec
Reader wait time: 3%
Writer wait time: 15%
[RCU-style (readers never block)]
Throughput: 4,120,000 ops/sec
Reader wait time: 0% (readers NEVER blocked!)
Writer wait time: 5%
Memory overhead: +15% (deferred reclamation)
Correctness check: All 1,000,000 operations verified ✓
Learning Milestones
- Single lock works but is slow → You understand coarse-grained locking
- Lock striping dramatically improves throughput → You understand fine-grained locking
- Read-write locks help read-heavy workloads → You understand reader/writer trade-offs
- RCU gives maximum read performance → You understand lock-free reads
Implementation Hints
Lock Striping
#define NUM_BUCKETS 256
#define NUM_LOCKS 16 // Fewer locks than buckets
typedef struct node {
char* key;
void* value;
struct node* next;
} node_t;
typedef struct {
node_t* buckets[NUM_BUCKETS];
pthread_mutex_t locks[NUM_LOCKS];
} concurrent_map_t;
static inline int get_lock_index(int bucket) {
return bucket % NUM_LOCKS; // Multiple buckets share a lock
}
void* map_get(concurrent_map_t* map, const char* key) {
int bucket = hash(key) % NUM_BUCKETS;
int lock_idx = get_lock_index(bucket);
pthread_mutex_lock(&map->locks[lock_idx]);
node_t* node = map->buckets[bucket];
while (node) {
if (strcmp(node->key, key) == 0) {
void* value = node->value;
pthread_mutex_unlock(&map->locks[lock_idx]);
return value;
}
node = node->next;
}
pthread_mutex_unlock(&map->locks[lock_idx]);
return NULL;
}
void map_put(concurrent_map_t* map, const char* key, void* value) {
int bucket = hash(key) % NUM_BUCKETS;
int lock_idx = get_lock_index(bucket);
pthread_mutex_lock(&map->locks[lock_idx]);
// Check if key exists
node_t* node = map->buckets[bucket];
while (node) {
if (strcmp(node->key, key) == 0) {
node->value = value; // Update
pthread_mutex_unlock(&map->locks[lock_idx]);
return;
}
node = node->next;
}
// Insert new node
node_t* new_node = malloc(sizeof(node_t));
new_node->key = strdup(key);
new_node->value = value;
new_node->next = map->buckets[bucket];
map->buckets[bucket] = new_node;
pthread_mutex_unlock(&map->locks[lock_idx]);
}
Read-Write Lock Version
typedef struct {
node_t* buckets[NUM_BUCKETS];
pthread_rwlock_t locks[NUM_LOCKS];
} rw_concurrent_map_t;
void* map_get(rw_concurrent_map_t* map, const char* key) {
int bucket = hash(key) % NUM_BUCKETS;
int lock_idx = get_lock_index(bucket);
pthread_rwlock_rdlock(&map->locks[lock_idx]); // Read lock
node_t* node = map->buckets[bucket];
void* value = NULL;
while (node) {
if (strcmp(node->key, key) == 0) {
value = node->value;
break;
}
node = node->next;
}
pthread_rwlock_unlock(&map->locks[lock_idx]);
return value;
}
void map_put(rw_concurrent_map_t* map, const char* key, void* value) {
int bucket = hash(key) % NUM_BUCKETS;
int lock_idx = get_lock_index(bucket);
pthread_rwlock_wrlock(&map->locks[lock_idx]); // Write lock
// ... insertion logic ...
pthread_rwlock_unlock(&map->locks[lock_idx]);
}
Concurrency Trade-offs
| Approach | Read Performance | Write Performance | Complexity | Memory |
|---|---|---|---|---|
| Single lock | Poor | Poor | Low | Low |
| Lock striping | Good | Good | Medium | Low |
| RW-lock per bucket | Excellent | Moderate | Medium | Low |
| RCU | Perfect (never blocks) | Moderate | High | Higher |
Project 8: Mini Operating System Scheduler (Capstone)
- Main Programming Language: C + Assembly
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 5: Master
- Knowledge Area: Operating Systems / Kernel Development
- Software or Tool: QEMU, bare-metal x86
- Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau
What you’ll build
A minimal bare-metal kernel (runs in QEMU) that implements preemptive multitasking with a timer interrupt, process/thread scheduling, and basic synchronization primitives—all running without Linux.
Why this is the ultimate test
Everything you learned culminates here. You ARE the operating system. There’s no pthread_create()—you implement context switching, timer interrupts, and scheduling from scratch on real (emulated) hardware.
Core challenges you’ll face
- Bootloader (get code running without an OS) → maps to bare-metal programming
- Timer interrupt (preemptive scheduling) → maps to interrupt handling
- Task switching (save/restore full CPU state) → maps to context switch
- Priority scheduler (multiple scheduling algorithms) → maps to scheduling policies
- Kernel-mode synchronization (spinlocks in kernel) → maps to OS internals
Key Concepts
| Concept | Resource |
|---|---|
| Bootloaders | Operating Systems: Three Easy Pieces Appendix - OSTEP |
| Interrupts | Computer Systems: A Programmer’s Perspective Chapter 8 - Bryant & O’Hallaron |
| Context Switching | Linux Kernel Development Chapter 4 - Robert Love |
| Scheduling | Operating Systems: Three Easy Pieces Chapters 7-9 - OSTEP |
Difficulty & Time
- Difficulty: Master
- Time estimate: 1-2 months
- Prerequisites: All previous projects completed, assembly knowledge
Real World Outcome
Booting mini-OS in QEMU...
[KERNEL] Timer interrupt handler installed
[KERNEL] Scheduler initialized (Round Robin)
[KERNEL] Creating Task 1: counter
[KERNEL] Creating Task 2: printer
[KERNEL] Creating Task 3: beeper
[Task 1] Counter: 0
[Task 2] Printing: Hello from Task 2!
[SCHEDULER] Timer tick! Switching 1 -> 2
[Task 2] Printing: Hello from Task 2!
[Task 3] BEEP!
[SCHEDULER] Timer tick! Switching 2 -> 3
[Task 3] BEEP!
[Task 1] Counter: 1
[SCHEDULER] Timer tick! Switching 3 -> 1
...
Press 'p' to show process table:
PID STATE PRIORITY TICKS NAME
1 RUNNING 5 42 counter
2 READY 5 38 printer
3 BLOCKED 5 25 beeper (waiting on semaphore)
Press 'd' to trigger deadlock demo...
Learning Milestones
- Kernel boots and prints → You understand bare-metal
- Timer interrupt fires → You understand hardware interrupts
- Two tasks switch preemptively → You ARE the scheduler
- Semaphores block/wake tasks → You understand kernel synchronization
- Priority scheduling works → Operating systems are demystified
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ Your Mini-OS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Task 1 │ │ Task 2 │ │ Task 3 │ │
│ │ (counter) │ │ (printer) │ │ (beeper) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └────────────────┼─────────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Scheduler │ ◄─── Timer Interrupt │
│ └─────┬─────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ │ │ │ │
│ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ │
│ │ Mutex │ │ Semaphore │ │ Spinlock │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────┤
│ Interrupt Handlers │ Memory Management │ I/O (VGA, Keyboard) │
├─────────────────────────────────────────────────────────────────┤
│ Hardware (via QEMU) │
│ x86 CPU │ PIC │ PIT (Timer) │ VGA │ Keyboard │
└─────────────────────────────────────────────────────────────────┘
Implementation Roadmap
Phase 1: Bootloader & Basic Kernel
; boot.asm - Bootloader (loaded at 0x7C00)
[BITS 16]
[ORG 0x7C00]
start:
; Set up segments
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
; Load kernel from disk
mov ah, 0x02 ; BIOS read sectors
mov al, 10 ; 10 sectors
mov ch, 0 ; Cylinder 0
mov cl, 2 ; Sector 2 (1-indexed)
mov dh, 0 ; Head 0
mov bx, 0x1000 ; Load at 0x1000
int 0x13
; Switch to protected mode
cli
lgdt [gdt_descriptor]
mov eax, cr0
or eax, 1
mov cr0, eax
jmp 0x08:protected_mode
[BITS 32]
protected_mode:
; Set up segment registers
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
mov esp, 0x90000
; Jump to kernel
jmp 0x1000
; GDT
gdt_start:
dq 0 ; Null descriptor
gdt_code:
dw 0xFFFF, 0x0000
db 0x00, 0x9A, 0xCF, 0x00
gdt_data:
dw 0xFFFF, 0x0000
db 0x00, 0x92, 0xCF, 0x00
gdt_end:
gdt_descriptor:
dw gdt_end - gdt_start - 1
dd gdt_start
times 510-($-$$) db 0
dw 0xAA55 ; Boot signature
Phase 2: Timer Interrupt
// interrupts.c
#include <stdint.h>
#define PIC1_COMMAND 0x20
#define PIC1_DATA 0x21
#define PIT_COMMAND 0x43
#define PIT_DATA 0x40
void timer_handler() {
// Acknowledge interrupt
outb(PIC1_COMMAND, 0x20);
// Call scheduler
schedule();
}
void setup_timer(uint32_t frequency) {
uint32_t divisor = 1193180 / frequency;
outb(PIT_COMMAND, 0x36);
outb(PIT_DATA, divisor & 0xFF);
outb(PIT_DATA, (divisor >> 8) & 0xFF);
}
// IDT entry for timer (IRQ 0 -> INT 32)
void setup_idt() {
idt[32].handler_low = (uint32_t)timer_handler & 0xFFFF;
idt[32].handler_high = (uint32_t)timer_handler >> 16;
idt[32].selector = 0x08;
idt[32].flags = 0x8E;
lidt(&idt_descriptor);
}
Phase 3: Task Structure & Context Switch
// task.h
typedef struct {
uint32_t eax, ebx, ecx, edx;
uint32_t esi, edi, ebp, esp;
uint32_t eip, eflags;
uint32_t cs, ds, es, fs, gs, ss;
} registers_t;
typedef enum {
TASK_READY,
TASK_RUNNING,
TASK_BLOCKED,
TASK_TERMINATED
} task_state_t;
typedef struct task {
int pid;
char name[32];
task_state_t state;
int priority;
registers_t regs;
uint32_t* stack;
struct task* next;
} task_t;
// In assembly
extern void context_switch(registers_t* old, registers_t* new);
Phase 4: Scheduler
// scheduler.c
task_t* current_task = NULL;
task_t* ready_queue = NULL;
void schedule() {
if (!current_task) return;
// Save current task state (done in interrupt handler)
// Move current to end of queue if still runnable
if (current_task->state == TASK_RUNNING) {
current_task->state = TASK_READY;
enqueue(&ready_queue, current_task);
}
// Get next task
task_t* next = dequeue(&ready_queue);
if (!next) {
next = idle_task; // Never block completely
}
next->state = TASK_RUNNING;
current_task = next;
// Context switch happens on return from interrupt
}
Phase 5: Kernel Semaphore
// sync.c
typedef struct {
int count;
task_t* wait_queue;
spinlock_t lock;
} ksemaphore_t;
void ksem_wait(ksemaphore_t* sem) {
spinlock_acquire(&sem->lock);
while (sem->count <= 0) {
// Block current task
current_task->state = TASK_BLOCKED;
enqueue(&sem->wait_queue, current_task);
spinlock_release(&sem->lock);
schedule(); // Switch to another task
spinlock_acquire(&sem->lock);
}
sem->count--;
spinlock_release(&sem->lock);
}
void ksem_signal(ksemaphore_t* sem) {
spinlock_acquire(&sem->lock);
sem->count++;
if (sem->wait_queue) {
task_t* task = dequeue(&sem->wait_queue);
task->state = TASK_READY;
enqueue(&ready_queue, task);
}
spinlock_release(&sem->lock);
}
Build & Run
# Makefile
CC = i686-elf-gcc
AS = nasm
LD = i686-elf-ld
QEMU = qemu-system-i386
CFLAGS = -ffreestanding -O2 -Wall -Wextra
LDFLAGS = -T linker.ld -nostdlib
all: os.img
boot.bin: boot.asm
$(AS) -f bin $< -o $@
kernel.o: kernel.c
$(CC) $(CFLAGS) -c $< -o $@
kernel.bin: kernel.o
$(LD) $(LDFLAGS) $^ -o $@
os.img: boot.bin kernel.bin
cat $^ > $@
truncate -s 1440k $@
run: os.img
$(QEMU) -fda $< -monitor stdio
clean:
rm -f *.bin *.o *.img
Essential Resources Summary
Books (Priority Order)
- Operating Systems: Three Easy Pieces - FREE online, the best OS book
- Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron - Memory and low-level understanding
- The Linux Programming Interface by Michael Kerrisk - The bible for Linux system programming
- Rust Atomics and Locks by Mara Bos - Excellent even for C programmers on atomics/concurrency
- C++ Concurrency in Action by Anthony Williams - Best book on practical concurrency patterns
- Linux Kernel Development by Robert Love - When you’re ready for real kernel code
Online Tutorials
- Green threads explained (c9x.me) - For Project 4
- Threads, Mutexes and Concurrent Programming in C - For Project 2
- Barrgroup: Mutexes and Semaphores Demystified - Historical context
Papers (Advanced)
- “Scheduling Multithreaded Computations by Work Stealing” - Blumofe & Leiserson
- “What is RCU, Fundamentally?” - Paul McKenney (LWN article series)
Key Conceptual Distinctions
| Question | Answer |
|---|---|
| Thread vs Process? | Threads share memory, processes are isolated |
| Concurrency vs Parallelism? | Concurrency = structure (dealing with many things), Parallelism = execution (doing many things simultaneously) |
| Mutex vs Semaphore? | Mutex = locking (ownership), Semaphore = signaling (counting) |
| Kernel vs User thread? | Kernel thread = OS schedules, User thread = your code schedules |
| Spinlock vs Mutex? | Spinlock = busy-wait (wastes CPU), Mutex = sleep (efficient) |
| Green thread vs Coroutine? | Green thread = scheduled by runtime, Coroutine = explicit yield by programmer |
| Race condition vs Deadlock? | Race = wrong order causes wrong result, Deadlock = circular wait causes freeze |
| Preemptive vs Cooperative? | Preemptive = OS forces switch (timer), Cooperative = task chooses when to yield |
Appendix: Difficulty & Coolness Scales
Difficulty Levels
| Level | Name | Description |
|---|---|---|
| 1 | Beginner | Basic syntax, straightforward logic, single-purpose tools |
| 2 | Intermediate | State management, async operations, third-party libraries |
| 3 | Advanced | Concurrency, memory management, complex architectures |
| 4 | Expert | Kernel-level, assembly, complex mathematical models |
| 5 | Master | Self-referential systems, cryptography, bootstrapping |
Coolness Levels
| Level | Vibe | Description |
|---|---|---|
| 1 | “This looks like work” | Vital for career but zero flare |
| 2 | “Neat, I guess” | Standard hobbyist projects |
| 3 | “Hey, that’s pretty smart” | Mini versions of famous tools |
| 4 | “Woah, you did that yourself?” | Assembly, kernel modules, bare-metal |
| 5 | “Shut up and take my upvote!” | Visually stunning or technically impossible-looking |
Business Potential
| Level | Model | Description |
|---|---|---|
| 1 | Resume Gold | Gets you a $200k+ job |
| 2 | Micro-SaaS | $5-10/month subscriptions |
| 3 | Service & Support | $500-2000/month retainers |
| 4 | Open Core | Free engine, paid control plane |
| 5 | Industry Disruptor | VC-backable platform play |