CONCURRENCY THREADS SEMAPHORES PROJECTS

Learning Concurrency, Parallelism, Threads & Synchronization

Goal: Deeply understand how threads, processes, semaphores, and all concurrency primitives work at the operating system level through hands-on projects.

Core Concept Overview

Before diving into projects, here’s the mental map of what you’re learning:

Concept	What It Is	Why It’s Confusing
Process	Independent program with its own memory space	People conflate it with “program” or “thread”
Thread	Execution unit within a process, shares memory	There are multiple types of threads
Kernel Thread	Thread managed by the OS kernel (1:1 model)	The OS does the scheduling
Green Thread/User Thread	Thread managed in user-space (N:1 model)	Your code does the scheduling
Concurrency	Dealing with multiple things at once (structure)	Often confused with parallelism
Parallelism	Doing multiple things at once (execution)	Requires multiple CPU cores
Semaphore	Counter that controls access to resources	It’s not just a mutex with a count
Mutex	Lock that ensures only one thread enters critical section	Semaphore with count=1

The key insight: Threads share memory within a process. This sharing is both the power (fast communication) and the danger (race conditions, deadlocks).

Learning Path

Phase 1: Foundations (2-3 weeks)

Project 1: Process Forker & IPC Visualizer
Project 2: Multi-threaded Task Executor

Phase 2: Deep Dive (3-4 weeks)

Project 3: Build Your Own Mutex & Semaphore
Project 5: Dining Philosophers Visualizer

Phase 3: Mastery (4-6 weeks)

Project 4: Green Thread Library
Project 6: Thread Pool with Work Stealing
Project 7: Concurrent Hash Map

Phase 4: Capstone

Project 8: Mini Operating System Scheduler

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
Process Forker & IPC	⭐⭐	Weekend	Foundational	⭐⭐⭐
Multi-threaded Executor	⭐⭐⭐	1 week	Core threading	⭐⭐⭐⭐
Build Mutex & Semaphore	⭐⭐⭐⭐	1-2 weeks	Deep primitives	⭐⭐⭐⭐⭐
Green Thread Library	⭐⭐⭐⭐⭐	2-3 weeks	Expert-level	⭐⭐⭐⭐⭐
Dining Philosophers Viz	⭐⭐⭐	1 week	Deadlock mastery	⭐⭐⭐⭐⭐
Thread Pool + Work Stealing	⭐⭐⭐⭐	2-3 weeks	Production patterns	⭐⭐⭐⭐
Concurrent Hash Map	⭐⭐⭐⭐	2-3 weeks	Data structure concurrency	⭐⭐⭐⭐
Mini OS Scheduler	⭐⭐⭐⭐⭐	1-2 months	Complete mastery	⭐⭐⭐⭐⭐

Project 1: Process Forker & IPC Visualizer

Main Programming Language: C
Alternative Programming Languages: Rust, Go, Python
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Operating Systems / Process Management
Software or Tool: Unix fork/exec syscalls
Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

A visual tool that forks child processes, displays their PIDs, shows memory isolation (same variable address, different values), and demonstrates IPC via pipes and shared memory.

Why it teaches processes

You can’t understand threads until you understand what they’re not. A process has its own memory space—when you fork, the child gets a copy. By printing the same variable’s address and value in parent/child, you’ll see “same address, different value” which clicks the concept of virtual memory isolation.

Core challenges you’ll face

Fork semantics (child gets copy of memory, returns different PID) → maps to process isolation
Pipe communication (why can’t parent just write to child’s variable?) → maps to IPC necessity
Shared memory setup (shmget/mmap) → maps to explicit memory sharing
Zombie processes (forgetting to wait()) → maps to process lifecycle

Key Concepts

Concept	Resource
Process Creation	Operating Systems: Three Easy Pieces Chapter 5 - OSTEP
Virtual Memory Basics	Computer Systems: A Programmer’s Perspective Chapter 9 - Bryant & O’Hallaron
IPC Mechanisms	The Linux Programming Interface Chapters 44-49 - Michael Kerrisk

Difficulty & Time

Difficulty: Beginner-Intermediate
Time estimate: Weekend
Prerequisites: Basic C, comfort with command line

Real World Outcome

$ ./process_viz
[Parent PID: 1234] Variable 'counter' at 0x7fff5678 = 0
[Forking...]
[Child  PID: 1235] Variable 'counter' at 0x7fff5678 = 0  # Same address!
[Child] Incrementing counter...
[Child  PID: 1235] Variable 'counter' at 0x7fff5678 = 100
[Parent PID: 1234] Variable 'counter' at 0x7fff5678 = 0   # Still 0! Isolated!

[Demo: Pipe IPC]
Parent sends: "Hello from parent"
Child receives: "Hello from parent"

[Demo: Shared Memory]
Parent writes 42 to shared memory
Child reads 42 from shared memory
Parent sees child wrote 99 back

Learning Milestones

Fork works, child has different PID → You understand process creation
Same address, different values after fork → You understand memory isolation
Pipes working → You understand why IPC is necessary
Shared memory without race conditions → You’re ready for threads

Implementation Hints

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/mman.h>

int main() {
    int counter = 0;

    printf("[Parent PID: %d] Variable 'counter' at %p = %d\n",
           getpid(), (void*)&counter, counter);

    pid_t pid = fork();

    if (pid == 0) {
        // Child process
        printf("[Child  PID: %d] Variable 'counter' at %p = %d\n",
               getpid(), (void*)&counter, counter);
        counter = 100;
        printf("[Child] After increment: counter = %d\n", counter);
        exit(0);
    } else {
        // Parent process
        wait(NULL);  // Wait for child to finish
        printf("[Parent PID: %d] Variable 'counter' at %p = %d\n",
               getpid(), (void*)&counter, counter);
        // counter is still 0! Memory is isolated!
    }

    return 0;
}

Key System Calls to Master

System Call	Purpose
`fork()`	Create a child process (copy of parent)
`exec*()`	Replace current process with new program
`wait()` / `waitpid()`	Wait for child to terminate
`pipe()`	Create a unidirectional communication channel
`mmap()`	Create shared memory region
`shmget()` / `shmat()`	System V shared memory

Project 2: Multi-threaded Task Executor (pthreads)

Main Programming Language: C
Alternative Programming Languages: Rust, C++, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Concurrency / Thread Management
Software or Tool: POSIX Threads (pthreads)
Main Book: The Linux Programming Interface by Michael Kerrisk

What you’ll build

A command-line task executor that spawns N threads, each pulling tasks from a shared queue, with visible race conditions when you forget to lock, and proper synchronization when you add mutexes.

Why it teaches kernel threads

This is where threads become real. You’ll see that threads share memory by default (unlike processes). You’ll intentionally create race conditions, watch the chaos, then fix them. This visceral experience of “broken → fixed” is the best teacher.

Core challenges you’ll face

Race conditions (counter increments “lose” updates) → maps to critical sections
Mutex usage (lock before access, unlock after) → maps to mutual exclusion
Thread joining (waiting for all threads to finish) → maps to thread lifecycle
Shared queue access (producer-consumer pattern) → maps to concurrent data structures

Resources for Key Challenges

Threads, Mutexes and Concurrent Programming in C - Excellent walkthrough of pthread mutex usage

Key Concepts

Concept	Resource
Thread Creation & Joining	The Linux Programming Interface Chapter 29 - Kerrisk
Race Conditions	Operating Systems: Three Easy Pieces Chapter 26 - OSTEP
Mutexes	C++ Concurrency in Action Chapter 3 - Anthony Williams

Difficulty & Time

Difficulty: Intermediate
Time estimate: 1 week
Prerequisites: Project 1 completed, basic C

Real World Outcome

# First run: NO LOCKS (intentional race condition)
$ ./task_executor --threads 4 --tasks 10000 --no-lock
Starting 4 threads to process 10000 tasks...
Thread 0: processing task 0
Thread 1: processing task 0  # OOPS! Same task!
Thread 2: processing task 1
...
Expected completed: 10000
Actual completed: 9847  # RACE CONDITION! Lost updates!

# Second run: WITH LOCKS
$ ./task_executor --threads 4 --tasks 10000
Starting 4 threads to process 10000 tasks...
Thread 0: processing task 0
Thread 1: processing task 1
Thread 2: processing task 2
...
Expected completed: 10000
Actual completed: 10000  # CORRECT!
Total time: 0.23s (parallel speedup!)

Learning Milestones

Threads share memory (see same variable) → You understand thread vs process
Race condition breaks count → You understand critical sections
Mutex fixes the race → You understand synchronization
Parallel speedup measurable → You understand concurrency vs parallelism

Implementation Hints

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS 4
#define NUM_INCREMENTS 1000000

long shared_counter = 0;
pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
int use_mutex = 1;

void* increment_counter(void* arg) {
    int thread_id = *(int*)arg;

    for (int i = 0; i < NUM_INCREMENTS; i++) {
        if (use_mutex) {
            pthread_mutex_lock(&counter_mutex);
            shared_counter++;
            pthread_mutex_unlock(&counter_mutex);
        } else {
            shared_counter++;  // Race condition!
        }
    }

    printf("Thread %d finished\n", thread_id);
    return NULL;
}

int main(int argc, char* argv[]) {
    if (argc > 1 && strcmp(argv[1], "--no-lock") == 0) {
        use_mutex = 0;
        printf("Running WITHOUT mutex (expect race condition)\n");
    } else {
        printf("Running WITH mutex\n");
    }

    pthread_t threads[NUM_THREADS];
    int thread_ids[NUM_THREADS];

    for (int i = 0; i < NUM_THREADS; i++) {
        thread_ids[i] = i;
        pthread_create(&threads[i], NULL, increment_counter, &thread_ids[i]);
    }

    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
    }

    long expected = (long)NUM_THREADS * NUM_INCREMENTS;
    printf("Expected: %ld\n", expected);
    printf("Actual:   %ld\n", shared_counter);
    printf("Lost:     %ld\n", expected - shared_counter);

    return 0;
}

Compile with: gcc -pthread -o task_executor task_executor.c

Key pthread Functions to Master

Function	Purpose
`pthread_create()`	Create a new thread
`pthread_join()`	Wait for thread to terminate
`pthread_mutex_init()`	Initialize a mutex
`pthread_mutex_lock()`	Acquire the lock (blocks if held)
`pthread_mutex_unlock()`	Release the lock
`pthread_mutex_destroy()`	Clean up mutex resources

Project 3: Build Your Own Mutex & Semaphore

Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Synchronization Primitives / Atomics
Software or Tool: Atomic operations, futex
Main Book: The Linux Programming Interface by Michael Kerrisk

What you’ll build

Your own mutex and semaphore from scratch using atomic compare-and-swap (CAS) operations, spinlocks, and optionally the Linux futex syscall for sleeping.

Why it teaches synchronization deeply

Most people use pthread_mutex_lock() as a black box. By building your own, you’ll understand:

What is a spinlock?
Why does spinning waste CPU?
What is a futex and why does it exist?
What’s the difference between a mutex and a semaphore at the implementation level?

Core challenges you’ll face

Atomic operations (why normal counter++ isn’t thread-safe) → maps to memory model
Spinlock implementation (busy-wait CAS loop) → maps to lock-free programming
Futex for sleeping (avoid wasting CPU spinning) → maps to kernel integration
Semaphore counting (allow N concurrent accessors) → maps to resource pools

Resources for Key Challenges

Barrgroup: Mutexes and Semaphores Demystified - Historical context and the difference between signaling vs locking
GeeksforGeeks: Mutex vs Semaphore - Clear comparison with examples

Key Concepts

Concept	Resource
Atomic Operations	C++ Concurrency in Action Chapter 5 - Anthony Williams
Futex Deep Dive	The Linux Programming Interface Chapter 30 - Kerrisk
Memory Barriers	Rust Atomics and Locks by Mara Bos (excellent even for C programmers)

Difficulty & Time

Difficulty: Advanced
Time estimate: 1-2 weeks
Prerequisites: Projects 1-2 completed, understanding of assembly basics

Real World Outcome

$ ./my_sync_test

=== Testing MY_MUTEX (spinlock-based) ===
Spawning 8 threads, each incrementing counter 100000 times
No mutex: counter = 634982 (WRONG - race condition)
pthread_mutex: counter = 800000 (correct)
my_mutex: counter = 800000 (correct!) ✓

=== Testing MY_SEMAPHORE ===
Semaphore initialized with count = 3
Thread 0: acquired (active: 1)
Thread 1: acquired (active: 2)
Thread 2: acquired (active: 3)
Thread 3: waiting... (sem count = 0)
Thread 0: released
Thread 3: acquired (active: 3)
...
All threads completed, max concurrent = 3 ✓

=== Spinlock vs Futex comparison ===
Spinlock (high contention): 2.3s, CPU 780%
Futex-based (high contention): 0.4s, CPU 95%
Conclusion: Futex avoids wasting CPU when waiting!

Learning Milestones

Spinlock works → You understand CAS and atomic operations
You see 800% CPU usage with spinlock → You understand why sleeping matters
Futex reduces CPU usage → You understand kernel/user-space cooperation
Semaphore limits concurrency to N → You understand counting semaphores

Implementation Hints

Spinlock Mutex

#include <stdatomic.h>
#include <stdbool.h>

typedef struct {
    atomic_flag locked;
} my_spinlock_t;

void my_spinlock_init(my_spinlock_t* lock) {
    atomic_flag_clear(&lock->locked);
}

void my_spinlock_lock(my_spinlock_t* lock) {
    // Spin until we acquire the lock
    while (atomic_flag_test_and_set(&lock->locked)) {
        // Busy-wait (wastes CPU!)
        // Could add: __builtin_ia32_pause() for x86
    }
}

void my_spinlock_unlock(my_spinlock_t* lock) {
    atomic_flag_clear(&lock->locked);
}

Futex-based Mutex (Linux)

#include <linux/futex.h>
#include <sys/syscall.h>
#include <stdatomic.h>

typedef struct {
    atomic_int state;  // 0 = unlocked, 1 = locked
} my_futex_mutex_t;

void my_futex_lock(my_futex_mutex_t* m) {
    int expected = 0;
    while (!atomic_compare_exchange_weak(&m->state, &expected, 1)) {
        // Failed to acquire - sleep until state changes
        syscall(SYS_futex, &m->state, FUTEX_WAIT, 1, NULL, NULL, 0);
        expected = 0;  // Reset for next try
    }
}

void my_futex_unlock(my_futex_mutex_t* m) {
    atomic_store(&m->state, 0);
    // Wake one waiting thread
    syscall(SYS_futex, &m->state, FUTEX_WAKE, 1, NULL, NULL, 0);
}

Counting Semaphore

typedef struct {
    atomic_int count;
    pthread_mutex_t mutex;
    pthread_cond_t cond;
} my_semaphore_t;

void my_sem_init(my_semaphore_t* sem, int initial) {
    atomic_store(&sem->count, initial);
    pthread_mutex_init(&sem->mutex, NULL);
    pthread_cond_init(&sem->cond, NULL);
}

void my_sem_wait(my_semaphore_t* sem) {
    pthread_mutex_lock(&sem->mutex);
    while (atomic_load(&sem->count) <= 0) {
        pthread_cond_wait(&sem->cond, &sem->mutex);
    }
    atomic_fetch_sub(&sem->count, 1);
    pthread_mutex_unlock(&sem->mutex);
}

void my_sem_post(my_semaphore_t* sem) {
    pthread_mutex_lock(&sem->mutex);
    atomic_fetch_add(&sem->count, 1);
    pthread_cond_signal(&sem->cond);
    pthread_mutex_unlock(&sem->mutex);
}

Key Atomic Operations to Master

Operation	Purpose
`atomic_flag_test_and_set()`	Atomically set flag and return old value
`atomic_compare_exchange_weak()`	CAS - the building block of lock-free code
`atomic_fetch_add()`	Atomically increment and return old value
`atomic_load()` / `atomic_store()`	Thread-safe read/write

Project 4: Green Thread Library (User-Space Scheduler)

Main Programming Language: C
Alternative Programming Languages: Rust, Zig
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 5: Master
Knowledge Area: Runtime Systems / Cooperative Scheduling
Software or Tool: ucontext.h, setjmp/longjmp, or raw assembly
Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

A complete green thread (user-space thread) library that manages its own stack allocations, implements cooperative scheduling (yield), and multiplexes many green threads onto a single OS thread.

Why it teaches threading deeply

This is where you become the operating system. You’ll allocate stacks, save/restore registers, and implement yield(). After this project, you’ll understand exactly what Go’s goroutines and Rust’s async/await are doing under the hood.

Core challenges you’ll face

Stack allocation (each green thread needs its own stack) → maps to memory layout
Context switching (save registers, swap stack pointer) → maps to CPU state
Scheduler implementation (round-robin, run queue) → maps to scheduling algorithms
Cooperative vs preemptive (explicit yield vs timer interrupt) → maps to scheduling models

Resources for Key Challenges

Green threads explained (c9x.me) - Hands-on implementation guide for x86-64
Wikipedia: Green thread - Conceptual overview

Key Concepts

Concept	Resource
Context Switching	Operating Systems: Three Easy Pieces Chapter 6 - OSTEP
Stack Management	Computer Systems: A Programmer’s Perspective Chapter 3 - Bryant & O’Hallaron
ucontext API	man pages + GNU C Library docs

Difficulty & Time

Difficulty: Advanced-Expert
Time estimate: 2-3 weeks
Prerequisites: Projects 1-3 completed, some assembly knowledge

Real World Outcome

$ ./green_threads_demo

=== Green Thread Library Demo ===
Creating 1000 green threads on 1 OS thread...

Green thread 0: Hello! (stack at 0x7f1234560000)
Green thread 0: Yielding...
Green thread 1: Hello! (stack at 0x7f1234570000)
Green thread 1: Yielding...
Green thread 2: Hello! (stack at 0x7f1234580000)
...
Green thread 0: Resumed!
Green thread 0: Done!
...

Stats:
- Total green threads created: 1000
- OS threads used: 1
- Context switches performed: 5000
- Average switch time: 45 nanoseconds

For comparison:
- 1000 OS threads would use ~8GB stack memory
- 1000 green threads used ~8MB stack memory

Learning Milestones

Single green thread runs to completion → You understand stack allocation
Two threads yield to each other → You understand context switching
Round-robin scheduler works → You understand scheduling
1000 threads work smoothly → You understand why green threads scale

Implementation Hints

Using ucontext.h (simpler approach)

#include <ucontext.h>
#include <stdlib.h>
#include <stdio.h>

#define STACK_SIZE 16384
#define MAX_THREADS 64

typedef struct {
    ucontext_t context;
    char stack[STACK_SIZE];
    int id;
    int finished;
} green_thread_t;

green_thread_t threads[MAX_THREADS];
int current_thread = 0;
int num_threads = 0;
ucontext_t main_context;

void gt_yield() {
    int prev = current_thread;

    // Find next runnable thread (round-robin)
    do {
        current_thread = (current_thread + 1) % num_threads;
    } while (threads[current_thread].finished && current_thread != prev);

    if (current_thread != prev) {
        swapcontext(&threads[prev].context, &threads[current_thread].context);
    }
}

void gt_create(void (*func)(void*), void* arg) {
    green_thread_t* t = &threads[num_threads];
    t->id = num_threads;
    t->finished = 0;

    getcontext(&t->context);
    t->context.uc_stack.ss_sp = t->stack;
    t->context.uc_stack.ss_size = STACK_SIZE;
    t->context.uc_link = &main_context;

    makecontext(&t->context, (void (*)())func, 1, arg);
    num_threads++;
}

void thread_func(void* arg) {
    int id = *(int*)arg;
    for (int i = 0; i < 3; i++) {
        printf("Thread %d: iteration %d\n", id, i);
        gt_yield();
    }
    printf("Thread %d: done!\n", id);
    threads[id].finished = 1;
}

Raw Assembly Context Switch (x86-64)

// Context: saved registers
typedef struct {
    uint64_t rsp;
    uint64_t r15;
    uint64_t r14;
    uint64_t r13;
    uint64_t r12;
    uint64_t rbx;
    uint64_t rbp;
} context_t;

// Assembly function to switch contexts
// Saves current registers, restores target's registers
extern void context_switch(context_t* old, context_t* new);

; context_switch.s (NASM syntax)
global context_switch

context_switch:
    ; Save old context (rdi = old)
    mov [rdi + 0],  rsp
    mov [rdi + 8],  r15
    mov [rdi + 16], r14
    mov [rdi + 24], r13
    mov [rdi + 32], r12
    mov [rdi + 40], rbx
    mov [rdi + 48], rbp

    ; Restore new context (rsi = new)
    mov rsp, [rsi + 0]
    mov r15, [rsi + 8]
    mov r14, [rsi + 16]
    mov r13, [rsi + 24]
    mov r12, [rsi + 32]
    mov rbx, [rsi + 40]
    mov rbp, [rsi + 48]

    ret

Green Thread vs OS Thread Comparison

Aspect	OS Thread (kernel)	Green Thread (user)
Created by	`pthread_create()`	Your library
Scheduled by	OS kernel	Your scheduler
Context switch	~1-10 μs (syscall)	~50-100 ns (function call)
Stack size	1-8 MB default	You decide (8KB typical)
10,000 threads	10-80 GB memory	80 MB memory
Blocking syscall	Only that thread blocks	ALL green threads block!

Project 5: Dining Philosophers Visualizer

Main Programming Language: C
Alternative Programming Languages: Rust, Python, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Deadlock / Synchronization
Software or Tool: ncurses or terminal graphics
Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

An animated terminal visualization of the Dining Philosophers problem, where you can toggle between naive (deadlock-prone), ordered locking, and semaphore-based solutions, watching deadlocks happen in real-time.

Why it teaches deadlocks

Deadlock is abstract until you see it. Watching 5 philosophers all grab their left fork and freeze forever makes the concept visceral. Then implementing fixes (resource ordering, arbitrator semaphore) teaches how to prevent them.

Core challenges you’ll face

Deadlock creation (circular wait) → maps to Coffman conditions
Deadlock prevention (resource ordering) → maps to deadlock avoidance
Livelock (philosophers keep retrying) → maps to liveness properties
Starvation (one philosopher never eats) → maps to fairness

Key Concepts

Concept	Resource
Dining Philosophers Problem	Operating Systems: Three Easy Pieces Chapter 31 - OSTEP
Deadlock Conditions	Operating System Concepts Chapter 7 - Silberschatz
Semaphore Solutions	Day 12: Mutex and Semaphores - Exploring OS

Difficulty & Time

Difficulty: Intermediate
Time estimate: 1 week
Prerequisites: Project 2-3 completed

Real World Outcome

$ ./dining_philosophers --mode naive

    🍝 Dining Philosophers 🍝

     P0          P1
      🧠          🧠
    🍴  🍴      🍴  🍴
        \      /
         \    /
    P4 🧠--🍽️--🧠 P2
         /    \
        /      \
      🍴        🍴
      🧠
     P3

Status: [Mode: NAIVE - Deadlock prone!]
P0: 🍴 HOLDING left fork, WAITING for right
P1: 🍴 HOLDING left fork, WAITING for right
P2: 🍴 HOLDING left fork, WAITING for right
P3: 🍴 HOLDING left fork, WAITING for right
P4: 🍴 HOLDING left fork, WAITING for right

⚠️  DEADLOCK DETECTED! All philosophers stuck!

Press 's' for semaphore solution, 'o' for ordered locking...

Learning Milestones

Deadlock happens visually → You understand circular wait
Resource ordering prevents deadlock → You understand prevention
Semaphore (max 4 eating) works → You understand limiting concurrency
You can explain all 4 Coffman conditions → Mastery

The Four Coffman Conditions for Deadlock

All four must be present for deadlock to occur:

Mutual Exclusion: Resources cannot be shared (fork can only be held by one philosopher)
Hold and Wait: Process holds one resource while waiting for another
No Preemption: Resources cannot be forcibly taken away
Circular Wait: P0 waits for P1, P1 waits for P2, … P4 waits for P0

Implementation Hints

Naive Solution (causes deadlock)

void philosopher_naive(int id) {
    int left = id;
    int right = (id + 1) % NUM_PHILOSOPHERS;

    while (1) {
        think(id);

        pthread_mutex_lock(&forks[left]);   // Grab left fork
        printf("P%d: got left fork\n", id);

        usleep(1000);  // Small delay makes deadlock more likely

        pthread_mutex_lock(&forks[right]);  // Grab right fork (DEADLOCK!)
        printf("P%d: got right fork\n", id);

        eat(id);

        pthread_mutex_unlock(&forks[right]);
        pthread_mutex_unlock(&forks[left]);
    }
}

Solution 1: Resource Ordering

void philosopher_ordered(int id) {
    int left = id;
    int right = (id + 1) % NUM_PHILOSOPHERS;

    // Always lock lower-numbered fork first
    int first = (left < right) ? left : right;
    int second = (left < right) ? right : left;

    while (1) {
        think(id);

        pthread_mutex_lock(&forks[first]);
        pthread_mutex_lock(&forks[second]);

        eat(id);

        pthread_mutex_unlock(&forks[second]);
        pthread_mutex_unlock(&forks[first]);
    }
}

Solution 2: Semaphore (limit to N-1)

sem_t table_seats;  // Initialized to NUM_PHILOSOPHERS - 1

void philosopher_semaphore(int id) {
    int left = id;
    int right = (id + 1) % NUM_PHILOSOPHERS;

    while (1) {
        think(id);

        sem_wait(&table_seats);  // Only N-1 can try to eat at once

        pthread_mutex_lock(&forks[left]);
        pthread_mutex_lock(&forks[right]);

        eat(id);

        pthread_mutex_unlock(&forks[right]);
        pthread_mutex_unlock(&forks[left]);

        sem_post(&table_seats);
    }
}

Project 6: Thread Pool with Work Stealing

Main Programming Language: C
Alternative Programming Languages: Rust, C++, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Parallel Computing / Load Balancing
Software or Tool: Lock-free queues, work stealing deques
Main Book: C++ Concurrency in Action by Anthony Williams

What you’ll build

A production-quality thread pool where worker threads can “steal” tasks from each other’s queues when idle, maximizing CPU utilization and reducing latency.

Why it teaches advanced concurrency

Real systems use thread pools, not raw threads. Work stealing is how Go, Tokio (Rust), and Java’s ForkJoin work. Building this teaches you lock-free data structures, cache locality, and real-world parallel performance.

Core challenges you’ll face

Thread-safe queue (multiple threads push/pop) → maps to concurrent data structures
Work stealing deque (steal from back, pop from front) → maps to load balancing
Lock-free implementation (CAS-based operations) → maps to lock-free programming
Cache-friendly design (avoid false sharing) → maps to performance optimization

Key Concepts

Concept	Resource
Thread Pools	C++ Concurrency in Action Chapter 9 - Anthony Williams
Work Stealing	“Scheduling Multithreaded Computations by Work Stealing” - Blumofe & Leiserson (paper)
Lock-Free Queues	Rust Atomics and Locks Chapter 6 - Mara Bos

Difficulty & Time

Difficulty: Expert
Time estimate: 2-3 weeks
Prerequisites: Projects 1-4 completed

Real World Outcome

$ ./thread_pool_bench --workers 8 --tasks 100000

=== Thread Pool Benchmark ===

[Naive approach: spawn thread per task]
Time: 45.2s
Threads created: 100000
Context switches: 2,340,000

[Thread pool with single queue]
Time: 2.1s
Threads created: 8
Context switches: 12,000
Queue contention: HIGH (single lock)

[Thread pool with work stealing]
Time: 0.8s
Threads created: 8
Context switches: 9,000
Load balance: 12.5% per worker (perfect!)
Steals performed: 3,421

Work stealing is 56x faster than naive, 2.6x faster than single queue!

Learning Milestones

Basic thread pool works → You understand pooling
Per-worker queues reduce contention → You understand lock contention
Work stealing balances load → You understand dynamic scheduling
Benchmark shows real speedup → You understand parallel efficiency

Architecture

                    ┌─────────────────────────────────────────┐
                    │            Thread Pool                   │
                    └─────────────────────────────────────────┘
                                      │
              ┌───────────────────────┼───────────────────────┐
              │                       │                       │
        ┌─────▼─────┐          ┌─────▼─────┐          ┌─────▼─────┐
        │  Worker 0  │          │  Worker 1  │          │  Worker 2  │
        │            │          │            │          │            │
        │ ┌────────┐ │          │ ┌────────┐ │          │ ┌────────┐ │
        │ │ Deque  │ │◄─STEAL───│ │ Deque  │ │───STEAL─►│ │ Deque  │ │
        │ │        │ │          │ │ T T T  │ │          │ │        │ │
        │ └────────┘ │          │ └────────┘ │          │ └────────┘ │
        └────────────┘          └────────────┘          └────────────┘
              │                       │                       │
              │    Pop from front     │    Steal from back    │
              ▼                       ▼                       ▼
         [Execute]               [Execute]               [Execute]

Implementation Hints

Work-Stealing Deque (Chase-Lev)

#include <stdatomic.h>

#define DEQUE_SIZE 1024

typedef struct {
    atomic_size_t top;     // Thieves steal from here
    atomic_size_t bottom;  // Owner pushes/pops here
    void* buffer[DEQUE_SIZE];
} work_stealing_deque_t;

// Owner: push task (only owner calls this)
void deque_push(work_stealing_deque_t* d, void* task) {
    size_t b = atomic_load(&d->bottom);
    d->buffer[b % DEQUE_SIZE] = task;
    atomic_thread_fence(memory_order_release);
    atomic_store(&d->bottom, b + 1);
}

// Owner: pop task (only owner calls this)
void* deque_pop(work_stealing_deque_t* d) {
    size_t b = atomic_load(&d->bottom) - 1;
    atomic_store(&d->bottom, b);
    atomic_thread_fence(memory_order_seq_cst);
    size_t t = atomic_load(&d->top);

    if (t <= b) {
        void* task = d->buffer[b % DEQUE_SIZE];
        if (t == b) {
            // Last element - race with stealers
            if (!atomic_compare_exchange_strong(&d->top, &t, t + 1)) {
                task = NULL;  // Lost race
            }
            atomic_store(&d->bottom, b + 1);
        }
        return task;
    }
    atomic_store(&d->bottom, b + 1);
    return NULL;
}

// Thief: steal task (other workers call this)
void* deque_steal(work_stealing_deque_t* d) {
    size_t t = atomic_load(&d->top);
    atomic_thread_fence(memory_order_seq_cst);
    size_t b = atomic_load(&d->bottom);

    if (t < b) {
        void* task = d->buffer[t % DEQUE_SIZE];
        if (!atomic_compare_exchange_strong(&d->top, &t, t + 1)) {
            return NULL;  // Lost race with another stealer
        }
        return task;
    }
    return NULL;
}

Thread Pool Structure

typedef struct {
    pthread_t* threads;
    work_stealing_deque_t* deques;  // One per worker
    int num_workers;
    atomic_bool shutdown;
} thread_pool_t;

void* worker_loop(void* arg) {
    worker_context_t* ctx = arg;

    while (!atomic_load(&ctx->pool->shutdown)) {
        // Try to get task from own deque
        void* task = deque_pop(&ctx->pool->deques[ctx->id]);

        if (!task) {
            // Own deque empty - try to steal from others
            for (int i = 0; i < ctx->pool->num_workers; i++) {
                if (i != ctx->id) {
                    task = deque_steal(&ctx->pool->deques[i]);
                    if (task) break;
                }
            }
        }

        if (task) {
            execute_task(task);
        } else {
            // No work anywhere - yield CPU
            sched_yield();
        }
    }
    return NULL;
}

Project 7: Concurrent Hash Map

Main Programming Language: C
Alternative Programming Languages: Rust, C++, Java
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Concurrent Data Structures
Software or Tool: Lock striping, RCU
Main Book: C++ Concurrency in Action by Anthony Williams

What you’ll build

A high-performance concurrent hash map that supports concurrent reads and writes from multiple threads, using techniques like lock striping, read-write locks, and optionally RCU (Read-Copy-Update).

Why it teaches real-world concurrency

Every serious system needs concurrent data structures. Building a concurrent hash map teaches you the trade-offs between correctness, performance, and complexity. You’ll understand why Java’s ConcurrentHashMap and Go’s sync.Map exist.

Core challenges you’ll face

Lock striping (lock per bucket, not whole table) → maps to fine-grained locking
Read-write locks (many readers, one writer) → maps to reader/writer problem
Resize under contention (grow table without blocking readers) → maps to online operations
RCU pattern (readers never block) → maps to lock-free reads

Key Concepts

Concept	Resource
Lock Striping	C++ Concurrency in Action Chapter 6 - Anthony Williams
Read-Write Locks	The Linux Programming Interface Chapter 30 - Kerrisk
RCU Basics	“What is RCU, Fundamentally?” - Paul McKenney (LWN article)

Difficulty & Time

Difficulty: Expert
Time estimate: 2-3 weeks
Prerequisites: Projects 1-3 and basic hash table implementation

Real World Outcome

$ ./concurrent_map_bench --readers 6 --writers 2 --ops 1000000

=== Concurrent Hash Map Benchmark ===

[Single global lock]
Throughput: 234,000 ops/sec
Reader wait time: 89% (readers blocked by writers)
Writer wait time: 45%

[Lock striping (256 buckets)]
Throughput: 1,890,000 ops/sec (8x faster!)
Reader wait time: 12%
Writer wait time: 8%

[RW-lock per bucket]
Throughput: 2,340,000 ops/sec
Reader wait time: 3%
Writer wait time: 15%

[RCU-style (readers never block)]
Throughput: 4,120,000 ops/sec
Reader wait time: 0% (readers NEVER blocked!)
Writer wait time: 5%
Memory overhead: +15% (deferred reclamation)

Correctness check: All 1,000,000 operations verified ✓

Learning Milestones

Single lock works but is slow → You understand coarse-grained locking
Lock striping dramatically improves throughput → You understand fine-grained locking
Read-write locks help read-heavy workloads → You understand reader/writer trade-offs
RCU gives maximum read performance → You understand lock-free reads

Implementation Hints

Lock Striping

#define NUM_BUCKETS 256
#define NUM_LOCKS 16  // Fewer locks than buckets

typedef struct node {
    char* key;
    void* value;
    struct node* next;
} node_t;

typedef struct {
    node_t* buckets[NUM_BUCKETS];
    pthread_mutex_t locks[NUM_LOCKS];
} concurrent_map_t;

static inline int get_lock_index(int bucket) {
    return bucket % NUM_LOCKS;  // Multiple buckets share a lock
}

void* map_get(concurrent_map_t* map, const char* key) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_mutex_lock(&map->locks[lock_idx]);

    node_t* node = map->buckets[bucket];
    while (node) {
        if (strcmp(node->key, key) == 0) {
            void* value = node->value;
            pthread_mutex_unlock(&map->locks[lock_idx]);
            return value;
        }
        node = node->next;
    }

    pthread_mutex_unlock(&map->locks[lock_idx]);
    return NULL;
}

void map_put(concurrent_map_t* map, const char* key, void* value) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_mutex_lock(&map->locks[lock_idx]);

    // Check if key exists
    node_t* node = map->buckets[bucket];
    while (node) {
        if (strcmp(node->key, key) == 0) {
            node->value = value;  // Update
            pthread_mutex_unlock(&map->locks[lock_idx]);
            return;
        }
        node = node->next;
    }

    // Insert new node
    node_t* new_node = malloc(sizeof(node_t));
    new_node->key = strdup(key);
    new_node->value = value;
    new_node->next = map->buckets[bucket];
    map->buckets[bucket] = new_node;

    pthread_mutex_unlock(&map->locks[lock_idx]);
}

Read-Write Lock Version

typedef struct {
    node_t* buckets[NUM_BUCKETS];
    pthread_rwlock_t locks[NUM_LOCKS];
} rw_concurrent_map_t;

void* map_get(rw_concurrent_map_t* map, const char* key) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_rwlock_rdlock(&map->locks[lock_idx]);  // Read lock

    node_t* node = map->buckets[bucket];
    void* value = NULL;
    while (node) {
        if (strcmp(node->key, key) == 0) {
            value = node->value;
            break;
        }
        node = node->next;
    }

    pthread_rwlock_unlock(&map->locks[lock_idx]);
    return value;
}

void map_put(rw_concurrent_map_t* map, const char* key, void* value) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_rwlock_wrlock(&map->locks[lock_idx]);  // Write lock

    // ... insertion logic ...

    pthread_rwlock_unlock(&map->locks[lock_idx]);
}

Concurrency Trade-offs

Approach	Read Performance	Write Performance	Complexity	Memory
Single lock	Poor	Poor	Low	Low
Lock striping	Good	Good	Medium	Low
RW-lock per bucket	Excellent	Moderate	Medium	Low
RCU	Perfect (never blocks)	Moderate	High	Higher

Project 8: Mini Operating System Scheduler (Capstone)

Main Programming Language: C + Assembly
Alternative Programming Languages: Rust, Zig
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 5: Master
Knowledge Area: Operating Systems / Kernel Development
Software or Tool: QEMU, bare-metal x86
Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

A minimal bare-metal kernel (runs in QEMU) that implements preemptive multitasking with a timer interrupt, process/thread scheduling, and basic synchronization primitives—all running without Linux.

Why this is the ultimate test

Everything you learned culminates here. You ARE the operating system. There’s no pthread_create()—you implement context switching, timer interrupts, and scheduling from scratch on real (emulated) hardware.

Core challenges you’ll face

Bootloader (get code running without an OS) → maps to bare-metal programming
Timer interrupt (preemptive scheduling) → maps to interrupt handling
Task switching (save/restore full CPU state) → maps to context switch
Priority scheduler (multiple scheduling algorithms) → maps to scheduling policies
Kernel-mode synchronization (spinlocks in kernel) → maps to OS internals

Key Concepts

Concept	Resource
Bootloaders	Operating Systems: Three Easy Pieces Appendix - OSTEP
Interrupts	Computer Systems: A Programmer’s Perspective Chapter 8 - Bryant & O’Hallaron
Context Switching	Linux Kernel Development Chapter 4 - Robert Love
Scheduling	Operating Systems: Three Easy Pieces Chapters 7-9 - OSTEP

Difficulty & Time

Difficulty: Master
Time estimate: 1-2 months
Prerequisites: All previous projects completed, assembly knowledge

Real World Outcome

Booting mini-OS in QEMU...

[KERNEL] Timer interrupt handler installed
[KERNEL] Scheduler initialized (Round Robin)

[KERNEL] Creating Task 1: counter
[KERNEL] Creating Task 2: printer
[KERNEL] Creating Task 3: beeper

[Task 1] Counter: 0
[Task 2] Printing: Hello from Task 2!
[SCHEDULER] Timer tick! Switching 1 -> 2
[Task 2] Printing: Hello from Task 2!
[Task 3] BEEP!
[SCHEDULER] Timer tick! Switching 2 -> 3
[Task 3] BEEP!
[Task 1] Counter: 1
[SCHEDULER] Timer tick! Switching 3 -> 1
...

Press 'p' to show process table:
PID  STATE     PRIORITY  TICKS  NAME
1    RUNNING   5         42     counter
2    READY     5         38     printer
3    BLOCKED   5         25     beeper (waiting on semaphore)

Press 'd' to trigger deadlock demo...

Learning Milestones

Kernel boots and prints → You understand bare-metal
Timer interrupt fires → You understand hardware interrupts
Two tasks switch preemptively → You ARE the scheduler
Semaphores block/wake tasks → You understand kernel synchronization
Priority scheduling works → Operating systems are demystified

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Your Mini-OS                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │   Task 1     │  │   Task 2     │  │   Task 3     │           │
│  │  (counter)   │  │  (printer)   │  │  (beeper)    │           │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘           │
│         │                 │                 │                    │
│         └────────────────┼─────────────────┘                    │
│                          │                                       │
│                    ┌─────▼─────┐                                │
│                    │ Scheduler │ ◄─── Timer Interrupt           │
│                    └─────┬─────┘                                │
│                          │                                       │
│         ┌────────────────┼────────────────┐                     │
│         │                │                │                      │
│  ┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐              │
│  │   Mutex     │  │  Semaphore  │  │   Spinlock  │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│  Interrupt Handlers │ Memory Management │ I/O (VGA, Keyboard)   │
├─────────────────────────────────────────────────────────────────┤
│                    Hardware (via QEMU)                           │
│         x86 CPU │ PIC │ PIT (Timer) │ VGA │ Keyboard            │
└─────────────────────────────────────────────────────────────────┘

Implementation Roadmap

Phase 1: Bootloader & Basic Kernel

; boot.asm - Bootloader (loaded at 0x7C00)
[BITS 16]
[ORG 0x7C00]

start:
    ; Set up segments
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00

    ; Load kernel from disk
    mov ah, 0x02        ; BIOS read sectors
    mov al, 10          ; 10 sectors
    mov ch, 0           ; Cylinder 0
    mov cl, 2           ; Sector 2 (1-indexed)
    mov dh, 0           ; Head 0
    mov bx, 0x1000      ; Load at 0x1000
    int 0x13

    ; Switch to protected mode
    cli
    lgdt [gdt_descriptor]
    mov eax, cr0
    or eax, 1
    mov cr0, eax
    jmp 0x08:protected_mode

[BITS 32]
protected_mode:
    ; Set up segment registers
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax
    mov esp, 0x90000

    ; Jump to kernel
    jmp 0x1000

; GDT
gdt_start:
    dq 0                    ; Null descriptor
gdt_code:
    dw 0xFFFF, 0x0000
    db 0x00, 0x9A, 0xCF, 0x00
gdt_data:
    dw 0xFFFF, 0x0000
    db 0x00, 0x92, 0xCF, 0x00
gdt_end:

gdt_descriptor:
    dw gdt_end - gdt_start - 1
    dd gdt_start

times 510-($-$$) db 0
dw 0xAA55               ; Boot signature

Phase 2: Timer Interrupt

// interrupts.c
#include <stdint.h>

#define PIC1_COMMAND 0x20
#define PIC1_DATA    0x21
#define PIT_COMMAND  0x43
#define PIT_DATA     0x40

void timer_handler() {
    // Acknowledge interrupt
    outb(PIC1_COMMAND, 0x20);

    // Call scheduler
    schedule();
}

void setup_timer(uint32_t frequency) {
    uint32_t divisor = 1193180 / frequency;

    outb(PIT_COMMAND, 0x36);
    outb(PIT_DATA, divisor & 0xFF);
    outb(PIT_DATA, (divisor >> 8) & 0xFF);
}

// IDT entry for timer (IRQ 0 -> INT 32)
void setup_idt() {
    idt[32].handler_low = (uint32_t)timer_handler & 0xFFFF;
    idt[32].handler_high = (uint32_t)timer_handler >> 16;
    idt[32].selector = 0x08;
    idt[32].flags = 0x8E;

    lidt(&idt_descriptor);
}

Phase 3: Task Structure & Context Switch

// task.h
typedef struct {
    uint32_t eax, ebx, ecx, edx;
    uint32_t esi, edi, ebp, esp;
    uint32_t eip, eflags;
    uint32_t cs, ds, es, fs, gs, ss;
} registers_t;

typedef enum {
    TASK_READY,
    TASK_RUNNING,
    TASK_BLOCKED,
    TASK_TERMINATED
} task_state_t;

typedef struct task {
    int pid;
    char name[32];
    task_state_t state;
    int priority;
    registers_t regs;
    uint32_t* stack;
    struct task* next;
} task_t;

// In assembly
extern void context_switch(registers_t* old, registers_t* new);

Phase 4: Scheduler

// scheduler.c
task_t* current_task = NULL;
task_t* ready_queue = NULL;

void schedule() {
    if (!current_task) return;

    // Save current task state (done in interrupt handler)

    // Move current to end of queue if still runnable
    if (current_task->state == TASK_RUNNING) {
        current_task->state = TASK_READY;
        enqueue(&ready_queue, current_task);
    }

    // Get next task
    task_t* next = dequeue(&ready_queue);
    if (!next) {
        next = idle_task;  // Never block completely
    }

    next->state = TASK_RUNNING;
    current_task = next;

    // Context switch happens on return from interrupt
}

Phase 5: Kernel Semaphore

// sync.c
typedef struct {
    int count;
    task_t* wait_queue;
    spinlock_t lock;
} ksemaphore_t;

void ksem_wait(ksemaphore_t* sem) {
    spinlock_acquire(&sem->lock);

    while (sem->count <= 0) {
        // Block current task
        current_task->state = TASK_BLOCKED;
        enqueue(&sem->wait_queue, current_task);
        spinlock_release(&sem->lock);

        schedule();  // Switch to another task

        spinlock_acquire(&sem->lock);
    }

    sem->count--;
    spinlock_release(&sem->lock);
}

void ksem_signal(ksemaphore_t* sem) {
    spinlock_acquire(&sem->lock);

    sem->count++;

    if (sem->wait_queue) {
        task_t* task = dequeue(&sem->wait_queue);
        task->state = TASK_READY;
        enqueue(&ready_queue, task);
    }

    spinlock_release(&sem->lock);
}

Build & Run

# Makefile
CC = i686-elf-gcc
AS = nasm
LD = i686-elf-ld
QEMU = qemu-system-i386

CFLAGS = -ffreestanding -O2 -Wall -Wextra
LDFLAGS = -T linker.ld -nostdlib

all: os.img

boot.bin: boot.asm
	$(AS) -f bin $< -o $@

kernel.o: kernel.c
	$(CC) $(CFLAGS) -c $< -o $@

kernel.bin: kernel.o
	$(LD) $(LDFLAGS) $^ -o $@

os.img: boot.bin kernel.bin
	cat $^ > $@
	truncate -s 1440k $@

run: os.img
	$(QEMU) -fda $< -monitor stdio

clean:
	rm -f *.bin *.o *.img

Essential Resources Summary

Books (Priority Order)

Operating Systems: Three Easy Pieces - FREE online, the best OS book
Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron - Memory and low-level understanding
The Linux Programming Interface by Michael Kerrisk - The bible for Linux system programming
Rust Atomics and Locks by Mara Bos - Excellent even for C programmers on atomics/concurrency
C++ Concurrency in Action by Anthony Williams - Best book on practical concurrency patterns
Linux Kernel Development by Robert Love - When you’re ready for real kernel code

Online Tutorials

Green threads explained (c9x.me) - For Project 4
Threads, Mutexes and Concurrent Programming in C - For Project 2
Barrgroup: Mutexes and Semaphores Demystified - Historical context

Papers (Advanced)

“Scheduling Multithreaded Computations by Work Stealing” - Blumofe & Leiserson
“What is RCU, Fundamentally?” - Paul McKenney (LWN article series)

Key Conceptual Distinctions

Question	Answer
Thread vs Process?	Threads share memory, processes are isolated
Concurrency vs Parallelism?	Concurrency = structure (dealing with many things), Parallelism = execution (doing many things simultaneously)
Mutex vs Semaphore?	Mutex = locking (ownership), Semaphore = signaling (counting)
Kernel vs User thread?	Kernel thread = OS schedules, User thread = your code schedules
Spinlock vs Mutex?	Spinlock = busy-wait (wastes CPU), Mutex = sleep (efficient)
Green thread vs Coroutine?	Green thread = scheduled by runtime, Coroutine = explicit yield by programmer
Race condition vs Deadlock?	Race = wrong order causes wrong result, Deadlock = circular wait causes freeze
Preemptive vs Cooperative?	Preemptive = OS forces switch (timer), Cooperative = task chooses when to yield

Appendix: Difficulty & Coolness Scales

Difficulty Levels

Level	Name	Description
1	Beginner	Basic syntax, straightforward logic, single-purpose tools
2	Intermediate	State management, async operations, third-party libraries
3	Advanced	Concurrency, memory management, complex architectures
4	Expert	Kernel-level, assembly, complex mathematical models
5	Master	Self-referential systems, cryptography, bootstrapping

Coolness Levels

Level	Vibe	Description
1	“This looks like work”	Vital for career but zero flare
2	“Neat, I guess”	Standard hobbyist projects
3	“Hey, that’s pretty smart”	Mini versions of famous tools
4	“Woah, you did that yourself?”	Assembly, kernel modules, bare-metal
5	“Shut up and take my upvote!”	Visually stunning or technically impossible-looking

Business Potential

Level	Model	Description
1	Resume Gold	Gets you a $200k+ job
2	Micro-SaaS	$5-10/month subscriptions
3	Service & Support	$500-2000/month retainers
4	Open Core	Free engine, paid control plane
5	Industry Disruptor	VC-backable platform play