← Back to all projects

CONCURRENCY THREADS SEMAPHORES PROJECTS

Learning Concurrency, Parallelism, Threads & Synchronization

Goal: Deeply understand how threads, processes, semaphores, and all concurrency primitives work at the operating system level through hands-on projects.

Core Concept Overview

Before diving into projects, here’s the mental map of what you’re learning:

Concept What It Is Why It’s Confusing
Process Independent program with its own memory space People conflate it with “program” or “thread”
Thread Execution unit within a process, shares memory There are multiple types of threads
Kernel Thread Thread managed by the OS kernel (1:1 model) The OS does the scheduling
Green Thread/User Thread Thread managed in user-space (N:1 model) Your code does the scheduling
Concurrency Dealing with multiple things at once (structure) Often confused with parallelism
Parallelism Doing multiple things at once (execution) Requires multiple CPU cores
Semaphore Counter that controls access to resources It’s not just a mutex with a count
Mutex Lock that ensures only one thread enters critical section Semaphore with count=1

The key insight: Threads share memory within a process. This sharing is both the power (fast communication) and the danger (race conditions, deadlocks).


Learning Path

Phase 1: Foundations (2-3 weeks)

  1. Project 1: Process Forker & IPC Visualizer
  2. Project 2: Multi-threaded Task Executor

Phase 2: Deep Dive (3-4 weeks)

  1. Project 3: Build Your Own Mutex & Semaphore
  2. Project 5: Dining Philosophers Visualizer

Phase 3: Mastery (4-6 weeks)

  1. Project 4: Green Thread Library
  2. Project 6: Thread Pool with Work Stealing
  3. Project 7: Concurrent Hash Map

Phase 4: Capstone

  1. Project 8: Mini Operating System Scheduler

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
Process Forker & IPC ⭐⭐ Weekend Foundational ⭐⭐⭐
Multi-threaded Executor ⭐⭐⭐ 1 week Core threading ⭐⭐⭐⭐
Build Mutex & Semaphore ⭐⭐⭐⭐ 1-2 weeks Deep primitives ⭐⭐⭐⭐⭐
Green Thread Library ⭐⭐⭐⭐⭐ 2-3 weeks Expert-level ⭐⭐⭐⭐⭐
Dining Philosophers Viz ⭐⭐⭐ 1 week Deadlock mastery ⭐⭐⭐⭐⭐
Thread Pool + Work Stealing ⭐⭐⭐⭐ 2-3 weeks Production patterns ⭐⭐⭐⭐
Concurrent Hash Map ⭐⭐⭐⭐ 2-3 weeks Data structure concurrency ⭐⭐⭐⭐
Mini OS Scheduler ⭐⭐⭐⭐⭐ 1-2 months Complete mastery ⭐⭐⭐⭐⭐

Project 1: Process Forker & IPC Visualizer

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Operating Systems / Process Management
  • Software or Tool: Unix fork/exec syscalls
  • Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

A visual tool that forks child processes, displays their PIDs, shows memory isolation (same variable address, different values), and demonstrates IPC via pipes and shared memory.

Why it teaches processes

You can’t understand threads until you understand what they’re not. A process has its own memory space—when you fork, the child gets a copy. By printing the same variable’s address and value in parent/child, you’ll see “same address, different value” which clicks the concept of virtual memory isolation.

Core challenges you’ll face

  • Fork semantics (child gets copy of memory, returns different PID) → maps to process isolation
  • Pipe communication (why can’t parent just write to child’s variable?) → maps to IPC necessity
  • Shared memory setup (shmget/mmap) → maps to explicit memory sharing
  • Zombie processes (forgetting to wait()) → maps to process lifecycle

Key Concepts

Concept Resource
Process Creation Operating Systems: Three Easy Pieces Chapter 5 - OSTEP
Virtual Memory Basics Computer Systems: A Programmer’s Perspective Chapter 9 - Bryant & O’Hallaron
IPC Mechanisms The Linux Programming Interface Chapters 44-49 - Michael Kerrisk

Difficulty & Time

  • Difficulty: Beginner-Intermediate
  • Time estimate: Weekend
  • Prerequisites: Basic C, comfort with command line

Real World Outcome

$ ./process_viz
[Parent PID: 1234] Variable 'counter' at 0x7fff5678 = 0
[Forking...]
[Child  PID: 1235] Variable 'counter' at 0x7fff5678 = 0  # Same address!
[Child] Incrementing counter...
[Child  PID: 1235] Variable 'counter' at 0x7fff5678 = 100
[Parent PID: 1234] Variable 'counter' at 0x7fff5678 = 0   # Still 0! Isolated!

[Demo: Pipe IPC]
Parent sends: "Hello from parent"
Child receives: "Hello from parent"

[Demo: Shared Memory]
Parent writes 42 to shared memory
Child reads 42 from shared memory
Parent sees child wrote 99 back

Learning Milestones

  1. Fork works, child has different PID → You understand process creation
  2. Same address, different values after fork → You understand memory isolation
  3. Pipes working → You understand why IPC is necessary
  4. Shared memory without race conditions → You’re ready for threads

Implementation Hints

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/mman.h>

int main() {
    int counter = 0;

    printf("[Parent PID: %d] Variable 'counter' at %p = %d\n",
           getpid(), (void*)&counter, counter);

    pid_t pid = fork();

    if (pid == 0) {
        // Child process
        printf("[Child  PID: %d] Variable 'counter' at %p = %d\n",
               getpid(), (void*)&counter, counter);
        counter = 100;
        printf("[Child] After increment: counter = %d\n", counter);
        exit(0);
    } else {
        // Parent process
        wait(NULL);  // Wait for child to finish
        printf("[Parent PID: %d] Variable 'counter' at %p = %d\n",
               getpid(), (void*)&counter, counter);
        // counter is still 0! Memory is isolated!
    }

    return 0;
}

Key System Calls to Master

System Call Purpose
fork() Create a child process (copy of parent)
exec*() Replace current process with new program
wait() / waitpid() Wait for child to terminate
pipe() Create a unidirectional communication channel
mmap() Create shared memory region
shmget() / shmat() System V shared memory

Project 2: Multi-threaded Task Executor (pthreads)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Concurrency / Thread Management
  • Software or Tool: POSIX Threads (pthreads)
  • Main Book: The Linux Programming Interface by Michael Kerrisk

What you’ll build

A command-line task executor that spawns N threads, each pulling tasks from a shared queue, with visible race conditions when you forget to lock, and proper synchronization when you add mutexes.

Why it teaches kernel threads

This is where threads become real. You’ll see that threads share memory by default (unlike processes). You’ll intentionally create race conditions, watch the chaos, then fix them. This visceral experience of “broken → fixed” is the best teacher.

Core challenges you’ll face

  • Race conditions (counter increments “lose” updates) → maps to critical sections
  • Mutex usage (lock before access, unlock after) → maps to mutual exclusion
  • Thread joining (waiting for all threads to finish) → maps to thread lifecycle
  • Shared queue access (producer-consumer pattern) → maps to concurrent data structures

Resources for Key Challenges

Key Concepts

Concept Resource
Thread Creation & Joining The Linux Programming Interface Chapter 29 - Kerrisk
Race Conditions Operating Systems: Three Easy Pieces Chapter 26 - OSTEP
Mutexes C++ Concurrency in Action Chapter 3 - Anthony Williams

Difficulty & Time

  • Difficulty: Intermediate
  • Time estimate: 1 week
  • Prerequisites: Project 1 completed, basic C

Real World Outcome

# First run: NO LOCKS (intentional race condition)
$ ./task_executor --threads 4 --tasks 10000 --no-lock
Starting 4 threads to process 10000 tasks...
Thread 0: processing task 0
Thread 1: processing task 0  # OOPS! Same task!
Thread 2: processing task 1
...
Expected completed: 10000
Actual completed: 9847  # RACE CONDITION! Lost updates!

# Second run: WITH LOCKS
$ ./task_executor --threads 4 --tasks 10000
Starting 4 threads to process 10000 tasks...
Thread 0: processing task 0
Thread 1: processing task 1
Thread 2: processing task 2
...
Expected completed: 10000
Actual completed: 10000  # CORRECT!
Total time: 0.23s (parallel speedup!)

Learning Milestones

  1. Threads share memory (see same variable) → You understand thread vs process
  2. Race condition breaks count → You understand critical sections
  3. Mutex fixes the race → You understand synchronization
  4. Parallel speedup measurable → You understand concurrency vs parallelism

Implementation Hints

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS 4
#define NUM_INCREMENTS 1000000

long shared_counter = 0;
pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
int use_mutex = 1;

void* increment_counter(void* arg) {
    int thread_id = *(int*)arg;

    for (int i = 0; i < NUM_INCREMENTS; i++) {
        if (use_mutex) {
            pthread_mutex_lock(&counter_mutex);
            shared_counter++;
            pthread_mutex_unlock(&counter_mutex);
        } else {
            shared_counter++;  // Race condition!
        }
    }

    printf("Thread %d finished\n", thread_id);
    return NULL;
}

int main(int argc, char* argv[]) {
    if (argc > 1 && strcmp(argv[1], "--no-lock") == 0) {
        use_mutex = 0;
        printf("Running WITHOUT mutex (expect race condition)\n");
    } else {
        printf("Running WITH mutex\n");
    }

    pthread_t threads[NUM_THREADS];
    int thread_ids[NUM_THREADS];

    for (int i = 0; i < NUM_THREADS; i++) {
        thread_ids[i] = i;
        pthread_create(&threads[i], NULL, increment_counter, &thread_ids[i]);
    }

    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
    }

    long expected = (long)NUM_THREADS * NUM_INCREMENTS;
    printf("Expected: %ld\n", expected);
    printf("Actual:   %ld\n", shared_counter);
    printf("Lost:     %ld\n", expected - shared_counter);

    return 0;
}

Compile with: gcc -pthread -o task_executor task_executor.c

Key pthread Functions to Master

Function Purpose
pthread_create() Create a new thread
pthread_join() Wait for thread to terminate
pthread_mutex_init() Initialize a mutex
pthread_mutex_lock() Acquire the lock (blocks if held)
pthread_mutex_unlock() Release the lock
pthread_mutex_destroy() Clean up mutex resources

Project 3: Build Your Own Mutex & Semaphore

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Synchronization Primitives / Atomics
  • Software or Tool: Atomic operations, futex
  • Main Book: The Linux Programming Interface by Michael Kerrisk

What you’ll build

Your own mutex and semaphore from scratch using atomic compare-and-swap (CAS) operations, spinlocks, and optionally the Linux futex syscall for sleeping.

Why it teaches synchronization deeply

Most people use pthread_mutex_lock() as a black box. By building your own, you’ll understand:

  • What is a spinlock?
  • Why does spinning waste CPU?
  • What is a futex and why does it exist?
  • What’s the difference between a mutex and a semaphore at the implementation level?

Core challenges you’ll face

  • Atomic operations (why normal counter++ isn’t thread-safe) → maps to memory model
  • Spinlock implementation (busy-wait CAS loop) → maps to lock-free programming
  • Futex for sleeping (avoid wasting CPU spinning) → maps to kernel integration
  • Semaphore counting (allow N concurrent accessors) → maps to resource pools

Resources for Key Challenges

Key Concepts

Concept Resource
Atomic Operations C++ Concurrency in Action Chapter 5 - Anthony Williams
Futex Deep Dive The Linux Programming Interface Chapter 30 - Kerrisk
Memory Barriers Rust Atomics and Locks by Mara Bos (excellent even for C programmers)

Difficulty & Time

  • Difficulty: Advanced
  • Time estimate: 1-2 weeks
  • Prerequisites: Projects 1-2 completed, understanding of assembly basics

Real World Outcome

$ ./my_sync_test

=== Testing MY_MUTEX (spinlock-based) ===
Spawning 8 threads, each incrementing counter 100000 times
No mutex: counter = 634982 (WRONG - race condition)
pthread_mutex: counter = 800000 (correct)
my_mutex: counter = 800000 (correct!)=== Testing MY_SEMAPHORE ===
Semaphore initialized with count = 3
Thread 0: acquired (active: 1)
Thread 1: acquired (active: 2)
Thread 2: acquired (active: 3)
Thread 3: waiting... (sem count = 0)
Thread 0: released
Thread 3: acquired (active: 3)
...
All threads completed, max concurrent = 3 ✓

=== Spinlock vs Futex comparison ===
Spinlock (high contention): 2.3s, CPU 780%
Futex-based (high contention): 0.4s, CPU 95%
Conclusion: Futex avoids wasting CPU when waiting!

Learning Milestones

  1. Spinlock works → You understand CAS and atomic operations
  2. You see 800% CPU usage with spinlock → You understand why sleeping matters
  3. Futex reduces CPU usage → You understand kernel/user-space cooperation
  4. Semaphore limits concurrency to N → You understand counting semaphores

Implementation Hints

Spinlock Mutex

#include <stdatomic.h>
#include <stdbool.h>

typedef struct {
    atomic_flag locked;
} my_spinlock_t;

void my_spinlock_init(my_spinlock_t* lock) {
    atomic_flag_clear(&lock->locked);
}

void my_spinlock_lock(my_spinlock_t* lock) {
    // Spin until we acquire the lock
    while (atomic_flag_test_and_set(&lock->locked)) {
        // Busy-wait (wastes CPU!)
        // Could add: __builtin_ia32_pause() for x86
    }
}

void my_spinlock_unlock(my_spinlock_t* lock) {
    atomic_flag_clear(&lock->locked);
}

Futex-based Mutex (Linux)

#include <linux/futex.h>
#include <sys/syscall.h>
#include <stdatomic.h>

typedef struct {
    atomic_int state;  // 0 = unlocked, 1 = locked
} my_futex_mutex_t;

void my_futex_lock(my_futex_mutex_t* m) {
    int expected = 0;
    while (!atomic_compare_exchange_weak(&m->state, &expected, 1)) {
        // Failed to acquire - sleep until state changes
        syscall(SYS_futex, &m->state, FUTEX_WAIT, 1, NULL, NULL, 0);
        expected = 0;  // Reset for next try
    }
}

void my_futex_unlock(my_futex_mutex_t* m) {
    atomic_store(&m->state, 0);
    // Wake one waiting thread
    syscall(SYS_futex, &m->state, FUTEX_WAKE, 1, NULL, NULL, 0);
}

Counting Semaphore

typedef struct {
    atomic_int count;
    pthread_mutex_t mutex;
    pthread_cond_t cond;
} my_semaphore_t;

void my_sem_init(my_semaphore_t* sem, int initial) {
    atomic_store(&sem->count, initial);
    pthread_mutex_init(&sem->mutex, NULL);
    pthread_cond_init(&sem->cond, NULL);
}

void my_sem_wait(my_semaphore_t* sem) {
    pthread_mutex_lock(&sem->mutex);
    while (atomic_load(&sem->count) <= 0) {
        pthread_cond_wait(&sem->cond, &sem->mutex);
    }
    atomic_fetch_sub(&sem->count, 1);
    pthread_mutex_unlock(&sem->mutex);
}

void my_sem_post(my_semaphore_t* sem) {
    pthread_mutex_lock(&sem->mutex);
    atomic_fetch_add(&sem->count, 1);
    pthread_cond_signal(&sem->cond);
    pthread_mutex_unlock(&sem->mutex);
}

Key Atomic Operations to Master

Operation Purpose
atomic_flag_test_and_set() Atomically set flag and return old value
atomic_compare_exchange_weak() CAS - the building block of lock-free code
atomic_fetch_add() Atomically increment and return old value
atomic_load() / atomic_store() Thread-safe read/write

Project 4: Green Thread Library (User-Space Scheduler)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 5: Master
  • Knowledge Area: Runtime Systems / Cooperative Scheduling
  • Software or Tool: ucontext.h, setjmp/longjmp, or raw assembly
  • Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

A complete green thread (user-space thread) library that manages its own stack allocations, implements cooperative scheduling (yield), and multiplexes many green threads onto a single OS thread.

Why it teaches threading deeply

This is where you become the operating system. You’ll allocate stacks, save/restore registers, and implement yield(). After this project, you’ll understand exactly what Go’s goroutines and Rust’s async/await are doing under the hood.

Core challenges you’ll face

  • Stack allocation (each green thread needs its own stack) → maps to memory layout
  • Context switching (save registers, swap stack pointer) → maps to CPU state
  • Scheduler implementation (round-robin, run queue) → maps to scheduling algorithms
  • Cooperative vs preemptive (explicit yield vs timer interrupt) → maps to scheduling models

Resources for Key Challenges

Key Concepts

Concept Resource
Context Switching Operating Systems: Three Easy Pieces Chapter 6 - OSTEP
Stack Management Computer Systems: A Programmer’s Perspective Chapter 3 - Bryant & O’Hallaron
ucontext API man pages + GNU C Library docs

Difficulty & Time

  • Difficulty: Advanced-Expert
  • Time estimate: 2-3 weeks
  • Prerequisites: Projects 1-3 completed, some assembly knowledge

Real World Outcome

$ ./green_threads_demo

=== Green Thread Library Demo ===
Creating 1000 green threads on 1 OS thread...

Green thread 0: Hello! (stack at 0x7f1234560000)
Green thread 0: Yielding...
Green thread 1: Hello! (stack at 0x7f1234570000)
Green thread 1: Yielding...
Green thread 2: Hello! (stack at 0x7f1234580000)
...
Green thread 0: Resumed!
Green thread 0: Done!
...

Stats:
- Total green threads created: 1000
- OS threads used: 1
- Context switches performed: 5000
- Average switch time: 45 nanoseconds

For comparison:
- 1000 OS threads would use ~8GB stack memory
- 1000 green threads used ~8MB stack memory

Learning Milestones

  1. Single green thread runs to completion → You understand stack allocation
  2. Two threads yield to each other → You understand context switching
  3. Round-robin scheduler works → You understand scheduling
  4. 1000 threads work smoothly → You understand why green threads scale

Implementation Hints

Using ucontext.h (simpler approach)

#include <ucontext.h>
#include <stdlib.h>
#include <stdio.h>

#define STACK_SIZE 16384
#define MAX_THREADS 64

typedef struct {
    ucontext_t context;
    char stack[STACK_SIZE];
    int id;
    int finished;
} green_thread_t;

green_thread_t threads[MAX_THREADS];
int current_thread = 0;
int num_threads = 0;
ucontext_t main_context;

void gt_yield() {
    int prev = current_thread;

    // Find next runnable thread (round-robin)
    do {
        current_thread = (current_thread + 1) % num_threads;
    } while (threads[current_thread].finished && current_thread != prev);

    if (current_thread != prev) {
        swapcontext(&threads[prev].context, &threads[current_thread].context);
    }
}

void gt_create(void (*func)(void*), void* arg) {
    green_thread_t* t = &threads[num_threads];
    t->id = num_threads;
    t->finished = 0;

    getcontext(&t->context);
    t->context.uc_stack.ss_sp = t->stack;
    t->context.uc_stack.ss_size = STACK_SIZE;
    t->context.uc_link = &main_context;

    makecontext(&t->context, (void (*)())func, 1, arg);
    num_threads++;
}

void thread_func(void* arg) {
    int id = *(int*)arg;
    for (int i = 0; i < 3; i++) {
        printf("Thread %d: iteration %d\n", id, i);
        gt_yield();
    }
    printf("Thread %d: done!\n", id);
    threads[id].finished = 1;
}

Raw Assembly Context Switch (x86-64)

// Context: saved registers
typedef struct {
    uint64_t rsp;
    uint64_t r15;
    uint64_t r14;
    uint64_t r13;
    uint64_t r12;
    uint64_t rbx;
    uint64_t rbp;
} context_t;

// Assembly function to switch contexts
// Saves current registers, restores target's registers
extern void context_switch(context_t* old, context_t* new);
; context_switch.s (NASM syntax)
global context_switch

context_switch:
    ; Save old context (rdi = old)
    mov [rdi + 0],  rsp
    mov [rdi + 8],  r15
    mov [rdi + 16], r14
    mov [rdi + 24], r13
    mov [rdi + 32], r12
    mov [rdi + 40], rbx
    mov [rdi + 48], rbp

    ; Restore new context (rsi = new)
    mov rsp, [rsi + 0]
    mov r15, [rsi + 8]
    mov r14, [rsi + 16]
    mov r13, [rsi + 24]
    mov r12, [rsi + 32]
    mov rbx, [rsi + 40]
    mov rbp, [rsi + 48]

    ret

Green Thread vs OS Thread Comparison

Aspect OS Thread (kernel) Green Thread (user)
Created by pthread_create() Your library
Scheduled by OS kernel Your scheduler
Context switch ~1-10 μs (syscall) ~50-100 ns (function call)
Stack size 1-8 MB default You decide (8KB typical)
10,000 threads 10-80 GB memory 80 MB memory
Blocking syscall Only that thread blocks ALL green threads block!

Project 5: Dining Philosophers Visualizer

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Python, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Deadlock / Synchronization
  • Software or Tool: ncurses or terminal graphics
  • Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

An animated terminal visualization of the Dining Philosophers problem, where you can toggle between naive (deadlock-prone), ordered locking, and semaphore-based solutions, watching deadlocks happen in real-time.

Why it teaches deadlocks

Deadlock is abstract until you see it. Watching 5 philosophers all grab their left fork and freeze forever makes the concept visceral. Then implementing fixes (resource ordering, arbitrator semaphore) teaches how to prevent them.

Core challenges you’ll face

  • Deadlock creation (circular wait) → maps to Coffman conditions
  • Deadlock prevention (resource ordering) → maps to deadlock avoidance
  • Livelock (philosophers keep retrying) → maps to liveness properties
  • Starvation (one philosopher never eats) → maps to fairness

Key Concepts

Concept Resource
Dining Philosophers Problem Operating Systems: Three Easy Pieces Chapter 31 - OSTEP
Deadlock Conditions Operating System Concepts Chapter 7 - Silberschatz
Semaphore Solutions Day 12: Mutex and Semaphores - Exploring OS

Difficulty & Time

  • Difficulty: Intermediate
  • Time estimate: 1 week
  • Prerequisites: Project 2-3 completed

Real World Outcome

$ ./dining_philosophers --mode naive

    🍝 Dining Philosophers 🍝

     P0          P1
      🧠          🧠
    🍴  🍴      🍴  🍴
        \      /
         \    /
    P4 🧠--🍽️--🧠 P2
         /    \
        /      \
      🍴        🍴
      🧠
     P3

Status: [Mode: NAIVE - Deadlock prone!]
P0: 🍴 HOLDING left fork, WAITING for right
P1: 🍴 HOLDING left fork, WAITING for right
P2: 🍴 HOLDING left fork, WAITING for right
P3: 🍴 HOLDING left fork, WAITING for right
P4: 🍴 HOLDING left fork, WAITING for right

⚠️  DEADLOCK DETECTED! All philosophers stuck!

Press 's' for semaphore solution, 'o' for ordered locking...

Learning Milestones

  1. Deadlock happens visually → You understand circular wait
  2. Resource ordering prevents deadlock → You understand prevention
  3. Semaphore (max 4 eating) works → You understand limiting concurrency
  4. You can explain all 4 Coffman conditions → Mastery

The Four Coffman Conditions for Deadlock

All four must be present for deadlock to occur:

  1. Mutual Exclusion: Resources cannot be shared (fork can only be held by one philosopher)
  2. Hold and Wait: Process holds one resource while waiting for another
  3. No Preemption: Resources cannot be forcibly taken away
  4. Circular Wait: P0 waits for P1, P1 waits for P2, … P4 waits for P0

Implementation Hints

Naive Solution (causes deadlock)

void philosopher_naive(int id) {
    int left = id;
    int right = (id + 1) % NUM_PHILOSOPHERS;

    while (1) {
        think(id);

        pthread_mutex_lock(&forks[left]);   // Grab left fork
        printf("P%d: got left fork\n", id);

        usleep(1000);  // Small delay makes deadlock more likely

        pthread_mutex_lock(&forks[right]);  // Grab right fork (DEADLOCK!)
        printf("P%d: got right fork\n", id);

        eat(id);

        pthread_mutex_unlock(&forks[right]);
        pthread_mutex_unlock(&forks[left]);
    }
}

Solution 1: Resource Ordering

void philosopher_ordered(int id) {
    int left = id;
    int right = (id + 1) % NUM_PHILOSOPHERS;

    // Always lock lower-numbered fork first
    int first = (left < right) ? left : right;
    int second = (left < right) ? right : left;

    while (1) {
        think(id);

        pthread_mutex_lock(&forks[first]);
        pthread_mutex_lock(&forks[second]);

        eat(id);

        pthread_mutex_unlock(&forks[second]);
        pthread_mutex_unlock(&forks[first]);
    }
}

Solution 2: Semaphore (limit to N-1)

sem_t table_seats;  // Initialized to NUM_PHILOSOPHERS - 1

void philosopher_semaphore(int id) {
    int left = id;
    int right = (id + 1) % NUM_PHILOSOPHERS;

    while (1) {
        think(id);

        sem_wait(&table_seats);  // Only N-1 can try to eat at once

        pthread_mutex_lock(&forks[left]);
        pthread_mutex_lock(&forks[right]);

        eat(id);

        pthread_mutex_unlock(&forks[right]);
        pthread_mutex_unlock(&forks[left]);

        sem_post(&table_seats);
    }
}

Project 6: Thread Pool with Work Stealing

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++, Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Parallel Computing / Load Balancing
  • Software or Tool: Lock-free queues, work stealing deques
  • Main Book: C++ Concurrency in Action by Anthony Williams

What you’ll build

A production-quality thread pool where worker threads can “steal” tasks from each other’s queues when idle, maximizing CPU utilization and reducing latency.

Why it teaches advanced concurrency

Real systems use thread pools, not raw threads. Work stealing is how Go, Tokio (Rust), and Java’s ForkJoin work. Building this teaches you lock-free data structures, cache locality, and real-world parallel performance.

Core challenges you’ll face

  • Thread-safe queue (multiple threads push/pop) → maps to concurrent data structures
  • Work stealing deque (steal from back, pop from front) → maps to load balancing
  • Lock-free implementation (CAS-based operations) → maps to lock-free programming
  • Cache-friendly design (avoid false sharing) → maps to performance optimization

Key Concepts

Concept Resource
Thread Pools C++ Concurrency in Action Chapter 9 - Anthony Williams
Work Stealing “Scheduling Multithreaded Computations by Work Stealing” - Blumofe & Leiserson (paper)
Lock-Free Queues Rust Atomics and Locks Chapter 6 - Mara Bos

Difficulty & Time

  • Difficulty: Expert
  • Time estimate: 2-3 weeks
  • Prerequisites: Projects 1-4 completed

Real World Outcome

$ ./thread_pool_bench --workers 8 --tasks 100000

=== Thread Pool Benchmark ===

[Naive approach: spawn thread per task]
Time: 45.2s
Threads created: 100000
Context switches: 2,340,000

[Thread pool with single queue]
Time: 2.1s
Threads created: 8
Context switches: 12,000
Queue contention: HIGH (single lock)

[Thread pool with work stealing]
Time: 0.8s
Threads created: 8
Context switches: 9,000
Load balance: 12.5% per worker (perfect!)
Steals performed: 3,421

Work stealing is 56x faster than naive, 2.6x faster than single queue!

Learning Milestones

  1. Basic thread pool works → You understand pooling
  2. Per-worker queues reduce contention → You understand lock contention
  3. Work stealing balances load → You understand dynamic scheduling
  4. Benchmark shows real speedup → You understand parallel efficiency

Architecture

                    ┌─────────────────────────────────────────┐
                    │            Thread Pool                   │
                    └─────────────────────────────────────────┘
                                      │
              ┌───────────────────────┼───────────────────────┐
              │                       │                       │
        ┌─────▼─────┐          ┌─────▼─────┐          ┌─────▼─────┐
        │  Worker 0  │          │  Worker 1  │          │  Worker 2  │
        │            │          │            │          │            │
        │ ┌────────┐ │          │ ┌────────┐ │          │ ┌────────┐ │
        │ │ Deque  │ │◄─STEAL───│ │ Deque  │ │───STEAL─►│ │ Deque  │ │
        │ │        │ │          │ │ T T T  │ │          │ │        │ │
        │ └────────┘ │          │ └────────┘ │          │ └────────┘ │
        └────────────┘          └────────────┘          └────────────┘
              │                       │                       │
              │    Pop from front     │    Steal from back    │
              ▼                       ▼                       ▼
         [Execute]               [Execute]               [Execute]

Implementation Hints

Work-Stealing Deque (Chase-Lev)

#include <stdatomic.h>

#define DEQUE_SIZE 1024

typedef struct {
    atomic_size_t top;     // Thieves steal from here
    atomic_size_t bottom;  // Owner pushes/pops here
    void* buffer[DEQUE_SIZE];
} work_stealing_deque_t;

// Owner: push task (only owner calls this)
void deque_push(work_stealing_deque_t* d, void* task) {
    size_t b = atomic_load(&d->bottom);
    d->buffer[b % DEQUE_SIZE] = task;
    atomic_thread_fence(memory_order_release);
    atomic_store(&d->bottom, b + 1);
}

// Owner: pop task (only owner calls this)
void* deque_pop(work_stealing_deque_t* d) {
    size_t b = atomic_load(&d->bottom) - 1;
    atomic_store(&d->bottom, b);
    atomic_thread_fence(memory_order_seq_cst);
    size_t t = atomic_load(&d->top);

    if (t <= b) {
        void* task = d->buffer[b % DEQUE_SIZE];
        if (t == b) {
            // Last element - race with stealers
            if (!atomic_compare_exchange_strong(&d->top, &t, t + 1)) {
                task = NULL;  // Lost race
            }
            atomic_store(&d->bottom, b + 1);
        }
        return task;
    }
    atomic_store(&d->bottom, b + 1);
    return NULL;
}

// Thief: steal task (other workers call this)
void* deque_steal(work_stealing_deque_t* d) {
    size_t t = atomic_load(&d->top);
    atomic_thread_fence(memory_order_seq_cst);
    size_t b = atomic_load(&d->bottom);

    if (t < b) {
        void* task = d->buffer[t % DEQUE_SIZE];
        if (!atomic_compare_exchange_strong(&d->top, &t, t + 1)) {
            return NULL;  // Lost race with another stealer
        }
        return task;
    }
    return NULL;
}

Thread Pool Structure

typedef struct {
    pthread_t* threads;
    work_stealing_deque_t* deques;  // One per worker
    int num_workers;
    atomic_bool shutdown;
} thread_pool_t;

void* worker_loop(void* arg) {
    worker_context_t* ctx = arg;

    while (!atomic_load(&ctx->pool->shutdown)) {
        // Try to get task from own deque
        void* task = deque_pop(&ctx->pool->deques[ctx->id]);

        if (!task) {
            // Own deque empty - try to steal from others
            for (int i = 0; i < ctx->pool->num_workers; i++) {
                if (i != ctx->id) {
                    task = deque_steal(&ctx->pool->deques[i]);
                    if (task) break;
                }
            }
        }

        if (task) {
            execute_task(task);
        } else {
            // No work anywhere - yield CPU
            sched_yield();
        }
    }
    return NULL;
}

Project 7: Concurrent Hash Map

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++, Java
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Concurrent Data Structures
  • Software or Tool: Lock striping, RCU
  • Main Book: C++ Concurrency in Action by Anthony Williams

What you’ll build

A high-performance concurrent hash map that supports concurrent reads and writes from multiple threads, using techniques like lock striping, read-write locks, and optionally RCU (Read-Copy-Update).

Why it teaches real-world concurrency

Every serious system needs concurrent data structures. Building a concurrent hash map teaches you the trade-offs between correctness, performance, and complexity. You’ll understand why Java’s ConcurrentHashMap and Go’s sync.Map exist.

Core challenges you’ll face

  • Lock striping (lock per bucket, not whole table) → maps to fine-grained locking
  • Read-write locks (many readers, one writer) → maps to reader/writer problem
  • Resize under contention (grow table without blocking readers) → maps to online operations
  • RCU pattern (readers never block) → maps to lock-free reads

Key Concepts

Concept Resource
Lock Striping C++ Concurrency in Action Chapter 6 - Anthony Williams
Read-Write Locks The Linux Programming Interface Chapter 30 - Kerrisk
RCU Basics “What is RCU, Fundamentally?” - Paul McKenney (LWN article)

Difficulty & Time

  • Difficulty: Expert
  • Time estimate: 2-3 weeks
  • Prerequisites: Projects 1-3 and basic hash table implementation

Real World Outcome

$ ./concurrent_map_bench --readers 6 --writers 2 --ops 1000000

=== Concurrent Hash Map Benchmark ===

[Single global lock]
Throughput: 234,000 ops/sec
Reader wait time: 89% (readers blocked by writers)
Writer wait time: 45%

[Lock striping (256 buckets)]
Throughput: 1,890,000 ops/sec (8x faster!)
Reader wait time: 12%
Writer wait time: 8%

[RW-lock per bucket]
Throughput: 2,340,000 ops/sec
Reader wait time: 3%
Writer wait time: 15%

[RCU-style (readers never block)]
Throughput: 4,120,000 ops/sec
Reader wait time: 0% (readers NEVER blocked!)
Writer wait time: 5%
Memory overhead: +15% (deferred reclamation)

Correctness check: All 1,000,000 operations verified ✓

Learning Milestones

  1. Single lock works but is slow → You understand coarse-grained locking
  2. Lock striping dramatically improves throughput → You understand fine-grained locking
  3. Read-write locks help read-heavy workloads → You understand reader/writer trade-offs
  4. RCU gives maximum read performance → You understand lock-free reads

Implementation Hints

Lock Striping

#define NUM_BUCKETS 256
#define NUM_LOCKS 16  // Fewer locks than buckets

typedef struct node {
    char* key;
    void* value;
    struct node* next;
} node_t;

typedef struct {
    node_t* buckets[NUM_BUCKETS];
    pthread_mutex_t locks[NUM_LOCKS];
} concurrent_map_t;

static inline int get_lock_index(int bucket) {
    return bucket % NUM_LOCKS;  // Multiple buckets share a lock
}

void* map_get(concurrent_map_t* map, const char* key) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_mutex_lock(&map->locks[lock_idx]);

    node_t* node = map->buckets[bucket];
    while (node) {
        if (strcmp(node->key, key) == 0) {
            void* value = node->value;
            pthread_mutex_unlock(&map->locks[lock_idx]);
            return value;
        }
        node = node->next;
    }

    pthread_mutex_unlock(&map->locks[lock_idx]);
    return NULL;
}

void map_put(concurrent_map_t* map, const char* key, void* value) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_mutex_lock(&map->locks[lock_idx]);

    // Check if key exists
    node_t* node = map->buckets[bucket];
    while (node) {
        if (strcmp(node->key, key) == 0) {
            node->value = value;  // Update
            pthread_mutex_unlock(&map->locks[lock_idx]);
            return;
        }
        node = node->next;
    }

    // Insert new node
    node_t* new_node = malloc(sizeof(node_t));
    new_node->key = strdup(key);
    new_node->value = value;
    new_node->next = map->buckets[bucket];
    map->buckets[bucket] = new_node;

    pthread_mutex_unlock(&map->locks[lock_idx]);
}

Read-Write Lock Version

typedef struct {
    node_t* buckets[NUM_BUCKETS];
    pthread_rwlock_t locks[NUM_LOCKS];
} rw_concurrent_map_t;

void* map_get(rw_concurrent_map_t* map, const char* key) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_rwlock_rdlock(&map->locks[lock_idx]);  // Read lock

    node_t* node = map->buckets[bucket];
    void* value = NULL;
    while (node) {
        if (strcmp(node->key, key) == 0) {
            value = node->value;
            break;
        }
        node = node->next;
    }

    pthread_rwlock_unlock(&map->locks[lock_idx]);
    return value;
}

void map_put(rw_concurrent_map_t* map, const char* key, void* value) {
    int bucket = hash(key) % NUM_BUCKETS;
    int lock_idx = get_lock_index(bucket);

    pthread_rwlock_wrlock(&map->locks[lock_idx]);  // Write lock

    // ... insertion logic ...

    pthread_rwlock_unlock(&map->locks[lock_idx]);
}

Concurrency Trade-offs

Approach Read Performance Write Performance Complexity Memory
Single lock Poor Poor Low Low
Lock striping Good Good Medium Low
RW-lock per bucket Excellent Moderate Medium Low
RCU Perfect (never blocks) Moderate High Higher

Project 8: Mini Operating System Scheduler (Capstone)

  • Main Programming Language: C + Assembly
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 5: Master
  • Knowledge Area: Operating Systems / Kernel Development
  • Software or Tool: QEMU, bare-metal x86
  • Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

A minimal bare-metal kernel (runs in QEMU) that implements preemptive multitasking with a timer interrupt, process/thread scheduling, and basic synchronization primitives—all running without Linux.

Why this is the ultimate test

Everything you learned culminates here. You ARE the operating system. There’s no pthread_create()—you implement context switching, timer interrupts, and scheduling from scratch on real (emulated) hardware.

Core challenges you’ll face

  • Bootloader (get code running without an OS) → maps to bare-metal programming
  • Timer interrupt (preemptive scheduling) → maps to interrupt handling
  • Task switching (save/restore full CPU state) → maps to context switch
  • Priority scheduler (multiple scheduling algorithms) → maps to scheduling policies
  • Kernel-mode synchronization (spinlocks in kernel) → maps to OS internals

Key Concepts

Concept Resource
Bootloaders Operating Systems: Three Easy Pieces Appendix - OSTEP
Interrupts Computer Systems: A Programmer’s Perspective Chapter 8 - Bryant & O’Hallaron
Context Switching Linux Kernel Development Chapter 4 - Robert Love
Scheduling Operating Systems: Three Easy Pieces Chapters 7-9 - OSTEP

Difficulty & Time

  • Difficulty: Master
  • Time estimate: 1-2 months
  • Prerequisites: All previous projects completed, assembly knowledge

Real World Outcome

Booting mini-OS in QEMU...

[KERNEL] Timer interrupt handler installed
[KERNEL] Scheduler initialized (Round Robin)

[KERNEL] Creating Task 1: counter
[KERNEL] Creating Task 2: printer
[KERNEL] Creating Task 3: beeper

[Task 1] Counter: 0
[Task 2] Printing: Hello from Task 2!
[SCHEDULER] Timer tick! Switching 1 -> 2
[Task 2] Printing: Hello from Task 2!
[Task 3] BEEP!
[SCHEDULER] Timer tick! Switching 2 -> 3
[Task 3] BEEP!
[Task 1] Counter: 1
[SCHEDULER] Timer tick! Switching 3 -> 1
...

Press 'p' to show process table:
PID  STATE     PRIORITY  TICKS  NAME
1    RUNNING   5         42     counter
2    READY     5         38     printer
3    BLOCKED   5         25     beeper (waiting on semaphore)

Press 'd' to trigger deadlock demo...

Learning Milestones

  1. Kernel boots and prints → You understand bare-metal
  2. Timer interrupt fires → You understand hardware interrupts
  3. Two tasks switch preemptively → You ARE the scheduler
  4. Semaphores block/wake tasks → You understand kernel synchronization
  5. Priority scheduling works → Operating systems are demystified

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Your Mini-OS                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │   Task 1     │  │   Task 2     │  │   Task 3     │           │
│  │  (counter)   │  │  (printer)   │  │  (beeper)    │           │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘           │
│         │                 │                 │                    │
│         └────────────────┼─────────────────┘                    │
│                          │                                       │
│                    ┌─────▼─────┐                                │
│                    │ Scheduler │ ◄─── Timer Interrupt           │
│                    └─────┬─────┘                                │
│                          │                                       │
│         ┌────────────────┼────────────────┐                     │
│         │                │                │                      │
│  ┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐              │
│  │   Mutex     │  │  Semaphore  │  │   Spinlock  │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│  Interrupt Handlers │ Memory Management │ I/O (VGA, Keyboard)   │
├─────────────────────────────────────────────────────────────────┤
│                    Hardware (via QEMU)                           │
│         x86 CPU │ PIC │ PIT (Timer) │ VGA │ Keyboard            │
└─────────────────────────────────────────────────────────────────┘

Implementation Roadmap

Phase 1: Bootloader & Basic Kernel

; boot.asm - Bootloader (loaded at 0x7C00)
[BITS 16]
[ORG 0x7C00]

start:
    ; Set up segments
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00

    ; Load kernel from disk
    mov ah, 0x02        ; BIOS read sectors
    mov al, 10          ; 10 sectors
    mov ch, 0           ; Cylinder 0
    mov cl, 2           ; Sector 2 (1-indexed)
    mov dh, 0           ; Head 0
    mov bx, 0x1000      ; Load at 0x1000
    int 0x13

    ; Switch to protected mode
    cli
    lgdt [gdt_descriptor]
    mov eax, cr0
    or eax, 1
    mov cr0, eax
    jmp 0x08:protected_mode

[BITS 32]
protected_mode:
    ; Set up segment registers
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax
    mov esp, 0x90000

    ; Jump to kernel
    jmp 0x1000

; GDT
gdt_start:
    dq 0                    ; Null descriptor
gdt_code:
    dw 0xFFFF, 0x0000
    db 0x00, 0x9A, 0xCF, 0x00
gdt_data:
    dw 0xFFFF, 0x0000
    db 0x00, 0x92, 0xCF, 0x00
gdt_end:

gdt_descriptor:
    dw gdt_end - gdt_start - 1
    dd gdt_start

times 510-($-$$) db 0
dw 0xAA55               ; Boot signature

Phase 2: Timer Interrupt

// interrupts.c
#include <stdint.h>

#define PIC1_COMMAND 0x20
#define PIC1_DATA    0x21
#define PIT_COMMAND  0x43
#define PIT_DATA     0x40

void timer_handler() {
    // Acknowledge interrupt
    outb(PIC1_COMMAND, 0x20);

    // Call scheduler
    schedule();
}

void setup_timer(uint32_t frequency) {
    uint32_t divisor = 1193180 / frequency;

    outb(PIT_COMMAND, 0x36);
    outb(PIT_DATA, divisor & 0xFF);
    outb(PIT_DATA, (divisor >> 8) & 0xFF);
}

// IDT entry for timer (IRQ 0 -> INT 32)
void setup_idt() {
    idt[32].handler_low = (uint32_t)timer_handler & 0xFFFF;
    idt[32].handler_high = (uint32_t)timer_handler >> 16;
    idt[32].selector = 0x08;
    idt[32].flags = 0x8E;

    lidt(&idt_descriptor);
}

Phase 3: Task Structure & Context Switch

// task.h
typedef struct {
    uint32_t eax, ebx, ecx, edx;
    uint32_t esi, edi, ebp, esp;
    uint32_t eip, eflags;
    uint32_t cs, ds, es, fs, gs, ss;
} registers_t;

typedef enum {
    TASK_READY,
    TASK_RUNNING,
    TASK_BLOCKED,
    TASK_TERMINATED
} task_state_t;

typedef struct task {
    int pid;
    char name[32];
    task_state_t state;
    int priority;
    registers_t regs;
    uint32_t* stack;
    struct task* next;
} task_t;

// In assembly
extern void context_switch(registers_t* old, registers_t* new);

Phase 4: Scheduler

// scheduler.c
task_t* current_task = NULL;
task_t* ready_queue = NULL;

void schedule() {
    if (!current_task) return;

    // Save current task state (done in interrupt handler)

    // Move current to end of queue if still runnable
    if (current_task->state == TASK_RUNNING) {
        current_task->state = TASK_READY;
        enqueue(&ready_queue, current_task);
    }

    // Get next task
    task_t* next = dequeue(&ready_queue);
    if (!next) {
        next = idle_task;  // Never block completely
    }

    next->state = TASK_RUNNING;
    current_task = next;

    // Context switch happens on return from interrupt
}

Phase 5: Kernel Semaphore

// sync.c
typedef struct {
    int count;
    task_t* wait_queue;
    spinlock_t lock;
} ksemaphore_t;

void ksem_wait(ksemaphore_t* sem) {
    spinlock_acquire(&sem->lock);

    while (sem->count <= 0) {
        // Block current task
        current_task->state = TASK_BLOCKED;
        enqueue(&sem->wait_queue, current_task);
        spinlock_release(&sem->lock);

        schedule();  // Switch to another task

        spinlock_acquire(&sem->lock);
    }

    sem->count--;
    spinlock_release(&sem->lock);
}

void ksem_signal(ksemaphore_t* sem) {
    spinlock_acquire(&sem->lock);

    sem->count++;

    if (sem->wait_queue) {
        task_t* task = dequeue(&sem->wait_queue);
        task->state = TASK_READY;
        enqueue(&ready_queue, task);
    }

    spinlock_release(&sem->lock);
}

Build & Run

# Makefile
CC = i686-elf-gcc
AS = nasm
LD = i686-elf-ld
QEMU = qemu-system-i386

CFLAGS = -ffreestanding -O2 -Wall -Wextra
LDFLAGS = -T linker.ld -nostdlib

all: os.img

boot.bin: boot.asm
	$(AS) -f bin $< -o $@

kernel.o: kernel.c
	$(CC) $(CFLAGS) -c $< -o $@

kernel.bin: kernel.o
	$(LD) $(LDFLAGS) $^ -o $@

os.img: boot.bin kernel.bin
	cat $^ > $@
	truncate -s 1440k $@

run: os.img
	$(QEMU) -fda $< -monitor stdio

clean:
	rm -f *.bin *.o *.img

Essential Resources Summary

Books (Priority Order)

  1. Operating Systems: Three Easy Pieces - FREE online, the best OS book
  2. Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron - Memory and low-level understanding
  3. The Linux Programming Interface by Michael Kerrisk - The bible for Linux system programming
  4. Rust Atomics and Locks by Mara Bos - Excellent even for C programmers on atomics/concurrency
  5. C++ Concurrency in Action by Anthony Williams - Best book on practical concurrency patterns
  6. Linux Kernel Development by Robert Love - When you’re ready for real kernel code

Online Tutorials

Papers (Advanced)

  • “Scheduling Multithreaded Computations by Work Stealing” - Blumofe & Leiserson
  • “What is RCU, Fundamentally?” - Paul McKenney (LWN article series)

Key Conceptual Distinctions

Question Answer
Thread vs Process? Threads share memory, processes are isolated
Concurrency vs Parallelism? Concurrency = structure (dealing with many things), Parallelism = execution (doing many things simultaneously)
Mutex vs Semaphore? Mutex = locking (ownership), Semaphore = signaling (counting)
Kernel vs User thread? Kernel thread = OS schedules, User thread = your code schedules
Spinlock vs Mutex? Spinlock = busy-wait (wastes CPU), Mutex = sleep (efficient)
Green thread vs Coroutine? Green thread = scheduled by runtime, Coroutine = explicit yield by programmer
Race condition vs Deadlock? Race = wrong order causes wrong result, Deadlock = circular wait causes freeze
Preemptive vs Cooperative? Preemptive = OS forces switch (timer), Cooperative = task chooses when to yield

Appendix: Difficulty & Coolness Scales

Difficulty Levels

Level Name Description
1 Beginner Basic syntax, straightforward logic, single-purpose tools
2 Intermediate State management, async operations, third-party libraries
3 Advanced Concurrency, memory management, complex architectures
4 Expert Kernel-level, assembly, complex mathematical models
5 Master Self-referential systems, cryptography, bootstrapping

Coolness Levels

Level Vibe Description
1 “This looks like work” Vital for career but zero flare
2 “Neat, I guess” Standard hobbyist projects
3 “Hey, that’s pretty smart” Mini versions of famous tools
4 “Woah, you did that yourself?” Assembly, kernel modules, bare-metal
5 “Shut up and take my upvote!” Visually stunning or technically impossible-looking

Business Potential

Level Model Description
1 Resume Gold Gets you a $200k+ job
2 Micro-SaaS $5-10/month subscriptions
3 Service & Support $500-2000/month retainers
4 Open Core Free engine, paid control plane
5 Industry Disruptor VC-backable platform play