← Back to all projects

DATA STRUCTURES FROM FIRST PRINCIPLES

Learning Data Structures: From First Principles

Goal: Deeply understand what data structures are, why they exist, how they work at the memory level, and when to use each one—through hands-on projects that produce real outcomes.


Part 1: Why Do Data Structures Exist?

The Fundamental Problem

Every program needs to store and organize collections of data. But here’s the critical insight:

How you organize data determines how fast your program runs.

Consider this: You have 1 million user records. How do you store them?

Operation Unsorted Array Sorted Array Hash Table Binary Search Tree
Find user by ID O(n) = 1M checks O(log n) = 20 checks O(1) = 1 check O(log n) = 20 checks
Add new user O(1) = instant O(n) = 1M shifts O(1) = instant O(log n) = 20 steps
Delete user O(n) = 1M checks O(n) = 1M shifts O(1) = instant O(log n) = 20 steps
List in order O(n log n) = sort O(n) = just read O(n log n) = sort O(n) = in-order walk

There is no perfect data structure. Every choice is a trade-off between:

  • Time (how fast are operations?)
  • Space (how much memory?)
  • Complexity (how hard to implement correctly?)

The Mental Model

Think of data structures as containers with rules:

┌─────────────────────────────────────────────────────────────────┐
│                    DATA STRUCTURE                                │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    INTERFACE                             │    │
│  │  - What operations can I do? (insert, find, delete...)  │    │
│  │  - How fast is each operation? (O(1), O(log n), O(n))   │    │
│  └─────────────────────────────────────────────────────────┘    │
│                              │                                   │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                  IMPLEMENTATION                          │    │
│  │  - How is data physically stored in memory?             │    │
│  │  - How do the operations actually work?                 │    │
│  └─────────────────────────────────────────────────────────┘    │
│                              │                                   │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    MEMORY                                │    │
│  │  - Contiguous? (array) or Scattered? (linked)          │    │
│  │  - Cache-friendly? (locality of reference)              │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

The Data Structure Family Tree

                           Data Structures
                                 │
            ┌────────────────────┴────────────────────┐
            │                                         │
       Linear                                    Non-Linear
            │                                         │
   ┌────────┼────────┐                    ┌──────────┼──────────┐
   │        │        │                    │          │          │
 Array   Linked    Stack                Tree      Graph      Hash
         List      Queue                  │                   Table
                     │           ┌────────┼────────┐
              ┌──────┴──────┐    │        │        │
              │             │   BST     Heap    Trie
           Priority     Deque    │
            Queue              B-Tree
                              AVL/RB

Part 2: The Learning Path

Phase 1: Foundations (3-4 weeks)

  1. Project 1: Memory Arena & Dynamic Array
  2. Project 2: Linked List Toolkit

Phase 2: Abstract Data Types (2-3 weeks)

  1. Project 3: Stack Machine (Calculator/VM)
  2. Project 4: Queue Systems (Task Scheduler)

Phase 3: Fast Lookup (3-4 weeks)

  1. Project 5: Hash Table (Dictionary/Cache)
  2. Project 6: Binary Search Tree & Balancing

Phase 4: Specialized Structures (3-4 weeks)

  1. Project 7: Heap & Priority Queue
  2. Project 8: Graph Algorithms Visualizer

Phase 5: Capstone (4-6 weeks)

  1. Project 9: Mini Database Engine

Project Comparison Table

Project Difficulty Time Key Insight Real-World Use
Memory Arena & Dynamic Array ⭐⭐ 1 week Memory is just bytes Every program uses arrays
Linked List Toolkit ⭐⭐ 1 week Pointers connect scattered memory Undo systems, playlists
Stack Machine ⭐⭐⭐ 1 week LIFO enables recursion/backtracking Compilers, calculators
Queue Systems ⭐⭐⭐ 1 week FIFO enables fairness/ordering Job schedulers, BFS
Hash Table ⭐⭐⭐⭐ 2 weeks Trade space for O(1) lookup Databases, caches, dictionaries
BST & Balancing ⭐⭐⭐⭐ 2 weeks Ordering enables fast search Databases, file systems
Heap & Priority Queue ⭐⭐⭐ 1 week Partial ordering is cheaper Schedulers, Dijkstra’s
Graph Visualizer ⭐⭐⭐⭐ 2 weeks Relationships are everywhere Maps, social networks, AI
Mini Database ⭐⭐⭐⭐⭐ 4-6 weeks Combine everything Real databases

Project 1: Memory Arena & Dynamic Array

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Memory Management / Low-Level Programming
  • Software or Tool: Custom memory allocator
  • Main Book: Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron

What you’ll build

A memory arena (bump allocator) and a dynamic array (like C++’s std::vector or Python’s list) from scratch, visualizing memory layout as you add and remove elements.

Why it teaches data structures fundamentals

This is where everything starts. Before you can understand ANY data structure, you must understand:

  • What is memory? (just a giant array of bytes)
  • What is an address? (index into that array)
  • What is allocation? (reserving bytes)
  • What is contiguous storage? (elements side-by-side)

Most people use malloc() as a black box. By building your own allocator, you’ll see that memory is just bytes, and data structures are just ways of interpreting those bytes.

Core challenges you’ll face

  • Memory as bytes (everything is just numbers at addresses) → maps to memory model
  • Pointer arithmetic (address + offset = new address) → maps to array indexing
  • Reallocation (grow array when full) → maps to amortized analysis
  • Memory fragmentation (holes in memory) → maps to allocator design
  • Cache locality (contiguous = fast) → maps to performance

Key Concepts

Concept Resource
Memory Layout Computer Systems: A Programmer’s Perspective Chapter 9 - Bryant & O’Hallaron
Pointer Arithmetic C Programming: A Modern Approach Chapter 12 - K.N. King
Dynamic Arrays Introduction to Algorithms Chapter 17.4 (Amortized Analysis) - CLRS
Cache Performance The Lost Art of Structure Packing - Eric S. Raymond

Difficulty & Time

  • Difficulty: Intermediate
  • Time estimate: 1 week
  • Prerequisites: Basic C, understanding of pointers

Real World Outcome

$ ./memory_demo

=== Memory Arena Demo ===
Arena created: 1MB at address 0x7f1234500000

Allocating 100 bytes... got 0x7f1234500000
Allocating 200 bytes... got 0x7f1234500064  (offset: 100)
Allocating 50 bytes...  got 0x7f123450012C  (offset: 300)

Memory map:
[0x7f1234500000] ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
                 ^100 bytes    ^200 bytes  ^50 bytes     ^free space

Arena reset! All allocations freed instantly (no individual free calls needed)

=== Dynamic Array Demo ===
Creating dynamic array with initial capacity 4...

Push 1: [1, _, _, _]           capacity=4, size=1
Push 2: [1, 2, _, _]           capacity=4, size=2
Push 3: [1, 2, 3, _]           capacity=4, size=3
Push 4: [1, 2, 3, 4]           capacity=4, size=4
Push 5: [1, 2, 3, 4, 5, _, _, _]  capacity=8, size=5  ← GREW! (copied to new memory)

Memory addresses:
  Before grow: data at 0x7f1234600000
  After grow:  data at 0x7f1234700000 (new location!)

Performance test: push 1,000,000 elements
  Total time: 0.023s
  Total reallocations: 20 (log2 of 1M)
  Amortized cost per push: O(1) ✓

Random access test:
  arr[0] = 1       (0.000001ms)
  arr[500000] = ?  (0.000001ms)  ← Same speed! O(1) random access ✓

Learning Milestones

  1. Arena allocates bytes → You understand memory is just bytes
  2. Dynamic array grows → You understand reallocation and copying
  3. Growth is O(1) amortized → You understand why doubling works
  4. Random access is O(1) → You understand contiguous memory

Implementation

Memory Arena (Bump Allocator)

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    uint8_t* base;      // Start of memory block
    size_t size;        // Total size
    size_t offset;      // Current position (bump pointer)
} Arena;

Arena* arena_create(size_t size) {
    Arena* arena = malloc(sizeof(Arena));
    arena->base = malloc(size);
    arena->size = size;
    arena->offset = 0;
    printf("Arena created: %zu bytes at %p\n", size, arena->base);
    return arena;
}

void* arena_alloc(Arena* arena, size_t size) {
    // Align to 8 bytes for performance
    size_t aligned_size = (size + 7) & ~7;

    if (arena->offset + aligned_size > arena->size) {
        printf("Arena out of memory!\n");
        return NULL;
    }

    void* ptr = arena->base + arena->offset;
    arena->offset += aligned_size;

    printf("Allocated %zu bytes at %p (offset: %zu)\n",
           size, ptr, arena->offset - aligned_size);
    return ptr;
}

void arena_reset(Arena* arena) {
    arena->offset = 0;  // Instant "free" of everything!
    printf("Arena reset! All memory available again.\n");
}

void arena_destroy(Arena* arena) {
    free(arena->base);
    free(arena);
}

// Visualize memory usage
void arena_visualize(Arena* arena) {
    printf("\nMemory map (%zu / %zu bytes used):\n[", arena->offset, arena->size);
    int blocks = 50;
    int used = (arena->offset * blocks) / arena->size;
    for (int i = 0; i < blocks; i++) {
        printf("%s", i < used ? "█" : "░");
    }
    printf("]\n\n");
}

Dynamic Array

typedef struct {
    int* data;          // Pointer to array
    size_t size;        // Number of elements
    size_t capacity;    // Allocated space
} DynArray;

DynArray* dynarray_create(size_t initial_capacity) {
    DynArray* arr = malloc(sizeof(DynArray));
    arr->data = malloc(initial_capacity * sizeof(int));
    arr->size = 0;
    arr->capacity = initial_capacity;
    printf("Created array with capacity %zu at %p\n",
           initial_capacity, arr->data);
    return arr;
}

void dynarray_push(DynArray* arr, int value) {
    // Check if we need to grow
    if (arr->size >= arr->capacity) {
        size_t new_capacity = arr->capacity * 2;  // Double!
        int* new_data = malloc(new_capacity * sizeof(int));

        // Copy old data to new location
        memcpy(new_data, arr->data, arr->size * sizeof(int));

        printf("GROW: %zu -> %zu (moved from %p to %p)\n",
               arr->capacity, new_capacity, arr->data, new_data);

        free(arr->data);
        arr->data = new_data;
        arr->capacity = new_capacity;
    }

    arr->data[arr->size++] = value;
}

int dynarray_get(DynArray* arr, size_t index) {
    if (index >= arr->size) {
        printf("Index out of bounds!\n");
        return -1;
    }
    return arr->data[index];  // O(1) - just pointer + offset!
}

void dynarray_visualize(DynArray* arr) {
    printf("[");
    for (size_t i = 0; i < arr->capacity; i++) {
        if (i < arr->size) {
            printf("%d", arr->data[i]);
        } else {
            printf("_");
        }
        if (i < arr->capacity - 1) printf(", ");
    }
    printf("]  size=%zu, capacity=%zu\n", arr->size, arr->capacity);
}

Why Arrays Are Cache-Friendly

ARRAY (contiguous memory):
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │  ← All in one cache line (64 bytes)
└───┴───┴───┴───┴───┴───┴───┴───┘
  ↑
  CPU fetches entire line, gets 8 elements for price of 1!

LINKED LIST (scattered memory):
┌───┐     ┌───┐     ┌───┐     ┌───┐
│ 1 │────►│ 2 │────►│ 3 │────►│ 4 │
└───┘     └───┘     └───┘     └───┘
  ↑ cache miss  ↑ cache miss  ↑ cache miss
  Each access may require fetching a new cache line!

The Amortization Insight

Why does doubling capacity give O(1) amortized push?

Operations to push n elements:
  n pushes (always)
  + copy 1 element (when growing from 1 to 2)
  + copy 2 elements (when growing from 2 to 4)
  + copy 4 elements (when growing from 4 to 8)
  + ...
  + copy n/2 elements (when growing from n/2 to n)

Total copies = 1 + 2 + 4 + ... + n/2 = n - 1

Total work = n pushes + (n-1) copies ≈ 2n = O(n)
Average work per push = O(n) / n = O(1) ✓

Project 2: Linked List Toolkit

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Pointers / Dynamic Memory
  • Software or Tool: CLI list manipulation tool
  • Main Book: C Programming: A Modern Approach by K.N. King

What you’ll build

A comprehensive linked list library (singly-linked, doubly-linked, circular) with visualization, plus a practical application: an undo/redo system for a text editor.

Why it teaches this concept

Linked lists are the opposite of arrays:

  • Arrays: contiguous memory, random access, expensive insert/delete
  • Linked lists: scattered memory, sequential access, cheap insert/delete

Understanding this trade-off is fundamental. You’ll also master pointers, which are essential for all complex data structures.

Core challenges you’ll face

  • Pointer manipulation (next, prev, head, tail) → maps to indirection
  • Edge cases (empty list, one element, head/tail operations) → maps to boundary conditions
  • Memory management (malloc each node, free on remove) → maps to manual memory
  • Traversal (must walk from head) → maps to O(n) access
  • Doubly-linked (can go backwards) → maps to trade-offs

Key Concepts

Concept Resource
Pointer Fundamentals C Programming: A Modern Approach Chapter 11 - K.N. King
Linked List Operations Introduction to Algorithms Chapter 10.2 - CLRS
Memory Allocation The C Programming Language Chapter 7 - K&R

Difficulty & Time

  • Difficulty: Intermediate
  • Time estimate: 1 week
  • Prerequisites: Project 1 completed, comfortable with pointers

Real World Outcome

$ ./linkedlist_demo

=== Singly Linked List ===
Insert 10 at head: [10] -> NULL
Insert 20 at head: [20] -> [10] -> NULL
Insert 30 at tail: [20] -> [10] -> [30] -> NULL
Insert 25 after 10: [20] -> [10] -> [25] -> [30] -> NULL

Memory layout:
  Node@0x1000: data=20, next=0x1020
  Node@0x1020: data=10, next=0x1040
  Node@0x1040: data=25, next=0x1060
  Node@0x1060: data=30, next=NULL

Delete 10: [20] -> [25] -> [30] -> NULL
  (freed memory at 0x1020)

=== Doubly Linked List ===
[NULL <-> 20 <-> 25 <-> 30 <-> NULL]
Traverse forward:  20 -> 25 -> 30
Traverse backward: 30 -> 25 -> 20

=== Undo/Redo System Demo ===
Text: ""
> type "Hello"
Text: "Hello"                    [undo stack: "Hello"]
> type " World"
Text: "Hello World"              [undo stack: "Hello", " World"]
> undo
Text: "Hello"                    [redo stack: " World"]
> undo
Text: ""                         [redo stack: " World", "Hello"]
> redo
Text: "Hello"                    [undo stack: "Hello"]
> type "!"
Text: "Hello!"                   [redo stack cleared!]

Learning Milestones

  1. Insert/delete at head is O(1) → You understand the advantage over arrays
  2. Find element is O(n) → You understand the disadvantage
  3. Doubly-linked enables O(1) delete given node → You understand the trade-off
  4. Undo/redo works → You understand why linked lists exist in real systems

Implementation

Singly Linked List

typedef struct Node {
    int data;
    struct Node* next;
} Node;

typedef struct {
    Node* head;
    Node* tail;
    size_t size;
} SinglyLinkedList;

// O(1) - just update head
void sll_push_front(SinglyLinkedList* list, int data) {
    Node* new_node = malloc(sizeof(Node));
    new_node->data = data;
    new_node->next = list->head;

    list->head = new_node;
    if (list->tail == NULL) {
        list->tail = new_node;
    }
    list->size++;
}

// O(1) - just update tail
void sll_push_back(SinglyLinkedList* list, int data) {
    Node* new_node = malloc(sizeof(Node));
    new_node->data = data;
    new_node->next = NULL;

    if (list->tail) {
        list->tail->next = new_node;
    } else {
        list->head = new_node;
    }
    list->tail = new_node;
    list->size++;
}

// O(n) - must traverse to find
Node* sll_find(SinglyLinkedList* list, int data) {
    Node* current = list->head;
    while (current) {
        if (current->data == data) {
            return current;
        }
        current = current->next;
    }
    return NULL;
}

// O(n) - must find previous node
void sll_delete(SinglyLinkedList* list, int data) {
    if (!list->head) return;

    // Special case: deleting head
    if (list->head->data == data) {
        Node* to_delete = list->head;
        list->head = list->head->next;
        if (list->tail == to_delete) {
            list->tail = NULL;
        }
        free(to_delete);
        list->size--;
        return;
    }

    // Find previous node
    Node* prev = list->head;
    while (prev->next && prev->next->data != data) {
        prev = prev->next;
    }

    if (prev->next) {
        Node* to_delete = prev->next;
        prev->next = to_delete->next;
        if (list->tail == to_delete) {
            list->tail = prev;
        }
        free(to_delete);
        list->size--;
    }
}

void sll_print(SinglyLinkedList* list) {
    Node* current = list->head;
    while (current) {
        printf("[%d]", current->data);
        if (current->next) printf(" -> ");
        current = current->next;
    }
    printf(" -> NULL\n");
}

Doubly Linked List

typedef struct DNode {
    int data;
    struct DNode* prev;
    struct DNode* next;
} DNode;

typedef struct {
    DNode* head;
    DNode* tail;
    size_t size;
} DoublyLinkedList;

// O(1) delete when you have the node pointer!
void dll_delete_node(DoublyLinkedList* list, DNode* node) {
    if (node->prev) {
        node->prev->next = node->next;
    } else {
        list->head = node->next;
    }

    if (node->next) {
        node->next->prev = node->prev;
    } else {
        list->tail = node->prev;
    }

    free(node);
    list->size--;
}

Undo/Redo System (Practical Application)

typedef struct Action {
    char* text;              // The text that was added
    struct Action* next;
} Action;

typedef struct {
    char buffer[1024];       // Current text
    Action* undo_stack;      // Stack of actions to undo
    Action* redo_stack;      // Stack of actions to redo
} TextEditor;

void editor_type(TextEditor* editor, const char* text) {
    // Add to buffer
    strcat(editor->buffer, text);

    // Push to undo stack
    Action* action = malloc(sizeof(Action));
    action->text = strdup(text);
    action->next = editor->undo_stack;
    editor->undo_stack = action;

    // Clear redo stack (can't redo after new action)
    while (editor->redo_stack) {
        Action* to_free = editor->redo_stack;
        editor->redo_stack = editor->redo_stack->next;
        free(to_free->text);
        free(to_free);
    }
}

void editor_undo(TextEditor* editor) {
    if (!editor->undo_stack) {
        printf("Nothing to undo!\n");
        return;
    }

    // Pop from undo stack
    Action* action = editor->undo_stack;
    editor->undo_stack = action->next;

    // Remove text from buffer
    size_t len = strlen(action->text);
    editor->buffer[strlen(editor->buffer) - len] = '\0';

    // Push to redo stack
    action->next = editor->redo_stack;
    editor->redo_stack = action;
}

void editor_redo(TextEditor* editor) {
    if (!editor->redo_stack) {
        printf("Nothing to redo!\n");
        return;
    }

    // Pop from redo stack
    Action* action = editor->redo_stack;
    editor->redo_stack = action->next;

    // Add text back to buffer
    strcat(editor->buffer, action->text);

    // Push to undo stack
    action->next = editor->undo_stack;
    editor->undo_stack = action;
}

Array vs Linked List: The Trade-off

Operation Array Linked List Winner
Access by index O(1) O(n) Array
Insert at beginning O(n) O(1) Linked List
Insert at end O(1) amortized O(1) with tail Tie
Insert in middle O(n) O(1) if have node Linked List
Delete O(n) O(1) if have node Linked List
Memory overhead Low High (pointers) Array
Cache performance Excellent Poor Array

When to use Linked Lists:

  • Frequent insertions/deletions at arbitrary positions
  • When you have pointers to nodes (e.g., LRU cache)
  • When you need O(1) worst-case insert (not amortized)

When to use Arrays:

  • Need random access
  • Iterating through all elements (cache-friendly)
  • Memory is tight (no pointer overhead)

Project 3: Stack Machine (Calculator & Mini VM)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Abstract Data Types / Language Implementation
  • Software or Tool: RPN Calculator, Bytecode VM
  • Main Book: Writing a C Compiler by Nora Sandler

What you’ll build

A Reverse Polish Notation (RPN) calculator and a simple stack-based virtual machine that executes bytecode—demonstrating why stacks are fundamental to computing.

Why it teaches this concept

The stack is not just a data structure—it’s how computers work:

  • Function calls use a stack (call stack)
  • Expression evaluation uses a stack
  • Undo operations use a stack
  • Backtracking algorithms use a stack
  • Your CPU has a hardware stack

Building a stack machine shows you why LIFO (Last In, First Out) is so fundamental.

Core challenges you’ll face

  • LIFO semantics (last in, first out) → maps to stack discipline
  • Expression parsing (infix to postfix) → maps to operator precedence
  • VM execution (fetch-decode-execute) → maps to computation model
  • Stack underflow/overflow → maps to error handling

Key Concepts

Concept Resource
Stack ADT Introduction to Algorithms Chapter 10.1 - CLRS
Expression Evaluation Algorithms, Fourth Edition Chapter 1.3 - Sedgewick
Stack Machines Writing a C Compiler Chapter 3 - Nora Sandler
Shunting Yard Algorithm Dijkstra’s original paper or Wikipedia

Difficulty & Time

  • Difficulty: Intermediate-Advanced
  • Time estimate: 1 week
  • Prerequisites: Projects 1-2 completed

Real World Outcome

$ ./rpn_calc
RPN Calculator (type 'quit' to exit)

> 3 4 +
Stack: [7]
Result: 7

> 5 *
Stack: [35]
Result: 35

> 2 3 4 + *
Stack: [35, 14]
  Explanation: 2 * (3 + 4) = 14
Result: 14

> clear
Stack: []

=== Infix to RPN Demo ===
Infix:   (3 + 4) * 5
Postfix: 3 4 + 5 *
Result:  35

=== Mini VM Demo ===
Bytecode: PUSH 10, PUSH 20, ADD, PUSH 5, MUL, PRINT
Executing...
  PUSH 10    Stack: [10]
  PUSH 20    Stack: [10, 20]
  ADD        Stack: [30]
  PUSH 5     Stack: [30, 5]
  MUL        Stack: [150]
  PRINT      Output: 150

Learning Milestones

  1. RPN calculator works → You understand postfix evaluation
  2. Infix to postfix conversion works → You understand the shunting-yard algorithm
  3. VM executes bytecode → You understand stack machines
  4. You see the call stack in action → You understand how functions work

Implementation

Stack Data Structure

#define STACK_MAX 256

typedef struct {
    int items[STACK_MAX];
    int top;
} Stack;

void stack_init(Stack* s) {
    s->top = -1;
}

int stack_is_empty(Stack* s) {
    return s->top == -1;
}

int stack_is_full(Stack* s) {
    return s->top == STACK_MAX - 1;
}

void stack_push(Stack* s, int value) {
    if (stack_is_full(s)) {
        printf("Stack overflow!\n");
        exit(1);
    }
    s->items[++s->top] = value;
}

int stack_pop(Stack* s) {
    if (stack_is_empty(s)) {
        printf("Stack underflow!\n");
        exit(1);
    }
    return s->items[s->top--];
}

int stack_peek(Stack* s) {
    if (stack_is_empty(s)) {
        printf("Stack empty!\n");
        exit(1);
    }
    return s->items[s->top];
}

void stack_print(Stack* s) {
    printf("Stack: [");
    for (int i = 0; i <= s->top; i++) {
        printf("%d", s->items[i]);
        if (i < s->top) printf(", ");
    }
    printf("]\n");
}

RPN Calculator

int evaluate_rpn(const char* expression) {
    Stack stack;
    stack_init(&stack);

    char* expr = strdup(expression);
    char* token = strtok(expr, " ");

    while (token != NULL) {
        if (isdigit(token[0]) || (token[0] == '-' && isdigit(token[1]))) {
            // It's a number - push it
            stack_push(&stack, atoi(token));
        } else {
            // It's an operator - pop operands and compute
            int b = stack_pop(&stack);
            int a = stack_pop(&stack);
            int result;

            switch (token[0]) {
                case '+': result = a + b; break;
                case '-': result = a - b; break;
                case '*': result = a * b; break;
                case '/': result = a / b; break;
                default:
                    printf("Unknown operator: %s\n", token);
                    exit(1);
            }

            stack_push(&stack, result);
        }

        printf("  Token: %-4s  ", token);
        stack_print(&stack);

        token = strtok(NULL, " ");
    }

    free(expr);
    return stack_pop(&stack);
}

Shunting-Yard Algorithm (Infix to Postfix)

int precedence(char op) {
    switch (op) {
        case '+': case '-': return 1;
        case '*': case '/': return 2;
        default: return 0;
    }
}

void infix_to_postfix(const char* infix, char* postfix) {
    Stack op_stack;
    stack_init(&op_stack);

    int j = 0;
    for (int i = 0; infix[i]; i++) {
        char c = infix[i];

        if (isdigit(c)) {
            postfix[j++] = c;
            postfix[j++] = ' ';
        } else if (c == '(') {
            stack_push(&op_stack, c);
        } else if (c == ')') {
            while (!stack_is_empty(&op_stack) && stack_peek(&op_stack) != '(') {
                postfix[j++] = stack_pop(&op_stack);
                postfix[j++] = ' ';
            }
            stack_pop(&op_stack);  // Remove '('
        } else if (c == '+' || c == '-' || c == '*' || c == '/') {
            while (!stack_is_empty(&op_stack) &&
                   precedence(stack_peek(&op_stack)) >= precedence(c)) {
                postfix[j++] = stack_pop(&op_stack);
                postfix[j++] = ' ';
            }
            stack_push(&op_stack, c);
        }
    }

    while (!stack_is_empty(&op_stack)) {
        postfix[j++] = stack_pop(&op_stack);
        postfix[j++] = ' ';
    }

    postfix[j] = '\0';
}

Mini Stack-Based VM

typedef enum {
    OP_PUSH,
    OP_POP,
    OP_ADD,
    OP_SUB,
    OP_MUL,
    OP_DIV,
    OP_PRINT,
    OP_HALT
} OpCode;

typedef struct {
    OpCode op;
    int operand;
} Instruction;

void vm_execute(Instruction* program, int length) {
    Stack stack;
    stack_init(&stack);

    for (int ip = 0; ip < length; ip++) {
        Instruction inst = program[ip];

        switch (inst.op) {
            case OP_PUSH:
                printf("  PUSH %d\t", inst.operand);
                stack_push(&stack, inst.operand);
                break;

            case OP_ADD: {
                printf("  ADD\t\t");
                int b = stack_pop(&stack);
                int a = stack_pop(&stack);
                stack_push(&stack, a + b);
                break;
            }

            case OP_MUL: {
                printf("  MUL\t\t");
                int b = stack_pop(&stack);
                int a = stack_pop(&stack);
                stack_push(&stack, a * b);
                break;
            }

            case OP_PRINT:
                printf("  PRINT\t\t");
                printf("Output: %d\n", stack_peek(&stack));
                break;

            case OP_HALT:
                printf("  HALT\n");
                return;
        }

        stack_print(&stack);
    }
}

// Example: Compute (10 + 20) * 5
int main() {
    Instruction program[] = {
        {OP_PUSH, 10},
        {OP_PUSH, 20},
        {OP_ADD, 0},
        {OP_PUSH, 5},
        {OP_MUL, 0},
        {OP_PRINT, 0},
        {OP_HALT, 0}
    };

    printf("Executing bytecode:\n");
    vm_execute(program, sizeof(program) / sizeof(Instruction));
    return 0;
}

Why Stacks Are Everywhere

FUNCTION CALLS (The Call Stack):
┌─────────────────┐
│ main()          │ ← Called first, returns last
├─────────────────┤
│ calculate()     │ ← Called second
├─────────────────┤
│ helper()        │ ← Called third, returns first
└─────────────────┘

Each function call PUSHES a stack frame.
Each return POPS a stack frame.
LIFO ensures functions return in the right order!

EXPRESSION EVALUATION:
Expression: 3 + 4 * 5

As RPN: 3 4 5 * +

Stack trace:
  Push 3:     [3]
  Push 4:     [3, 4]
  Push 5:     [3, 4, 5]
  Mul:        [3, 20]      (4 * 5)
  Add:        [23]         (3 + 20)

Result: 23 ✓

Project 4: Queue Systems (Task Scheduler & BFS)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Abstract Data Types / Systems Programming
  • Software or Tool: Job Queue, BFS Maze Solver
  • Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau

What you’ll build

A multi-priority task scheduler using queues, plus a visual maze solver using BFS—demonstrating why FIFO (First In, First Out) enables fairness and level-order processing.

Why it teaches this concept

Queues are about fairness and ordering:

  • First come, first served (FIFO)
  • Level-by-level processing (BFS)
  • Buffering between producer and consumer

Understanding queues is essential for:

  • Operating system schedulers
  • Network packet handling
  • Message queues (Kafka, RabbitMQ)
  • Breadth-first search

Core challenges you’ll face

  • FIFO semantics (first in, first out) → maps to fairness
  • Circular buffer (efficient array-based queue) → maps to memory efficiency
  • Priority queues (not FIFO - will connect to heaps) → maps to scheduling
  • Producer-consumer (handling different speeds) → maps to buffering

Key Concepts

Concept Resource
Queue ADT Introduction to Algorithms Chapter 10.1 - CLRS
Circular Buffers The Linux Programming Interface Chapter 44 - Kerrisk
BFS Algorithm Introduction to Algorithms Chapter 22.2 - CLRS
Process Scheduling Operating Systems: Three Easy Pieces Chapter 7 - OSTEP

Difficulty & Time

  • Difficulty: Intermediate
  • Time estimate: 1 week
  • Prerequisites: Projects 1-3 completed

Real World Outcome

$ ./task_scheduler

=== Task Scheduler with Multiple Queues ===

Adding tasks:
  [HIGH]   Task: "Handle interrupt"      -> High priority queue
  [MEDIUM] Task: "Process user input"    -> Medium priority queue
  [LOW]    Task: "Background sync"       -> Low priority queue
  [HIGH]   Task: "Memory page fault"     -> High priority queue
  [MEDIUM] Task: "Render frame"          -> Medium priority queue

Processing (high priority first, then FIFO within each level):
  [1] HIGH:   "Handle interrupt"
  [2] HIGH:   "Memory page fault"
  [3] MEDIUM: "Process user input"
  [4] MEDIUM: "Render frame"
  [5] LOW:    "Background sync"

=== BFS Maze Solver ===

Maze:
█████████████
█S    █     █
███ ███ █████
█   █   █   █
█ █████ █ █ █
█       █   █
█████████ ███
█         █E█
█████████████

Solving with BFS...
Step 1: Exploring (1,1) - Added neighbors to queue
Step 2: Exploring (1,2) - Added neighbors to queue
...

Solution found! Path length: 24 steps
█████████████
█S....█.....█
███.███.█████
█...█...█...█
█.█████.█.█.█
█.......█...█
█████████.███
█.........█E█
█████████████

Why BFS finds shortest path: It explores level-by-level!
  Level 0: Start (1,1)
  Level 1: All cells 1 step away
  Level 2: All cells 2 steps away
  ...
  First time we reach End = shortest path!

Learning Milestones

  1. Basic queue works → You understand FIFO
  2. Circular buffer is efficient → You understand wrapping
  3. Priority scheduler works → You understand scheduling
  4. BFS finds shortest path → You understand level-order processing

Implementation

Queue with Circular Buffer

#define QUEUE_CAPACITY 100

typedef struct {
    int items[QUEUE_CAPACITY];
    int front;      // Index to dequeue from
    int rear;       // Index to enqueue to
    int size;
} Queue;

void queue_init(Queue* q) {
    q->front = 0;
    q->rear = 0;
    q->size = 0;
}

int queue_is_empty(Queue* q) {
    return q->size == 0;
}

int queue_is_full(Queue* q) {
    return q->size == QUEUE_CAPACITY;
}

void queue_enqueue(Queue* q, int value) {
    if (queue_is_full(q)) {
        printf("Queue overflow!\n");
        return;
    }
    q->items[q->rear] = value;
    q->rear = (q->rear + 1) % QUEUE_CAPACITY;  // Wrap around!
    q->size++;
}

int queue_dequeue(Queue* q) {
    if (queue_is_empty(q)) {
        printf("Queue underflow!\n");
        return -1;
    }
    int value = q->items[q->front];
    q->front = (q->front + 1) % QUEUE_CAPACITY;  // Wrap around!
    q->size--;
    return value;
}

// Visualize circular buffer
void queue_visualize(Queue* q) {
    printf("Queue (front=%d, rear=%d, size=%d):\n[", q->front, q->rear, q->size);
    for (int i = 0; i < QUEUE_CAPACITY && i < 20; i++) {
        if (i >= q->front && i < q->front + q->size) {
            printf("%d", q->items[i % QUEUE_CAPACITY]);
        } else {
            printf("_");
        }
        if (i < 19) printf(",");
    }
    printf("...]\n");
}

Multi-Level Priority Scheduler

typedef enum { HIGH, MEDIUM, LOW, NUM_PRIORITIES } Priority;

typedef struct {
    char name[64];
    Priority priority;
} Task;

typedef struct {
    Queue queues[NUM_PRIORITIES];
    Task tasks[1000];  // Task storage
    int task_count;
} Scheduler;

void scheduler_add_task(Scheduler* s, const char* name, Priority priority) {
    Task* task = &s->tasks[s->task_count];
    strcpy(task->name, name);
    task->priority = priority;

    // Add task index to appropriate queue
    queue_enqueue(&s->queues[priority], s->task_count);
    s->task_count++;

    const char* priority_names[] = {"HIGH", "MEDIUM", "LOW"};
    printf("Added [%s] task: \"%s\"\n", priority_names[priority], name);
}

Task* scheduler_get_next(Scheduler* s) {
    // Check high priority first, then medium, then low
    for (int p = HIGH; p < NUM_PRIORITIES; p++) {
        if (!queue_is_empty(&s->queues[p])) {
            int task_idx = queue_dequeue(&s->queues[p]);
            return &s->tasks[task_idx];
        }
    }
    return NULL;  // No tasks
}

BFS Maze Solver

#define MAZE_SIZE 13

typedef struct {
    int row, col;
    int distance;
} Cell;

int bfs_solve_maze(char maze[MAZE_SIZE][MAZE_SIZE],
                   int start_row, int start_col,
                   int end_row, int end_col) {

    // Queue for BFS
    Cell queue[MAZE_SIZE * MAZE_SIZE];
    int front = 0, rear = 0;

    // Visited array
    int visited[MAZE_SIZE][MAZE_SIZE] = {0};

    // Parent tracking for path reconstruction
    int parent_row[MAZE_SIZE][MAZE_SIZE];
    int parent_col[MAZE_SIZE][MAZE_SIZE];

    // Start BFS
    queue[rear++] = (Cell){start_row, start_col, 0};
    visited[start_row][start_col] = 1;

    int directions[4][2] = {{-1, 0}, {1, 0}, {0, -1}, {0, 1}};

    while (front < rear) {
        Cell current = queue[front++];

        // Found the end!
        if (current.row == end_row && current.col == end_col) {
            // Reconstruct path
            int r = end_row, c = end_col;
            while (r != start_row || c != start_col) {
                maze[r][c] = '.';
                int pr = parent_row[r][c];
                int pc = parent_col[r][c];
                r = pr;
                c = pc;
            }
            maze[start_row][start_col] = 'S';
            maze[end_row][end_col] = 'E';
            return current.distance;
        }

        // Explore neighbors
        for (int d = 0; d < 4; d++) {
            int nr = current.row + directions[d][0];
            int nc = current.col + directions[d][1];

            if (nr >= 0 && nr < MAZE_SIZE && nc >= 0 && nc < MAZE_SIZE &&
                maze[nr][nc] != '#' && !visited[nr][nc]) {

                visited[nr][nc] = 1;
                parent_row[nr][nc] = current.row;
                parent_col[nr][nc] = current.col;
                queue[rear++] = (Cell){nr, nc, current.distance + 1};
            }
        }
    }

    return -1;  // No path found
}

Stack vs Queue: When to Use Each

Use Case Stack (LIFO) Queue (FIFO)
Undo/Redo  
Function calls  
DFS traversal  
Expression evaluation  
Task scheduling  
BFS traversal  
Print queue  
Message buffering  

Key insight:

  • Stack: When most recent is most important
  • Queue: When order of arrival matters (fairness)

Project 5: Hash Table (Dictionary & Cache)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Python, Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Data Structures / Algorithms
  • Software or Tool: Key-Value Store, LRU Cache
  • Main Book: Introduction to Algorithms by CLRS

What you’ll build

A complete hash table implementation with multiple collision resolution strategies (chaining, open addressing), plus a practical LRU (Least Recently Used) cache.

Why it teaches this concept

Hash tables are the most practically important data structure:

  • O(1) average lookup, insert, delete
  • Backbone of databases, caches, dictionaries
  • Used everywhere: Python dicts, JavaScript objects, database indexes

Understanding hash tables means understanding:

  • Hash functions (turning keys into indices)
  • Collision resolution (when two keys hash to same index)
  • Load factor and resizing
  • Trade-offs between time and space

Core challenges you’ll face

  • Hash function design (distribute keys uniformly) → maps to hashing theory
  • Collision handling (chaining vs open addressing) → maps to trade-offs
  • Load factor (when to resize) → maps to performance tuning
  • LRU eviction (combining hash table with linked list) → maps to real caches

Resources for Key Challenges

Key Concepts

Concept Resource
Hash Functions Introduction to Algorithms Chapter 11 - CLRS
Collision Resolution Algorithms, Fourth Edition Chapter 3.4 - Sedgewick
Universal Hashing Introduction to Algorithms Chapter 11.3.3 - CLRS
LRU Cache Design LeetCode Problem 146 explanation

Difficulty & Time

  • Difficulty: Advanced
  • Time estimate: 2 weeks
  • Prerequisites: Projects 1-2 completed, understanding of modular arithmetic

Real World Outcome

$ ./hashtable_demo

=== Hash Table with Chaining ===
Initial capacity: 16, load factor threshold: 0.75

put("apple", 5)   -> hash("apple") = 7, bucket 7
put("banana", 3)  -> hash("banana") = 2, bucket 2
put("cherry", 8)  -> hash("cherry") = 7, bucket 7 (collision with apple!)

Bucket visualization:
[0]: empty
[1]: empty
[2]: ["banana" -> 3]
...
[7]: ["apple" -> 5] -> ["cherry" -> 8]  ← Chained!
...

get("cherry") = 8  (found in bucket 7, 2nd in chain)
get("mango") = NOT FOUND

=== Open Addressing (Linear Probing) ===
put("apple", 5)   -> hash = 7, slot 7
put("cherry", 8)  -> hash = 7, collision! probe slot 8

Table visualization:
[0]: empty
...
[7]: ("apple", 5)
[8]: ("cherry", 8)  ← Probed here!
...

=== Resize Demo ===
Adding 12 items... load factor = 0.75
RESIZE triggered! 16 -> 32 buckets
All items rehashed to new positions

=== LRU Cache (capacity=3) ===
put(1, "one")    Cache: [1]
put(2, "two")    Cache: [2, 1]
put(3, "three")  Cache: [3, 2, 1]
get(1)           Cache: [1, 3, 2]  ← 1 moved to front (recently used)
put(4, "four")   Cache: [4, 1, 3]  ← 2 evicted (least recently used)!
get(2)           MISS! (was evicted)

Learning Milestones

  1. Hash function distributes evenly → You understand hashing
  2. Chaining handles collisions → You understand linked lists in practice
  3. Open addressing works → You understand the alternative trade-off
  4. Resize maintains O(1) amortized → You understand load factors
  5. LRU cache works → You understand combining data structures

Implementation

Hash Function

// DJB2 hash - simple and effective
uint32_t hash_string(const char* str) {
    uint32_t hash = 5381;
    int c;
    while ((c = *str++)) {
        hash = ((hash << 5) + hash) + c;  // hash * 33 + c
    }
    return hash;
}

// For integer keys
uint32_t hash_int(int key) {
    // Knuth's multiplicative hash
    return key * 2654435761u;
}

Hash Table with Chaining

typedef struct Entry {
    char* key;
    int value;
    struct Entry* next;  // For chaining
} Entry;

typedef struct {
    Entry** buckets;     // Array of linked lists
    size_t capacity;
    size_t size;
    float max_load_factor;
} HashTable;

HashTable* ht_create(size_t capacity) {
    HashTable* ht = malloc(sizeof(HashTable));
    ht->buckets = calloc(capacity, sizeof(Entry*));
    ht->capacity = capacity;
    ht->size = 0;
    ht->max_load_factor = 0.75;
    return ht;
}

void ht_put(HashTable* ht, const char* key, int value) {
    // Check load factor
    float load = (float)ht->size / ht->capacity;
    if (load > ht->max_load_factor) {
        ht_resize(ht, ht->capacity * 2);
    }

    uint32_t hash = hash_string(key);
    size_t index = hash % ht->capacity;

    // Check if key exists
    Entry* entry = ht->buckets[index];
    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            entry->value = value;  // Update
            return;
        }
        entry = entry->next;
    }

    // Insert new entry at head of chain
    Entry* new_entry = malloc(sizeof(Entry));
    new_entry->key = strdup(key);
    new_entry->value = value;
    new_entry->next = ht->buckets[index];
    ht->buckets[index] = new_entry;
    ht->size++;
}

int ht_get(HashTable* ht, const char* key, int* found) {
    uint32_t hash = hash_string(key);
    size_t index = hash % ht->capacity;

    Entry* entry = ht->buckets[index];
    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            *found = 1;
            return entry->value;
        }
        entry = entry->next;
    }

    *found = 0;
    return 0;
}

void ht_resize(HashTable* ht, size_t new_capacity) {
    Entry** old_buckets = ht->buckets;
    size_t old_capacity = ht->capacity;

    ht->buckets = calloc(new_capacity, sizeof(Entry*));
    ht->capacity = new_capacity;
    ht->size = 0;

    // Rehash all entries
    for (size_t i = 0; i < old_capacity; i++) {
        Entry* entry = old_buckets[i];
        while (entry) {
            Entry* next = entry->next;
            ht_put(ht, entry->key, entry->value);
            free(entry->key);
            free(entry);
            entry = next;
        }
    }

    free(old_buckets);
    printf("Resized: %zu -> %zu buckets\n", old_capacity, new_capacity);
}

Hash Table with Open Addressing (Linear Probing)

typedef enum { EMPTY, OCCUPIED, DELETED } SlotState;

typedef struct {
    char* key;
    int value;
    SlotState state;
} Slot;

typedef struct {
    Slot* slots;
    size_t capacity;
    size_t size;
} OpenHashTable;

void oht_put(OpenHashTable* ht, const char* key, int value) {
    uint32_t hash = hash_string(key);
    size_t index = hash % ht->capacity;

    // Linear probing
    size_t start = index;
    do {
        if (ht->slots[index].state == EMPTY ||
            ht->slots[index].state == DELETED) {
            // Found empty slot
            ht->slots[index].key = strdup(key);
            ht->slots[index].value = value;
            ht->slots[index].state = OCCUPIED;
            ht->size++;
            return;
        }

        if (ht->slots[index].state == OCCUPIED &&
            strcmp(ht->slots[index].key, key) == 0) {
            // Key exists - update
            ht->slots[index].value = value;
            return;
        }

        index = (index + 1) % ht->capacity;  // Probe next slot
    } while (index != start);

    printf("Hash table full!\n");
}

LRU Cache (Hash Table + Doubly Linked List)

typedef struct LRUNode {
    int key;
    int value;
    struct LRUNode* prev;
    struct LRUNode* next;
} LRUNode;

typedef struct {
    int capacity;
    int size;
    LRUNode* head;          // Most recently used
    LRUNode* tail;          // Least recently used
    LRUNode** hash_table;   // For O(1) lookup
    int table_size;
} LRUCache;

void lru_move_to_front(LRUCache* cache, LRUNode* node) {
    if (node == cache->head) return;

    // Remove from current position
    if (node->prev) node->prev->next = node->next;
    if (node->next) node->next->prev = node->prev;
    if (node == cache->tail) cache->tail = node->prev;

    // Insert at head
    node->prev = NULL;
    node->next = cache->head;
    if (cache->head) cache->head->prev = node;
    cache->head = node;
    if (!cache->tail) cache->tail = node;
}

int lru_get(LRUCache* cache, int key) {
    int index = key % cache->table_size;
    LRUNode* node = cache->hash_table[index];

    // Find in chain
    while (node && node->key != key) {
        node = node->next;  // Actually need separate chaining here
    }

    if (!node) return -1;  // Not found

    // Move to front (mark as recently used)
    lru_move_to_front(cache, node);
    return node->value;
}

void lru_put(LRUCache* cache, int key, int value) {
    // Check if exists
    int index = key % cache->table_size;
    LRUNode* existing = cache->hash_table[index];

    if (existing) {
        existing->value = value;
        lru_move_to_front(cache, existing);
        return;
    }

    // Check if at capacity
    if (cache->size >= cache->capacity) {
        // Evict LRU (tail)
        LRUNode* victim = cache->tail;
        printf("Evicting key %d (LRU)\n", victim->key);

        // Remove from hash table
        int victim_index = victim->key % cache->table_size;
        cache->hash_table[victim_index] = NULL;

        // Remove from list
        cache->tail = victim->prev;
        if (cache->tail) cache->tail->next = NULL;
        free(victim);
        cache->size--;
    }

    // Add new node
    LRUNode* node = malloc(sizeof(LRUNode));
    node->key = key;
    node->value = value;
    node->prev = NULL;
    node->next = cache->head;

    if (cache->head) cache->head->prev = node;
    cache->head = node;
    if (!cache->tail) cache->tail = node;

    cache->hash_table[index] = node;
    cache->size++;
}

Chaining vs Open Addressing

Aspect Chaining Open Addressing
Memory Pointer overhead per entry No extra pointers
Cache Poor (linked list traversal) Good (contiguous probing)
Load factor Can exceed 1.0 Must stay below ~0.7
Deletion Simple (unlink node) Complex (tombstones)
Implementation Easier Harder
Clustering No Yes (performance degrades)

Project 6: Binary Search Tree & Balancing

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go, Java
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Trees / Self-Balancing
  • Software or Tool: BST with visualization, AVL Tree
  • Main Book: Introduction to Algorithms by CLRS

What you’ll build

A binary search tree with visualization, then extend it to a self-balancing AVL tree, demonstrating why balancing is crucial for performance.

Why it teaches this concept

Trees enable O(log n) operations when balanced:

  • Searching
  • Insertion
  • Deletion
  • Range queries
  • Ordered traversal

But an unbalanced tree degrades to O(n)—essentially a linked list. Understanding balancing is crucial for:

  • Database indexes (B-trees)
  • File systems
  • Memory allocators (red-black trees in Linux kernel)
  • Priority scheduling

Core challenges you’ll face

  • BST property (left < node < right) → maps to binary search
  • Tree traversal (in-order, pre-order, post-order) → maps to recursive thinking
  • Deletion (three cases: leaf, one child, two children) → maps to pointer manipulation
  • Balancing (rotations to maintain height) → maps to self-balancing

Key Concepts

Concept Resource
BST Operations Introduction to Algorithms Chapter 12 - CLRS
Tree Traversals Algorithms, Fourth Edition Chapter 3.2 - Sedgewick
AVL Trees Introduction to Algorithms Chapter 13 (similar to RB trees) - CLRS
Rotations Visualgo.net BST visualization

Difficulty & Time

  • Difficulty: Advanced
  • Time estimate: 2 weeks
  • Prerequisites: Projects 1-4 completed, recursion comfort

Real World Outcome

$ ./bst_demo

=== Binary Search Tree Demo ===
Inserting: 50, 30, 70, 20, 40, 60, 80

Tree visualization:
         50
        /  \
      30    70
     /  \  /  \
   20  40 60  80

In-order traversal (sorted!): 20 30 40 50 60 70 80
Tree height: 3
Is balanced: YES

=== Unbalanced Insertion Demo ===
Inserting sorted: 10, 20, 30, 40, 50, 60, 70

Tree visualization:
10
  \
   20
     \
      30
        \
         40
           \
            50 ... (degenerates to linked list!)

Height: 7 (should be 3 for balanced)
Is balanced: NO
Search for 70: 7 comparisons (should be 3!)

=== AVL Tree (Self-Balancing) ===
Same insertions: 10, 20, 30, 40, 50, 60, 70

After each insert, rebalance:
Insert 10:     10
Insert 20:     10-20
Insert 30:     10-20-30 -> ROTATE LEFT
                 20
                /  \
              10   30
Insert 40:       20
                /  \
              10   30-40
Insert 50:       20           30
                /  \    ->   /  \
              10   30-40-50  20  40
                            /     \
                          10      50
...

Final AVL tree:
         40
        /  \
      20    60
     /  \  /  \
   10  30 50  70

Height: 3 (optimal!)
All operations: O(log n) guaranteed!

Learning Milestones

  1. BST insertion/search works → You understand the BST property
  2. In-order traversal is sorted → You understand tree ordering
  3. Unbalanced tree is slow → You understand why balancing matters
  4. AVL rotations fix balance → You understand self-balancing

Implementation

Basic BST

typedef struct TreeNode {
    int key;
    struct TreeNode* left;
    struct TreeNode* right;
    int height;  // For AVL
} TreeNode;

TreeNode* create_node(int key) {
    TreeNode* node = malloc(sizeof(TreeNode));
    node->key = key;
    node->left = NULL;
    node->right = NULL;
    node->height = 1;
    return node;
}

// O(h) where h is height
TreeNode* bst_insert(TreeNode* root, int key) {
    if (root == NULL) {
        return create_node(key);
    }

    if (key < root->key) {
        root->left = bst_insert(root->left, key);
    } else if (key > root->key) {
        root->right = bst_insert(root->right, key);
    }
    // If equal, ignore (no duplicates)

    return root;
}

// O(h) where h is height
TreeNode* bst_search(TreeNode* root, int key, int* comparisons) {
    (*comparisons)++;

    if (root == NULL || root->key == key) {
        return root;
    }

    if (key < root->key) {
        return bst_search(root->left, key, comparisons);
    } else {
        return bst_search(root->right, key, comparisons);
    }
}

// In-order: Left, Root, Right -> Sorted order!
void inorder_traversal(TreeNode* root) {
    if (root == NULL) return;
    inorder_traversal(root->left);
    printf("%d ", root->key);
    inorder_traversal(root->right);
}

// Find minimum (leftmost node)
TreeNode* find_min(TreeNode* root) {
    while (root->left) {
        root = root->left;
    }
    return root;
}

// Deletion (the tricky one!)
TreeNode* bst_delete(TreeNode* root, int key) {
    if (root == NULL) return NULL;

    if (key < root->key) {
        root->left = bst_delete(root->left, key);
    } else if (key > root->key) {
        root->right = bst_delete(root->right, key);
    } else {
        // Found the node to delete

        // Case 1: Leaf node
        if (!root->left && !root->right) {
            free(root);
            return NULL;
        }

        // Case 2: One child
        if (!root->left) {
            TreeNode* temp = root->right;
            free(root);
            return temp;
        }
        if (!root->right) {
            TreeNode* temp = root->left;
            free(root);
            return temp;
        }

        // Case 3: Two children
        // Find in-order successor (smallest in right subtree)
        TreeNode* successor = find_min(root->right);
        root->key = successor->key;
        root->right = bst_delete(root->right, successor->key);
    }

    return root;
}

Tree Visualization

void print_tree_helper(TreeNode* root, int space, int height) {
    if (root == NULL) return;

    space += height;

    print_tree_helper(root->right, space, height);

    printf("\n");
    for (int i = height; i < space; i++) {
        printf(" ");
    }
    printf("%d\n", root->key);

    print_tree_helper(root->left, space, height);
}

void print_tree(TreeNode* root) {
    print_tree_helper(root, 0, 5);
}

int tree_height(TreeNode* root) {
    if (root == NULL) return 0;
    int left_h = tree_height(root->left);
    int right_h = tree_height(root->right);
    return 1 + (left_h > right_h ? left_h : right_h);
}

int is_balanced(TreeNode* root) {
    if (root == NULL) return 1;

    int left_h = tree_height(root->left);
    int right_h = tree_height(root->right);

    if (abs(left_h - right_h) > 1) return 0;

    return is_balanced(root->left) && is_balanced(root->right);
}

AVL Tree (Self-Balancing)

int get_height(TreeNode* node) {
    return node ? node->height : 0;
}

int get_balance(TreeNode* node) {
    return node ? get_height(node->left) - get_height(node->right) : 0;
}

void update_height(TreeNode* node) {
    int left_h = get_height(node->left);
    int right_h = get_height(node->right);
    node->height = 1 + (left_h > right_h ? left_h : right_h);
}

// Right rotation
//       y                x
//      / \              / \
//     x   C    ->      A   y
//    / \                  / \
//   A   B                B   C
TreeNode* rotate_right(TreeNode* y) {
    TreeNode* x = y->left;
    TreeNode* B = x->right;

    x->right = y;
    y->left = B;

    update_height(y);
    update_height(x);

    return x;
}

// Left rotation
TreeNode* rotate_left(TreeNode* x) {
    TreeNode* y = x->right;
    TreeNode* B = y->left;

    y->left = x;
    x->right = B;

    update_height(x);
    update_height(y);

    return y;
}

TreeNode* avl_insert(TreeNode* root, int key) {
    // Standard BST insert
    if (root == NULL) {
        return create_node(key);
    }

    if (key < root->key) {
        root->left = avl_insert(root->left, key);
    } else if (key > root->key) {
        root->right = avl_insert(root->right, key);
    } else {
        return root;  // No duplicates
    }

    // Update height
    update_height(root);

    // Check balance
    int balance = get_balance(root);

    // Left Left Case
    if (balance > 1 && key < root->left->key) {
        printf("Left-Left case: rotating right\n");
        return rotate_right(root);
    }

    // Right Right Case
    if (balance < -1 && key > root->right->key) {
        printf("Right-Right case: rotating left\n");
        return rotate_left(root);
    }

    // Left Right Case
    if (balance > 1 && key > root->left->key) {
        printf("Left-Right case: double rotation\n");
        root->left = rotate_left(root->left);
        return rotate_right(root);
    }

    // Right Left Case
    if (balance < -1 && key < root->right->key) {
        printf("Right-Left case: double rotation\n");
        root->right = rotate_right(root->right);
        return rotate_left(root);
    }

    return root;
}

BST vs Hash Table

Operation Hash Table BST (balanced) BST (unbalanced)
Search O(1) avg O(log n) O(n)
Insert O(1) avg O(log n) O(n)
Delete O(1) avg O(log n) O(n)
Ordered traversal O(n log n) sort O(n) O(n)
Range query O(n) O(log n + k) O(n)
Min/Max O(n) O(log n) O(n)

Use BST when:

  • You need ordered data
  • Range queries are common
  • You need min/max quickly

Use Hash Table when:

  • Order doesn’t matter
  • Just need fast lookup by key

Project 7: Heap & Priority Queue

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Python, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Heaps / Priority Queues
  • Software or Tool: Task scheduler, Heapsort visualizer
  • Main Book: Introduction to Algorithms by CLRS

What you’ll build

A binary heap (min and max), priority queue, and heapsort implementation with visualization—plus a practical task scheduler that always runs the highest-priority task.

Why it teaches this concept

Heaps answer the question: What if I only care about the minimum (or maximum)?

A BST gives O(log n) for everything, but requires balancing. A heap gives:

  • O(1) find-min
  • O(log n) insert
  • O(log n) extract-min

And it’s simpler to implement (just an array!). Heaps are used in:

  • Priority scheduling (OS, network packets)
  • Dijkstra’s algorithm (shortest path)
  • Heap sort (in-place O(n log n))
  • Merge k sorted lists

Core challenges you’ll face

  • Heap property (parent ≤ children for min-heap) → maps to partial ordering
  • Array representation (implicit tree structure) → maps to memory efficiency
  • Heapify (restore heap property) → maps to bubble up/down
  • Build heap (O(n) from array) → maps to amortization

Key Concepts

Concept Resource
Heap Operations Introduction to Algorithms Chapter 6 - CLRS
Heapsort Algorithms, Fourth Edition Chapter 2.4 - Sedgewick
Priority Queues Introduction to Algorithms Chapter 6.5 - CLRS

Difficulty & Time

  • Difficulty: Intermediate-Advanced
  • Time estimate: 1 week
  • Prerequisites: Projects 1-4 completed, understanding of trees

Real World Outcome

$ ./heap_demo

=== Min Heap Demo ===
Inserting: 50, 30, 70, 20, 40, 60, 80

Heap as array: [20, 30, 60, 50, 40, 70, 80]

Heap as tree:
        20
       /  \
     30    60
    /  \  /  \
   50 40 70  80

Extract min: 20
After extract: [30, 40, 60, 50, 80, 70]

        30
       /  \
     40    60
    /  \  /
   50 80 70

=== Priority Queue Scheduler ===
Adding tasks:
  [Priority 5] "Low: check emails"
  [Priority 1] "URGENT: server down!"
  [Priority 3] "Medium: review PR"
  [Priority 2] "High: fix bug"

Processing (highest priority = lowest number):
  1. [Priority 1] "URGENT: server down!"
  2. [Priority 2] "High: fix bug"
  3. [Priority 3] "Medium: review PR"
  4. [Priority 5] "Low: check emails"

=== Heapsort Demo ===
Input:  [64, 34, 25, 12, 22, 11, 90]
Sorting...
  Build heap: [90, 34, 64, 12, 22, 11, 25]
  Extract 90: [64, 34, 25, 12, 22, 11 | 90]
  Extract 64: [34, 22, 25, 12, 11 | 64, 90]
  ...
Output: [11, 12, 22, 25, 34, 64, 90]

Heapsort: O(n log n), in-place, not stable

Learning Milestones

  1. Heap property maintained after insert → You understand bubble-up
  2. Extract-min works → You understand bubble-down
  3. Build-heap is O(n) → You understand the clever analysis
  4. Heapsort works in-place → You understand the algorithm

Implementation

Binary Heap (Array-based)

typedef struct {
    int* data;
    int size;
    int capacity;
    int is_min_heap;  // 1 for min-heap, 0 for max-heap
} Heap;

// Parent and child indices (0-indexed)
int parent(int i) { return (i - 1) / 2; }
int left_child(int i) { return 2 * i + 1; }
int right_child(int i) { return 2 * i + 2; }

Heap* heap_create(int capacity, int is_min_heap) {
    Heap* h = malloc(sizeof(Heap));
    h->data = malloc(capacity * sizeof(int));
    h->size = 0;
    h->capacity = capacity;
    h->is_min_heap = is_min_heap;
    return h;
}

int compare(Heap* h, int a, int b) {
    // Returns true if a should be above b
    return h->is_min_heap ? (a < b) : (a > b);
}

void swap(int* a, int* b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}

// O(log n) - bubble up
void heap_insert(Heap* h, int value) {
    if (h->size >= h->capacity) {
        h->capacity *= 2;
        h->data = realloc(h->data, h->capacity * sizeof(int));
    }

    // Insert at end
    int i = h->size++;
    h->data[i] = value;

    // Bubble up
    while (i > 0 && compare(h, h->data[i], h->data[parent(i)])) {
        swap(&h->data[i], &h->data[parent(i)]);
        i = parent(i);
    }
}

// O(log n) - bubble down
void heapify_down(Heap* h, int i) {
    int best = i;
    int left = left_child(i);
    int right = right_child(i);

    if (left < h->size && compare(h, h->data[left], h->data[best])) {
        best = left;
    }
    if (right < h->size && compare(h, h->data[right], h->data[best])) {
        best = right;
    }

    if (best != i) {
        swap(&h->data[i], &h->data[best]);
        heapify_down(h, best);
    }
}

// O(1)
int heap_peek(Heap* h) {
    if (h->size == 0) {
        printf("Heap is empty!\n");
        return -1;
    }
    return h->data[0];
}

// O(log n)
int heap_extract(Heap* h) {
    if (h->size == 0) {
        printf("Heap is empty!\n");
        return -1;
    }

    int root = h->data[0];
    h->data[0] = h->data[--h->size];
    heapify_down(h, 0);

    return root;
}

// O(n) - not O(n log n)!
void build_heap(Heap* h, int* arr, int n) {
    h->size = n;
    for (int i = 0; i < n; i++) {
        h->data[i] = arr[i];
    }

    // Start from last non-leaf and heapify down
    for (int i = n / 2 - 1; i >= 0; i--) {
        heapify_down(h, i);
    }
}

Heap Visualization

void heap_print_tree(Heap* h) {
    if (h->size == 0) {
        printf("(empty)\n");
        return;
    }

    int levels = 0;
    int temp = h->size;
    while (temp > 0) {
        levels++;
        temp /= 2;
    }

    int index = 0;
    for (int level = 0; level < levels; level++) {
        int nodes_in_level = 1 << level;
        int spaces = (1 << (levels - level)) - 1;

        // Print leading spaces
        for (int s = 0; s < spaces; s++) printf("  ");

        // Print nodes
        for (int n = 0; n < nodes_in_level && index < h->size; n++) {
            printf("%2d", h->data[index++]);
            // Print spaces between nodes
            for (int s = 0; s < 2 * spaces + 1; s++) printf("  ");
        }
        printf("\n");
    }
}

Heapsort

void heapsort(int* arr, int n) {
    // Build max-heap
    Heap h;
    h.data = arr;
    h.size = n;
    h.capacity = n;
    h.is_min_heap = 0;  // Max heap for ascending sort

    // Build heap: O(n)
    for (int i = n / 2 - 1; i >= 0; i--) {
        heapify_down(&h, i);
    }

    // Extract max n times: O(n log n)
    for (int i = n - 1; i > 0; i--) {
        swap(&arr[0], &arr[i]);
        h.size--;
        heapify_down(&h, 0);
    }
}

Priority Queue Task Scheduler

typedef struct {
    int priority;
    char description[128];
} Task;

typedef struct {
    Task* tasks;
    int size;
    int capacity;
} PriorityQueue;

void pq_insert(PriorityQueue* pq, int priority, const char* desc) {
    if (pq->size >= pq->capacity) {
        pq->capacity *= 2;
        pq->tasks = realloc(pq->tasks, pq->capacity * sizeof(Task));
    }

    int i = pq->size++;
    pq->tasks[i].priority = priority;
    strcpy(pq->tasks[i].description, desc);

    // Bubble up (lower number = higher priority)
    while (i > 0 && pq->tasks[i].priority < pq->tasks[(i-1)/2].priority) {
        Task temp = pq->tasks[i];
        pq->tasks[i] = pq->tasks[(i-1)/2];
        pq->tasks[(i-1)/2] = temp;
        i = (i - 1) / 2;
    }
}

Task pq_extract(PriorityQueue* pq) {
    Task highest = pq->tasks[0];
    pq->tasks[0] = pq->tasks[--pq->size];

    // Bubble down
    int i = 0;
    while (1) {
        int best = i;
        int left = 2 * i + 1;
        int right = 2 * i + 2;

        if (left < pq->size &&
            pq->tasks[left].priority < pq->tasks[best].priority) {
            best = left;
        }
        if (right < pq->size &&
            pq->tasks[right].priority < pq->tasks[best].priority) {
            best = right;
        }

        if (best == i) break;

        Task temp = pq->tasks[i];
        pq->tasks[i] = pq->tasks[best];
        pq->tasks[best] = temp;
        i = best;
    }

    return highest;
}

Heap vs BST vs Sorted Array

Operation Heap BST (balanced) Sorted Array
Find min/max O(1) O(log n) O(1)
Insert O(log n) O(log n) O(n)
Extract min/max O(log n) O(log n) O(1) or O(n)
Build from array O(n) O(n log n) O(n log n)
Find arbitrary O(n) O(log n) O(log n)
Memory Minimal (array) Pointers overhead Minimal

Use Heap when:

  • You only need min or max, not both
  • Priority queue operations
  • Heapsort (in-place)

Project 8: Graph Algorithms Visualizer

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Python, JavaScript
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Graphs / Algorithms
  • Software or Tool: Terminal/ncurses visualizer
  • Main Book: Introduction to Algorithms by CLRS

What you’ll build

A graph library with visualization that demonstrates BFS, DFS, Dijkstra’s shortest path, and topological sort—showing how graphs model real-world relationships.

Why it teaches this concept

Graphs are everywhere:

  • Social networks (people and friendships)
  • Maps (cities and roads)
  • The internet (pages and links)
  • Dependencies (packages, tasks)
  • Neural networks

Understanding graphs unlocks:

  • Shortest path (GPS navigation)
  • Social network analysis
  • Dependency resolution (npm, cargo)
  • Web crawling
  • Recommendation systems

Core challenges you’ll face

  • Graph representation (adjacency list vs matrix) → maps to space/time trade-off
  • BFS (shortest path in unweighted graphs) → maps to level-order
  • DFS (explore deeply, backtrack) → maps to stack/recursion
  • Dijkstra’s (shortest path with weights) → maps to greedy + heap
  • Topological sort (ordering with dependencies) → maps to DAGs

Resources for Key Challenges

Key Concepts

Concept Resource
Graph Representations Introduction to Algorithms Chapter 22 - CLRS
BFS/DFS Introduction to Algorithms Chapter 22.2-22.3 - CLRS
Dijkstra’s Algorithm Introduction to Algorithms Chapter 24.3 - CLRS
Topological Sort Introduction to Algorithms Chapter 22.4 - CLRS

Difficulty & Time

  • Difficulty: Advanced
  • Time estimate: 2 weeks
  • Prerequisites: Projects 1-7 completed, especially heap

Real World Outcome

$ ./graph_demo

=== Graph Representation ===
Creating a city road network...

Adjacency List:
  NYC -> [Boston(215), Philadelphia(97)]
  Boston -> [NYC(215), Portland(108)]
  Philadelphia -> [NYC(97), DC(140)]
  DC -> [Philadelphia(140), Richmond(109)]
  Portland -> [Boston(108)]
  Richmond -> [DC(109)]

Adjacency Matrix:
           NYC  BOS  PHL  DC   PRT  RCH
    NYC     0   215   97   0    0    0
    BOS   215    0    0    0   108   0
    PHL    97    0    0   140   0    0
    DC      0    0   140   0    0   109
    PRT     0   108   0    0    0    0
    RCH     0    0    0   109   0    0

=== BFS: Shortest Path (unweighted) ===
From NYC, how many hops to each city?

         NYC
         /  \
      BOS   PHL
       |      \
     PRT      DC
               |
             RCH

Distances: NYC(0) BOS(1) PHL(1) DC(2) PRT(2) RCH(3)

=== DFS: Explore Deeply ===
Starting from NYC...
Visit order: NYC -> Boston -> Portland -> (backtrack) ->
             Philadelphia -> DC -> Richmond

=== Dijkstra: Shortest Path (weighted) ===
Shortest path from NYC to Richmond:

Step 1: Process NYC, update neighbors
  Boston: ∞ -> 215, Philadelphia: ∞ -> 97
Step 2: Process Philadelphia (closest), update neighbors
  DC: ∞ -> 237
Step 3: Process Boston, update neighbors
  Portland: ∞ -> 323
Step 4: Process DC, update neighbors
  Richmond: ∞ -> 346
Step 5: Process Portland (no new paths)
Step 6: Process Richmond (destination!)

Shortest path NYC -> Richmond: 346 miles
Route: NYC -> Philadelphia -> DC -> Richmond

=== Topological Sort: Task Dependencies ===
Build system dependencies:
  main.c depends on: utils.h, config.h
  utils.c depends on: utils.h
  config.c depends on: config.h
  program depends on: main.o, utils.o, config.o

Valid build order:
  1. config.h
  2. utils.h
  3. config.c -> config.o
  4. utils.c -> utils.o
  5. main.c -> main.o
  6. program

Learning Milestones

  1. Can create graphs both ways → You understand representation trade-offs
  2. BFS finds shortest unweighted path → You understand level-order
  3. DFS explores deeply → You understand backtracking
  4. Dijkstra’s with heap works → You understand greedy algorithms
  5. Topological sort orders dependencies → You understand DAGs

Implementation

Graph Representation

// Adjacency List (better for sparse graphs)
typedef struct Edge {
    int destination;
    int weight;
    struct Edge* next;
} Edge;

typedef struct {
    Edge** adj_list;  // Array of linked lists
    int num_vertices;
    int is_directed;
} Graph;

Graph* graph_create(int vertices, int is_directed) {
    Graph* g = malloc(sizeof(Graph));
    g->num_vertices = vertices;
    g->is_directed = is_directed;
    g->adj_list = calloc(vertices, sizeof(Edge*));
    return g;
}

void graph_add_edge(Graph* g, int src, int dest, int weight) {
    Edge* edge = malloc(sizeof(Edge));
    edge->destination = dest;
    edge->weight = weight;
    edge->next = g->adj_list[src];
    g->adj_list[src] = edge;

    if (!g->is_directed) {
        Edge* reverse = malloc(sizeof(Edge));
        reverse->destination = src;
        reverse->weight = weight;
        reverse->next = g->adj_list[dest];
        g->adj_list[dest] = reverse;
    }
}
void bfs(Graph* g, int start) {
    int* visited = calloc(g->num_vertices, sizeof(int));
    int* distance = malloc(g->num_vertices * sizeof(int));
    for (int i = 0; i < g->num_vertices; i++) {
        distance[i] = -1;
    }

    // Queue
    int* queue = malloc(g->num_vertices * sizeof(int));
    int front = 0, rear = 0;

    visited[start] = 1;
    distance[start] = 0;
    queue[rear++] = start;

    printf("BFS from vertex %d:\n", start);

    while (front < rear) {
        int current = queue[front++];
        printf("  Visit %d (distance %d)\n", current, distance[current]);

        Edge* edge = g->adj_list[current];
        while (edge) {
            if (!visited[edge->destination]) {
                visited[edge->destination] = 1;
                distance[edge->destination] = distance[current] + 1;
                queue[rear++] = edge->destination;
            }
            edge = edge->next;
        }
    }

    free(visited);
    free(distance);
    free(queue);
}
void dfs_helper(Graph* g, int vertex, int* visited) {
    visited[vertex] = 1;
    printf("  Visit %d\n", vertex);

    Edge* edge = g->adj_list[vertex];
    while (edge) {
        if (!visited[edge->destination]) {
            dfs_helper(g, edge->destination, visited);
        }
        edge = edge->next;
    }

    printf("  Backtrack from %d\n", vertex);
}

void dfs(Graph* g, int start) {
    int* visited = calloc(g->num_vertices, sizeof(int));
    printf("DFS from vertex %d:\n", start);
    dfs_helper(g, start, visited);
    free(visited);
}

Dijkstra’s Algorithm (with Min-Heap)

typedef struct {
    int vertex;
    int distance;
} HeapNode;

void dijkstra(Graph* g, int start, int* dist, int* prev) {
    // Initialize distances
    for (int i = 0; i < g->num_vertices; i++) {
        dist[i] = INT_MAX;
        prev[i] = -1;
    }
    dist[start] = 0;

    // Min-heap priority queue
    HeapNode* heap = malloc(g->num_vertices * sizeof(HeapNode));
    int heap_size = 0;

    // Insert start
    heap[heap_size++] = (HeapNode){start, 0};

    int* processed = calloc(g->num_vertices, sizeof(int));

    while (heap_size > 0) {
        // Extract min (simplified - should use proper heap)
        int min_idx = 0;
        for (int i = 1; i < heap_size; i++) {
            if (heap[i].distance < heap[min_idx].distance) {
                min_idx = i;
            }
        }

        HeapNode current = heap[min_idx];
        heap[min_idx] = heap[--heap_size];

        if (processed[current.vertex]) continue;
        processed[current.vertex] = 1;

        printf("Process vertex %d (distance %d)\n",
               current.vertex, current.distance);

        // Update neighbors
        Edge* edge = g->adj_list[current.vertex];
        while (edge) {
            int new_dist = dist[current.vertex] + edge->weight;
            if (new_dist < dist[edge->destination]) {
                dist[edge->destination] = new_dist;
                prev[edge->destination] = current.vertex;
                heap[heap_size++] = (HeapNode){edge->destination, new_dist};
                printf("  Update %d: distance = %d\n",
                       edge->destination, new_dist);
            }
            edge = edge->next;
        }
    }

    free(heap);
    free(processed);
}

void print_path(int* prev, int dest) {
    if (prev[dest] != -1) {
        print_path(prev, prev[dest]);
        printf(" -> ");
    }
    printf("%d", dest);
}

Topological Sort (Kahn’s Algorithm)

void topological_sort(Graph* g) {
    int* in_degree = calloc(g->num_vertices, sizeof(int));

    // Calculate in-degrees
    for (int v = 0; v < g->num_vertices; v++) {
        Edge* edge = g->adj_list[v];
        while (edge) {
            in_degree[edge->destination]++;
            edge = edge->next;
        }
    }

    // Queue of vertices with in-degree 0
    int* queue = malloc(g->num_vertices * sizeof(int));
    int front = 0, rear = 0;

    for (int v = 0; v < g->num_vertices; v++) {
        if (in_degree[v] == 0) {
            queue[rear++] = v;
        }
    }

    printf("Topological order: ");
    int count = 0;

    while (front < rear) {
        int current = queue[front++];
        printf("%d ", current);
        count++;

        Edge* edge = g->adj_list[current];
        while (edge) {
            in_degree[edge->destination]--;
            if (in_degree[edge->destination] == 0) {
                queue[rear++] = edge->destination;
            }
            edge = edge->next;
        }
    }

    if (count != g->num_vertices) {
        printf("\nCycle detected! Not a DAG.\n");
    }
    printf("\n");

    free(in_degree);
    free(queue);
}

Adjacency List vs Adjacency Matrix

Aspect Adjacency List Adjacency Matrix
Space O(V + E) O(V²)
Check if edge exists O(degree) O(1)
Find all neighbors O(degree) O(V)
Add edge O(1) O(1)
Best for Sparse graphs Dense graphs

Project 9: Mini Database Engine (Capstone)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 5: Master
  • Knowledge Area: Databases / Systems Programming
  • Software or Tool: SQL-like query engine
  • Main Book: Database Internals by Alex Petrov

What you’ll build

A mini database that combines everything you’ve learned: B-tree indexing, hash tables for lookups, heaps for sorting, and more—with a simple SQL-like query language.

Why this is the capstone

Real databases are the ultimate data structure application:

  • Hash indexes for equality lookups
  • B-trees for range queries
  • Heaps for sorting and top-k
  • Graphs for query planning

Building a mini database proves you understand when to use each structure.

Core challenges you’ll face

  • Storage engine (pages, buffer pool) → maps to memory management
  • B-tree index (balanced tree for disk) → maps to tree structures
  • Hash index (fast equality lookup) → maps to hash tables
  • Query execution (scans, filters, joins) → maps to algorithm design
  • Query parsing (SQL-like syntax) → maps to stack machines

Key Concepts

Concept Resource
B-Trees Introduction to Algorithms Chapter 18 - CLRS
Database Storage Database Internals Chapters 1-4 - Alex Petrov
Query Processing Database System Concepts Chapter 12 - Silberschatz
Buffer Management Database Internals Chapter 5 - Alex Petrov

Difficulty & Time

  • Difficulty: Expert/Master
  • Time estimate: 4-6 weeks
  • Prerequisites: All previous projects completed

Real World Outcome

$ ./minidb

MiniDB v1.0 - A learning database engine
Type 'help' for commands.

minidb> CREATE TABLE users (id INT, name TEXT, age INT);
Table 'users' created.

minidb> CREATE INDEX idx_id ON users(id) USING HASH;
Hash index 'idx_id' created on users(id).

minidb> CREATE INDEX idx_age ON users(age) USING BTREE;
B-tree index 'idx_age' created on users(age).

minidb> INSERT INTO users VALUES (1, 'Alice', 30);
INSERT INTO users VALUES (2, 'Bob', 25);
INSERT INTO users VALUES (3, 'Charlie', 35);
Inserted 3 rows.

minidb> SELECT * FROM users;
+----+---------+-----+
| id | name    | age |
+----+---------+-----+
| 1  | Alice   | 30  |
| 2  | Bob     | 25  |
| 3  | Charlie | 35  |
+----+---------+-----+
3 rows returned (full table scan)

minidb> SELECT * FROM users WHERE id = 2;
+----+------+-----+
| id | name | age |
+----+------+-----+
| 2  | Bob  | 25  |
+----+------+-----+
1 row returned (used hash index 'idx_id')

minidb> SELECT * FROM users WHERE age > 28 ORDER BY age;
+----+---------+-----+
| id | name    | age |
+----+---------+-----+
| 1  | Alice   | 30  |
| 3  | Charlie | 35  |
+----+---------+-----+
2 rows returned (used B-tree index 'idx_age' for range scan)

minidb> EXPLAIN SELECT * FROM users WHERE age > 28;
Query Plan:
  -> Index Range Scan on idx_age (age > 28)
     -> Fetch rows from table 'users'
        -> Return results

minidb> .stats
Database Statistics:
  Tables: 1
  Total rows: 3
  Hash indexes: 1 (idx_id)
  B-tree indexes: 1 (idx_age)
  Buffer pool: 16 pages, 2 dirty

minidb> .dump users
Page 0: [1|Alice|30] [2|Bob|25] [3|Charlie|35]
B-tree idx_age:
        [30]
       /    \
    [25]    [35]

Learning Milestones

  1. Can store and retrieve rows → You understand storage
  2. Hash index speeds up equality queries → You understand hash tables in practice
  3. B-tree index enables range queries → You understand balanced trees in practice
  4. Query planner chooses correct index → You understand optimization

Architecture Overview

┌────────────────────────────────────────────────────────────────┐
│                         MiniDB                                  │
├────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐ │
│  │ SQL Parser   │  │ Query        │  │ Query Executor       │ │
│  │              │→ │ Planner      │→ │                      │ │
│  │ (Stack-based)│  │ (Choose idx) │  │ (Scan/Filter/Sort)  │ │
│  └──────────────┘  └──────────────┘  └──────────────────────┘ │
│                                              │                  │
│                                              ▼                  │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │                    Index Manager                          │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐  │ │
│  │  │ Hash Index  │  │ B-Tree Index│  │ (future: R-tree)│  │ │
│  │  │  O(1) ==    │  │ O(log n) <> │  │                 │  │ │
│  │  └─────────────┘  └─────────────┘  └─────────────────┘  │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                              │                  │
│                                              ▼                  │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │                   Storage Engine                          │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐  │ │
│  │  │ Buffer Pool │  │ Page Manager│  │ Heap File       │  │ │
│  │  │ (LRU Cache) │  │ (Slotted)   │  │ (Row Storage)   │  │ │
│  │  └─────────────┘  └─────────────┘  └─────────────────┘  │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                              │                  │
│                                              ▼                  │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │                    Disk (Files)                           │ │
│  │  [users.dat] [users_idx_id.hash] [users_idx_age.btree]   │ │
│  └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘

Implementation Skeleton

// Row storage
typedef struct {
    int id;
    char name[64];
    int age;
} Row;

// Table metadata
typedef struct {
    char name[64];
    int num_rows;
    Row* rows;  // Simple in-memory for now
    HashTable* hash_indexes;
    BTree* btree_indexes;
} Table;

// Database
typedef struct {
    Table* tables[MAX_TABLES];
    int num_tables;
} Database;

// Query result
typedef struct {
    Row* rows;
    int count;
    char index_used[64];
} QueryResult;

// Execute SELECT with index selection
QueryResult* execute_select(Table* table, const char* column,
                           const char* op, int value) {
    QueryResult* result = malloc(sizeof(QueryResult));
    result->rows = malloc(table->num_rows * sizeof(Row));
    result->count = 0;

    // Check for applicable index
    if (strcmp(op, "=") == 0 && has_hash_index(table, column)) {
        // Use hash index for O(1) lookup
        int* row_ids = hash_index_lookup(table, column, value);
        strcpy(result->index_used, "hash");
        // Fetch rows...
    }
    else if ((strcmp(op, ">") == 0 || strcmp(op, "<") == 0) &&
             has_btree_index(table, column)) {
        // Use B-tree for range scan
        int* row_ids = btree_range_scan(table, column, op, value);
        strcpy(result->index_used, "btree");
        // Fetch rows...
    }
    else {
        // Full table scan
        strcpy(result->index_used, "none (full scan)");
        for (int i = 0; i < table->num_rows; i++) {
            if (matches_condition(&table->rows[i], column, op, value)) {
                result->rows[result->count++] = table->rows[i];
            }
        }
    }

    return result;
}

Essential Resources Summary

Primary Books (in order)

  1. Introduction to Algorithms (CLRS) - The bible of algorithms and data structures
  2. Algorithms, Fourth Edition by Sedgewick - More accessible, excellent visualizations
  3. Computer Systems: A Programmer’s Perspective - For understanding memory
  4. Grokking Algorithms - Visual, beginner-friendly
  5. Database Internals by Alex Petrov - For the capstone project

From Your Library

  • C Programming: A Modern Approach by K.N. King - For implementation
  • The C Programming Language by K&R - Reference
  • Data Structures the Fun Way by Jeremy Kubica - Visual learning
  • Grokking Data Structures by Marcello La Rocca - Beginner-friendly
  • Advanced Algorithms and Data Structures by Marcello La Rocca - Advanced topics

Online Resources


The Big Picture: When to Use What

I Need To… Use This Why
Access by index Array O(1) random access
Insert/delete frequently at ends Linked List O(1) with pointers
LIFO (last in, first out) Stack O(1) push/pop
FIFO (first in, first out) Queue O(1) enqueue/dequeue
Fast lookup by key Hash Table O(1) average
Ordered data + fast search BST / B-Tree O(log n) everything
Fast min/max only Heap O(1) peek, O(log n) extract
Model relationships Graph Represents connections
Combine multiple needs Custom e.g., LRU = Hash + DLL

Appendix: Complexity Cheat Sheet

Data Structure Access Search Insert Delete Space
Array O(1) O(n) O(n) O(n) O(n)
Dynamic Array O(1) O(n) O(1)* O(n) O(n)
Linked List O(n) O(n) O(1)† O(1)† O(n)
Stack O(n) O(n) O(1) O(1) O(n)
Queue O(n) O(n) O(1) O(1) O(n)
Hash Table N/A O(1)* O(1)* O(1)* O(n)
BST (balanced) O(log n) O(log n) O(log n) O(log n) O(n)
BST (unbalanced) O(n) O(n) O(n) O(n) O(n)
Heap O(1)‡ O(n) O(log n) O(log n) O(n)
Graph (adj list) O(V) O(V) O(1) O(E) O(V+E)

*amortized †if you have the pointer ‡min/max only