DATA STRUCTURES FROM FIRST PRINCIPLES
Learning Data Structures: From First Principles
Goal: Deeply understand what data structures are, why they exist, how they work at the memory level, and when to use each one—through hands-on projects that produce real outcomes.
Part 1: Why Do Data Structures Exist?
The Fundamental Problem
Every program needs to store and organize collections of data. But here’s the critical insight:
How you organize data determines how fast your program runs.
Consider this: You have 1 million user records. How do you store them?
| Operation | Unsorted Array | Sorted Array | Hash Table | Binary Search Tree |
|---|---|---|---|---|
| Find user by ID | O(n) = 1M checks | O(log n) = 20 checks | O(1) = 1 check | O(log n) = 20 checks |
| Add new user | O(1) = instant | O(n) = 1M shifts | O(1) = instant | O(log n) = 20 steps |
| Delete user | O(n) = 1M checks | O(n) = 1M shifts | O(1) = instant | O(log n) = 20 steps |
| List in order | O(n log n) = sort | O(n) = just read | O(n log n) = sort | O(n) = in-order walk |
There is no perfect data structure. Every choice is a trade-off between:
- Time (how fast are operations?)
- Space (how much memory?)
- Complexity (how hard to implement correctly?)
The Mental Model
Think of data structures as containers with rules:
┌─────────────────────────────────────────────────────────────────┐
│ DATA STRUCTURE │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ INTERFACE │ │
│ │ - What operations can I do? (insert, find, delete...) │ │
│ │ - How fast is each operation? (O(1), O(log n), O(n)) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ IMPLEMENTATION │ │
│ │ - How is data physically stored in memory? │ │
│ │ - How do the operations actually work? │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ MEMORY │ │
│ │ - Contiguous? (array) or Scattered? (linked) │ │
│ │ - Cache-friendly? (locality of reference) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
The Data Structure Family Tree
Data Structures
│
┌────────────────────┴────────────────────┐
│ │
Linear Non-Linear
│ │
┌────────┼────────┐ ┌──────────┼──────────┐
│ │ │ │ │ │
Array Linked Stack Tree Graph Hash
List Queue │ Table
│ ┌────────┼────────┐
┌──────┴──────┐ │ │ │
│ │ BST Heap Trie
Priority Deque │
Queue B-Tree
AVL/RB
Part 2: The Learning Path
Phase 1: Foundations (3-4 weeks)
- Project 1: Memory Arena & Dynamic Array
- Project 2: Linked List Toolkit
Phase 2: Abstract Data Types (2-3 weeks)
- Project 3: Stack Machine (Calculator/VM)
- Project 4: Queue Systems (Task Scheduler)
Phase 3: Fast Lookup (3-4 weeks)
- Project 5: Hash Table (Dictionary/Cache)
- Project 6: Binary Search Tree & Balancing
Phase 4: Specialized Structures (3-4 weeks)
- Project 7: Heap & Priority Queue
- Project 8: Graph Algorithms Visualizer
Phase 5: Capstone (4-6 weeks)
- Project 9: Mini Database Engine
Project Comparison Table
| Project | Difficulty | Time | Key Insight | Real-World Use |
|---|---|---|---|---|
| Memory Arena & Dynamic Array | ⭐⭐ | 1 week | Memory is just bytes | Every program uses arrays |
| Linked List Toolkit | ⭐⭐ | 1 week | Pointers connect scattered memory | Undo systems, playlists |
| Stack Machine | ⭐⭐⭐ | 1 week | LIFO enables recursion/backtracking | Compilers, calculators |
| Queue Systems | ⭐⭐⭐ | 1 week | FIFO enables fairness/ordering | Job schedulers, BFS |
| Hash Table | ⭐⭐⭐⭐ | 2 weeks | Trade space for O(1) lookup | Databases, caches, dictionaries |
| BST & Balancing | ⭐⭐⭐⭐ | 2 weeks | Ordering enables fast search | Databases, file systems |
| Heap & Priority Queue | ⭐⭐⭐ | 1 week | Partial ordering is cheaper | Schedulers, Dijkstra’s |
| Graph Visualizer | ⭐⭐⭐⭐ | 2 weeks | Relationships are everywhere | Maps, social networks, AI |
| Mini Database | ⭐⭐⭐⭐⭐ | 4-6 weeks | Combine everything | Real databases |
Project 1: Memory Arena & Dynamic Array
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Memory Management / Low-Level Programming
- Software or Tool: Custom memory allocator
- Main Book: Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron
What you’ll build
A memory arena (bump allocator) and a dynamic array (like C++’s std::vector or Python’s list) from scratch, visualizing memory layout as you add and remove elements.
Why it teaches data structures fundamentals
This is where everything starts. Before you can understand ANY data structure, you must understand:
- What is memory? (just a giant array of bytes)
- What is an address? (index into that array)
- What is allocation? (reserving bytes)
- What is contiguous storage? (elements side-by-side)
Most people use malloc() as a black box. By building your own allocator, you’ll see that memory is just bytes, and data structures are just ways of interpreting those bytes.
Core challenges you’ll face
- Memory as bytes (everything is just numbers at addresses) → maps to memory model
- Pointer arithmetic (address + offset = new address) → maps to array indexing
- Reallocation (grow array when full) → maps to amortized analysis
- Memory fragmentation (holes in memory) → maps to allocator design
- Cache locality (contiguous = fast) → maps to performance
Key Concepts
| Concept | Resource |
|---|---|
| Memory Layout | Computer Systems: A Programmer’s Perspective Chapter 9 - Bryant & O’Hallaron |
| Pointer Arithmetic | C Programming: A Modern Approach Chapter 12 - K.N. King |
| Dynamic Arrays | Introduction to Algorithms Chapter 17.4 (Amortized Analysis) - CLRS |
| Cache Performance | The Lost Art of Structure Packing - Eric S. Raymond |
Difficulty & Time
- Difficulty: Intermediate
- Time estimate: 1 week
- Prerequisites: Basic C, understanding of pointers
Real World Outcome
$ ./memory_demo
=== Memory Arena Demo ===
Arena created: 1MB at address 0x7f1234500000
Allocating 100 bytes... got 0x7f1234500000
Allocating 200 bytes... got 0x7f1234500064 (offset: 100)
Allocating 50 bytes... got 0x7f123450012C (offset: 300)
Memory map:
[0x7f1234500000] ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
^100 bytes ^200 bytes ^50 bytes ^free space
Arena reset! All allocations freed instantly (no individual free calls needed)
=== Dynamic Array Demo ===
Creating dynamic array with initial capacity 4...
Push 1: [1, _, _, _] capacity=4, size=1
Push 2: [1, 2, _, _] capacity=4, size=2
Push 3: [1, 2, 3, _] capacity=4, size=3
Push 4: [1, 2, 3, 4] capacity=4, size=4
Push 5: [1, 2, 3, 4, 5, _, _, _] capacity=8, size=5 ← GREW! (copied to new memory)
Memory addresses:
Before grow: data at 0x7f1234600000
After grow: data at 0x7f1234700000 (new location!)
Performance test: push 1,000,000 elements
Total time: 0.023s
Total reallocations: 20 (log2 of 1M)
Amortized cost per push: O(1) ✓
Random access test:
arr[0] = 1 (0.000001ms)
arr[500000] = ? (0.000001ms) ← Same speed! O(1) random access ✓
Learning Milestones
- Arena allocates bytes → You understand memory is just bytes
- Dynamic array grows → You understand reallocation and copying
- Growth is O(1) amortized → You understand why doubling works
- Random access is O(1) → You understand contiguous memory
Implementation
Memory Arena (Bump Allocator)
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint8_t* base; // Start of memory block
size_t size; // Total size
size_t offset; // Current position (bump pointer)
} Arena;
Arena* arena_create(size_t size) {
Arena* arena = malloc(sizeof(Arena));
arena->base = malloc(size);
arena->size = size;
arena->offset = 0;
printf("Arena created: %zu bytes at %p\n", size, arena->base);
return arena;
}
void* arena_alloc(Arena* arena, size_t size) {
// Align to 8 bytes for performance
size_t aligned_size = (size + 7) & ~7;
if (arena->offset + aligned_size > arena->size) {
printf("Arena out of memory!\n");
return NULL;
}
void* ptr = arena->base + arena->offset;
arena->offset += aligned_size;
printf("Allocated %zu bytes at %p (offset: %zu)\n",
size, ptr, arena->offset - aligned_size);
return ptr;
}
void arena_reset(Arena* arena) {
arena->offset = 0; // Instant "free" of everything!
printf("Arena reset! All memory available again.\n");
}
void arena_destroy(Arena* arena) {
free(arena->base);
free(arena);
}
// Visualize memory usage
void arena_visualize(Arena* arena) {
printf("\nMemory map (%zu / %zu bytes used):\n[", arena->offset, arena->size);
int blocks = 50;
int used = (arena->offset * blocks) / arena->size;
for (int i = 0; i < blocks; i++) {
printf("%s", i < used ? "█" : "░");
}
printf("]\n\n");
}
Dynamic Array
typedef struct {
int* data; // Pointer to array
size_t size; // Number of elements
size_t capacity; // Allocated space
} DynArray;
DynArray* dynarray_create(size_t initial_capacity) {
DynArray* arr = malloc(sizeof(DynArray));
arr->data = malloc(initial_capacity * sizeof(int));
arr->size = 0;
arr->capacity = initial_capacity;
printf("Created array with capacity %zu at %p\n",
initial_capacity, arr->data);
return arr;
}
void dynarray_push(DynArray* arr, int value) {
// Check if we need to grow
if (arr->size >= arr->capacity) {
size_t new_capacity = arr->capacity * 2; // Double!
int* new_data = malloc(new_capacity * sizeof(int));
// Copy old data to new location
memcpy(new_data, arr->data, arr->size * sizeof(int));
printf("GROW: %zu -> %zu (moved from %p to %p)\n",
arr->capacity, new_capacity, arr->data, new_data);
free(arr->data);
arr->data = new_data;
arr->capacity = new_capacity;
}
arr->data[arr->size++] = value;
}
int dynarray_get(DynArray* arr, size_t index) {
if (index >= arr->size) {
printf("Index out of bounds!\n");
return -1;
}
return arr->data[index]; // O(1) - just pointer + offset!
}
void dynarray_visualize(DynArray* arr) {
printf("[");
for (size_t i = 0; i < arr->capacity; i++) {
if (i < arr->size) {
printf("%d", arr->data[i]);
} else {
printf("_");
}
if (i < arr->capacity - 1) printf(", ");
}
printf("] size=%zu, capacity=%zu\n", arr->size, arr->capacity);
}
Why Arrays Are Cache-Friendly
ARRAY (contiguous memory):
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ ← All in one cache line (64 bytes)
└───┴───┴───┴───┴───┴───┴───┴───┘
↑
CPU fetches entire line, gets 8 elements for price of 1!
LINKED LIST (scattered memory):
┌───┐ ┌───┐ ┌───┐ ┌───┐
│ 1 │────►│ 2 │────►│ 3 │────►│ 4 │
└───┘ └───┘ └───┘ └───┘
↑ cache miss ↑ cache miss ↑ cache miss
Each access may require fetching a new cache line!
The Amortization Insight
Why does doubling capacity give O(1) amortized push?
Operations to push n elements:
n pushes (always)
+ copy 1 element (when growing from 1 to 2)
+ copy 2 elements (when growing from 2 to 4)
+ copy 4 elements (when growing from 4 to 8)
+ ...
+ copy n/2 elements (when growing from n/2 to n)
Total copies = 1 + 2 + 4 + ... + n/2 = n - 1
Total work = n pushes + (n-1) copies ≈ 2n = O(n)
Average work per push = O(n) / n = O(1) ✓
Project 2: Linked List Toolkit
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Pointers / Dynamic Memory
- Software or Tool: CLI list manipulation tool
- Main Book: C Programming: A Modern Approach by K.N. King
What you’ll build
A comprehensive linked list library (singly-linked, doubly-linked, circular) with visualization, plus a practical application: an undo/redo system for a text editor.
Why it teaches this concept
Linked lists are the opposite of arrays:
- Arrays: contiguous memory, random access, expensive insert/delete
- Linked lists: scattered memory, sequential access, cheap insert/delete
Understanding this trade-off is fundamental. You’ll also master pointers, which are essential for all complex data structures.
Core challenges you’ll face
- Pointer manipulation (next, prev, head, tail) → maps to indirection
- Edge cases (empty list, one element, head/tail operations) → maps to boundary conditions
- Memory management (malloc each node, free on remove) → maps to manual memory
- Traversal (must walk from head) → maps to O(n) access
- Doubly-linked (can go backwards) → maps to trade-offs
Key Concepts
| Concept | Resource |
|---|---|
| Pointer Fundamentals | C Programming: A Modern Approach Chapter 11 - K.N. King |
| Linked List Operations | Introduction to Algorithms Chapter 10.2 - CLRS |
| Memory Allocation | The C Programming Language Chapter 7 - K&R |
Difficulty & Time
- Difficulty: Intermediate
- Time estimate: 1 week
- Prerequisites: Project 1 completed, comfortable with pointers
Real World Outcome
$ ./linkedlist_demo
=== Singly Linked List ===
Insert 10 at head: [10] -> NULL
Insert 20 at head: [20] -> [10] -> NULL
Insert 30 at tail: [20] -> [10] -> [30] -> NULL
Insert 25 after 10: [20] -> [10] -> [25] -> [30] -> NULL
Memory layout:
Node@0x1000: data=20, next=0x1020
Node@0x1020: data=10, next=0x1040
Node@0x1040: data=25, next=0x1060
Node@0x1060: data=30, next=NULL
Delete 10: [20] -> [25] -> [30] -> NULL
(freed memory at 0x1020)
=== Doubly Linked List ===
[NULL <-> 20 <-> 25 <-> 30 <-> NULL]
Traverse forward: 20 -> 25 -> 30
Traverse backward: 30 -> 25 -> 20
=== Undo/Redo System Demo ===
Text: ""
> type "Hello"
Text: "Hello" [undo stack: "Hello"]
> type " World"
Text: "Hello World" [undo stack: "Hello", " World"]
> undo
Text: "Hello" [redo stack: " World"]
> undo
Text: "" [redo stack: " World", "Hello"]
> redo
Text: "Hello" [undo stack: "Hello"]
> type "!"
Text: "Hello!" [redo stack cleared!]
Learning Milestones
- Insert/delete at head is O(1) → You understand the advantage over arrays
- Find element is O(n) → You understand the disadvantage
- Doubly-linked enables O(1) delete given node → You understand the trade-off
- Undo/redo works → You understand why linked lists exist in real systems
Implementation
Singly Linked List
typedef struct Node {
int data;
struct Node* next;
} Node;
typedef struct {
Node* head;
Node* tail;
size_t size;
} SinglyLinkedList;
// O(1) - just update head
void sll_push_front(SinglyLinkedList* list, int data) {
Node* new_node = malloc(sizeof(Node));
new_node->data = data;
new_node->next = list->head;
list->head = new_node;
if (list->tail == NULL) {
list->tail = new_node;
}
list->size++;
}
// O(1) - just update tail
void sll_push_back(SinglyLinkedList* list, int data) {
Node* new_node = malloc(sizeof(Node));
new_node->data = data;
new_node->next = NULL;
if (list->tail) {
list->tail->next = new_node;
} else {
list->head = new_node;
}
list->tail = new_node;
list->size++;
}
// O(n) - must traverse to find
Node* sll_find(SinglyLinkedList* list, int data) {
Node* current = list->head;
while (current) {
if (current->data == data) {
return current;
}
current = current->next;
}
return NULL;
}
// O(n) - must find previous node
void sll_delete(SinglyLinkedList* list, int data) {
if (!list->head) return;
// Special case: deleting head
if (list->head->data == data) {
Node* to_delete = list->head;
list->head = list->head->next;
if (list->tail == to_delete) {
list->tail = NULL;
}
free(to_delete);
list->size--;
return;
}
// Find previous node
Node* prev = list->head;
while (prev->next && prev->next->data != data) {
prev = prev->next;
}
if (prev->next) {
Node* to_delete = prev->next;
prev->next = to_delete->next;
if (list->tail == to_delete) {
list->tail = prev;
}
free(to_delete);
list->size--;
}
}
void sll_print(SinglyLinkedList* list) {
Node* current = list->head;
while (current) {
printf("[%d]", current->data);
if (current->next) printf(" -> ");
current = current->next;
}
printf(" -> NULL\n");
}
Doubly Linked List
typedef struct DNode {
int data;
struct DNode* prev;
struct DNode* next;
} DNode;
typedef struct {
DNode* head;
DNode* tail;
size_t size;
} DoublyLinkedList;
// O(1) delete when you have the node pointer!
void dll_delete_node(DoublyLinkedList* list, DNode* node) {
if (node->prev) {
node->prev->next = node->next;
} else {
list->head = node->next;
}
if (node->next) {
node->next->prev = node->prev;
} else {
list->tail = node->prev;
}
free(node);
list->size--;
}
Undo/Redo System (Practical Application)
typedef struct Action {
char* text; // The text that was added
struct Action* next;
} Action;
typedef struct {
char buffer[1024]; // Current text
Action* undo_stack; // Stack of actions to undo
Action* redo_stack; // Stack of actions to redo
} TextEditor;
void editor_type(TextEditor* editor, const char* text) {
// Add to buffer
strcat(editor->buffer, text);
// Push to undo stack
Action* action = malloc(sizeof(Action));
action->text = strdup(text);
action->next = editor->undo_stack;
editor->undo_stack = action;
// Clear redo stack (can't redo after new action)
while (editor->redo_stack) {
Action* to_free = editor->redo_stack;
editor->redo_stack = editor->redo_stack->next;
free(to_free->text);
free(to_free);
}
}
void editor_undo(TextEditor* editor) {
if (!editor->undo_stack) {
printf("Nothing to undo!\n");
return;
}
// Pop from undo stack
Action* action = editor->undo_stack;
editor->undo_stack = action->next;
// Remove text from buffer
size_t len = strlen(action->text);
editor->buffer[strlen(editor->buffer) - len] = '\0';
// Push to redo stack
action->next = editor->redo_stack;
editor->redo_stack = action;
}
void editor_redo(TextEditor* editor) {
if (!editor->redo_stack) {
printf("Nothing to redo!\n");
return;
}
// Pop from redo stack
Action* action = editor->redo_stack;
editor->redo_stack = action->next;
// Add text back to buffer
strcat(editor->buffer, action->text);
// Push to undo stack
action->next = editor->undo_stack;
editor->undo_stack = action;
}
Array vs Linked List: The Trade-off
| Operation | Array | Linked List | Winner |
|---|---|---|---|
| Access by index | O(1) | O(n) | Array |
| Insert at beginning | O(n) | O(1) | Linked List |
| Insert at end | O(1) amortized | O(1) with tail | Tie |
| Insert in middle | O(n) | O(1) if have node | Linked List |
| Delete | O(n) | O(1) if have node | Linked List |
| Memory overhead | Low | High (pointers) | Array |
| Cache performance | Excellent | Poor | Array |
When to use Linked Lists:
- Frequent insertions/deletions at arbitrary positions
- When you have pointers to nodes (e.g., LRU cache)
- When you need O(1) worst-case insert (not amortized)
When to use Arrays:
- Need random access
- Iterating through all elements (cache-friendly)
- Memory is tight (no pointer overhead)
Project 3: Stack Machine (Calculator & Mini VM)
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Abstract Data Types / Language Implementation
- Software or Tool: RPN Calculator, Bytecode VM
- Main Book: Writing a C Compiler by Nora Sandler
What you’ll build
A Reverse Polish Notation (RPN) calculator and a simple stack-based virtual machine that executes bytecode—demonstrating why stacks are fundamental to computing.
Why it teaches this concept
The stack is not just a data structure—it’s how computers work:
- Function calls use a stack (call stack)
- Expression evaluation uses a stack
- Undo operations use a stack
- Backtracking algorithms use a stack
- Your CPU has a hardware stack
Building a stack machine shows you why LIFO (Last In, First Out) is so fundamental.
Core challenges you’ll face
- LIFO semantics (last in, first out) → maps to stack discipline
- Expression parsing (infix to postfix) → maps to operator precedence
- VM execution (fetch-decode-execute) → maps to computation model
- Stack underflow/overflow → maps to error handling
Key Concepts
| Concept | Resource |
|---|---|
| Stack ADT | Introduction to Algorithms Chapter 10.1 - CLRS |
| Expression Evaluation | Algorithms, Fourth Edition Chapter 1.3 - Sedgewick |
| Stack Machines | Writing a C Compiler Chapter 3 - Nora Sandler |
| Shunting Yard Algorithm | Dijkstra’s original paper or Wikipedia |
Difficulty & Time
- Difficulty: Intermediate-Advanced
- Time estimate: 1 week
- Prerequisites: Projects 1-2 completed
Real World Outcome
$ ./rpn_calc
RPN Calculator (type 'quit' to exit)
> 3 4 +
Stack: [7]
Result: 7
> 5 *
Stack: [35]
Result: 35
> 2 3 4 + *
Stack: [35, 14]
Explanation: 2 * (3 + 4) = 14
Result: 14
> clear
Stack: []
=== Infix to RPN Demo ===
Infix: (3 + 4) * 5
Postfix: 3 4 + 5 *
Result: 35
=== Mini VM Demo ===
Bytecode: PUSH 10, PUSH 20, ADD, PUSH 5, MUL, PRINT
Executing...
PUSH 10 Stack: [10]
PUSH 20 Stack: [10, 20]
ADD Stack: [30]
PUSH 5 Stack: [30, 5]
MUL Stack: [150]
PRINT Output: 150
Learning Milestones
- RPN calculator works → You understand postfix evaluation
- Infix to postfix conversion works → You understand the shunting-yard algorithm
- VM executes bytecode → You understand stack machines
- You see the call stack in action → You understand how functions work
Implementation
Stack Data Structure
#define STACK_MAX 256
typedef struct {
int items[STACK_MAX];
int top;
} Stack;
void stack_init(Stack* s) {
s->top = -1;
}
int stack_is_empty(Stack* s) {
return s->top == -1;
}
int stack_is_full(Stack* s) {
return s->top == STACK_MAX - 1;
}
void stack_push(Stack* s, int value) {
if (stack_is_full(s)) {
printf("Stack overflow!\n");
exit(1);
}
s->items[++s->top] = value;
}
int stack_pop(Stack* s) {
if (stack_is_empty(s)) {
printf("Stack underflow!\n");
exit(1);
}
return s->items[s->top--];
}
int stack_peek(Stack* s) {
if (stack_is_empty(s)) {
printf("Stack empty!\n");
exit(1);
}
return s->items[s->top];
}
void stack_print(Stack* s) {
printf("Stack: [");
for (int i = 0; i <= s->top; i++) {
printf("%d", s->items[i]);
if (i < s->top) printf(", ");
}
printf("]\n");
}
RPN Calculator
int evaluate_rpn(const char* expression) {
Stack stack;
stack_init(&stack);
char* expr = strdup(expression);
char* token = strtok(expr, " ");
while (token != NULL) {
if (isdigit(token[0]) || (token[0] == '-' && isdigit(token[1]))) {
// It's a number - push it
stack_push(&stack, atoi(token));
} else {
// It's an operator - pop operands and compute
int b = stack_pop(&stack);
int a = stack_pop(&stack);
int result;
switch (token[0]) {
case '+': result = a + b; break;
case '-': result = a - b; break;
case '*': result = a * b; break;
case '/': result = a / b; break;
default:
printf("Unknown operator: %s\n", token);
exit(1);
}
stack_push(&stack, result);
}
printf(" Token: %-4s ", token);
stack_print(&stack);
token = strtok(NULL, " ");
}
free(expr);
return stack_pop(&stack);
}
Shunting-Yard Algorithm (Infix to Postfix)
int precedence(char op) {
switch (op) {
case '+': case '-': return 1;
case '*': case '/': return 2;
default: return 0;
}
}
void infix_to_postfix(const char* infix, char* postfix) {
Stack op_stack;
stack_init(&op_stack);
int j = 0;
for (int i = 0; infix[i]; i++) {
char c = infix[i];
if (isdigit(c)) {
postfix[j++] = c;
postfix[j++] = ' ';
} else if (c == '(') {
stack_push(&op_stack, c);
} else if (c == ')') {
while (!stack_is_empty(&op_stack) && stack_peek(&op_stack) != '(') {
postfix[j++] = stack_pop(&op_stack);
postfix[j++] = ' ';
}
stack_pop(&op_stack); // Remove '('
} else if (c == '+' || c == '-' || c == '*' || c == '/') {
while (!stack_is_empty(&op_stack) &&
precedence(stack_peek(&op_stack)) >= precedence(c)) {
postfix[j++] = stack_pop(&op_stack);
postfix[j++] = ' ';
}
stack_push(&op_stack, c);
}
}
while (!stack_is_empty(&op_stack)) {
postfix[j++] = stack_pop(&op_stack);
postfix[j++] = ' ';
}
postfix[j] = '\0';
}
Mini Stack-Based VM
typedef enum {
OP_PUSH,
OP_POP,
OP_ADD,
OP_SUB,
OP_MUL,
OP_DIV,
OP_PRINT,
OP_HALT
} OpCode;
typedef struct {
OpCode op;
int operand;
} Instruction;
void vm_execute(Instruction* program, int length) {
Stack stack;
stack_init(&stack);
for (int ip = 0; ip < length; ip++) {
Instruction inst = program[ip];
switch (inst.op) {
case OP_PUSH:
printf(" PUSH %d\t", inst.operand);
stack_push(&stack, inst.operand);
break;
case OP_ADD: {
printf(" ADD\t\t");
int b = stack_pop(&stack);
int a = stack_pop(&stack);
stack_push(&stack, a + b);
break;
}
case OP_MUL: {
printf(" MUL\t\t");
int b = stack_pop(&stack);
int a = stack_pop(&stack);
stack_push(&stack, a * b);
break;
}
case OP_PRINT:
printf(" PRINT\t\t");
printf("Output: %d\n", stack_peek(&stack));
break;
case OP_HALT:
printf(" HALT\n");
return;
}
stack_print(&stack);
}
}
// Example: Compute (10 + 20) * 5
int main() {
Instruction program[] = {
{OP_PUSH, 10},
{OP_PUSH, 20},
{OP_ADD, 0},
{OP_PUSH, 5},
{OP_MUL, 0},
{OP_PRINT, 0},
{OP_HALT, 0}
};
printf("Executing bytecode:\n");
vm_execute(program, sizeof(program) / sizeof(Instruction));
return 0;
}
Why Stacks Are Everywhere
FUNCTION CALLS (The Call Stack):
┌─────────────────┐
│ main() │ ← Called first, returns last
├─────────────────┤
│ calculate() │ ← Called second
├─────────────────┤
│ helper() │ ← Called third, returns first
└─────────────────┘
Each function call PUSHES a stack frame.
Each return POPS a stack frame.
LIFO ensures functions return in the right order!
EXPRESSION EVALUATION:
Expression: 3 + 4 * 5
As RPN: 3 4 5 * +
Stack trace:
Push 3: [3]
Push 4: [3, 4]
Push 5: [3, 4, 5]
Mul: [3, 20] (4 * 5)
Add: [23] (3 + 20)
Result: 23 ✓
Project 4: Queue Systems (Task Scheduler & BFS)
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Abstract Data Types / Systems Programming
- Software or Tool: Job Queue, BFS Maze Solver
- Main Book: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau
What you’ll build
A multi-priority task scheduler using queues, plus a visual maze solver using BFS—demonstrating why FIFO (First In, First Out) enables fairness and level-order processing.
Why it teaches this concept
Queues are about fairness and ordering:
- First come, first served (FIFO)
- Level-by-level processing (BFS)
- Buffering between producer and consumer
Understanding queues is essential for:
- Operating system schedulers
- Network packet handling
- Message queues (Kafka, RabbitMQ)
- Breadth-first search
Core challenges you’ll face
- FIFO semantics (first in, first out) → maps to fairness
- Circular buffer (efficient array-based queue) → maps to memory efficiency
- Priority queues (not FIFO - will connect to heaps) → maps to scheduling
- Producer-consumer (handling different speeds) → maps to buffering
Key Concepts
| Concept | Resource |
|---|---|
| Queue ADT | Introduction to Algorithms Chapter 10.1 - CLRS |
| Circular Buffers | The Linux Programming Interface Chapter 44 - Kerrisk |
| BFS Algorithm | Introduction to Algorithms Chapter 22.2 - CLRS |
| Process Scheduling | Operating Systems: Three Easy Pieces Chapter 7 - OSTEP |
Difficulty & Time
- Difficulty: Intermediate
- Time estimate: 1 week
- Prerequisites: Projects 1-3 completed
Real World Outcome
$ ./task_scheduler
=== Task Scheduler with Multiple Queues ===
Adding tasks:
[HIGH] Task: "Handle interrupt" -> High priority queue
[MEDIUM] Task: "Process user input" -> Medium priority queue
[LOW] Task: "Background sync" -> Low priority queue
[HIGH] Task: "Memory page fault" -> High priority queue
[MEDIUM] Task: "Render frame" -> Medium priority queue
Processing (high priority first, then FIFO within each level):
[1] HIGH: "Handle interrupt"
[2] HIGH: "Memory page fault"
[3] MEDIUM: "Process user input"
[4] MEDIUM: "Render frame"
[5] LOW: "Background sync"
=== BFS Maze Solver ===
Maze:
█████████████
█S █ █
███ ███ █████
█ █ █ █
█ █████ █ █ █
█ █ █
█████████ ███
█ █E█
█████████████
Solving with BFS...
Step 1: Exploring (1,1) - Added neighbors to queue
Step 2: Exploring (1,2) - Added neighbors to queue
...
Solution found! Path length: 24 steps
█████████████
█S....█.....█
███.███.█████
█...█...█...█
█.█████.█.█.█
█.......█...█
█████████.███
█.........█E█
█████████████
Why BFS finds shortest path: It explores level-by-level!
Level 0: Start (1,1)
Level 1: All cells 1 step away
Level 2: All cells 2 steps away
...
First time we reach End = shortest path!
Learning Milestones
- Basic queue works → You understand FIFO
- Circular buffer is efficient → You understand wrapping
- Priority scheduler works → You understand scheduling
- BFS finds shortest path → You understand level-order processing
Implementation
Queue with Circular Buffer
#define QUEUE_CAPACITY 100
typedef struct {
int items[QUEUE_CAPACITY];
int front; // Index to dequeue from
int rear; // Index to enqueue to
int size;
} Queue;
void queue_init(Queue* q) {
q->front = 0;
q->rear = 0;
q->size = 0;
}
int queue_is_empty(Queue* q) {
return q->size == 0;
}
int queue_is_full(Queue* q) {
return q->size == QUEUE_CAPACITY;
}
void queue_enqueue(Queue* q, int value) {
if (queue_is_full(q)) {
printf("Queue overflow!\n");
return;
}
q->items[q->rear] = value;
q->rear = (q->rear + 1) % QUEUE_CAPACITY; // Wrap around!
q->size++;
}
int queue_dequeue(Queue* q) {
if (queue_is_empty(q)) {
printf("Queue underflow!\n");
return -1;
}
int value = q->items[q->front];
q->front = (q->front + 1) % QUEUE_CAPACITY; // Wrap around!
q->size--;
return value;
}
// Visualize circular buffer
void queue_visualize(Queue* q) {
printf("Queue (front=%d, rear=%d, size=%d):\n[", q->front, q->rear, q->size);
for (int i = 0; i < QUEUE_CAPACITY && i < 20; i++) {
if (i >= q->front && i < q->front + q->size) {
printf("%d", q->items[i % QUEUE_CAPACITY]);
} else {
printf("_");
}
if (i < 19) printf(",");
}
printf("...]\n");
}
Multi-Level Priority Scheduler
typedef enum { HIGH, MEDIUM, LOW, NUM_PRIORITIES } Priority;
typedef struct {
char name[64];
Priority priority;
} Task;
typedef struct {
Queue queues[NUM_PRIORITIES];
Task tasks[1000]; // Task storage
int task_count;
} Scheduler;
void scheduler_add_task(Scheduler* s, const char* name, Priority priority) {
Task* task = &s->tasks[s->task_count];
strcpy(task->name, name);
task->priority = priority;
// Add task index to appropriate queue
queue_enqueue(&s->queues[priority], s->task_count);
s->task_count++;
const char* priority_names[] = {"HIGH", "MEDIUM", "LOW"};
printf("Added [%s] task: \"%s\"\n", priority_names[priority], name);
}
Task* scheduler_get_next(Scheduler* s) {
// Check high priority first, then medium, then low
for (int p = HIGH; p < NUM_PRIORITIES; p++) {
if (!queue_is_empty(&s->queues[p])) {
int task_idx = queue_dequeue(&s->queues[p]);
return &s->tasks[task_idx];
}
}
return NULL; // No tasks
}
BFS Maze Solver
#define MAZE_SIZE 13
typedef struct {
int row, col;
int distance;
} Cell;
int bfs_solve_maze(char maze[MAZE_SIZE][MAZE_SIZE],
int start_row, int start_col,
int end_row, int end_col) {
// Queue for BFS
Cell queue[MAZE_SIZE * MAZE_SIZE];
int front = 0, rear = 0;
// Visited array
int visited[MAZE_SIZE][MAZE_SIZE] = {0};
// Parent tracking for path reconstruction
int parent_row[MAZE_SIZE][MAZE_SIZE];
int parent_col[MAZE_SIZE][MAZE_SIZE];
// Start BFS
queue[rear++] = (Cell){start_row, start_col, 0};
visited[start_row][start_col] = 1;
int directions[4][2] = {{-1, 0}, {1, 0}, {0, -1}, {0, 1}};
while (front < rear) {
Cell current = queue[front++];
// Found the end!
if (current.row == end_row && current.col == end_col) {
// Reconstruct path
int r = end_row, c = end_col;
while (r != start_row || c != start_col) {
maze[r][c] = '.';
int pr = parent_row[r][c];
int pc = parent_col[r][c];
r = pr;
c = pc;
}
maze[start_row][start_col] = 'S';
maze[end_row][end_col] = 'E';
return current.distance;
}
// Explore neighbors
for (int d = 0; d < 4; d++) {
int nr = current.row + directions[d][0];
int nc = current.col + directions[d][1];
if (nr >= 0 && nr < MAZE_SIZE && nc >= 0 && nc < MAZE_SIZE &&
maze[nr][nc] != '#' && !visited[nr][nc]) {
visited[nr][nc] = 1;
parent_row[nr][nc] = current.row;
parent_col[nr][nc] = current.col;
queue[rear++] = (Cell){nr, nc, current.distance + 1};
}
}
}
return -1; // No path found
}
Stack vs Queue: When to Use Each
| Use Case | Stack (LIFO) | Queue (FIFO) |
|---|---|---|
| Undo/Redo | ✓ | |
| Function calls | ✓ | |
| DFS traversal | ✓ | |
| Expression evaluation | ✓ | |
| Task scheduling | ✓ | |
| BFS traversal | ✓ | |
| Print queue | ✓ | |
| Message buffering | ✓ |
Key insight:
- Stack: When most recent is most important
- Queue: When order of arrival matters (fairness)
Project 5: Hash Table (Dictionary & Cache)
- Main Programming Language: C
- Alternative Programming Languages: Rust, Python, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Data Structures / Algorithms
- Software or Tool: Key-Value Store, LRU Cache
- Main Book: Introduction to Algorithms by CLRS
What you’ll build
A complete hash table implementation with multiple collision resolution strategies (chaining, open addressing), plus a practical LRU (Least Recently Used) cache.
Why it teaches this concept
Hash tables are the most practically important data structure:
- O(1) average lookup, insert, delete
- Backbone of databases, caches, dictionaries
- Used everywhere: Python dicts, JavaScript objects, database indexes
Understanding hash tables means understanding:
- Hash functions (turning keys into indices)
- Collision resolution (when two keys hash to same index)
- Load factor and resizing
- Trade-offs between time and space
Core challenges you’ll face
- Hash function design (distribute keys uniformly) → maps to hashing theory
- Collision handling (chaining vs open addressing) → maps to trade-offs
- Load factor (when to resize) → maps to performance tuning
- LRU eviction (combining hash table with linked list) → maps to real caches
Resources for Key Challenges
- Real-world Applications of Hash Tables - Practical use cases
- GeeksforGeeks: Real-life Applications of DSA - Hash tables in databases, caching
Key Concepts
| Concept | Resource |
|---|---|
| Hash Functions | Introduction to Algorithms Chapter 11 - CLRS |
| Collision Resolution | Algorithms, Fourth Edition Chapter 3.4 - Sedgewick |
| Universal Hashing | Introduction to Algorithms Chapter 11.3.3 - CLRS |
| LRU Cache Design | LeetCode Problem 146 explanation |
Difficulty & Time
- Difficulty: Advanced
- Time estimate: 2 weeks
- Prerequisites: Projects 1-2 completed, understanding of modular arithmetic
Real World Outcome
$ ./hashtable_demo
=== Hash Table with Chaining ===
Initial capacity: 16, load factor threshold: 0.75
put("apple", 5) -> hash("apple") = 7, bucket 7
put("banana", 3) -> hash("banana") = 2, bucket 2
put("cherry", 8) -> hash("cherry") = 7, bucket 7 (collision with apple!)
Bucket visualization:
[0]: empty
[1]: empty
[2]: ["banana" -> 3]
...
[7]: ["apple" -> 5] -> ["cherry" -> 8] ← Chained!
...
get("cherry") = 8 (found in bucket 7, 2nd in chain)
get("mango") = NOT FOUND
=== Open Addressing (Linear Probing) ===
put("apple", 5) -> hash = 7, slot 7
put("cherry", 8) -> hash = 7, collision! probe slot 8
Table visualization:
[0]: empty
...
[7]: ("apple", 5)
[8]: ("cherry", 8) ← Probed here!
...
=== Resize Demo ===
Adding 12 items... load factor = 0.75
RESIZE triggered! 16 -> 32 buckets
All items rehashed to new positions
=== LRU Cache (capacity=3) ===
put(1, "one") Cache: [1]
put(2, "two") Cache: [2, 1]
put(3, "three") Cache: [3, 2, 1]
get(1) Cache: [1, 3, 2] ← 1 moved to front (recently used)
put(4, "four") Cache: [4, 1, 3] ← 2 evicted (least recently used)!
get(2) MISS! (was evicted)
Learning Milestones
- Hash function distributes evenly → You understand hashing
- Chaining handles collisions → You understand linked lists in practice
- Open addressing works → You understand the alternative trade-off
- Resize maintains O(1) amortized → You understand load factors
- LRU cache works → You understand combining data structures
Implementation
Hash Function
// DJB2 hash - simple and effective
uint32_t hash_string(const char* str) {
uint32_t hash = 5381;
int c;
while ((c = *str++)) {
hash = ((hash << 5) + hash) + c; // hash * 33 + c
}
return hash;
}
// For integer keys
uint32_t hash_int(int key) {
// Knuth's multiplicative hash
return key * 2654435761u;
}
Hash Table with Chaining
typedef struct Entry {
char* key;
int value;
struct Entry* next; // For chaining
} Entry;
typedef struct {
Entry** buckets; // Array of linked lists
size_t capacity;
size_t size;
float max_load_factor;
} HashTable;
HashTable* ht_create(size_t capacity) {
HashTable* ht = malloc(sizeof(HashTable));
ht->buckets = calloc(capacity, sizeof(Entry*));
ht->capacity = capacity;
ht->size = 0;
ht->max_load_factor = 0.75;
return ht;
}
void ht_put(HashTable* ht, const char* key, int value) {
// Check load factor
float load = (float)ht->size / ht->capacity;
if (load > ht->max_load_factor) {
ht_resize(ht, ht->capacity * 2);
}
uint32_t hash = hash_string(key);
size_t index = hash % ht->capacity;
// Check if key exists
Entry* entry = ht->buckets[index];
while (entry) {
if (strcmp(entry->key, key) == 0) {
entry->value = value; // Update
return;
}
entry = entry->next;
}
// Insert new entry at head of chain
Entry* new_entry = malloc(sizeof(Entry));
new_entry->key = strdup(key);
new_entry->value = value;
new_entry->next = ht->buckets[index];
ht->buckets[index] = new_entry;
ht->size++;
}
int ht_get(HashTable* ht, const char* key, int* found) {
uint32_t hash = hash_string(key);
size_t index = hash % ht->capacity;
Entry* entry = ht->buckets[index];
while (entry) {
if (strcmp(entry->key, key) == 0) {
*found = 1;
return entry->value;
}
entry = entry->next;
}
*found = 0;
return 0;
}
void ht_resize(HashTable* ht, size_t new_capacity) {
Entry** old_buckets = ht->buckets;
size_t old_capacity = ht->capacity;
ht->buckets = calloc(new_capacity, sizeof(Entry*));
ht->capacity = new_capacity;
ht->size = 0;
// Rehash all entries
for (size_t i = 0; i < old_capacity; i++) {
Entry* entry = old_buckets[i];
while (entry) {
Entry* next = entry->next;
ht_put(ht, entry->key, entry->value);
free(entry->key);
free(entry);
entry = next;
}
}
free(old_buckets);
printf("Resized: %zu -> %zu buckets\n", old_capacity, new_capacity);
}
Hash Table with Open Addressing (Linear Probing)
typedef enum { EMPTY, OCCUPIED, DELETED } SlotState;
typedef struct {
char* key;
int value;
SlotState state;
} Slot;
typedef struct {
Slot* slots;
size_t capacity;
size_t size;
} OpenHashTable;
void oht_put(OpenHashTable* ht, const char* key, int value) {
uint32_t hash = hash_string(key);
size_t index = hash % ht->capacity;
// Linear probing
size_t start = index;
do {
if (ht->slots[index].state == EMPTY ||
ht->slots[index].state == DELETED) {
// Found empty slot
ht->slots[index].key = strdup(key);
ht->slots[index].value = value;
ht->slots[index].state = OCCUPIED;
ht->size++;
return;
}
if (ht->slots[index].state == OCCUPIED &&
strcmp(ht->slots[index].key, key) == 0) {
// Key exists - update
ht->slots[index].value = value;
return;
}
index = (index + 1) % ht->capacity; // Probe next slot
} while (index != start);
printf("Hash table full!\n");
}
LRU Cache (Hash Table + Doubly Linked List)
typedef struct LRUNode {
int key;
int value;
struct LRUNode* prev;
struct LRUNode* next;
} LRUNode;
typedef struct {
int capacity;
int size;
LRUNode* head; // Most recently used
LRUNode* tail; // Least recently used
LRUNode** hash_table; // For O(1) lookup
int table_size;
} LRUCache;
void lru_move_to_front(LRUCache* cache, LRUNode* node) {
if (node == cache->head) return;
// Remove from current position
if (node->prev) node->prev->next = node->next;
if (node->next) node->next->prev = node->prev;
if (node == cache->tail) cache->tail = node->prev;
// Insert at head
node->prev = NULL;
node->next = cache->head;
if (cache->head) cache->head->prev = node;
cache->head = node;
if (!cache->tail) cache->tail = node;
}
int lru_get(LRUCache* cache, int key) {
int index = key % cache->table_size;
LRUNode* node = cache->hash_table[index];
// Find in chain
while (node && node->key != key) {
node = node->next; // Actually need separate chaining here
}
if (!node) return -1; // Not found
// Move to front (mark as recently used)
lru_move_to_front(cache, node);
return node->value;
}
void lru_put(LRUCache* cache, int key, int value) {
// Check if exists
int index = key % cache->table_size;
LRUNode* existing = cache->hash_table[index];
if (existing) {
existing->value = value;
lru_move_to_front(cache, existing);
return;
}
// Check if at capacity
if (cache->size >= cache->capacity) {
// Evict LRU (tail)
LRUNode* victim = cache->tail;
printf("Evicting key %d (LRU)\n", victim->key);
// Remove from hash table
int victim_index = victim->key % cache->table_size;
cache->hash_table[victim_index] = NULL;
// Remove from list
cache->tail = victim->prev;
if (cache->tail) cache->tail->next = NULL;
free(victim);
cache->size--;
}
// Add new node
LRUNode* node = malloc(sizeof(LRUNode));
node->key = key;
node->value = value;
node->prev = NULL;
node->next = cache->head;
if (cache->head) cache->head->prev = node;
cache->head = node;
if (!cache->tail) cache->tail = node;
cache->hash_table[index] = node;
cache->size++;
}
Chaining vs Open Addressing
| Aspect | Chaining | Open Addressing |
|---|---|---|
| Memory | Pointer overhead per entry | No extra pointers |
| Cache | Poor (linked list traversal) | Good (contiguous probing) |
| Load factor | Can exceed 1.0 | Must stay below ~0.7 |
| Deletion | Simple (unlink node) | Complex (tombstones) |
| Implementation | Easier | Harder |
| Clustering | No | Yes (performance degrades) |
Project 6: Binary Search Tree & Balancing
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Java
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Trees / Self-Balancing
- Software or Tool: BST with visualization, AVL Tree
- Main Book: Introduction to Algorithms by CLRS
What you’ll build
A binary search tree with visualization, then extend it to a self-balancing AVL tree, demonstrating why balancing is crucial for performance.
Why it teaches this concept
Trees enable O(log n) operations when balanced:
- Searching
- Insertion
- Deletion
- Range queries
- Ordered traversal
But an unbalanced tree degrades to O(n)—essentially a linked list. Understanding balancing is crucial for:
- Database indexes (B-trees)
- File systems
- Memory allocators (red-black trees in Linux kernel)
- Priority scheduling
Core challenges you’ll face
- BST property (left < node < right) → maps to binary search
- Tree traversal (in-order, pre-order, post-order) → maps to recursive thinking
- Deletion (three cases: leaf, one child, two children) → maps to pointer manipulation
- Balancing (rotations to maintain height) → maps to self-balancing
Key Concepts
| Concept | Resource |
|---|---|
| BST Operations | Introduction to Algorithms Chapter 12 - CLRS |
| Tree Traversals | Algorithms, Fourth Edition Chapter 3.2 - Sedgewick |
| AVL Trees | Introduction to Algorithms Chapter 13 (similar to RB trees) - CLRS |
| Rotations | Visualgo.net BST visualization |
Difficulty & Time
- Difficulty: Advanced
- Time estimate: 2 weeks
- Prerequisites: Projects 1-4 completed, recursion comfort
Real World Outcome
$ ./bst_demo
=== Binary Search Tree Demo ===
Inserting: 50, 30, 70, 20, 40, 60, 80
Tree visualization:
50
/ \
30 70
/ \ / \
20 40 60 80
In-order traversal (sorted!): 20 30 40 50 60 70 80
Tree height: 3
Is balanced: YES
=== Unbalanced Insertion Demo ===
Inserting sorted: 10, 20, 30, 40, 50, 60, 70
Tree visualization:
10
\
20
\
30
\
40
\
50 ... (degenerates to linked list!)
Height: 7 (should be 3 for balanced)
Is balanced: NO
Search for 70: 7 comparisons (should be 3!)
=== AVL Tree (Self-Balancing) ===
Same insertions: 10, 20, 30, 40, 50, 60, 70
After each insert, rebalance:
Insert 10: 10
Insert 20: 10-20
Insert 30: 10-20-30 -> ROTATE LEFT
20
/ \
10 30
Insert 40: 20
/ \
10 30-40
Insert 50: 20 30
/ \ -> / \
10 30-40-50 20 40
/ \
10 50
...
Final AVL tree:
40
/ \
20 60
/ \ / \
10 30 50 70
Height: 3 (optimal!)
All operations: O(log n) guaranteed!
Learning Milestones
- BST insertion/search works → You understand the BST property
- In-order traversal is sorted → You understand tree ordering
- Unbalanced tree is slow → You understand why balancing matters
- AVL rotations fix balance → You understand self-balancing
Implementation
Basic BST
typedef struct TreeNode {
int key;
struct TreeNode* left;
struct TreeNode* right;
int height; // For AVL
} TreeNode;
TreeNode* create_node(int key) {
TreeNode* node = malloc(sizeof(TreeNode));
node->key = key;
node->left = NULL;
node->right = NULL;
node->height = 1;
return node;
}
// O(h) where h is height
TreeNode* bst_insert(TreeNode* root, int key) {
if (root == NULL) {
return create_node(key);
}
if (key < root->key) {
root->left = bst_insert(root->left, key);
} else if (key > root->key) {
root->right = bst_insert(root->right, key);
}
// If equal, ignore (no duplicates)
return root;
}
// O(h) where h is height
TreeNode* bst_search(TreeNode* root, int key, int* comparisons) {
(*comparisons)++;
if (root == NULL || root->key == key) {
return root;
}
if (key < root->key) {
return bst_search(root->left, key, comparisons);
} else {
return bst_search(root->right, key, comparisons);
}
}
// In-order: Left, Root, Right -> Sorted order!
void inorder_traversal(TreeNode* root) {
if (root == NULL) return;
inorder_traversal(root->left);
printf("%d ", root->key);
inorder_traversal(root->right);
}
// Find minimum (leftmost node)
TreeNode* find_min(TreeNode* root) {
while (root->left) {
root = root->left;
}
return root;
}
// Deletion (the tricky one!)
TreeNode* bst_delete(TreeNode* root, int key) {
if (root == NULL) return NULL;
if (key < root->key) {
root->left = bst_delete(root->left, key);
} else if (key > root->key) {
root->right = bst_delete(root->right, key);
} else {
// Found the node to delete
// Case 1: Leaf node
if (!root->left && !root->right) {
free(root);
return NULL;
}
// Case 2: One child
if (!root->left) {
TreeNode* temp = root->right;
free(root);
return temp;
}
if (!root->right) {
TreeNode* temp = root->left;
free(root);
return temp;
}
// Case 3: Two children
// Find in-order successor (smallest in right subtree)
TreeNode* successor = find_min(root->right);
root->key = successor->key;
root->right = bst_delete(root->right, successor->key);
}
return root;
}
Tree Visualization
void print_tree_helper(TreeNode* root, int space, int height) {
if (root == NULL) return;
space += height;
print_tree_helper(root->right, space, height);
printf("\n");
for (int i = height; i < space; i++) {
printf(" ");
}
printf("%d\n", root->key);
print_tree_helper(root->left, space, height);
}
void print_tree(TreeNode* root) {
print_tree_helper(root, 0, 5);
}
int tree_height(TreeNode* root) {
if (root == NULL) return 0;
int left_h = tree_height(root->left);
int right_h = tree_height(root->right);
return 1 + (left_h > right_h ? left_h : right_h);
}
int is_balanced(TreeNode* root) {
if (root == NULL) return 1;
int left_h = tree_height(root->left);
int right_h = tree_height(root->right);
if (abs(left_h - right_h) > 1) return 0;
return is_balanced(root->left) && is_balanced(root->right);
}
AVL Tree (Self-Balancing)
int get_height(TreeNode* node) {
return node ? node->height : 0;
}
int get_balance(TreeNode* node) {
return node ? get_height(node->left) - get_height(node->right) : 0;
}
void update_height(TreeNode* node) {
int left_h = get_height(node->left);
int right_h = get_height(node->right);
node->height = 1 + (left_h > right_h ? left_h : right_h);
}
// Right rotation
// y x
// / \ / \
// x C -> A y
// / \ / \
// A B B C
TreeNode* rotate_right(TreeNode* y) {
TreeNode* x = y->left;
TreeNode* B = x->right;
x->right = y;
y->left = B;
update_height(y);
update_height(x);
return x;
}
// Left rotation
TreeNode* rotate_left(TreeNode* x) {
TreeNode* y = x->right;
TreeNode* B = y->left;
y->left = x;
x->right = B;
update_height(x);
update_height(y);
return y;
}
TreeNode* avl_insert(TreeNode* root, int key) {
// Standard BST insert
if (root == NULL) {
return create_node(key);
}
if (key < root->key) {
root->left = avl_insert(root->left, key);
} else if (key > root->key) {
root->right = avl_insert(root->right, key);
} else {
return root; // No duplicates
}
// Update height
update_height(root);
// Check balance
int balance = get_balance(root);
// Left Left Case
if (balance > 1 && key < root->left->key) {
printf("Left-Left case: rotating right\n");
return rotate_right(root);
}
// Right Right Case
if (balance < -1 && key > root->right->key) {
printf("Right-Right case: rotating left\n");
return rotate_left(root);
}
// Left Right Case
if (balance > 1 && key > root->left->key) {
printf("Left-Right case: double rotation\n");
root->left = rotate_left(root->left);
return rotate_right(root);
}
// Right Left Case
if (balance < -1 && key < root->right->key) {
printf("Right-Left case: double rotation\n");
root->right = rotate_right(root->right);
return rotate_left(root);
}
return root;
}
BST vs Hash Table
| Operation | Hash Table | BST (balanced) | BST (unbalanced) |
|---|---|---|---|
| Search | O(1) avg | O(log n) | O(n) |
| Insert | O(1) avg | O(log n) | O(n) |
| Delete | O(1) avg | O(log n) | O(n) |
| Ordered traversal | O(n log n) sort | O(n) | O(n) |
| Range query | O(n) | O(log n + k) | O(n) |
| Min/Max | O(n) | O(log n) | O(n) |
Use BST when:
- You need ordered data
- Range queries are common
- You need min/max quickly
Use Hash Table when:
- Order doesn’t matter
- Just need fast lookup by key
Project 7: Heap & Priority Queue
- Main Programming Language: C
- Alternative Programming Languages: Rust, Python, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Heaps / Priority Queues
- Software or Tool: Task scheduler, Heapsort visualizer
- Main Book: Introduction to Algorithms by CLRS
What you’ll build
A binary heap (min and max), priority queue, and heapsort implementation with visualization—plus a practical task scheduler that always runs the highest-priority task.
Why it teaches this concept
Heaps answer the question: What if I only care about the minimum (or maximum)?
A BST gives O(log n) for everything, but requires balancing. A heap gives:
- O(1) find-min
- O(log n) insert
- O(log n) extract-min
And it’s simpler to implement (just an array!). Heaps are used in:
- Priority scheduling (OS, network packets)
- Dijkstra’s algorithm (shortest path)
- Heap sort (in-place O(n log n))
- Merge k sorted lists
Core challenges you’ll face
- Heap property (parent ≤ children for min-heap) → maps to partial ordering
- Array representation (implicit tree structure) → maps to memory efficiency
- Heapify (restore heap property) → maps to bubble up/down
- Build heap (O(n) from array) → maps to amortization
Key Concepts
| Concept | Resource |
|---|---|
| Heap Operations | Introduction to Algorithms Chapter 6 - CLRS |
| Heapsort | Algorithms, Fourth Edition Chapter 2.4 - Sedgewick |
| Priority Queues | Introduction to Algorithms Chapter 6.5 - CLRS |
Difficulty & Time
- Difficulty: Intermediate-Advanced
- Time estimate: 1 week
- Prerequisites: Projects 1-4 completed, understanding of trees
Real World Outcome
$ ./heap_demo
=== Min Heap Demo ===
Inserting: 50, 30, 70, 20, 40, 60, 80
Heap as array: [20, 30, 60, 50, 40, 70, 80]
Heap as tree:
20
/ \
30 60
/ \ / \
50 40 70 80
Extract min: 20
After extract: [30, 40, 60, 50, 80, 70]
30
/ \
40 60
/ \ /
50 80 70
=== Priority Queue Scheduler ===
Adding tasks:
[Priority 5] "Low: check emails"
[Priority 1] "URGENT: server down!"
[Priority 3] "Medium: review PR"
[Priority 2] "High: fix bug"
Processing (highest priority = lowest number):
1. [Priority 1] "URGENT: server down!"
2. [Priority 2] "High: fix bug"
3. [Priority 3] "Medium: review PR"
4. [Priority 5] "Low: check emails"
=== Heapsort Demo ===
Input: [64, 34, 25, 12, 22, 11, 90]
Sorting...
Build heap: [90, 34, 64, 12, 22, 11, 25]
Extract 90: [64, 34, 25, 12, 22, 11 | 90]
Extract 64: [34, 22, 25, 12, 11 | 64, 90]
...
Output: [11, 12, 22, 25, 34, 64, 90]
Heapsort: O(n log n), in-place, not stable
Learning Milestones
- Heap property maintained after insert → You understand bubble-up
- Extract-min works → You understand bubble-down
- Build-heap is O(n) → You understand the clever analysis
- Heapsort works in-place → You understand the algorithm
Implementation
Binary Heap (Array-based)
typedef struct {
int* data;
int size;
int capacity;
int is_min_heap; // 1 for min-heap, 0 for max-heap
} Heap;
// Parent and child indices (0-indexed)
int parent(int i) { return (i - 1) / 2; }
int left_child(int i) { return 2 * i + 1; }
int right_child(int i) { return 2 * i + 2; }
Heap* heap_create(int capacity, int is_min_heap) {
Heap* h = malloc(sizeof(Heap));
h->data = malloc(capacity * sizeof(int));
h->size = 0;
h->capacity = capacity;
h->is_min_heap = is_min_heap;
return h;
}
int compare(Heap* h, int a, int b) {
// Returns true if a should be above b
return h->is_min_heap ? (a < b) : (a > b);
}
void swap(int* a, int* b) {
int temp = *a;
*a = *b;
*b = temp;
}
// O(log n) - bubble up
void heap_insert(Heap* h, int value) {
if (h->size >= h->capacity) {
h->capacity *= 2;
h->data = realloc(h->data, h->capacity * sizeof(int));
}
// Insert at end
int i = h->size++;
h->data[i] = value;
// Bubble up
while (i > 0 && compare(h, h->data[i], h->data[parent(i)])) {
swap(&h->data[i], &h->data[parent(i)]);
i = parent(i);
}
}
// O(log n) - bubble down
void heapify_down(Heap* h, int i) {
int best = i;
int left = left_child(i);
int right = right_child(i);
if (left < h->size && compare(h, h->data[left], h->data[best])) {
best = left;
}
if (right < h->size && compare(h, h->data[right], h->data[best])) {
best = right;
}
if (best != i) {
swap(&h->data[i], &h->data[best]);
heapify_down(h, best);
}
}
// O(1)
int heap_peek(Heap* h) {
if (h->size == 0) {
printf("Heap is empty!\n");
return -1;
}
return h->data[0];
}
// O(log n)
int heap_extract(Heap* h) {
if (h->size == 0) {
printf("Heap is empty!\n");
return -1;
}
int root = h->data[0];
h->data[0] = h->data[--h->size];
heapify_down(h, 0);
return root;
}
// O(n) - not O(n log n)!
void build_heap(Heap* h, int* arr, int n) {
h->size = n;
for (int i = 0; i < n; i++) {
h->data[i] = arr[i];
}
// Start from last non-leaf and heapify down
for (int i = n / 2 - 1; i >= 0; i--) {
heapify_down(h, i);
}
}
Heap Visualization
void heap_print_tree(Heap* h) {
if (h->size == 0) {
printf("(empty)\n");
return;
}
int levels = 0;
int temp = h->size;
while (temp > 0) {
levels++;
temp /= 2;
}
int index = 0;
for (int level = 0; level < levels; level++) {
int nodes_in_level = 1 << level;
int spaces = (1 << (levels - level)) - 1;
// Print leading spaces
for (int s = 0; s < spaces; s++) printf(" ");
// Print nodes
for (int n = 0; n < nodes_in_level && index < h->size; n++) {
printf("%2d", h->data[index++]);
// Print spaces between nodes
for (int s = 0; s < 2 * spaces + 1; s++) printf(" ");
}
printf("\n");
}
}
Heapsort
void heapsort(int* arr, int n) {
// Build max-heap
Heap h;
h.data = arr;
h.size = n;
h.capacity = n;
h.is_min_heap = 0; // Max heap for ascending sort
// Build heap: O(n)
for (int i = n / 2 - 1; i >= 0; i--) {
heapify_down(&h, i);
}
// Extract max n times: O(n log n)
for (int i = n - 1; i > 0; i--) {
swap(&arr[0], &arr[i]);
h.size--;
heapify_down(&h, 0);
}
}
Priority Queue Task Scheduler
typedef struct {
int priority;
char description[128];
} Task;
typedef struct {
Task* tasks;
int size;
int capacity;
} PriorityQueue;
void pq_insert(PriorityQueue* pq, int priority, const char* desc) {
if (pq->size >= pq->capacity) {
pq->capacity *= 2;
pq->tasks = realloc(pq->tasks, pq->capacity * sizeof(Task));
}
int i = pq->size++;
pq->tasks[i].priority = priority;
strcpy(pq->tasks[i].description, desc);
// Bubble up (lower number = higher priority)
while (i > 0 && pq->tasks[i].priority < pq->tasks[(i-1)/2].priority) {
Task temp = pq->tasks[i];
pq->tasks[i] = pq->tasks[(i-1)/2];
pq->tasks[(i-1)/2] = temp;
i = (i - 1) / 2;
}
}
Task pq_extract(PriorityQueue* pq) {
Task highest = pq->tasks[0];
pq->tasks[0] = pq->tasks[--pq->size];
// Bubble down
int i = 0;
while (1) {
int best = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < pq->size &&
pq->tasks[left].priority < pq->tasks[best].priority) {
best = left;
}
if (right < pq->size &&
pq->tasks[right].priority < pq->tasks[best].priority) {
best = right;
}
if (best == i) break;
Task temp = pq->tasks[i];
pq->tasks[i] = pq->tasks[best];
pq->tasks[best] = temp;
i = best;
}
return highest;
}
Heap vs BST vs Sorted Array
| Operation | Heap | BST (balanced) | Sorted Array |
|---|---|---|---|
| Find min/max | O(1) | O(log n) | O(1) |
| Insert | O(log n) | O(log n) | O(n) |
| Extract min/max | O(log n) | O(log n) | O(1) or O(n) |
| Build from array | O(n) | O(n log n) | O(n log n) |
| Find arbitrary | O(n) | O(log n) | O(log n) |
| Memory | Minimal (array) | Pointers overhead | Minimal |
Use Heap when:
- You only need min or max, not both
- Priority queue operations
- Heapsort (in-place)
Project 8: Graph Algorithms Visualizer
- Main Programming Language: C
- Alternative Programming Languages: Rust, Python, JavaScript
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Graphs / Algorithms
- Software or Tool: Terminal/ncurses visualizer
- Main Book: Introduction to Algorithms by CLRS
What you’ll build
A graph library with visualization that demonstrates BFS, DFS, Dijkstra’s shortest path, and topological sort—showing how graphs model real-world relationships.
Why it teaches this concept
Graphs are everywhere:
- Social networks (people and friendships)
- Maps (cities and roads)
- The internet (pages and links)
- Dependencies (packages, tasks)
- Neural networks
Understanding graphs unlocks:
- Shortest path (GPS navigation)
- Social network analysis
- Dependency resolution (npm, cargo)
- Web crawling
- Recommendation systems
Core challenges you’ll face
- Graph representation (adjacency list vs matrix) → maps to space/time trade-off
- BFS (shortest path in unweighted graphs) → maps to level-order
- DFS (explore deeply, backtrack) → maps to stack/recursion
- Dijkstra’s (shortest path with weights) → maps to greedy + heap
- Topological sort (ordering with dependencies) → maps to DAGs
Resources for Key Challenges
- GeeksforGeeks: Real-life Applications of DSA - Graphs in maps, networks
- FreeCodeCamp: Real-world data structures - Graph applications
Key Concepts
| Concept | Resource |
|---|---|
| Graph Representations | Introduction to Algorithms Chapter 22 - CLRS |
| BFS/DFS | Introduction to Algorithms Chapter 22.2-22.3 - CLRS |
| Dijkstra’s Algorithm | Introduction to Algorithms Chapter 24.3 - CLRS |
| Topological Sort | Introduction to Algorithms Chapter 22.4 - CLRS |
Difficulty & Time
- Difficulty: Advanced
- Time estimate: 2 weeks
- Prerequisites: Projects 1-7 completed, especially heap
Real World Outcome
$ ./graph_demo
=== Graph Representation ===
Creating a city road network...
Adjacency List:
NYC -> [Boston(215), Philadelphia(97)]
Boston -> [NYC(215), Portland(108)]
Philadelphia -> [NYC(97), DC(140)]
DC -> [Philadelphia(140), Richmond(109)]
Portland -> [Boston(108)]
Richmond -> [DC(109)]
Adjacency Matrix:
NYC BOS PHL DC PRT RCH
NYC 0 215 97 0 0 0
BOS 215 0 0 0 108 0
PHL 97 0 0 140 0 0
DC 0 0 140 0 0 109
PRT 0 108 0 0 0 0
RCH 0 0 0 109 0 0
=== BFS: Shortest Path (unweighted) ===
From NYC, how many hops to each city?
NYC
/ \
BOS PHL
| \
PRT DC
|
RCH
Distances: NYC(0) BOS(1) PHL(1) DC(2) PRT(2) RCH(3)
=== DFS: Explore Deeply ===
Starting from NYC...
Visit order: NYC -> Boston -> Portland -> (backtrack) ->
Philadelphia -> DC -> Richmond
=== Dijkstra: Shortest Path (weighted) ===
Shortest path from NYC to Richmond:
Step 1: Process NYC, update neighbors
Boston: ∞ -> 215, Philadelphia: ∞ -> 97
Step 2: Process Philadelphia (closest), update neighbors
DC: ∞ -> 237
Step 3: Process Boston, update neighbors
Portland: ∞ -> 323
Step 4: Process DC, update neighbors
Richmond: ∞ -> 346
Step 5: Process Portland (no new paths)
Step 6: Process Richmond (destination!)
Shortest path NYC -> Richmond: 346 miles
Route: NYC -> Philadelphia -> DC -> Richmond
=== Topological Sort: Task Dependencies ===
Build system dependencies:
main.c depends on: utils.h, config.h
utils.c depends on: utils.h
config.c depends on: config.h
program depends on: main.o, utils.o, config.o
Valid build order:
1. config.h
2. utils.h
3. config.c -> config.o
4. utils.c -> utils.o
5. main.c -> main.o
6. program
Learning Milestones
- Can create graphs both ways → You understand representation trade-offs
- BFS finds shortest unweighted path → You understand level-order
- DFS explores deeply → You understand backtracking
- Dijkstra’s with heap works → You understand greedy algorithms
- Topological sort orders dependencies → You understand DAGs
Implementation
Graph Representation
// Adjacency List (better for sparse graphs)
typedef struct Edge {
int destination;
int weight;
struct Edge* next;
} Edge;
typedef struct {
Edge** adj_list; // Array of linked lists
int num_vertices;
int is_directed;
} Graph;
Graph* graph_create(int vertices, int is_directed) {
Graph* g = malloc(sizeof(Graph));
g->num_vertices = vertices;
g->is_directed = is_directed;
g->adj_list = calloc(vertices, sizeof(Edge*));
return g;
}
void graph_add_edge(Graph* g, int src, int dest, int weight) {
Edge* edge = malloc(sizeof(Edge));
edge->destination = dest;
edge->weight = weight;
edge->next = g->adj_list[src];
g->adj_list[src] = edge;
if (!g->is_directed) {
Edge* reverse = malloc(sizeof(Edge));
reverse->destination = src;
reverse->weight = weight;
reverse->next = g->adj_list[dest];
g->adj_list[dest] = reverse;
}
}
BFS (Breadth-First Search)
void bfs(Graph* g, int start) {
int* visited = calloc(g->num_vertices, sizeof(int));
int* distance = malloc(g->num_vertices * sizeof(int));
for (int i = 0; i < g->num_vertices; i++) {
distance[i] = -1;
}
// Queue
int* queue = malloc(g->num_vertices * sizeof(int));
int front = 0, rear = 0;
visited[start] = 1;
distance[start] = 0;
queue[rear++] = start;
printf("BFS from vertex %d:\n", start);
while (front < rear) {
int current = queue[front++];
printf(" Visit %d (distance %d)\n", current, distance[current]);
Edge* edge = g->adj_list[current];
while (edge) {
if (!visited[edge->destination]) {
visited[edge->destination] = 1;
distance[edge->destination] = distance[current] + 1;
queue[rear++] = edge->destination;
}
edge = edge->next;
}
}
free(visited);
free(distance);
free(queue);
}
DFS (Depth-First Search)
void dfs_helper(Graph* g, int vertex, int* visited) {
visited[vertex] = 1;
printf(" Visit %d\n", vertex);
Edge* edge = g->adj_list[vertex];
while (edge) {
if (!visited[edge->destination]) {
dfs_helper(g, edge->destination, visited);
}
edge = edge->next;
}
printf(" Backtrack from %d\n", vertex);
}
void dfs(Graph* g, int start) {
int* visited = calloc(g->num_vertices, sizeof(int));
printf("DFS from vertex %d:\n", start);
dfs_helper(g, start, visited);
free(visited);
}
Dijkstra’s Algorithm (with Min-Heap)
typedef struct {
int vertex;
int distance;
} HeapNode;
void dijkstra(Graph* g, int start, int* dist, int* prev) {
// Initialize distances
for (int i = 0; i < g->num_vertices; i++) {
dist[i] = INT_MAX;
prev[i] = -1;
}
dist[start] = 0;
// Min-heap priority queue
HeapNode* heap = malloc(g->num_vertices * sizeof(HeapNode));
int heap_size = 0;
// Insert start
heap[heap_size++] = (HeapNode){start, 0};
int* processed = calloc(g->num_vertices, sizeof(int));
while (heap_size > 0) {
// Extract min (simplified - should use proper heap)
int min_idx = 0;
for (int i = 1; i < heap_size; i++) {
if (heap[i].distance < heap[min_idx].distance) {
min_idx = i;
}
}
HeapNode current = heap[min_idx];
heap[min_idx] = heap[--heap_size];
if (processed[current.vertex]) continue;
processed[current.vertex] = 1;
printf("Process vertex %d (distance %d)\n",
current.vertex, current.distance);
// Update neighbors
Edge* edge = g->adj_list[current.vertex];
while (edge) {
int new_dist = dist[current.vertex] + edge->weight;
if (new_dist < dist[edge->destination]) {
dist[edge->destination] = new_dist;
prev[edge->destination] = current.vertex;
heap[heap_size++] = (HeapNode){edge->destination, new_dist};
printf(" Update %d: distance = %d\n",
edge->destination, new_dist);
}
edge = edge->next;
}
}
free(heap);
free(processed);
}
void print_path(int* prev, int dest) {
if (prev[dest] != -1) {
print_path(prev, prev[dest]);
printf(" -> ");
}
printf("%d", dest);
}
Topological Sort (Kahn’s Algorithm)
void topological_sort(Graph* g) {
int* in_degree = calloc(g->num_vertices, sizeof(int));
// Calculate in-degrees
for (int v = 0; v < g->num_vertices; v++) {
Edge* edge = g->adj_list[v];
while (edge) {
in_degree[edge->destination]++;
edge = edge->next;
}
}
// Queue of vertices with in-degree 0
int* queue = malloc(g->num_vertices * sizeof(int));
int front = 0, rear = 0;
for (int v = 0; v < g->num_vertices; v++) {
if (in_degree[v] == 0) {
queue[rear++] = v;
}
}
printf("Topological order: ");
int count = 0;
while (front < rear) {
int current = queue[front++];
printf("%d ", current);
count++;
Edge* edge = g->adj_list[current];
while (edge) {
in_degree[edge->destination]--;
if (in_degree[edge->destination] == 0) {
queue[rear++] = edge->destination;
}
edge = edge->next;
}
}
if (count != g->num_vertices) {
printf("\nCycle detected! Not a DAG.\n");
}
printf("\n");
free(in_degree);
free(queue);
}
Adjacency List vs Adjacency Matrix
| Aspect | Adjacency List | Adjacency Matrix |
|---|---|---|
| Space | O(V + E) | O(V²) |
| Check if edge exists | O(degree) | O(1) |
| Find all neighbors | O(degree) | O(V) |
| Add edge | O(1) | O(1) |
| Best for | Sparse graphs | Dense graphs |
Project 9: Mini Database Engine (Capstone)
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Databases / Systems Programming
- Software or Tool: SQL-like query engine
- Main Book: Database Internals by Alex Petrov
What you’ll build
A mini database that combines everything you’ve learned: B-tree indexing, hash tables for lookups, heaps for sorting, and more—with a simple SQL-like query language.
Why this is the capstone
Real databases are the ultimate data structure application:
- Hash indexes for equality lookups
- B-trees for range queries
- Heaps for sorting and top-k
- Graphs for query planning
Building a mini database proves you understand when to use each structure.
Core challenges you’ll face
- Storage engine (pages, buffer pool) → maps to memory management
- B-tree index (balanced tree for disk) → maps to tree structures
- Hash index (fast equality lookup) → maps to hash tables
- Query execution (scans, filters, joins) → maps to algorithm design
- Query parsing (SQL-like syntax) → maps to stack machines
Key Concepts
| Concept | Resource |
|---|---|
| B-Trees | Introduction to Algorithms Chapter 18 - CLRS |
| Database Storage | Database Internals Chapters 1-4 - Alex Petrov |
| Query Processing | Database System Concepts Chapter 12 - Silberschatz |
| Buffer Management | Database Internals Chapter 5 - Alex Petrov |
Difficulty & Time
- Difficulty: Expert/Master
- Time estimate: 4-6 weeks
- Prerequisites: All previous projects completed
Real World Outcome
$ ./minidb
MiniDB v1.0 - A learning database engine
Type 'help' for commands.
minidb> CREATE TABLE users (id INT, name TEXT, age INT);
Table 'users' created.
minidb> CREATE INDEX idx_id ON users(id) USING HASH;
Hash index 'idx_id' created on users(id).
minidb> CREATE INDEX idx_age ON users(age) USING BTREE;
B-tree index 'idx_age' created on users(age).
minidb> INSERT INTO users VALUES (1, 'Alice', 30);
INSERT INTO users VALUES (2, 'Bob', 25);
INSERT INTO users VALUES (3, 'Charlie', 35);
Inserted 3 rows.
minidb> SELECT * FROM users;
+----+---------+-----+
| id | name | age |
+----+---------+-----+
| 1 | Alice | 30 |
| 2 | Bob | 25 |
| 3 | Charlie | 35 |
+----+---------+-----+
3 rows returned (full table scan)
minidb> SELECT * FROM users WHERE id = 2;
+----+------+-----+
| id | name | age |
+----+------+-----+
| 2 | Bob | 25 |
+----+------+-----+
1 row returned (used hash index 'idx_id')
minidb> SELECT * FROM users WHERE age > 28 ORDER BY age;
+----+---------+-----+
| id | name | age |
+----+---------+-----+
| 1 | Alice | 30 |
| 3 | Charlie | 35 |
+----+---------+-----+
2 rows returned (used B-tree index 'idx_age' for range scan)
minidb> EXPLAIN SELECT * FROM users WHERE age > 28;
Query Plan:
-> Index Range Scan on idx_age (age > 28)
-> Fetch rows from table 'users'
-> Return results
minidb> .stats
Database Statistics:
Tables: 1
Total rows: 3
Hash indexes: 1 (idx_id)
B-tree indexes: 1 (idx_age)
Buffer pool: 16 pages, 2 dirty
minidb> .dump users
Page 0: [1|Alice|30] [2|Bob|25] [3|Charlie|35]
B-tree idx_age:
[30]
/ \
[25] [35]
Learning Milestones
- Can store and retrieve rows → You understand storage
- Hash index speeds up equality queries → You understand hash tables in practice
- B-tree index enables range queries → You understand balanced trees in practice
- Query planner chooses correct index → You understand optimization
Architecture Overview
┌────────────────────────────────────────────────────────────────┐
│ MiniDB │
├────────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ SQL Parser │ │ Query │ │ Query Executor │ │
│ │ │→ │ Planner │→ │ │ │
│ │ (Stack-based)│ │ (Choose idx) │ │ (Scan/Filter/Sort) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Index Manager │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Hash Index │ │ B-Tree Index│ │ (future: R-tree)│ │ │
│ │ │ O(1) == │ │ O(log n) <> │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Storage Engine │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Buffer Pool │ │ Page Manager│ │ Heap File │ │ │
│ │ │ (LRU Cache) │ │ (Slotted) │ │ (Row Storage) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Disk (Files) │ │
│ │ [users.dat] [users_idx_id.hash] [users_idx_age.btree] │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
Implementation Skeleton
// Row storage
typedef struct {
int id;
char name[64];
int age;
} Row;
// Table metadata
typedef struct {
char name[64];
int num_rows;
Row* rows; // Simple in-memory for now
HashTable* hash_indexes;
BTree* btree_indexes;
} Table;
// Database
typedef struct {
Table* tables[MAX_TABLES];
int num_tables;
} Database;
// Query result
typedef struct {
Row* rows;
int count;
char index_used[64];
} QueryResult;
// Execute SELECT with index selection
QueryResult* execute_select(Table* table, const char* column,
const char* op, int value) {
QueryResult* result = malloc(sizeof(QueryResult));
result->rows = malloc(table->num_rows * sizeof(Row));
result->count = 0;
// Check for applicable index
if (strcmp(op, "=") == 0 && has_hash_index(table, column)) {
// Use hash index for O(1) lookup
int* row_ids = hash_index_lookup(table, column, value);
strcpy(result->index_used, "hash");
// Fetch rows...
}
else if ((strcmp(op, ">") == 0 || strcmp(op, "<") == 0) &&
has_btree_index(table, column)) {
// Use B-tree for range scan
int* row_ids = btree_range_scan(table, column, op, value);
strcpy(result->index_used, "btree");
// Fetch rows...
}
else {
// Full table scan
strcpy(result->index_used, "none (full scan)");
for (int i = 0; i < table->num_rows; i++) {
if (matches_condition(&table->rows[i], column, op, value)) {
result->rows[result->count++] = table->rows[i];
}
}
}
return result;
}
Essential Resources Summary
Primary Books (in order)
- Introduction to Algorithms (CLRS) - The bible of algorithms and data structures
- Algorithms, Fourth Edition by Sedgewick - More accessible, excellent visualizations
- Computer Systems: A Programmer’s Perspective - For understanding memory
- Grokking Algorithms - Visual, beginner-friendly
- Database Internals by Alex Petrov - For the capstone project
From Your Library
- C Programming: A Modern Approach by K.N. King - For implementation
- The C Programming Language by K&R - Reference
- Data Structures the Fun Way by Jeremy Kubica - Visual learning
- Grokking Data Structures by Marcello La Rocca - Beginner-friendly
- Advanced Algorithms and Data Structures by Marcello La Rocca - Advanced topics
Online Resources
- Visualgo.net - Animated data structure visualizations
- GeeksforGeeks: Real-life Applications - Practical examples
- The Lost Art of Structure Packing - Memory layout optimization
The Big Picture: When to Use What
| I Need To… | Use This | Why |
|---|---|---|
| Access by index | Array | O(1) random access |
| Insert/delete frequently at ends | Linked List | O(1) with pointers |
| LIFO (last in, first out) | Stack | O(1) push/pop |
| FIFO (first in, first out) | Queue | O(1) enqueue/dequeue |
| Fast lookup by key | Hash Table | O(1) average |
| Ordered data + fast search | BST / B-Tree | O(log n) everything |
| Fast min/max only | Heap | O(1) peek, O(log n) extract |
| Model relationships | Graph | Represents connections |
| Combine multiple needs | Custom | e.g., LRU = Hash + DLL |
Appendix: Complexity Cheat Sheet
| Data Structure | Access | Search | Insert | Delete | Space |
|---|---|---|---|---|---|
| Array | O(1) | O(n) | O(n) | O(n) | O(n) |
| Dynamic Array | O(1) | O(n) | O(1)* | O(n) | O(n) |
| Linked List | O(n) | O(n) | O(1)† | O(1)† | O(n) |
| Stack | O(n) | O(n) | O(1) | O(1) | O(n) |
| Queue | O(n) | O(n) | O(1) | O(1) | O(n) |
| Hash Table | N/A | O(1)* | O(1)* | O(1)* | O(n) |
| BST (balanced) | O(log n) | O(log n) | O(log n) | O(log n) | O(n) |
| BST (unbalanced) | O(n) | O(n) | O(n) | O(n) | O(n) |
| Heap | O(1)‡ | O(n) | O(log n) | O(log n) | O(n) |
| Graph (adj list) | O(V) | O(V) | O(1) | O(E) | O(V+E) |
*amortized †if you have the pointer ‡min/max only