Project 5: Hash Table

Build a hash table with open addressing or chaining, then use it as a dictionary and cache.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2 weeks
Language	C (Alternatives: Rust, C++)
Prerequisites	Projects 1-4, strong pointers
Key Topics	Hashing, collisions, load factor, resizing

1. Learning Objectives

By completing this project, you will:

Implement a hash table with insert, delete, find.
Compare collision strategies (chaining vs open addressing).
Tune load factor and resize policies.
Use a hash table to build a simple LRU cache.

2. Theoretical Foundation

2.1 Core Concepts

Hash function: Maps keys to uniform bucket indices.
Collision handling: Multiple keys may hash to the same index.
Load factor: n / capacity impacts speed and memory.
Amortized resizing: Grow table to maintain O(1) average ops.

2.2 Why This Matters

Hash tables power databases, compilers, caches, and language runtimes. Understanding their trade-offs helps you choose the right design and avoid pathological performance.

2.3 Historical Context / Background

Hashing became practical in the 1950s and is now a standard tool for constant-time lookup in modern systems.

2.4 Common Misconceptions

“O(1) always”: Worst-case is O(n) if hashing is poor.
“Any hash function works”: Distribution quality matters.

3. Project Specification

3.1 What You Will Build

A hash table supporting string keys and integer values.
Collision handling with either chaining or open addressing.
An LRU cache using a hash table + doubly-linked list.

3.2 Functional Requirements

Insert, find, delete operations.
Resize when load factor exceeds threshold.
Support iteration over keys.
Optional: LRU cache with fixed capacity.

3.3 Non-Functional Requirements

Performance: Average O(1) operations.
Reliability: Proper handling of collisions and deletes.
Usability: CLI demo for dictionary and cache behavior.

3.4 Example Usage / Output

$ ./hashtable_demo
Insert alice=30
Insert bob=25
Find bob -> 25
Delete alice
Load factor: 0.62 -> resize to 128 buckets

3.5 Real World Outcome

You run the demo and insert key/value pairs. The program prints the bucket index, collision count, and when it resizes. When you enable LRU mode, it shows cache hits, misses, and evictions as you exceed capacity.

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────────┐     ┌────────────────┐
│ Hash Table   │────▶│ Collision Policy │────▶│ LRU Cache      │
└──────────────┘     └──────────────────┘     └────────────────┘

4.2 Key Components

Component	Responsibility	Key Decisions
Hash fn	Map keys to indices	Use FNV-1a or djb2
Buckets	Store entries	Chaining vs open addressing
Resizer	Maintain load factor	Double size at 0.75
LRU list	Track recency	Doubly-linked list

4.3 Data Structures

typedef struct Entry {
    char* key;
    int value;
    struct Entry* next;  // for chaining
} Entry;

typedef struct {
    Entry** buckets;
    size_t capacity;
    size_t size;
} HashTable;

4.4 Algorithm Overview

Key Algorithm: Insert with Chaining

Hash key to index.
Scan chain for existing key.
Insert new entry at head if absent.
Resize if load factor exceeds threshold.

Complexity Analysis:

Time: O(1) average, O(n) worst-case
Space: O(n + capacity)

5. Implementation Guide

5.1 Development Environment Setup

clang -Wall -Wextra -g -o hashtable_demo src/*.c

5.2 Project Structure

project-root/
├── src/
│   ├── hash.c
│   ├── hashtable.c
│   ├── lru.c
│   └── main.c
├── tests/
│   └── test_hashtable.c
└── Makefile

5.3 The Core Question You’re Answering

“How can I trade extra memory to make lookup nearly constant time?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

Hash function properties
- Uniform distribution
- Determinism
Collision handling
- Separate chaining vs open addressing
Load factor
- Why performance drops as it grows

5.5 Questions to Guide Your Design

Which collision strategy will you implement first?
How will you handle deletion in open addressing?
What load factor threshold makes sense?

5.6 Thinking Exercise

Take 10 keys, hash them with a simple hash, and map to 8 buckets. Count collisions and estimate load factor impact.

5.7 The Interview Questions They’ll Ask

“What happens when two keys hash to the same bucket?”
“Why does load factor matter?”
“How does LRU caching use a hash table?”

5.8 Hints in Layers

Hint 1: Start with chaining It is simpler and deletion is straightforward.

Hint 2: Add resize early Your performance will degrade without it.

Hint 3: Use a real hash function FNV-1a is short and reliable for strings.

5.9 Books That Will Help

Topic	Book	Chapter
Hash tables	CLRS	Ch. 11
C pointers	C Programming: A Modern Approach	Ch. 11

5.10 Implementation Phases

Phase 1: Foundation (3 days)

Goals:

Hash function and basic insert/find

Tasks:

Implement string hashing.
Insert into buckets with chaining.

Checkpoint: Insert and find operations work on small sets.

Phase 2: Core Functionality (4 days)

Goals:

Deletion and resizing

Tasks:

Implement delete with chain update.
Add resize when load factor > 0.75.

Checkpoint: Table grows and maintains data.

Phase 3: Polish & Edge Cases (3 days)

Goals:

LRU cache demo
Metrics

Tasks:

Add doubly-linked list for recency.
Print collisions and load factor.

Checkpoint: Cache evicts least-recently used key.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Collision strategy	Chaining, open addressing	Chaining	Simpler deletes, easier to debug
Resize threshold	0.5, 0.75, 0.9	0.75	Common balance of space/time
Hash fn	djb2, FNV-1a	FNV-1a	Good distribution for strings

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Hash + insert/find	collisions
Integration Tests	Resizing	insert 1000 keys
Edge Case Tests	Deletion	delete head of chain

6.2 Critical Test Cases

Insert duplicate key updates value.
Resize preserves all keys.
Delete removes key and leaves others intact.

6.3 Test Data

Keys: "a", "b", "c", "aa", "bb", "cc"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Forgetting to rehash on resize	Missing keys	Recompute indices
Poor hash function	Many collisions	Switch hash fn
Memory leaks	`valgrind` warnings	Free keys and entries

7.2 Debugging Strategies

Print bucket lengths after inserts.
Track collision counts and load factor.

7.3 Performance Traps

Excessive collisions turn O(1) into O(n).

8. Extensions & Challenges

8.1 Beginner Extensions

Implement open addressing (linear probing).
Add a contains API.

8.2 Intermediate Extensions

Add tombstones and rehashing for open addressing.
Make hash table generic with void* values.

8.3 Advanced Extensions

Implement Robin Hood hashing.
Add incremental resizing.

9. Real-World Connections

9.1 Industry Applications

Databases: hash indexes for equality lookups.
Caches: O(1) access to hot data.

Redis: dictionary implementation.
CPython: open addressing hash table.

9.3 Interview Relevance

Hash table design and collision handling are frequent interview topics.

10. Resources

10.1 Essential Reading

CLRS - Ch. 11
Algorithms, 4th Edition by Sedgewick - hashing chapter

10.2 Video Resources

“Hash Tables” - algorithms courses

10.3 Tools & Documentation

Valgrind: https://valgrind.org/

Previous Project: Queue Systems
Next Project: Binary Search Tree & Balancing

11. Self-Assessment Checklist

11.1 Understanding

I can explain collisions and load factor.
I can justify resize thresholds.
I can compare chaining vs open addressing.

11.2 Implementation

All operations pass tests.
Resizing preserves entries.
LRU cache behaves correctly.

11.3 Growth

I can swap in a better hash function.

12. Submission / Completion Criteria

Minimum Viable Completion:

Insert/find/delete operations with chaining.
Load factor tracking.

Full Completion:

Resizing and LRU cache demo.

Excellence (Going Above & Beyond):

Open addressing with Robin Hood hashing.

This guide was generated from DATA_STRUCTURES_FROM_FIRST_PRINCIPLES.md. For the complete learning path, see the parent directory README.

Project 5: Hash Table

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Historical Context / Background

2.4 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Real World Outcome

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (3 days)

Phase 2: Core Functionality (4 days)

Phase 3: Polish & Edge Cases (3 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria