Project 5: Hash Table
Build a hash table with open addressing or chaining, then use it as a dictionary and cache.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2 weeks |
| Language | C (Alternatives: Rust, C++) |
| Prerequisites | Projects 1-4, strong pointers |
| Key Topics | Hashing, collisions, load factor, resizing |
1. Learning Objectives
By completing this project, you will:
- Implement a hash table with insert, delete, find.
- Compare collision strategies (chaining vs open addressing).
- Tune load factor and resize policies.
- Use a hash table to build a simple LRU cache.
2. Theoretical Foundation
2.1 Core Concepts
- Hash function: Maps keys to uniform bucket indices.
- Collision handling: Multiple keys may hash to the same index.
- Load factor:
n / capacityimpacts speed and memory. - Amortized resizing: Grow table to maintain O(1) average ops.
2.2 Why This Matters
Hash tables power databases, compilers, caches, and language runtimes. Understanding their trade-offs helps you choose the right design and avoid pathological performance.
2.3 Historical Context / Background
Hashing became practical in the 1950s and is now a standard tool for constant-time lookup in modern systems.
2.4 Common Misconceptions
- “O(1) always”: Worst-case is O(n) if hashing is poor.
- “Any hash function works”: Distribution quality matters.
3. Project Specification
3.1 What You Will Build
- A hash table supporting string keys and integer values.
- Collision handling with either chaining or open addressing.
- An LRU cache using a hash table + doubly-linked list.
3.2 Functional Requirements
- Insert, find, delete operations.
- Resize when load factor exceeds threshold.
- Support iteration over keys.
- Optional: LRU cache with fixed capacity.
3.3 Non-Functional Requirements
- Performance: Average O(1) operations.
- Reliability: Proper handling of collisions and deletes.
- Usability: CLI demo for dictionary and cache behavior.
3.4 Example Usage / Output
$ ./hashtable_demo
Insert alice=30
Insert bob=25
Find bob -> 25
Delete alice
Load factor: 0.62 -> resize to 128 buckets
3.5 Real World Outcome
You run the demo and insert key/value pairs. The program prints the bucket index, collision count, and when it resizes. When you enable LRU mode, it shows cache hits, misses, and evictions as you exceed capacity.
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌──────────────────┐ ┌────────────────┐
│ Hash Table │────▶│ Collision Policy │────▶│ LRU Cache │
└──────────────┘ └──────────────────┘ └────────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Hash fn | Map keys to indices | Use FNV-1a or djb2 |
| Buckets | Store entries | Chaining vs open addressing |
| Resizer | Maintain load factor | Double size at 0.75 |
| LRU list | Track recency | Doubly-linked list |
4.3 Data Structures
typedef struct Entry {
char* key;
int value;
struct Entry* next; // for chaining
} Entry;
typedef struct {
Entry** buckets;
size_t capacity;
size_t size;
} HashTable;
4.4 Algorithm Overview
Key Algorithm: Insert with Chaining
- Hash key to index.
- Scan chain for existing key.
- Insert new entry at head if absent.
- Resize if load factor exceeds threshold.
Complexity Analysis:
- Time: O(1) average, O(n) worst-case
- Space: O(n + capacity)
5. Implementation Guide
5.1 Development Environment Setup
clang -Wall -Wextra -g -o hashtable_demo src/*.c
5.2 Project Structure
project-root/
├── src/
│ ├── hash.c
│ ├── hashtable.c
│ ├── lru.c
│ └── main.c
├── tests/
│ └── test_hashtable.c
└── Makefile
5.3 The Core Question You’re Answering
“How can I trade extra memory to make lookup nearly constant time?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Hash function properties
- Uniform distribution
- Determinism
- Collision handling
- Separate chaining vs open addressing
- Load factor
- Why performance drops as it grows
5.5 Questions to Guide Your Design
- Which collision strategy will you implement first?
- How will you handle deletion in open addressing?
- What load factor threshold makes sense?
5.6 Thinking Exercise
Take 10 keys, hash them with a simple hash, and map to 8 buckets. Count collisions and estimate load factor impact.
5.7 The Interview Questions They’ll Ask
- “What happens when two keys hash to the same bucket?”
- “Why does load factor matter?”
- “How does LRU caching use a hash table?”
5.8 Hints in Layers
Hint 1: Start with chaining It is simpler and deletion is straightforward.
Hint 2: Add resize early Your performance will degrade without it.
Hint 3: Use a real hash function FNV-1a is short and reliable for strings.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Hash tables | CLRS | Ch. 11 |
| C pointers | C Programming: A Modern Approach | Ch. 11 |
5.10 Implementation Phases
Phase 1: Foundation (3 days)
Goals:
- Hash function and basic insert/find
Tasks:
- Implement string hashing.
- Insert into buckets with chaining.
Checkpoint: Insert and find operations work on small sets.
Phase 2: Core Functionality (4 days)
Goals:
- Deletion and resizing
Tasks:
- Implement delete with chain update.
- Add resize when load factor > 0.75.
Checkpoint: Table grows and maintains data.
Phase 3: Polish & Edge Cases (3 days)
Goals:
- LRU cache demo
- Metrics
Tasks:
- Add doubly-linked list for recency.
- Print collisions and load factor.
Checkpoint: Cache evicts least-recently used key.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Collision strategy | Chaining, open addressing | Chaining | Simpler deletes, easier to debug |
| Resize threshold | 0.5, 0.75, 0.9 | 0.75 | Common balance of space/time |
| Hash fn | djb2, FNV-1a | FNV-1a | Good distribution for strings |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Hash + insert/find | collisions |
| Integration Tests | Resizing | insert 1000 keys |
| Edge Case Tests | Deletion | delete head of chain |
6.2 Critical Test Cases
- Insert duplicate key updates value.
- Resize preserves all keys.
- Delete removes key and leaves others intact.
6.3 Test Data
Keys: "a", "b", "c", "aa", "bb", "cc"
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Forgetting to rehash on resize | Missing keys | Recompute indices |
| Poor hash function | Many collisions | Switch hash fn |
| Memory leaks | valgrind warnings |
Free keys and entries |
7.2 Debugging Strategies
- Print bucket lengths after inserts.
- Track collision counts and load factor.
7.3 Performance Traps
- Excessive collisions turn O(1) into O(n).
8. Extensions & Challenges
8.1 Beginner Extensions
- Implement open addressing (linear probing).
- Add a
containsAPI.
8.2 Intermediate Extensions
- Add tombstones and rehashing for open addressing.
- Make hash table generic with void* values.
8.3 Advanced Extensions
- Implement Robin Hood hashing.
- Add incremental resizing.
9. Real-World Connections
9.1 Industry Applications
- Databases: hash indexes for equality lookups.
- Caches: O(1) access to hot data.
9.2 Related Open Source Projects
- Redis: dictionary implementation.
- CPython: open addressing hash table.
9.3 Interview Relevance
- Hash table design and collision handling are frequent interview topics.
10. Resources
10.1 Essential Reading
- CLRS - Ch. 11
- Algorithms, 4th Edition by Sedgewick - hashing chapter
10.2 Video Resources
- “Hash Tables” - algorithms courses
10.3 Tools & Documentation
- Valgrind: https://valgrind.org/
10.4 Related Projects in This Series
- Previous Project: Queue Systems
- Next Project: Binary Search Tree & Balancing
11. Self-Assessment Checklist
11.1 Understanding
- I can explain collisions and load factor.
- I can justify resize thresholds.
- I can compare chaining vs open addressing.
11.2 Implementation
- All operations pass tests.
- Resizing preserves entries.
- LRU cache behaves correctly.
11.3 Growth
- I can swap in a better hash function.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Insert/find/delete operations with chaining.
- Load factor tracking.
Full Completion:
- Resizing and LRU cache demo.
Excellence (Going Above & Beyond):
- Open addressing with Robin Hood hashing.
This guide was generated from DATA_STRUCTURES_FROM_FIRST_PRINCIPLES.md. For the complete learning path, see the parent directory README.