Project 5: Hash Table
Implement a hash table with collision handling, resizing, and string keys.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1-2 weeks |
| Language | C |
| Prerequisites | Projects 2-4, hashing basics |
| Key Topics | Hashing, collisions, load factor |
1. Learning Objectives
By completing this project, you will:
- Implement a hash table with chaining or open addressing.
- Understand load factors and resizing.
- Design APIs for insert, lookup, and delete.
- Handle memory ownership for keys and values.
2. Theoretical Foundation
2.1 Core Concepts
- Hash function: Maps keys to bucket indices.
- Collisions: Different keys can map to same index.
- Load factor: Ratio of items to buckets; drives resizing.
2.2 Why This Matters
Hash tables are foundational for symbol tables, caches, and databases. Implementing one teaches you performance trade-offs and careful memory handling.
2.3 Historical Context / Background
Hash tables are a classic computer science structure, balancing average O(1) operations with collision management.
2.4 Common Misconceptions
- “Hash functions avoid all collisions”: They cannot.
- “Resizing is optional”: Without it, performance degrades.
3. Project Specification
3.1 What You Will Build
A hashmap library that supports string keys and void-pointer values with:
map_put,map_get,map_removemap_contains,map_sizemap_iterate- Automatic resizing
3.2 Functional Requirements
- Support insertion and lookup by string key.
- Handle collisions correctly.
- Resize when load factor exceeds threshold.
- Provide deletion without memory leaks.
3.3 Non-Functional Requirements
- Performance: Average O(1) for put/get.
- Reliability: No leaks on rehash.
- Usability: Clear ownership rules for keys/values.
3.4 Example Usage / Output
Map map;
map_init(&map);
map_put(&map, "name", strdup("Ada"));
char *name = map_get(&map, "name");
map_free(&map, free);
3.5 Real World Outcome
You can store and retrieve data in O(1) average time and understand why resizing and hashing quality affect real-world performance.
4. Solution Architecture
4.1 High-Level Design
Map -> buckets[] -> (chaining) linked lists of entries
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Hash function | Compute index | Use FNV-1a or djb2 |
| Buckets | Store entries | Chaining for simplicity |
| Resizer | Rehash entries | Double capacity |
4.3 Data Structures
typedef struct Entry {
char *key;
void *value;
struct Entry *next;
} Entry;
typedef struct {
Entry **buckets;
size_t capacity;
size_t size;
} Map;
4.4 Algorithm Overview
Key Algorithm: Put with chaining
- Hash key to bucket index.
- Traverse list; update if key exists.
- Otherwise, insert new entry at head.
- Resize if load factor exceeded.
Complexity Analysis:
- Average: O(1)
- Worst-case: O(n)
5. Implementation Guide
5.1 Development Environment Setup
cc -Wall -Wextra -O2 -g -o test_map test_map.c map.c
5.2 Project Structure
hashmap/
├── src/
│ ├── map.c
│ └── map.h
├── tests/
│ └── test_map.c
└── README.md
5.3 The Core Question You’re Answering
“How do I map arbitrary keys to fast lookups without losing correctness?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Hash functions
- What makes a hash function “good”?
- Collisions
- How does chaining differ from open addressing?
- Load factor
- When should you resize?
5.5 Questions to Guide Your Design
Before implementing, think through these:
- Will you copy keys or store references?
- How will you handle memory cleanup?
- What load factor threshold will you use?
5.6 Thinking Exercise
Collision Analysis
If 1,000 keys are inserted into 1,024 buckets, what is the expected average chain length?
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “What is a load factor and why does it matter?”
- “How do you handle hash collisions?”
- “Why do hash tables resize?”
5.8 Hints in Layers
Hint 1: Start with fixed capacity Implement put/get without resizing.
Hint 2: Add deletion Remove nodes from a bucket list safely.
Hint 3: Add resizing Rehash all entries into a larger table.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Hashing | “Algorithms in C” | Hash tables chapter |
| C pointers | “The C Programming Language” | Ch. 5 |
5.10 Implementation Phases
Phase 1: Foundation (3-4 days)
Goals:
- Basic put/get with chaining
Tasks:
- Implement hash function.
- Implement
map_putandmap_get.
Checkpoint: Basic lookup works.
Phase 2: Core Functionality (3-4 days)
Goals:
- Deletion and iteration
Tasks:
- Add
map_remove. - Add iteration helper.
Checkpoint: Remove and iterate work.
Phase 3: Resizing (2-3 days)
Goals:
- Automatic rehash
Tasks:
- Trigger rehash on load factor.
- Move entries to new buckets.
Checkpoint: Size grows and lookups stay correct.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Collision handling | Chaining vs open addressing | Chaining | Simpler deletion |
| Key ownership | Copy vs reference | Copy | Safer API |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Hash stability | Known keys |
| Stress Tests | High load | 100k keys |
| Edge Cases | Empty, duplicate keys | Update existing |
6.2 Critical Test Cases
- Insert duplicate key: Value updated.
- Delete from middle: Chain remains intact.
- Resize: All entries still reachable.
6.3 Test Data
"a", "b", "c", "long_key"
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Not copying keys | Use-after-free | strdup keys |
| Resize leaks | Memory loss | Free old buckets |
| Bad hash function | Many collisions | Use FNV-1a or djb2 |
7.2 Debugging Strategies
- Log bucket index for each key.
- Verify chain integrity after deletes.
7.3 Performance Traps
High load factors degrade performance; resize early to keep average chain length low.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
map_keysfunction. - Add
map_valuesiterator.
8.2 Intermediate Extensions
- Support integer keys.
- Add open addressing implementation.
8.3 Advanced Extensions
- Implement Robin Hood hashing.
- Add incremental rehashing.
9. Real-World Connections
9.1 Industry Applications
- Compilers: Symbol tables.
- Caches: Fast key lookup.
9.2 Related Open Source Projects
- uthash: Popular C hash table library.
9.3 Interview Relevance
Hash tables are a staple in interviews for systems and backend roles.
10. Resources
10.1 Essential Reading
- “Algorithms in C” - Hash tables chapter
- “The C Programming Language” - Ch. 5
10.2 Video Resources
- Hashing and load factor lectures
10.3 Tools & Documentation
man 3 strdup: Key copying helper
10.4 Related Projects in This Series
- Dynamic Array: Alternative storage.
- Database Engine: Uses hashing for indexing.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain load factors and resizing.
- I can compare chaining vs open addressing.
- I can describe hash function requirements.
11.2 Implementation
- Put/get/remove work correctly.
- Resizing keeps data intact.
- No leaks or corruption.
11.3 Growth
- I can implement open addressing.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Put/get with collision handling.
Full Completion:
- Supports deletion and resizing.
Excellence (Going Above & Beyond):
- Implements Robin Hood hashing or incremental rehash.
This guide was generated from C_PROGRAMMING_COMPLETE_MASTERY.md. For the complete learning path, see the parent directory.