Project 5: Hash Table

Implement a hash table with collision handling, resizing, and string keys.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	1-2 weeks
Language	C
Prerequisites	Projects 2-4, hashing basics
Key Topics	Hashing, collisions, load factor
Main Programming Language	C
Alternative Programming Languages	Rust, Go, Zig
Coolness Level	Level 3: Genuinely Clever
Business Potential	2. The “Micro-SaaS / Pro Tool”
Knowledge Area	Data Structures / Hashing / Collision Resolution
Software or Tool	Valgrind, Performance profiler
Main Book	Mastering Algorithms with C by Kyle Loudon

What You Will Build

Implement a hash table with collision handling, resizing, and string keys.

This project emphasizes Data Structures / Hashing / Collision Resolution using C with production-style constraints.

1. Learning Objectives

By completing this project, you will:

Implement a hash table with chaining or open addressing.
Understand load factors and resizing.
Design APIs for insert, lookup, and delete.
Handle memory ownership for keys and values.

2. Theoretical Foundation

2.1 Core Concepts

Hash function: Maps keys to bucket indices.
Collisions: Different keys can map to same index.
Load factor: Ratio of items to buckets; drives resizing.

2.2 Why This Matters

Hash tables are foundational for symbol tables, caches, and databases. Implementing one teaches you performance trade-offs and careful memory handling.

2.3 Historical Context / Background

Hash tables are a classic computer science structure, balancing average O(1) operations with collision management.

2.4 Common Misconceptions

“Hash functions avoid all collisions”: They cannot.
“Resizing is optional”: Without it, performance degrades.

3. Project Specification

3.1 What You Will Build

A hashmap library that supports string keys and void-pointer values with:

map_put, map_get, map_remove
map_contains, map_size
map_iterate
Automatic resizing

3.2 Functional Requirements

Support insertion and lookup by string key.
Handle collisions correctly.
Resize when load factor exceeds threshold.
Provide deletion without memory leaks.

3.3 Non-Functional Requirements

Performance: Average O(1) for put/get.
Reliability: No leaks on rehash.
Usability: Clear ownership rules for keys/values.

3.4 Example Usage / Output

Map map;
map_init(&map);
map_put(&map, "name", strdup("Ada"));
char *name = map_get(&map, "name");
map_free(&map, free);

3.5 Real World Outcome

You can store and retrieve data in O(1) average time and understand why resizing and hashing quality affect real-world performance.

4. Solution Architecture

4.1 High-Level Design

Map -> buckets[] -> (chaining) linked lists of entries

4.2 Key Components

Component	Responsibility	Key Decisions
Hash function	Compute index	Use FNV-1a or djb2
Buckets	Store entries	Chaining for simplicity
Resizer	Rehash entries	Double capacity

4.3 Data Structures

typedef struct Entry {
    char *key;
    void *value;
    struct Entry *next;
} Entry;

typedef struct {
    Entry **buckets;
    size_t capacity;
    size_t size;
} Map;

4.4 Algorithm Overview

Key Algorithm: Put with chaining

Hash key to bucket index.
Traverse list; update if key exists.
Otherwise, insert new entry at head.
Resize if load factor exceeded.

Complexity Analysis:

Average: O(1)
Worst-case: O(n)

5. Implementation Guide

5.1 Development Environment Setup

cc -Wall -Wextra -O2 -g -o test_map test_map.c map.c

5.2 Project Structure

hashmap/
├── src/
│   ├── map.c
│   └── map.h
├── tests/
│   └── test_map.c
└── README.md

5.3 The Core Question You’re Answering

“How do I map arbitrary keys to fast lookups without losing correctness?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

Hash functions
- What makes a hash function “good”?
Collisions
- How does chaining differ from open addressing?
Load factor
- When should you resize?

5.5 Questions to Guide Your Design

Before implementing, think through these:

Will you copy keys or store references?
How will you handle memory cleanup?
What load factor threshold will you use?

5.6 Thinking Exercise

Collision Analysis

If 1,000 keys are inserted into 1,024 buckets, what is the expected average chain length?

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

“What is a load factor and why does it matter?”
“How do you handle hash collisions?”
“Why do hash tables resize?”

5.8 Hints in Layers

Hint 1: Start with fixed capacity Implement put/get without resizing.

Hint 2: Add deletion Remove nodes from a bucket list safely.

Hint 3: Add resizing Rehash all entries into a larger table.

5.9 Books That Will Help

Topic	Book	Chapter
Hashing	“Algorithms in C”	Hash tables chapter
C pointers	“The C Programming Language”	Ch. 5

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Goals:

Basic put/get with chaining

Tasks:

Implement hash function.
Implement map_put and map_get.

Checkpoint: Basic lookup works.

Phase 2: Core Functionality (3-4 days)

Goals:

Deletion and iteration

Tasks:

Add map_remove.
Add iteration helper.

Checkpoint: Remove and iterate work.

Phase 3: Resizing (2-3 days)

Goals:

Automatic rehash

Tasks:

Trigger rehash on load factor.
Move entries to new buckets.

Checkpoint: Size grows and lookups stay correct.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Collision handling	Chaining vs open addressing	Chaining	Simpler deletion
Key ownership	Copy vs reference	Copy	Safer API

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Hash stability	Known keys
Stress Tests	High load	100k keys
Edge Cases	Empty, duplicate keys	Update existing

6.2 Critical Test Cases

Insert duplicate key: Value updated.
Delete from middle: Chain remains intact.
Resize: All entries still reachable.

6.3 Test Data

"a", "b", "c", "long_key"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Not copying keys	Use-after-free	`strdup` keys
Resize leaks	Memory loss	Free old buckets
Bad hash function	Many collisions	Use FNV-1a or djb2

7.2 Debugging Strategies

Log bucket index for each key.
Verify chain integrity after deletes.

7.3 Performance Traps

High load factors degrade performance; resize early to keep average chain length low.

8. Extensions & Challenges

8.1 Beginner Extensions

Add map_keys function.
Add map_values iterator.

8.2 Intermediate Extensions

Support integer keys.
Add open addressing implementation.

8.3 Advanced Extensions

Implement Robin Hood hashing.
Add incremental rehashing.

9. Real-World Connections

9.1 Industry Applications

Compilers: Symbol tables.
Caches: Fast key lookup.

uthash: Popular C hash table library.

9.3 Interview Relevance

Hash tables are a staple in interviews for systems and backend roles.

10. Resources

10.1 Essential Reading

“Algorithms in C” - Hash tables chapter
“The C Programming Language” - Ch. 5

10.2 Video Resources

Hashing and load factor lectures

10.3 Tools & Documentation

man 3 strdup: Key copying helper

Dynamic Array: Alternative storage.
Database Engine: Uses hashing for indexing.

11. Self-Assessment Checklist

11.1 Understanding

I can explain load factors and resizing.
I can compare chaining vs open addressing.
I can describe hash function requirements.

11.2 Implementation

Put/get/remove work correctly.
Resizing keeps data intact.
No leaks or corruption.

11.3 Growth

I can implement open addressing.
I can explain this project in an interview.

Definition of Done

Minimum Viable Completion:

Put/get with collision handling.

Full Completion:

Supports deletion and resizing.

Excellence (Going Above & Beyond):

Implements Robin Hood hashing or incremental rehash.

This guide was generated from C_PROGRAMMING_COMPLETE_MASTERY.md. For the complete learning path, see the parent directory.

Project 5: Hash Table

Quick Reference

What You Will Build

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Historical Context / Background

2.4 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Real World Outcome

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

Collision Analysis

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Phase 2: Core Functionality (3-4 days)

Phase 3: Resizing (2-3 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

Definition of Done