Project 5: Hash Table

Implement a hash table with collision handling, resizing, and string keys.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language C
Prerequisites Projects 2-4, hashing basics
Key Topics Hashing, collisions, load factor

1. Learning Objectives

By completing this project, you will:

  1. Implement a hash table with chaining or open addressing.
  2. Understand load factors and resizing.
  3. Design APIs for insert, lookup, and delete.
  4. Handle memory ownership for keys and values.

2. Theoretical Foundation

2.1 Core Concepts

  • Hash function: Maps keys to bucket indices.
  • Collisions: Different keys can map to same index.
  • Load factor: Ratio of items to buckets; drives resizing.

2.2 Why This Matters

Hash tables are foundational for symbol tables, caches, and databases. Implementing one teaches you performance trade-offs and careful memory handling.

2.3 Historical Context / Background

Hash tables are a classic computer science structure, balancing average O(1) operations with collision management.

2.4 Common Misconceptions

  • “Hash functions avoid all collisions”: They cannot.
  • “Resizing is optional”: Without it, performance degrades.

3. Project Specification

3.1 What You Will Build

A hashmap library that supports string keys and void-pointer values with:

  • map_put, map_get, map_remove
  • map_contains, map_size
  • map_iterate
  • Automatic resizing

3.2 Functional Requirements

  1. Support insertion and lookup by string key.
  2. Handle collisions correctly.
  3. Resize when load factor exceeds threshold.
  4. Provide deletion without memory leaks.

3.3 Non-Functional Requirements

  • Performance: Average O(1) for put/get.
  • Reliability: No leaks on rehash.
  • Usability: Clear ownership rules for keys/values.

3.4 Example Usage / Output

Map map;
map_init(&map);
map_put(&map, "name", strdup("Ada"));
char *name = map_get(&map, "name");
map_free(&map, free);

3.5 Real World Outcome

You can store and retrieve data in O(1) average time and understand why resizing and hashing quality affect real-world performance.


4. Solution Architecture

4.1 High-Level Design

Map -> buckets[] -> (chaining) linked lists of entries

4.2 Key Components

Component Responsibility Key Decisions
Hash function Compute index Use FNV-1a or djb2
Buckets Store entries Chaining for simplicity
Resizer Rehash entries Double capacity

4.3 Data Structures

typedef struct Entry {
    char *key;
    void *value;
    struct Entry *next;
} Entry;

typedef struct {
    Entry **buckets;
    size_t capacity;
    size_t size;
} Map;

4.4 Algorithm Overview

Key Algorithm: Put with chaining

  1. Hash key to bucket index.
  2. Traverse list; update if key exists.
  3. Otherwise, insert new entry at head.
  4. Resize if load factor exceeded.

Complexity Analysis:

  • Average: O(1)
  • Worst-case: O(n)

5. Implementation Guide

5.1 Development Environment Setup

cc -Wall -Wextra -O2 -g -o test_map test_map.c map.c

5.2 Project Structure

hashmap/
├── src/
│   ├── map.c
│   └── map.h
├── tests/
│   └── test_map.c
└── README.md

5.3 The Core Question You’re Answering

“How do I map arbitrary keys to fast lookups without losing correctness?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Hash functions
    • What makes a hash function “good”?
  2. Collisions
    • How does chaining differ from open addressing?
  3. Load factor
    • When should you resize?

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Will you copy keys or store references?
  2. How will you handle memory cleanup?
  3. What load factor threshold will you use?

5.6 Thinking Exercise

Collision Analysis

If 1,000 keys are inserted into 1,024 buckets, what is the expected average chain length?

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is a load factor and why does it matter?”
  2. “How do you handle hash collisions?”
  3. “Why do hash tables resize?”

5.8 Hints in Layers

Hint 1: Start with fixed capacity Implement put/get without resizing.

Hint 2: Add deletion Remove nodes from a bucket list safely.

Hint 3: Add resizing Rehash all entries into a larger table.

5.9 Books That Will Help

Topic Book Chapter
Hashing “Algorithms in C” Hash tables chapter
C pointers “The C Programming Language” Ch. 5

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Goals:

  • Basic put/get with chaining

Tasks:

  1. Implement hash function.
  2. Implement map_put and map_get.

Checkpoint: Basic lookup works.

Phase 2: Core Functionality (3-4 days)

Goals:

  • Deletion and iteration

Tasks:

  1. Add map_remove.
  2. Add iteration helper.

Checkpoint: Remove and iterate work.

Phase 3: Resizing (2-3 days)

Goals:

  • Automatic rehash

Tasks:

  1. Trigger rehash on load factor.
  2. Move entries to new buckets.

Checkpoint: Size grows and lookups stay correct.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Collision handling Chaining vs open addressing Chaining Simpler deletion
Key ownership Copy vs reference Copy Safer API

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Hash stability Known keys
Stress Tests High load 100k keys
Edge Cases Empty, duplicate keys Update existing

6.2 Critical Test Cases

  1. Insert duplicate key: Value updated.
  2. Delete from middle: Chain remains intact.
  3. Resize: All entries still reachable.

6.3 Test Data

"a", "b", "c", "long_key"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Not copying keys Use-after-free strdup keys
Resize leaks Memory loss Free old buckets
Bad hash function Many collisions Use FNV-1a or djb2

7.2 Debugging Strategies

  • Log bucket index for each key.
  • Verify chain integrity after deletes.

7.3 Performance Traps

High load factors degrade performance; resize early to keep average chain length low.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add map_keys function.
  • Add map_values iterator.

8.2 Intermediate Extensions

  • Support integer keys.
  • Add open addressing implementation.

8.3 Advanced Extensions

  • Implement Robin Hood hashing.
  • Add incremental rehashing.

9. Real-World Connections

9.1 Industry Applications

  • Compilers: Symbol tables.
  • Caches: Fast key lookup.
  • uthash: Popular C hash table library.

9.3 Interview Relevance

Hash tables are a staple in interviews for systems and backend roles.


10. Resources

10.1 Essential Reading

  • “Algorithms in C” - Hash tables chapter
  • “The C Programming Language” - Ch. 5

10.2 Video Resources

  • Hashing and load factor lectures

10.3 Tools & Documentation

  • man 3 strdup: Key copying helper
  • Dynamic Array: Alternative storage.
  • Database Engine: Uses hashing for indexing.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain load factors and resizing.
  • I can compare chaining vs open addressing.
  • I can describe hash function requirements.

11.2 Implementation

  • Put/get/remove work correctly.
  • Resizing keeps data intact.
  • No leaks or corruption.

11.3 Growth

  • I can implement open addressing.
  • I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Put/get with collision handling.

Full Completion:

  • Supports deletion and resizing.

Excellence (Going Above & Beyond):

  • Implements Robin Hood hashing or incremental rehash.

This guide was generated from C_PROGRAMMING_COMPLETE_MASTERY.md. For the complete learning path, see the parent directory.