Project 10: Hash Table Laboratory

Implement and compare hashing strategies from TAOCP Vol 3.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2 weeks
Language	C
Prerequisites	Arrays, hashing basics
Key Topics	chaining, open addressing, load factor

1. Learning Objectives

Implement separate chaining and open addressing.
Measure collision rates and probe lengths.
Explore hash functions and load factors.
Validate correctness and performance.

2. Theoretical Foundation

2.1 Core Concepts

Load factor: alpha = n / m.
Collision handling: chaining vs probing.
Hash function quality: distribution and independence.

2.2 Why This Matters

Hash tables are fast only when collisions are controlled. TAOCP analyzes these tradeoffs rigorously.

2.3 Historical Context / Background

Knuth established key analysis for probing strategies and expected probe lengths.

2.4 Common Misconceptions

“Any hash function works”: Poor hashes break performance.
“Load factor can be 1”: Probing degrades quickly.

3. Project Specification

3.1 What You Will Build

A hash table lab that supports multiple hashing strategies and a benchmark harness.

3.2 Functional Requirements

Separate chaining with linked lists.
Open addressing with linear/quadratic probing.
Pluggable hash functions.
Metrics on collisions and probes.

3.3 Non-Functional Requirements

Correctness: lookup returns correct values.
Performance: measure probe costs.
Usability: CLI for experiments.

3.4 Example Usage / Output

$ ./hashlab --method linear --load 0.7
avg_probe=2.1

3.5 Real World Outcome

You can choose a hashing strategy based on empirical and theoretical evidence.

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ hash core    │────▶│ strategies   │────▶│ metrics      │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component	Responsibility	Key Decisions
Hash funcs	key -> index	djb2, FNV
Table impl	store items	chaining/probing
Metrics	collisions/probes	counters

4.3 Data Structures

typedef struct {
    char *key;
    int value;
} Entry;

4.4 Algorithm Overview

Key Algorithm: Linear probing

hash key to index.
probe sequentially until empty slot.

Complexity Analysis:

Expected O(1), worst O(n).

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── src/hash.c
├── src/bench.c
├── include/
└── Makefile

5.3 The Core Question You’re Answering

“How do load factor and hash quality affect performance?”

5.4 Concepts You Must Understand First

Hash function properties
Collision handling strategies
Load factor impact

5.5 Questions to Guide Your Design

When should you resize?
Which hash function performs best on your data?
How will you measure probes fairly?

5.6 Thinking Exercise

Derive expected probe count for linear probing at load factor alpha.

5.7 The Interview Questions They’ll Ask

Compare chaining vs open addressing.
What is load factor and why does it matter?
Why is deletion tricky in open addressing?

5.8 Hints in Layers

Hint 1: Implement chaining first. Hint 2: Add linear probing. Hint 3: Add quadratic probing and metrics.

5.9 Books That Will Help

Topic	Book	Chapter
Hashing	TAOCP Vol 3	Ch. 6.4

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Hash functions and chaining.

Phase 2: Core Functionality (4-5 days)

Open addressing and metrics.

Phase 3: Polish & Edge Cases (2-3 days)

Resize and benchmark output.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Strategy	chaining vs probing	both	comparison

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Correctness	insert/find/delete	small tables
Collisions	stress cases	forced collisions
Resizing	rehash	load factor thresholds

6.2 Critical Test Cases

Insert many keys and find them.
Delete and reinsert keys.
Resizing preserves all entries.

6.3 Test Data

Random keys, adversarial keys

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Poor hash	clustering	use stronger hash
Delete bug	lost entries	tombstones for probing
Resize bug	missing entries	rehash all items

7.2 Debugging Strategies

Log probe sequences for small tables.
Verify load factor after operations.

7.3 Performance Traps

High load factor causes long probe chains.

8. Extensions & Challenges

8.1 Beginner Extensions

Add double hashing.

8.2 Intermediate Extensions

Add Robin Hood hashing.

8.3 Advanced Extensions

Add perfect hashing for static sets.

9. Real-World Connections

Hash tables in runtime systems, caches, and databases.

10. Resources

TAOCP Vol 3 Chapter 6.4.

11. Self-Assessment Checklist

I can explain probe behavior at different load factors.
My benchmarks align with expected trends.

12. Submission / Completion Criteria

Minimum: Chaining implementation with tests. Full: Open addressing + benchmarks. Excellence: Robin Hood or double hashing.

This guide was generated from LEARN_TAOCP_DEEP_DIVE.md. For the complete learning path, see the parent directory README.