Project 10: Hash Table Laboratory

Implement and compare hashing strategies from TAOCP Vol 3.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2 weeks
Language C
Prerequisites Arrays, hashing basics
Key Topics chaining, open addressing, load factor

1. Learning Objectives

  1. Implement separate chaining and open addressing.
  2. Measure collision rates and probe lengths.
  3. Explore hash functions and load factors.
  4. Validate correctness and performance.

2. Theoretical Foundation

2.1 Core Concepts

  • Load factor: alpha = n / m.
  • Collision handling: chaining vs probing.
  • Hash function quality: distribution and independence.

2.2 Why This Matters

Hash tables are fast only when collisions are controlled. TAOCP analyzes these tradeoffs rigorously.

2.3 Historical Context / Background

Knuth established key analysis for probing strategies and expected probe lengths.

2.4 Common Misconceptions

  • “Any hash function works”: Poor hashes break performance.
  • “Load factor can be 1”: Probing degrades quickly.

3. Project Specification

3.1 What You Will Build

A hash table lab that supports multiple hashing strategies and a benchmark harness.

3.2 Functional Requirements

  1. Separate chaining with linked lists.
  2. Open addressing with linear/quadratic probing.
  3. Pluggable hash functions.
  4. Metrics on collisions and probes.

3.3 Non-Functional Requirements

  • Correctness: lookup returns correct values.
  • Performance: measure probe costs.
  • Usability: CLI for experiments.

3.4 Example Usage / Output

$ ./hashlab --method linear --load 0.7
avg_probe=2.1

3.5 Real World Outcome

You can choose a hashing strategy based on empirical and theoretical evidence.


4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ hash core    │────▶│ strategies   │────▶│ metrics      │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Hash funcs key -> index djb2, FNV
Table impl store items chaining/probing
Metrics collisions/probes counters

4.3 Data Structures

typedef struct {
    char *key;
    int value;
} Entry;

4.4 Algorithm Overview

Key Algorithm: Linear probing

  1. hash key to index.
  2. probe sequentially until empty slot.

Complexity Analysis:

  • Expected O(1), worst O(n).

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── src/hash.c
├── src/bench.c
├── include/
└── Makefile

5.3 The Core Question You’re Answering

“How do load factor and hash quality affect performance?”

5.4 Concepts You Must Understand First

  • Hash function properties
  • Collision handling strategies
  • Load factor impact

5.5 Questions to Guide Your Design

  1. When should you resize?
  2. Which hash function performs best on your data?
  3. How will you measure probes fairly?

5.6 Thinking Exercise

Derive expected probe count for linear probing at load factor alpha.

5.7 The Interview Questions They’ll Ask

  1. Compare chaining vs open addressing.
  2. What is load factor and why does it matter?
  3. Why is deletion tricky in open addressing?

5.8 Hints in Layers

Hint 1: Implement chaining first. Hint 2: Add linear probing. Hint 3: Add quadratic probing and metrics.

5.9 Books That Will Help

Topic Book Chapter
Hashing TAOCP Vol 3 Ch. 6.4

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

  • Hash functions and chaining.

Phase 2: Core Functionality (4-5 days)

  • Open addressing and metrics.

Phase 3: Polish & Edge Cases (2-3 days)

  • Resize and benchmark output.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Strategy chaining vs probing both comparison

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Correctness insert/find/delete small tables
Collisions stress cases forced collisions
Resizing rehash load factor thresholds

6.2 Critical Test Cases

  1. Insert many keys and find them.
  2. Delete and reinsert keys.
  3. Resizing preserves all entries.

6.3 Test Data

Random keys, adversarial keys

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Poor hash clustering use stronger hash
Delete bug lost entries tombstones for probing
Resize bug missing entries rehash all items

7.2 Debugging Strategies

  • Log probe sequences for small tables.
  • Verify load factor after operations.

7.3 Performance Traps

High load factor causes long probe chains.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add double hashing.

8.2 Intermediate Extensions

  • Add Robin Hood hashing.

8.3 Advanced Extensions

  • Add perfect hashing for static sets.

9. Real-World Connections

  • Hash tables in runtime systems, caches, and databases.

10. Resources

  • TAOCP Vol 3 Chapter 6.4.

11. Self-Assessment Checklist

  • I can explain probe behavior at different load factors.
  • My benchmarks align with expected trends.

12. Submission / Completion Criteria

Minimum: Chaining implementation with tests. Full: Open addressing + benchmarks. Excellence: Robin Hood or double hashing.


This guide was generated from LEARN_TAOCP_DEEP_DIVE.md. For the complete learning path, see the parent directory README.