Project 6: List Module - Functional-Style Linked Lists

Build a singly-linked list module that follows functional programming conventions - operations create new lists rather than mutating existing ones, enabling safe composition and predictable behavior.

Quick Reference

Attribute Value
Difficulty Level 2 - Intermediate
Time Estimate Weekend (2-3 days)
Language C
Prerequisites Pointers, malloc/free, basic linked lists, function pointers
Key Topics Functional programming patterns, memory ownership, structure sharing, higher-order functions in C

1. Learning Objectives

By completing this project, you will:

  • Understand functional vs. imperative data structures - Why immutable operations create new lists instead of mutating existing ones
  • Master structure sharing - How multiple lists can share common suffixes without duplicating memory
  • Implement higher-order functions in C - Build map, filter, fold patterns using function pointers
  • Design clean APIs - Create interfaces that are hard to misuse and easy to compose
  • Handle memory ownership in shared structures - Navigate the complexity of freeing shared data

2. Theoretical Foundation

2.1 Core Concepts

Hanson’s List module is fundamentally different from typical textbook linked lists. It draws from the Lisp and ML traditions, treating lists as values rather than mutable containers.

The Key Insight: In functional programming, data structures are immutable. Instead of modifying a list in place, operations return new lists. This enables:

  • Safe sharing - Multiple variables can reference the same list structure
  • Predictable behavior - No action-at-a-distance bugs
  • Thread safety - Immutable data is inherently thread-safe
┌─────────────────────────────────────────────────────────────────────────────┐
│              IMPERATIVE vs. FUNCTIONAL LINKED LISTS                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  IMPERATIVE LIST (traditional):                                             │
│  ──────────────────────────────                                             │
│                                                                             │
│  List_T list = List_create();                                               │
│  List_append(list, "A");     // Mutates list                                │
│  List_append(list, "B");     // Mutates list                                │
│  List_insert(list, 0, "X");  // Mutates list                                │
│                                                                             │
│  list ──► [X] ──► [A] ──► [B] ──► NULL                                      │
│                                                                             │
│  Problem: If someone else has a reference to list, they see the changes!   │
│                                                                             │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                             │
│  FUNCTIONAL LIST (Hanson style):                                            │
│  ───────────────────────────────                                            │
│                                                                             │
│  List_T list1 = List_push(NULL, "A");   // Returns new list                 │
│  List_T list2 = List_push(list1, "B");  // Returns new list                 │
│  List_T list3 = List_push(list2, "C");  // Returns new list                 │
│                                                                             │
│  list1 still points to: [A] ──► NULL                                        │
│  list2 still points to: [B] ──► [A] ──► NULL                                │
│  list3 points to:       [C] ──► [B] ──► [A] ──► NULL                        │
│                                                                             │
│  All three variables are valid! Structure is SHARED:                        │
│                                                                             │
│  list3 ──► [C] ───┐                                                         │
│                   │                                                         │
│  list2 ──► [B] ───┴──► [A] ──► NULL                                         │
│                        ▲                                                    │
│  list1 ────────────────┘                                                    │
│                                                                             │
│  KEY INSIGHT: The tail [A]→NULL is shared by all three lists!               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

In the CII philosophy, the List module demonstrates several critical principles:

  1. Interface Design Reflects Semantics: The API makes it clear that operations don’t mutate:
    List_T List_push(List_T list, void *x);   // Returns new list
    List_T List_pop(List_T list, void **x);   // Returns new list
    
  2. Composition Over Mutation: You can safely pass lists to functions without worrying about side effects.

  3. Memory Ownership Is Explicit: The caller must understand when to free what - the interface doesn’t hide this complexity because it can’t be hidden safely.

2.3 Historical Context

The List module connects to deep traditions in computer science:

Lisp (1958): The first language built around linked lists. Lisp’s cons cells are the spiritual ancestor of Hanson’s List nodes:

  • cons creates a new pair (like List_push)
  • car returns the head (like List_pop + first element)
  • cdr returns the tail (like List_pop return value)

ML/Haskell: Functional languages where lists are immutable by default. Pattern matching on lists ([], x::xs) mirrors Hanson’s operations.

Modern Use Cases:

  • SQLite: Uses linked lists for query plan operators
  • Git: Commit history is essentially an immutable linked list
  • Redis: LIST data type for queues and stacks
  • Linux Kernel: The list_head pattern for intrusive linked lists

2.4 Common Misconceptions

  1. “Functional lists are always slower”
    • False. Structure sharing means push is O(1), same as mutable lists
    • You only pay extra when you need to copy (like append)
  2. “You need garbage collection for functional data structures”
    • False. But you DO need disciplined memory management
    • Hanson’s approach: caller is responsible for freeing what they allocate
  3. “This is just a linked list, I already know this”
    • The data structure is similar, but the API design and usage patterns are completely different
    • Understanding WHY the operations return new lists is the key insight
  4. “Empty list should be a sentinel node”
    • Hanson uses NULL for empty lists, not a sentinel
    • This simplifies some operations but requires NULL checks everywhere

3. Project Specification

3.1 What You Will Build

A complete List module with:

  1. Core Operations:
    • List_push - Add element to front, return new list
    • List_pop - Remove and return first element, return new list
    • List_length - Count elements
    • List_free - Deallocate list nodes
  2. Construction Operations:
    • List_list - Create list from variable arguments
    • List_copy - Create a shallow copy
    • List_append - Concatenate two lists (creates new list)
    • List_reverse - Reverse a list (creates new list)
  3. Higher-Order Functions:
    • List_map - Apply function to each element, build new list
    • List_apply - Apply function to each element (for side effects)
    • List_toArray - Convert list to array

3.2 Functional Requirements

Function Signature Description
List_push List_T List_push(List_T list, void *x) Prepend x, return new list
List_pop List_T List_pop(List_T list, void **x) Remove first, store in *x, return rest
List_length int List_length(List_T list) Count elements (O(n))
List_free void List_free(List_T *list) Free all nodes, set *list to NULL
List_list List_T List_list(void *x, ...) Create from args, NULL-terminated
List_copy List_T List_copy(List_T list) Shallow copy all nodes
List_append List_T List_append(List_T list, List_T tail) Copy list, link to tail
List_reverse List_T List_reverse(List_T list) Create reversed copy
List_map List_T List_map(List_T list, void *apply(void *x, void *cl), void *cl) Transform elements
List_apply void List_apply(List_T list, void apply(void **x, void *cl), void *cl) Apply for side effects
List_toArray void **List_toArray(List_T list, void *end) Create array with sentinel end

3.3 Non-Functional Requirements

  • Memory Safety: No leaks, no double-frees, no use-after-free
  • Portability: Standard C11, no platform-specific code
  • Assertions: Assert preconditions (e.g., non-NULL pointers where required)
  • Performance: O(1) push/pop, O(n) length/reverse/copy/append

3.4 Example Usage / Output

# Build and run the word frequency counter
$ ./wordfreq sample.txt
Word Frequency Analysis
=======================
Total words: 18
Unique words: 10

the      4  ████████████
quick    2  ██████
fox      2  ██████
jumps    1  ███
over     1  ███
lazy     1  ███
dog      1  ███
brown    1  ███
a        1  ███

# Verify no memory leaks
$ valgrind --leak-check=full ./wordfreq sample.txt
==12345== Memcheck, a memory error detector
==12345== Command: ./wordfreq sample.txt
==12345==
Word Frequency Analysis
=======================
...
==12345==
==12345== HEAP SUMMARY:
==12345==     in use at exit: 0 bytes in 0 blocks
==12345==   total heap usage: 45 allocs, 45 frees, 2,048 bytes allocated
==12345==
==12345== All heap blocks were freed -- no leaks are possible
==12345==
==12345== For lists of detected and suppressed errors, rerun with: -s
==12345== ERROR SUMMARY: 0 errors from 0 contexts

# Test list operations
$ ./list_test
Testing List Module
==================

Test: List_push and List_length
  Empty list length: 0
  After push("A"): length = 1
  After push("B"): length = 2
  After push("C"): length = 3
  PASSED

Test: List_pop
  List: [C, B, A]
  Pop: got "C", remaining length = 2
  Pop: got "B", remaining length = 1
  Pop: got "A", remaining length = 0
  PASSED

Test: Structure Sharing
  list1 = [A]
  list2 = push(list1, "B") = [B, A]
  list1 still valid: length = 1
  list2 valid: length = 2
  Same tail node: YES (0x7f8a4000 == 0x7f8a4000)
  PASSED

Test: List_reverse
  Original: [1, 2, 3, 4, 5]
  Reversed: [5, 4, 3, 2, 1]
  Original unchanged: [1, 2, 3, 4, 5]
  PASSED

Test: List_append
  list1: [A, B]
  list2: [C, D]
  List_append(list1, list2): [A, B, C, D]
  list1 unchanged: [A, B]
  list2 unchanged: [C, D]
  PASSED

Test: List_map (uppercase)
  Original: [hello, world, test]
  Mapped: [HELLO, WORLD, TEST]
  PASSED

All tests passed!

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                         LIST MODULE ARCHITECTURE                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  list.h (PUBLIC INTERFACE)                                                  │
│  ────────────────────────                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │ typedef struct List_T *List_T;    // OPAQUE - client can't see      │    │
│  │                                                                     │    │
│  │ // Construction                                                     │    │
│  │ List_T List_push(List_T list, void *x);                             │    │
│  │ List_T List_list(void *x, ...);                                     │    │
│  │ List_T List_append(List_T list, List_T tail);                       │    │
│  │ List_T List_copy(List_T list);                                      │    │
│  │ List_T List_reverse(List_T list);                                   │    │
│  │                                                                     │    │
│  │ // Deconstruction                                                   │    │
│  │ List_T List_pop(List_T list, void **x);                             │    │
│  │ void   List_free(List_T *list);                                     │    │
│  │                                                                     │    │
│  │ // Query                                                            │    │
│  │ int    List_length(List_T list);                                    │    │
│  │                                                                     │    │
│  │ // Higher-order                                                     │    │
│  │ List_T List_map(List_T list, ...);                                  │    │
│  │ void   List_apply(List_T list, ...);                                │    │
│  │ void **List_toArray(List_T list, void *end);                        │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                    │                                        │
│                                    │ hides implementation                   │
│                                    ▼                                        │
│  list.c (PRIVATE IMPLEMENTATION)                                            │
│  ───────────────────────────────                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │ struct List_T {                                                     │    │
│  │     void *first;           // Data in this node                     │    │
│  │     struct List_T *rest;   // Pointer to next node (or NULL)        │    │
│  │ };                                                                  │    │
│  │                                                                     │    │
│  │ // List_push implementation:                                        │    │
│  │ List_T List_push(List_T list, void *x) {                            │    │
│  │     List_T p;                                                       │    │
│  │     NEW(p);               // Allocate node                          │    │
│  │     p->first = x;         // Store data                             │    │
│  │     p->rest = list;       // Point to existing list                 │    │
│  │     return p;             // Return NEW list (original unchanged)   │    │
│  │ }                                                                   │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
│  MEMORY LAYOUT EXAMPLE:                                                     │
│  ──────────────────────                                                     │
│                                                                             │
│  List_T a = List_list("x", "y", "z", NULL);                                 │
│  List_T b = List_push(a, "w");                                              │
│                                                                             │
│      a ────────────────────────────────┐                                    │
│                                        ▼                                    │
│      b ──► ┌────────────┐    ┌────────────┐    ┌────────────┐    ┌────┐    │
│            │ first: "w" │    │ first: "x" │    │ first: "y" │    │"z" │    │
│            │ rest: ─────┼───►│ rest: ─────┼───►│ rest: ─────┼───►│NULL│    │
│            └────────────┘    └────────────┘    └────────────┘    └────┘    │
│                                                                             │
│  Note: Lists a and b SHARE nodes! Only "w" node is new.                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

File Purpose
list.h Public interface - only declarations, opaque type
list.c Implementation - struct definition, all functions
mem.h/c Memory allocation wrapper (from earlier projects)
assert.h Assertions (from Project 1)

4.3 Data Structures

// The only data structure - a cons cell
struct List_T {
    void *first;           // The data (client-owned)
    struct List_T *rest;   // Next node or NULL for empty
};

// Convention: List_T is NULL for empty list, not a sentinel
// This matches Lisp's nil and functional language conventions

4.4 Algorithm Overview

List_push - O(1):

1. Allocate new node
2. Store data pointer in node->first
3. Set node->rest to existing list
4. Return pointer to new node

List_pop - O(1):

1. Assert list is not empty
2. Store list->first in caller's pointer
3. Return list->rest (the tail)

List_reverse - O(n) time, O(n) space:

1. Start with empty result list
2. Iterate through input list
3. Push each element onto result
4. Return result (order reversed by push order)

List_append - O(n) where n = length of first list:

1. Copy all nodes of first list
2. Last copied node points to tail (second list)
3. Return copy of first list (second list shared, not copied)

5. Implementation Guide

5.1 Development Environment Setup

# Required tools
gcc --version      # GCC 7+ or Clang 6+
make --version     # GNU Make
valgrind --version # Memory checking (Linux) OR use -fsanitize=address

# Recommended compiler flags
CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g

# For development (catches memory errors)
CFLAGS += -fsanitize=address -fsanitize=undefined

# For production
CFLAGS += -O2 -DNDEBUG

5.2 Project Structure

list-module/
├── include/
│   ├── list.h          # Public interface
│   ├── mem.h           # Memory wrapper (from earlier project)
│   └── assert.h        # Custom assert (from Project 1)
├── src/
│   ├── list.c          # Implementation
│   └── mem.c           # Memory wrapper implementation
├── test/
│   ├── list_test.c     # Unit tests
│   └── wordfreq.c      # Example application
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How do you implement functional programming patterns in an imperative language, and what does this teach you about data ownership and sharing?”

Traditional imperative linked lists let you insert and delete nodes anywhere. Hanson’s List is different - it’s closer to Lisp’s cons cells. Operations like push and pop work on the head. Concatenation and reversal create new lists. This design enables structure sharing: two lists can share a common tail without either “owning” it in the traditional sense.

5.4 Concepts You Must Understand First

  1. Singly-Linked List Basics
    • What is a node? How do you traverse?
    • Book Reference: “Algorithms in C” by Sedgewick - Ch. 3.3-3.4
  2. Functional vs. Imperative Lists
    • What does “immutable” mean for a data structure?
    • How does Lisp’s cons/car/cdr work?
    • Book Reference: “Structure and Interpretation of Computer Programs” - Ch. 2.2
  3. Memory Ownership Patterns
    • Who is responsible for freeing shared data?
    • What happens when two pointers reference the same memory?
    • Book Reference: “Mastering Algorithms with C” by Loudon - Ch. 5
  4. The Apply Pattern (Map/Fold)
    • What does it mean to “apply” a function to each element?
    • How do closures work in C (hint: extra void* parameter)?

5.5 Questions to Guide Your Design

API Design:

  • Should List_pop on an empty list assert-fail or return NULL?
  • Should List_free free just nodes or also the data?
  • How do you handle the variable argument list in List_list?

Memory Management:

  • When List_append copies the first list, who owns the copied nodes?
  • If two lists share a tail, calling List_free on one corrupts the other - how do you document this?
  • Should there be a List_freeDeep that also frees data?

Higher-Order Functions:

  • The apply function in List_map needs a “closure” - how do you pass extra data?
  • Should List_apply allow modifying elements? (Hanson says yes: void **x)

5.6 Thinking Exercise

Before writing code, trace through this scenario by hand:

List_T a = NULL;                        // Draw memory state
a = List_push(a, "first");              // Draw memory state
a = List_push(a, "second");             // Draw memory state
List_T b = a;                           // Draw memory state
a = List_push(a, "third");              // Draw memory state

// Now answer:
// 1. What does List_length(a) return?
// 2. What does List_length(b) return?
// 3. How many total nodes are allocated?
// 4. If you call List_free(&a), what happens to b?
// 5. If you call List_free(&b), then List_free(&a), what happens?

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual): A list node contains a void* for data and a pointer to the next node. List_T is just a pointer to the first node (or NULL for empty list). The key insight: push doesn’t modify the existing list - it creates a new node that points to it.

Hint 2 - Next Level (More Specific): List_apply takes a function pointer and a “closure” void* that gets passed to each invocation. This is how you pass context (like a counter, accumulator, or output file) to the apply function without using globals.

Hint 3 - Technical Details (Approach):

// List_reverse pseudocode:
List_T List_reverse(List_T list) {
    List_T result = NULL;
    while (list) {
        result = List_push(result, list->first);
        list = list->rest;
    }
    return result;
}
// Why does this work? Each push prepends, so [A,B,C] becomes [C,B,A]

Hint 4 - Tools/Debugging (Verification):

  • Write a List_print helper that shows addresses to verify sharing
  • Use Valgrind with --track-origins=yes to find use-after-free
  • Test structure sharing explicitly: after push, verify old list unchanged
  • Remember: empty list is NULL, so many functions need if (list == NULL) checks

5.8 The Interview Questions They’ll Ask

  1. “What’s the difference between a functional list and a mutable linked list?”

    Strong answer: A functional list treats the list as an immutable value. Operations like push don’t modify the existing list - they return a new list that shares structure with the old one. This enables safe sharing: multiple variables can reference overlapping lists. A mutable list has operations like insert and delete that modify in place, which is faster for single-owner scenarios but dangerous when shared.

  2. “If two lists share a tail, how do you manage memory?”

    Strong answer: This is the fundamental challenge of functional data structures in non-GC languages. Options: (1) Reference counting - each node tracks how many lists reference it, freed when count hits zero. (2) Single-owner rule - document that only one list “owns” shared nodes, others are views. (3) Arena allocation - allocate all nodes from a pool, free the whole pool at once. Hanson uses option 2 with careful documentation.

  3. “Implement a function that reverses a linked list. Time and space complexity?”

    Strong answer for functional list: Create new list, iterate original pushing each element. O(n) time AND O(n) space because we create n new nodes. For mutable list: can reverse in O(n) time and O(1) space by adjusting pointers (prev/curr/next technique). The functional version is simpler and preserves the original list.

  4. “How would you implement ‘filter’ for a linked list in C?”

    Strong answer:

    typedef int (*Predicate)(void *x, void *cl);
    
    List_T List_filter(List_T list, Predicate keep, void *cl) {
        List_T result = NULL;
        for (; list; list = list->rest) {
            if (keep(list->first, cl)) {
                result = List_push(result, list->first);
            }
        }
        return List_reverse(result);  // Preserve order
    }
    

    The closure parameter cl allows passing context without globals.

  5. “What’s the time complexity of appending two functional lists?”

    Strong answer: O(n) where n is the length of the first list. You must copy all nodes of the first list (to avoid mutating it), then point the last copy to the second list. The second list is NOT copied - it’s shared. This is why functional lists prefer cons (prepend) over append.

5.9 Books That Will Help

Topic Book Chapter
The List Module “C Interfaces and Implementations” - Hanson Ch. 7
Linked List Fundamentals “Algorithms in C” - Sedgewick Ch. 3.3-3.5
Functional Data Structures “Purely Functional Data Structures” - Okasaki Ch. 2
C Linked Lists “Mastering Algorithms with C” - Loudon Ch. 5
Lisp Lists (cons/car/cdr) “Structure and Interpretation of Computer Programs” Ch. 2.2

5.10 Implementation Phases

Phase 1: Core Structure (Day 1 morning)

1. Define struct List_T in list.c
2. Implement List_push - the fundamental operation
3. Implement List_length - for testing
4. Write basic tests: empty list, single push, multiple pushes

Phase 2: Deconstruction (Day 1 afternoon)

1. Implement List_pop
2. Implement List_free (careful: just nodes, not data)
3. Test: push/pop round-trip, free doesn't leak

Phase 3: Construction Helpers (Day 2 morning)

1. Implement List_list using <stdarg.h>
2. Implement List_copy
3. Implement List_reverse
4. Implement List_append
5. Test all construction operations preserve originals

Phase 4: Higher-Order Functions (Day 2 afternoon)

1. Implement List_apply (mutating iteration)
2. Implement List_map (transforming iteration)
3. Implement List_toArray
4. Test with various apply functions (print, uppercase, sum)

Phase 5: Integration & Polish (Day 3)

1. Build wordfreq application using all operations
2. Run Valgrind, fix any memory issues
3. Add documentation comments
4. Review against Hanson's implementation

5.11 Key Implementation Decisions

  1. Empty list representation: NULL (not sentinel node)
    • Pro: Natural NULL checks, matches functional convention
    • Con: Many functions need explicit NULL checks
  2. List_free semantics: Free nodes only, not data
    • Pro: Caller controls data lifetime, can share data across lists
    • Con: Easy to leak if caller forgets to free data
  3. List_pop on empty list: Assert failure
    • Pro: Bug caught immediately, can’t ignore
    • Con: Requires caller to check before calling
  4. Order of List_list arguments: First element first
    • List_list("A", "B", "C", NULL) creates [A, B, C]
    • Matches natural reading order

6. Testing Strategy

Unit Tests

// test/list_test.c
#include "list.h"
#include <assert.h>
#include <stdio.h>
#include <string.h>

// Test empty list
void test_empty(void) {
    List_T list = NULL;
    assert(List_length(list) == 0);
    printf("test_empty: PASSED\n");
}

// Test push/pop round-trip
void test_push_pop(void) {
    List_T list = NULL;
    list = List_push(list, "A");
    list = List_push(list, "B");
    assert(List_length(list) == 2);

    void *x;
    list = List_pop(list, &x);
    assert(strcmp(x, "B") == 0);
    list = List_pop(list, &x);
    assert(strcmp(x, "A") == 0);
    assert(list == NULL);
    printf("test_push_pop: PASSED\n");
}

// Test structure sharing
void test_sharing(void) {
    List_T a = List_push(NULL, "X");
    List_T b = List_push(a, "Y");

    // 'a' should still be valid and unchanged
    assert(List_length(a) == 1);
    assert(List_length(b) == 2);

    // They should share the "X" node
    void *x_from_a, *x_from_b;
    List_pop(a, &x_from_a);
    List_pop(List_pop(b, &x_from_b), &x_from_b);
    assert(x_from_a == x_from_b);  // Same pointer!

    printf("test_sharing: PASSED\n");
}

// Test reverse creates new list
void test_reverse(void) {
    List_T orig = List_list("A", "B", "C", NULL);
    List_T rev = List_reverse(orig);

    assert(List_length(orig) == 3);
    assert(List_length(rev) == 3);

    void *x;
    rev = List_pop(rev, &x);
    assert(strcmp(x, "C") == 0);

    // Original unchanged
    orig = List_pop(orig, &x);
    assert(strcmp(x, "A") == 0);

    // Clean up
    List_free(&orig);
    List_free(&rev);

    printf("test_reverse: PASSED\n");
}

Edge Cases to Test

  • Empty list for all operations
  • Single element list
  • Very long list (1000+ elements)
  • List_free on already-NULL pointer
  • List_append with empty first or second list
  • List_map with identity function
  • List_toArray with various end sentinels

Memory Testing

# With Valgrind (Linux)
valgrind --leak-check=full --track-origins=yes ./list_test

# With AddressSanitizer (macOS/Linux)
make CFLAGS="-fsanitize=address -g" list_test
./list_test

7. Common Pitfalls & Debugging

Problem Root Cause Fix
“List_free causes double-free” Structure sharing means you can’t naively free all nodes if another list references them Document ownership clearly. Consider reference counting or arena allocation for complex sharing patterns.
“List_append modifies original” Forgot to copy nodes of first list Must allocate new nodes for first list, only share second list
“List_apply crashes on empty” Missing NULL check at start Add if (list == NULL) return;
“List_reverse returns wrong order” Built result backwards Push builds list in reverse of iteration order - that’s the point
“List_list gives segfault” Missing NULL terminator in arguments Always end with NULL: List_list("A", "B", NULL)
“Memory leak in List_map” Not freeing intermediate/old lists Caller must free both original and mapped lists separately
“List_pop assertion failed” Called on empty list Check List_length > 0 or list != NULL before calling

Debugging Tips

  1. Print with addresses: Helps visualize sharing
    void List_debug(List_T list) {
        while (list) {
            printf("[%p: %s] -> ", (void*)list, (char*)list->first);
            list = list->rest;
        }
        printf("NULL\n");
    }
    
  2. Track allocations: Count malloc/free calls
    static int alloc_count = 0;
    // In List_push: alloc_count++;
    // In List_free: alloc_count--;
    // At end: assert(alloc_count == 0);
    

8. Extensions & Challenges

After completing the basic implementation:

  1. Add Reference Counting
    • Track how many lists reference each node
    • Free node only when refcount hits zero
    • Challenge: How do you handle cycles?
  2. Implement List_sort
    • Merge sort is natural for linked lists
    • O(n log n) time, O(log n) stack space
    • Can you do it without modifying original?
  3. Add List_filter and List_reduce
    • Complete the functional toolkit
    • List_reduce should accumulate a single value
  4. Make It Thread-Safe
    • Immutable structure helps, but allocation needs protection
    • Use thread-local arenas?
  5. Implement Lazy Lists
    • Store a function instead of data
    • Compute elements on demand
    • Enables infinite lists!
  6. Compare Performance
    • Benchmark against mutable linked list
    • Profile cache behavior
    • When is functional style faster/slower?

9. Real-World Connections

SQLite: Query Plan Operators

SQLite represents query plans as trees of operators. Each operator is allocated per-query and freed all at once - similar to functional list patterns with arena allocation.

Git: Commit History

Git’s commit history is essentially an immutable linked list. Each commit points to its parent(s). “Branches” are just pointers into this list. Merging creates new nodes that share history with both parents.

Redis: LIST Data Type

Redis implements LPUSH/RPUSH and LPOP/RPOP for double-ended operations. Internally uses a quicklist (linked list of small arrays) for cache efficiency while maintaining O(1) operations.

Linux Kernel: list_head

Linux uses “intrusive” linked lists where the list node is embedded in the data structure. This is more efficient (one allocation) but less flexible than Hanson’s approach.

// Linux style (intrusive)
struct task_struct {
    struct list_head tasks;  // Embedded list node
    pid_t pid;
    // ...
};

// Hanson style (extrusive)
struct List_T {
    void *first;             // Points to data
    struct List_T *rest;
};

10. Resources

Primary Reference

  • “C Interfaces and Implementations” by David Hanson - Chapter 7
  • Source code: https://github.com/drh/cii

Supplementary Books

  • “Algorithms in C” by Sedgewick - Linked list fundamentals
  • “Purely Functional Data Structures” by Okasaki - Theory of immutable structures
  • “Structure and Interpretation of Computer Programs” - Lisp lists

Online Resources


11. Self-Assessment Checklist

Before moving to the next project, verify you can:

  • Explain why List_push returns a new list instead of modifying in place
  • Draw memory diagrams showing structure sharing between lists
  • Implement List_reverse without looking at reference
  • Explain the memory ownership issues with List_free on shared lists
  • Write a higher-order function using List_apply or List_map
  • Pass Valgrind with zero leaks on all test cases
  • Explain when functional lists are better/worse than mutable lists
  • Describe how this relates to Lisp cons cells and ML lists

12. Submission / Completion Criteria

Your implementation is complete when:

  1. All functions implemented matching Hanson’s interface
  2. Test suite passes with 100% of tests succeeding
  3. Memory clean - Valgrind reports no leaks, no errors
  4. Documentation - Header file has complete function documentation
  5. Example application - wordfreq or similar works correctly
  6. Code review - Compared against Hanson’s reference implementation

Quality Gates

Gate Requirement
Compiles make succeeds with -Wall -Wextra -Werror
Tests Pass All unit tests pass
Memory Safe Valgrind: 0 errors, 0 leaks
Documented Every function has a doc comment
Example Works wordfreq produces correct output