Project 5: Linked List Database

Build a tiny in-memory database using a linked list with safe iteration, deletion, and explicit ownership rules.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Main Programming Language C (Alternatives: Rust, Zig)
Alternative Programming Languages Rust, Zig
Coolness Level Level 2 (Practical)
Business Potential Level 2 (Simple storage engine)
Prerequisites Linked lists, pointers, memory allocation
Key Topics List invariants, safe deletion, ownership

1. Learning Objectives

By completing this project, you will:

  1. Define the invariants of a singly linked list and enforce them in code.
  2. Implement CRUD operations with safe iteration and deletion.
  3. Avoid use-after-free by designing correct traversal patterns.
  4. Document ownership rules for list nodes and payloads.
  5. Test correctness under mutation during iteration.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Linked List Representation Invariants

Fundamentals

A linked list is a sequence of nodes where each node points to the next. The core invariants are that every next pointer is either NULL or points to a valid node, that the list head is either NULL or points to the first node, and that the list is acyclic unless explicitly designed to be circular. A valid list must also be consistent: if a node is reachable from the head, it is part of the list; if not, it is not. These invariants define correctness. If a node’s next pointer is corrupted, the list may become cyclic, causing infinite loops, or it may point to invalid memory, causing crashes.

Deep Dive into the Concept

Linked lists are simple but notoriously error-prone because their invariants are implicit and can be violated easily. At a stable point, you must be able to traverse from the head to the tail without encountering invalid pointers or cycles. This means the list is a directed acyclic chain terminating at a NULL pointer. In C, nothing enforces this except your discipline. That is why you must design explicit validation functions that walk the list and check for consistency.

A key invariant is acyclicity. Many bugs come from accidentally creating cycles, such as when inserting a node and setting node->next incorrectly. A cycle will cause traversal loops to never terminate. You can detect cycles using Floyd’s tortoise-and-hare algorithm in debug builds. Another invariant is correct head management: if you delete the head node, you must update the head to the next node; otherwise, you will lose the rest of the list or leak it. This is why many list implementations use a sentinel (dummy) head node: it simplifies edge cases and makes invariants easier to maintain.

Linked list invariants also include size consistency. If you store a size field, it must equal the number of nodes reachable from the head. This seems straightforward but can be violated if you forget to update size on insert or delete. A mismatch between size and actual nodes causes bounds errors and logic errors. Therefore, if you include a size field, you should validate it in debug builds by counting nodes and comparing.

Another invariant is that each node’s data pointer is valid. If nodes own their payload, then node->data must be allocated and valid for as long as the node exists. If nodes do not own the payload, then the list must not free it. This is an ownership contract; if you violate it, you will get leaks or double frees. Therefore, your list invariants include not just pointer connectivity but also ownership rules. This is why your list_destroy should accept an optional destructor callback for payloads.

The list is also defined by traversal order. If you insert at the end, you must update the tail’s next to point to the new node and ensure that the new node’s next is NULL. If you insert at the head, you must set the new node’s next to the old head. Any mistake here can reorder nodes incorrectly or lose nodes. These are logical invariants about the list’s order and reachability.

Finally, representation invariants matter for mutation during iteration. If you delete a node while iterating, the invariant must still hold: the list must remain connected and acyclic. This requires careful updates to the next pointers and to the iterator state. This concept connects directly to the next section on safe iteration.

A practical debugging technique is to include a list_validate function that checks for cycles, counts nodes, and verifies that the tail is reachable. Even if you do not ship it in production, it is invaluable during development. It turns hidden structural bugs into immediate failures, which is especially important in C where silent corruption can otherwise go undetected for long stretches of time.

How this fits on projects

Every operation in the linked list database depends on these invariants. Insert, delete, find, and iterate are all defined by maintaining a valid chain of nodes.

Definitions & key terms

  • Head: Pointer to the first node.
  • Tail: The last node whose next is NULL.
  • Acyclic: No node points back to an earlier node.
  • Sentinel node: Dummy node to simplify edge cases.

Mental model diagram (ASCII)

head -> [node1] -> [node2] -> [node3] -> NULL

How it works (step-by-step, with invariants and failure modes)

  1. Ensure head is valid (NULL or pointer to node).
  2. Traverse and verify each next is valid or NULL.
  3. Ensure no cycles (optional debug check).
  4. Update pointers carefully during insert/delete.

Failure modes: cycles, lost nodes, dangling next pointers.

Minimal concrete example

assert(list->head == NULL || is_valid_node(list->head));

Common misconceptions

  • “Linked lists are simple, so validation is unnecessary.” (They are easy to corrupt.)
  • “If it compiles, the pointers must be valid.” (Pointers can still be dangling.)
  • “Size fields always stay correct.” (They must be maintained explicitly.)

Check-your-understanding questions

  1. What invariant ensures the list terminates?
  2. Why are sentinel nodes useful?
  3. How can you detect cycles in a debug build?

Check-your-understanding answers

  1. The last node’s next must be NULL.
  2. They simplify head deletion and insertion logic.
  3. Use tortoise-and-hare cycle detection.

Real-world applications

  • Simple in-memory indexes.
  • Task lists and queues in embedded systems.

Where you will apply it

  • This project: See §3.2 Functional Requirements and §5.10 Phase 2.
  • Also used in: P01 Safe Dynamic Array for invariant thinking.

References

  • “Algorithms in C” by Sedgewick, linked list chapters.
  • “Effective C” by Robert Seacord.

Key insights

Linked lists are correct only when their pointer chain remains valid and acyclic.

Summary

The representation invariants of a linked list are its correctness conditions. Without them, operations become undefined.

Homework/Exercises to practice the concept

  1. Write a function that counts nodes and verifies size.
  2. Implement cycle detection and test with a manually created cycle.
  3. Create a sentinel-based list and compare insert/delete complexity.

Solutions to the homework/exercises

  1. Traverse from head to NULL and compare count to size.
  2. Use two pointers moving at different speeds.
  3. With a sentinel, deletion of head becomes a standard case.

2.2 Safe Iteration and Deletion Patterns

Fundamentals

Deleting nodes while iterating is dangerous because the current node may be freed, invalidating the iterator. The safe pattern is to store the next pointer before deleting the current node, then continue iteration using that saved pointer. Another safe pattern is to use a “previous” pointer and update prev->next when deleting curr. These patterns are invariants in themselves: the iteration must always have a valid next node before freeing the current one. If you violate this rule, you will access freed memory. This is a subtle but central correctness rule for any mutable traversal in C. Make it explicit in your API docs and tests so future changes do not regress it.

Deep Dive into the Concept

Safe iteration is the difference between a list that works in trivial cases and a list that is robust under real-world mutation. The core issue is that deleting a node invalidates the memory and therefore invalidates the next pointer stored in that node. If your loop uses curr = curr->next after freeing curr, you are reading from freed memory. The safe approach is to save next before freeing. This simple pattern is easy to get wrong, especially when you are also updating the list links.

There are two common iteration styles: using a prev pointer or using a pointer-to-pointer (often called “link” iteration). In the prev style, you maintain both prev and curr. When you delete curr, you set prev->next = curr->next and then free curr, then set curr = prev->next. This works but requires careful handling when curr is the head. The pointer-to-pointer style is simpler: you maintain a pointer to the link that points to the current node. Initially, this is &head. On each step, node = *link. If you delete, you set *link = node->next and free node without advancing the link. If you keep the node, you advance link = &node->next. This pattern avoids special cases and ensures that the list remains connected.

When you design a database-like linked list, you often need to remove nodes based on a predicate (e.g., delete key). The safe iteration pattern should be encapsulated in a helper function so that deletion logic is centralized and correct. This is another application of invariants: the list’s structure should be mutated in one well-tested function, not scattered across the codebase.

Safe iteration also matters for APIs that return iterators. If you design an iterator type, you must define what happens if the list is modified during iteration. You might declare that mutation invalidates iterators, or you might design iterators that are robust to certain mutations. In a simple project, you can define that only deletion of the current node is allowed, and you must provide a specific function like list_iter_remove that updates the iterator state safely. This makes the ownership and mutation rules explicit.

Another subtlety is that deletion can interact with memory ownership. If nodes own their payload, deleting a node must free the payload before freeing the node. But if a caller has a pointer to the payload, that pointer becomes dangling. Your API must document that deletion invalidates payload pointers. If you fail to document this, you create hidden use-after-free bugs.

Testing safe iteration should include deleting the head, deleting the tail, deleting every node, and deleting none. You should also test deletion in the middle of a traversal. If you use a pointer-to-pointer pattern, you should test that the list remains connected and that the iterator does not skip nodes. These tests are not optional; they are part of proving that your safe deletion invariant is implemented correctly.

Another useful pattern is to separate traversal from mutation by providing a list_remove_if function. This allows you to centralize deletion logic and reduce the chance of ad hoc, unsafe loops elsewhere in the codebase. It also makes testing easier, because you can exercise many deletion scenarios with a single function and verify that invariants hold after each call.

How this fits on projects

The linked list database requires safe deletion while iterating (e.g., delete records matching a predicate). This concept defines the correct iteration patterns.

Definitions & key terms

  • Iterator invalidation: When a mutation makes an iterator unsafe.
  • Pointer-to-pointer: A pointer to the link that points to the current node.
  • Safe deletion: Deleting a node without losing list connectivity.

Mental model diagram (ASCII)

link -> [node] -> next
*link updated to bypass node on delete

How it works (step-by-step, with invariants and failure modes)

  1. Save next before deletion, or use pointer-to-pointer.
  2. Update links to bypass the node.
  3. Free payload (if owned) and node.
  4. Continue iteration using saved next or updated link.

Failure modes: use-after-free, skipped nodes, broken links.

Minimal concrete example

Node **link = &list->head;
while (*link) {
    Node *n = *link;
    if (match(n)) {
        *link = n->next;
        free(n);
    } else {
        link = &n->next;
    }
}

Common misconceptions

  • “I can free curr and then use curr->next.” (Use-after-free.)
  • “Iterators are always safe.” (They are invalidated by mutation.)
  • “Deleting head is a special case in pointer-to-pointer.” (It is not.)

Check-your-understanding questions

  1. Why must you save next before freeing the current node?
  2. What advantage does pointer-to-pointer iteration provide?
  3. What happens if you delete a node that another pointer references?

Check-your-understanding answers

  1. Because the node memory becomes invalid after free.
  2. It eliminates special cases for head deletion.
  3. The other pointer becomes dangling; this must be documented.

Real-world applications

  • In-memory caches that remove expired entries.
  • Operating system process lists.

Where you will apply it

  • This project: See §3.2 Functional Requirements and §5.10 Phase 2.
  • Also used in: P06 HTTP Server for request lists.

References

  • “Effective C” by Robert Seacord.
  • Common C idioms for pointer-to-pointer list manipulation.

Key insights

Safe deletion is not a detail; it is the core correctness property of mutable lists.

Summary

The pointer-to-pointer iteration pattern provides a clean, invariant-preserving way to delete nodes during traversal.

Homework/Exercises to practice the concept

  1. Implement deletion by key using pointer-to-pointer.
  2. Write a test that deletes every node while iterating.
  3. Add a safe iterator API with iter_remove.

Solutions to the homework/exercises

  1. Use a loop that updates *link on match.
  2. Verify the list ends empty and no leaks remain.
  3. Store prev_link in the iterator and update on deletion.

2.3 Ownership of Node Payloads and Memory Management

Fundamentals

Each node in your database holds a key and a value. You must decide who owns these strings or data blobs. If the list owns them, it must free them when the node is deleted. If the caller owns them, the list must not free them. This ownership rule must be explicit. Otherwise you will leak memory or double-free. Ownership is part of the data structure’s contract and must be documented alongside the API. A single inconsistent insert/delete path can violate that contract and create hard-to-debug leaks. Without this clarity, deletion and destruction semantics become ambiguous and unsafe.

Deep Dive into the Concept

Ownership in a linked list database is more complex than in a simple list because the list stores user data. If you are storing strings, you must decide whether to copy them or store borrowed pointers. Copying is safer because it decouples the list from the caller’s memory. Borrowing is faster but requires the caller to keep the memory alive and not modify it. Both are valid choices, but they imply different ownership rules and different failure modes.

If you choose to copy, your insert function should allocate new memory for the key and value, then store those pointers in the node. The list owns them and must free them on deletion or destruction. This is the simplest model for users because they can pass temporary strings and the list keeps its own copies. The downside is the overhead of allocation and copying. However, for a small educational database, the safety benefits outweigh the cost.

If you choose to borrow, your insert must document that the caller retains ownership and must keep the memory valid. In this model, deletion does not free the payload. This can be useful for storing references to data owned elsewhere (e.g., interned strings or static literals). However, the risk is that callers will forget the lifetime rule and the list will contain dangling pointers. This is why borrowed ownership is dangerous in C unless the data is clearly static.

A middle ground is to allow a custom destructor callback. The list stores pointers and calls the destructor on deletion. This allows callers to decide ownership per list. It also makes the list more reusable: it can store structs, strings, or other resources. The invariant is that if a destructor is provided, it will be called exactly once per node. This must hold even when deletion happens during iteration or when an error occurs in insertion after partial allocation.

Memory management also includes node allocation. Nodes can be allocated individually with malloc, or you can use an arena for faster bulk allocation. If you use an arena, you must adjust your ownership rules because individual nodes cannot be freed. That would mean deletions remove nodes from the list but do not reclaim memory until the arena is reset. This may be acceptable for a short-lived database but should be documented. For this project, the simplest approach is to allocate nodes individually so that delete truly frees memory and to focus on explicit ownership of payloads.

Error handling is another ownership hazard. Suppose you allocate a node, then allocate the key string, but the value allocation fails. You must free the node and the key before returning an error. This requires careful ordering and cleanup code. A good pattern is to allocate all resources first, then attach them to the list only after all allocations succeed. That way, if any allocation fails, you can free the temporary resources without touching the list. This is an ownership invariant: the list only contains fully constructed nodes.

How this fits on projects

Ownership rules define the API for insert and delete, and they determine how you implement list_destroy and error handling. The database is only safe if ownership is explicit and enforced.

Definitions & key terms

  • Owned payload: Data allocated and freed by the list.
  • Borrowed payload: Data owned by the caller.
  • Destructor callback: Function invoked to free or clean up payloads.

Mental model diagram (ASCII)

Node owns key/value
[node] -> key (malloc)
       -> value (malloc)
Delete node => free key, free value, free node

How it works (step-by-step, with invariants and failure modes)

  1. Allocate node and payloads.
  2. If any allocation fails, free partial resources.
  3. Insert node into list only after success.
  4. On delete, free payloads and node in correct order.

Failure modes: leaked payloads, double frees, dangling pointers.

Minimal concrete example

node->key = strdup(key);
node->value = strdup(value);

Common misconceptions

  • “The list can free payloads without knowing ownership.” (It must know.)
  • “Borrowed strings are safe if you promise.” (Promises fail under pressure.)
  • “Error paths are rare.” (They are where leaks hide.)

Check-your-understanding questions

  1. What happens if you delete a node with borrowed payloads?
  2. Why is a destructor callback useful?
  3. How do you avoid leaks if allocation fails mid-insert?

Check-your-understanding answers

  1. You must not free the payloads; only the node.
  2. It allows the list to free payloads without knowing their type.
  3. Allocate everything first, and clean up before insertion on failure.

Real-world applications

  • In-memory caches with eviction.
  • Symbol tables in compilers.

Where you will apply it

  • This project: See §3.2 Functional Requirements and §5.10 Phase 1.
  • Also used in: P01 Safe Dynamic Array for destructor callbacks.

References

  • “Effective C” by Robert Seacord.
  • Data structure ownership patterns in systems programming.

Key insights

Ownership is a contract. If you do not write it down, you will violate it.

Summary

A safe linked list database depends on explicit ownership of node payloads and careful cleanup on error paths.

Homework/Exercises to practice the concept

  1. Implement list_destroy with optional destructor callback.
  2. Write tests that insert and delete keys repeatedly and check for leaks.
  3. Add a mode that stores borrowed strings and document it.

Solutions to the homework/exercises

  1. Traverse nodes and call destructor on payloads, then free nodes.
  2. Use Valgrind to verify no leaks.
  3. Provide a flag or separate API for borrowed insertion.

3. Project Specification

3.1 What You Will Build

A small in-memory key-value database backed by a singly linked list. It supports insert, delete, find, list-all, and a CLI for interactive testing.

Included:

  • Linked list implementation with invariants
  • Key/value record storage
  • CLI interface

Excluded:

  • Persistence to disk
  • Concurrency support

3.2 Functional Requirements

  1. Insert: Add or update a record by key.
  2. Find: Retrieve a value by key.
  3. Delete: Remove a record safely.
  4. List: Print all records in order.
  5. Validate: Verify list invariants.

3.3 Non-Functional Requirements

  • Performance: O(n) search, predictable memory usage.
  • Reliability: No use-after-free or leaks.
  • Usability: Clear ownership and error codes.

3.4 Example Usage / Output

./listdb insert key=1 value=alpha
./listdb find key=1

3.5 Data Formats / Schemas / Protocols

CLI commands:

insert <key> <value>
find <key>
delete <key>
list

3.6 Edge Cases

  • Deleting the head node.
  • Deleting the only node.
  • Searching for missing keys.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./listdb

3.7.2 Golden Path Demo (Deterministic)

Fixed command sequence showing insert, find, delete.

3.7.3 CLI Terminal Transcript (Exact)

$ ./listdb
insert 1 alpha
insert 2 beta
find 2
beta
delete 1
list
(2, beta)
exit_code=0

3.7.4 Failure Demo (Deterministic)

$ ./listdb
delete 999
error: key not found
exit_code=4

3.7.5 Exit Codes

  • 0: success
  • 4: key not found
  • 5: invariant violation
  • 6: allocation failure

4. Solution Architecture

4.1 High-Level Design

CLI -> listdb API -> linked list -> records

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Node | Stores key/value and next pointer | Owns payload by default | | List | Head pointer and size | Optional sentinel | | CLI | Parse commands and call API | Simple REPL |

4.3 Data Structures (No Full Code)

typedef struct Node {
    char *key;
    char *value;
    struct Node *next;
} Node;

typedef struct {
    Node *head;
    size_t size;
} List;

4.4 Algorithm Overview

Key Algorithm: Delete by Key

  1. Use pointer-to-pointer traversal.
  2. On match, unlink node.
  3. Free payload and node.

Complexity Analysis:

  • Time: O(n) for search and delete.
  • Space: O(n) for nodes.

5. Implementation Guide

5.1 Development Environment Setup

cc --version
make --version

5.2 Project Structure

listdb/
├── include/listdb.h
├── src/listdb.c
├── tests/listdb_test.c
├── examples/listdb_cli.c
└── Makefile

5.3 The Core Question You’re Answering

“How do I mutate a linked list safely without invalidating my iterator?”

5.4 Concepts You Must Understand First

  1. Linked list invariants.
  2. Safe iteration and deletion patterns.
  3. Ownership of node payloads.

5.5 Questions to Guide Your Design

  1. Will nodes own keys/values or borrow them?
  2. How will you handle updating an existing key?
  3. Will you use a sentinel node?

5.6 Thinking Exercise

Write the pointer-to-pointer loop for deleting a node and verify it works for head and middle nodes.

5.7 The Interview Questions They’ll Ask

  1. How do you delete a node in a singly linked list?
  2. How do you avoid use-after-free during iteration?
  3. What invariants define a linked list?

5.8 Hints in Layers

Hint 1: Use pointer-to-pointer iteration. Hint 2: Store next before freeing a node. Hint 3: Add a validate function that checks for cycles.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Linked lists | “Algorithms in C” | Ch. 3 | | Memory safety | “Effective C” | Ch. 4-6 |

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals: Define structs and ownership rules. Tasks:

  1. Implement create/destroy.
  2. Add validate.
  3. Write tests for insert/find. Checkpoint: Insert/find works with no leaks.

Phase 2: Core Functionality (3-4 days)

Goals: Delete, update, list. Tasks:

  1. Implement delete with pointer-to-pointer.
  2. Implement update-on-insert.
  3. Add tests for deletion edge cases. Checkpoint: Deletion works for head/tail/middle.

Phase 3: Polish & Edge Cases (2-3 days)

Goals: CLI, errors, docs. Tasks:

  1. Implement REPL CLI.
  2. Add error codes and messages.
  3. Add failure demo. Checkpoint: CLI output matches golden transcript.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Ownership | Copy or borrow strings | Copy | Safer default | | Traversal | prev/curr or pointer-to-pointer | Pointer-to-pointer | Fewer edge cases | | Sentinel | Use or not | Optional | Simplifies head operations |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | CRUD Tests | Verify operations | insert/find/delete | | Invariant Tests | Check list structure | cycle detection | | Error Tests | Missing key | delete nonexistent |

6.2 Critical Test Cases

  1. Delete head, then verify new head.
  2. Delete middle node and verify connectivity.
  3. Insert duplicate key updates value.

6.3 Test Data

Insert: (1, alpha), (2, beta)
Delete: 1
Expected list: (2, beta)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Freeing node before saving next | Crash | Save next first | | Not updating head on delete | Lost list | Use sentinel or pointer-to-pointer | | Leaking payloads | Valgrind leaks | Free payloads in delete |

7.2 Debugging Strategies

  • Use Valgrind to catch leaks.
  • Add a debug validator for cycles and size.

7.3 Performance Traps

Linked list search is O(n). Document this and avoid large-scale use.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a count command.
  • Add a clear command.

8.2 Intermediate Extensions

  • Add persistence to a file.
  • Add a sorted insert option.

8.3 Advanced Extensions

  • Replace list with a hash table for O(1) lookup.
  • Implement a skip list for faster searches.

9. Real-World Connections

9.1 Industry Applications

  • Simple caches and in-memory indexes.
  • Linked lists inside kernels and embedded systems.
  • Simple key-value stores in C.
  • Linked list utilities in OS kernels.

9.3 Interview Relevance

  • Linked list deletion and invariants.
  • Ownership and memory safety questions.

10. Resources

10.1 Essential Reading

  • “Algorithms in C” by Sedgewick.
  • “Effective C” by Robert Seacord.

10.2 Video Resources

  • Data structures lectures on linked lists.

10.3 Tools & Documentation

  • Valgrind Memcheck.

11. Self-Assessment Checklist

11.1 Understanding

  • I can state linked list invariants.
  • I can safely delete during iteration.
  • I can explain ownership of payloads.

11.2 Implementation

  • CRUD operations pass tests.
  • No leaks or invalid accesses.
  • CLI output matches golden transcript.

11.3 Growth

  • I can compare list vs array trade-offs.
  • I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Insert/find/delete work correctly.
  • Invariant checks pass.

Full Completion:

  • CLI and error handling included.

Excellence (Going Above & Beyond):

  • Persistence or alternate index structure.

13. Additional Content Rules (Hard Requirements)

13.1 Determinism

CLI demos use fixed command sequences and deterministic output.

13.2 Outcome Completeness

  • Success and failure demos included.
  • Exit codes specified in §3.7.5.

13.3 Cross-Linking

13.4 No Placeholder Text

All content is complete and explicit.