Project 3: Build a Graph Data Structure (Fighting the Borrow Checker)

Implement three graph representations (index-based, arena-based, and Rc/RefCell) and compare their trade-offs.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 2-3 weeks
Main Programming Language Rust (Alternatives: C++ for comparison)
Alternative Programming Languages C++, Java
Coolness Level High
Business Potential Medium
Prerequisites Ownership, lifetimes, interior mutability, basic graph algorithms
Key Topics Borrow splitting, graph design, Rc/RefCell, arena allocation

1. Learning Objectives

By completing this project, you will:

  1. Design three Rust graph APIs that satisfy borrowing rules in different ways.
  2. Compare performance and ergonomics of index-based vs reference-based graphs.
  3. Use Rc<RefCell<T>> safely and understand runtime borrow failures.
  4. Document ownership strategies and their invariants.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Ownership Strategies for Graphs

Fundamentals

Graphs are naturally cyclic, but Rust’s ownership model is acyclic: every value has exactly one owner. To build graphs in Rust, you must choose an ownership strategy that reconciles these constraints. Common strategies are: (1) index-based graphs that store nodes in a Vec<Node> and refer to nodes by indices, (2) arena-based graphs that store nodes in an arena and return references tied to the arena lifetime, and (3) Rc/RefCell-based graphs that use shared ownership and interior mutability for cycles. Each strategy trades compile-time safety for different runtime costs.

Deep Dive into the concept

An index-based graph stores all nodes in a vector and uses numeric indices for edges. This avoids lifetimes entirely: a node is identified by a stable integer that refers to its position in the vector. Because the vector is owned by the graph, you can guarantee nodes live as long as the graph. The downside is that you must manage indices carefully: removing nodes is tricky because indices shift unless you use a slot map or arena index pattern. You also lose direct borrowing; you must borrow the vector to access nodes, which leads to borrow checker challenges when you want to mutate two nodes at once. The typical solution is to split the vector into disjoint slices or use split_at_mut, which demonstrates borrow splitting.

Arena-based graphs store nodes in a bump arena and return references with lifetime 'a tied to the arena. This gives you ergonomic &Node or &mut Node references without indices, and it preserves stable addresses. The trade-off is that you cannot remove nodes individually, and the arena must outlive all references. This is often acceptable in compilers or query planners where graphs live for the duration of a pass. The arena also encourages immutability: if you return &'a Node, you can’t mutate nodes without additional mechanisms.

Rc/RefCell-based graphs use shared ownership. Each node is stored in an Rc<RefCell<Node>>, and edges store cloned Rcs. This allows true cycles because multiple nodes share ownership. Mutation is allowed via RefCell, which enforces borrowing rules at runtime. The cost is runtime overhead and possible panics when borrow rules are violated. This strategy is the most flexible but also the least efficient. It’s a good teaching tool because it exposes the difference between compile-time and runtime enforcement.

The key insight is that ownership strategies are API design decisions. An index-based graph surfaces indices in its API, which can be less ergonomic but highly performant. An arena-based graph exposes lifetimes, which can be challenging but leads to safe borrowing. Rc/RefCell graphs are ergonomic but have runtime costs and less predictable performance. In real systems, you often mix strategies: e.g., an index-based core with a temporary Rc-based view for a GUI.

Rust’s borrow checker pushes you to design APIs that encode these decisions. For example, an index-based graph can offer methods like fn edge_mut(&mut self, a: NodeId, b: NodeId) -> (&mut Node, &mut Node) by using split_at_mut to ensure disjoint borrows. An arena-based graph might restrict mutation to a builder phase with &mut self, then expose read-only references afterward. These are not arbitrary restrictions; they express safety invariants that Rust can enforce.

How this fit on projects

This concept drives §3.1 (what you build) and §4.2 (components). It is also referenced in Project 5 (string interner ownership) and Project 10 (safe abstraction design).

Definitions & key terms

  • Index-based graph: Nodes stored in a vector, edges store indices.
  • Arena-based graph: Nodes stored in an arena, edges store references.
  • Rc/RefCell graph: Nodes stored with shared ownership and runtime borrow checks.
  • Borrow splitting: Technique to get multiple mutable borrows safely.

Mental model diagram (ASCII)

Index-based:
Vec<Node>  edges: [0, 2, 5]

Arena-based:
Arena<Node>  edges: &Node

Rc/RefCell:
Rc<RefCell<Node>> edges: Rc clones

How it works (step-by-step)

  1. Choose ownership strategy based on required mutability and lifetime.
  2. Implement node storage and edge representation.
  3. Design traversal APIs that respect borrow rules.
  4. Benchmark and compare trade-offs.

Minimal concrete example

struct Graph { nodes: Vec<Node> }
struct Node { edges: Vec<usize> }

Common misconceptions

  • “Rc/RefCell is always easiest.” It is easiest to start but costliest in runtime.
  • “Index graphs are unsafe.” They are safe if you keep indices valid.
  • “Arena graphs allow removals.” They do not without extra indirection.

Check-your-understanding questions

  1. Why do index-based graphs avoid lifetimes?
  2. What trade-off does Rc/RefCell introduce?
  3. When is arena-based the best choice?

Check-your-understanding answers

  1. Because indices are plain integers, not references.
  2. Runtime borrow checks and refcount overhead.
  3. When the graph lives for a fixed phase and removals are rare.

Real-world applications

  • Compiler IR graphs.
  • Dependency graphs in build systems.
  • Social graphs (often index-based for scale).

Where you’ll apply it

References

  • “Too Many Lists” (linked list ownership patterns).
  • TRPL Ch. 15 (Rc, RefCell).

Key insights

Ownership strategy is API design: it determines what the borrow checker can prove.

Summary

Graphs in Rust are about choosing the right ownership model, not forcing one structure into all use cases.

Homework/Exercises to practice the concept

  1. Implement a tiny index-based graph and add/remove nodes.
  2. Write a RefCell borrow violation and observe panic.
  3. Compare memory usage of index vs Rc graphs.

Solutions to the homework/exercises

  1. Use a Vec<Option<Node>> to handle removals.
  2. Borrow mutably twice without dropping the first borrow.
  3. Rc uses more heap allocations and refcount overhead.

2.2 Borrow Splitting and Disjoint Mutability

Fundamentals

Rust allows either one mutable borrow or many shared borrows. When you need two mutable references to distinct elements of a collection, you must prove to the compiler that they do not alias. This is called borrow splitting. The standard tool is split_at_mut, which divides a slice into two non-overlapping slices, allowing mutable borrows into each. This is essential for algorithms like swapping nodes or updating multiple edges in a graph.

Deep Dive into the concept

Borrow splitting is a pattern that turns global mutability into local, disjoint mutability. Consider a Vec<Node>. If you call let a = &mut vec[i]; let b = &mut vec[j];, the compiler rejects it because it cannot prove that i != j. However, you can split the slice into two parts: let (left, right) = vec.split_at_mut(max(i, j)); then index into each half, ensuring disjointness. This is a compile-time proof that the borrows do not alias.

In graph algorithms, you often need to mutate two nodes or edges simultaneously. For example, adding an edge might require updating adjacency lists for both nodes. With index-based graphs, you must implement helper functions that take two indices, order them, split the vector, and return mutable references. This is tedious but demonstrates how Rust encodes aliasing rules into code structure.

Borrow splitting also highlights why graph APIs sometimes return handles instead of references. A handle (NodeId) can be stored and passed around without borrowing the graph. The actual borrowing happens when you call a method like with_nodes(a, b, |na, nb| { ... }) that internally splits the borrow. This API design makes the borrow checker happy and keeps lifetimes local. It’s a key design lesson: you often need to reorganize APIs around borrowing, not just data structures.

In arena-based graphs, borrow splitting is less common because you often hold immutable references and avoid mutation. In Rc/RefCell graphs, borrow splitting shifts to runtime: RefCell enforces disjointness at runtime, and double mutable borrows cause panics. This is an explicit trade-off between compile-time complexity and runtime overhead.

A practical debugging tip is to use scopes. Shorten borrows by limiting their lexical scope. When you structure code so that mutable borrows end before new borrows begin, the compiler can accept patterns that otherwise seem impossible. This is especially important in graph traversals and updates.

How this fit on projects

This concept is applied in §5.5 (design questions) and §5.10 (Phase 2) for index-based graphs. It is also relevant in Project 4 (queue operations with multiple locks).

Definitions & key terms

  • Borrow splitting: Proving disjoint mutable borrows.
  • Aliasing: Two references to the same memory.
  • Disjointness: Guaranteeing non-overlap.

Mental model diagram (ASCII)

Vec<Node>
[0][1][2][3][4]
 split_at_mut(2)
 left: [0][1]
 right:[2][3][4]

How it works (step-by-step)

  1. Order indices so that i < j.
  2. Split vector at j.
  3. Borrow left[i] and right[0] (which is original j).
  4. Mutate safely.

Minimal concrete example

let (a, b) = two_mut(&mut nodes, i, j);

Common misconceptions

  • “The compiler should know indices differ.” It does not prove runtime values.
  • “Borrow splitting is only for slices.” It’s a general principle of aliasing.

Check-your-understanding questions

  1. Why does split_at_mut allow two mutable borrows?
  2. What happens if i == j?
  3. How can you design APIs to hide borrow splitting?

Check-your-understanding answers

  1. It produces two non-overlapping slices, proving disjointness.
  2. You would borrow the same element twice; must guard against it.
  3. Use closures that perform the split internally.

Real-world applications

  • Matrix algorithms needing two mutable rows.
  • Graph edge updates.

Where you’ll apply it

References

  • Rust Book chapter on borrowing.
  • Rustonomicon “Borrow Splitting”.

Key insights

Borrow splitting is how you encode disjointness into code so the compiler can prove safety.

Summary

You can mutate two elements safely if you can prove they are disjoint; split_at_mut is the tool.

Homework/Exercises to practice the concept

  1. Write a helper two_mut that returns two &mut for distinct indices.
  2. Add a test that panics when indices are equal.

Solutions to the homework/exercises

  1. Use split_at_mut and index into both halves.
  2. Guard with assert!(i != j).

2.3 Interior Mutability in Graphs

Fundamentals

RefCell allows mutation through shared references, enforcing borrowing rules at runtime. In graph structures, Rc<RefCell<Node>> enables cycles and shared mutation. The trade-off is runtime overhead and possible panics if borrow rules are violated. This strategy is common when you need flexible graph mutation and are willing to pay runtime checks.

Deep Dive into the concept

In a graph, each node may need to mutate its neighbors or itself while being shared across multiple owners. Rc<RefCell<Node>> is the standard Rust pattern: Rc provides shared ownership, RefCell provides runtime-checked mutability. Each borrow is tracked dynamically; if you try to mutably borrow while an immutable borrow exists (or another mutable borrow), the program panics. This makes correctness a runtime property, not a compile-time one.

The major benefit is flexibility. You can store Rc clones in edges and freely traverse the graph. When you need to modify a node, you call borrow_mut(). The API is ergonomic and closely mirrors how you might code in a GC language. The downside is that you pay runtime checks and can crash at runtime if you violate borrowing rules. This makes it important to design careful traversal patterns and ensure borrows are short-lived.

RefCell also interacts with Drop. If you keep Ref guards alive too long, you may block mutation and create unexpected panics. The fix is to scope borrows tightly, often using braces. Another risk is cycles: Rc cycles leak memory. You can mitigate this by using Weak for back edges or parent pointers. This ties back to Project 2.

Interior mutability is a trade-off: it gives you expressive power but moves safety guarantees from compile time to runtime. In a project like this, it is essential to build tests that validate borrow patterns, and to document which APIs may panic. A robust design can even wrap RefCell and expose safer APIs that avoid reentrant borrowing.

How this fit on projects

This concept is used in the Rc-based graph implementation and is cross-linked with Project 2’s refcounting and weak references. It also prepares you for Project 4’s use of mutexes for thread-safe interior mutability.

Definitions & key terms

  • RefCell: Runtime borrow-checked interior mutability.
  • Borrow guard: Ref or RefMut that enforces borrow rules.
  • Reentrant borrow: Nested borrows that can cause panics.

Mental model diagram (ASCII)

Rc -> RefCell<Node>
borrow()   => Ref<Node>
borrow_mut => RefMut<Node>

How it works (step-by-step)

  1. Call borrow() to read; counter increments.
  2. Call borrow_mut() to write; requires no active borrows.
  3. Drop guards to release borrows.

Minimal concrete example

let n = Rc::new(RefCell::new(Node::new()));
{
    let mut v = n.borrow_mut();
    v.value += 1;
}

Common misconceptions

  • RefCell is always safe.” It can panic at runtime.
  • Rc<RefCell<T>> is thread-safe.” It is not.

Check-your-understanding questions

  1. What happens if you call borrow_mut twice?
  2. Why are RefCell panics preferable to UB?
  3. How do you avoid long-lived borrow guards?

Check-your-understanding answers

  1. It panics because a mutable borrow already exists.
  2. Panics prevent undefined behavior by enforcing rules at runtime.
  3. Use smaller scopes or immediate value extraction.

Real-world applications

  • AST mutation in compilers during optimization passes.
  • GUI state graphs in single-threaded apps.

Where you’ll apply it

References

  • TRPL Ch. 15 (RefCell).

Key insights

RefCell trades compile-time checks for runtime checks to enable flexible mutation.

Summary

If you need cycles and mutation, Rc<RefCell<T>> is the practical but costly choice.

Homework/Exercises to practice the concept

  1. Create a small cycle graph with Rc<RefCell<Node>>.
  2. Trigger a runtime borrow panic and document the stack trace.

Solutions to the homework/exercises

  1. Use Rc::clone for edges and Weak for back edges.
  2. Keep a Ref alive and call borrow_mut.

3. Project Specification

3.1 What You Will Build

A Rust crate graph_lab with three graph implementations (index-based, arena-based, Rc/RefCell-based), plus traversal algorithms (DFS/BFS/topo), benchmarks, and a CLI comparison tool.

3.2 Functional Requirements

  1. Implement all three graph representations with similar APIs.
  2. Provide DFS, BFS, and topological sort for each.
  3. Include benchmark comparing performance and memory.
  4. Provide CLI demo for deterministic traversal outputs.

3.3 Non-Functional Requirements

  • Correctness: Traversals return expected ordering.
  • Performance: Benchmarks show relative costs.
  • Usability: APIs documented with trade-offs.

3.4 Example Usage / Output

$ cargo run --example graph_demo
arena: dfs=ok, bfs=ok, topo=ok
index: dfs=ok, bfs=ok, topo=ok
rc:    dfs=ok, bfs=ok, topo=ok
exit code: 0

3.5 Data Formats / Schemas / Protocols

  • Node storage: Vec<Node> / Arena<Node> / Rc<RefCell<Node>>.
  • Edge storage: Vec<usize> or Vec<Rc<RefCell<Node>>>.

3.6 Edge Cases

  • Graph with zero nodes.
  • Self-loop edges.
  • Removing nodes in index-based model.

3.7 Real World Outcome

Deterministic demo and benchmark outputs.

3.7.1 How to Run (Copy/Paste)

cargo run --example graph_demo
cargo bench

3.7.2 Golden Path Demo (Deterministic)

  • Use a fixed graph input file with 10 nodes and 20 edges.
  • Traversal outputs must match expected ordering.

3.7.3 CLI Transcript (Success)

$ cargo run --example graph_demo -- --input data/graph.json
arena: dfs=ok, bfs=ok, topo=ok
index: dfs=ok, bfs=ok, topo=ok
rc:    dfs=ok, bfs=ok, topo=ok
exit code: 0

3.7.4 Failure Demo (Bad Input)

$ cargo run --example graph_demo -- --input missing.json
error: input file not found
exit code: 2

4. Solution Architecture

4.1 High-Level Design

           +-------------------+
           | Graph API (traits)|
           +---------+---------+
                     |
         +-----------+-----------+
         |           |           |
   IndexGraph   ArenaGraph   RcGraph

4.2 Key Components

Component Responsibility Key Decisions
Graph trait Common API Minimal methods
IndexGraph Index-based NodeId handles
ArenaGraph Arena-based Lifetime-tied refs
RcGraph Rc/RefCell Runtime borrow checks
Benchmarks Compare perf Fixed dataset

4.4 Data Structures (No Full Code)

struct Node { edges: Vec<NodeId> }
struct IndexGraph { nodes: Vec<Node> }

4.4 Algorithm Overview

Key Algorithm: DFS

  1. Maintain stack of node IDs.
  2. Mark visited nodes.
  3. Traverse edges.

Complexity Analysis:

  • Time: O(V + E)
  • Space: O(V)

5. Implementation Guide

5.1 Development Environment Setup

cargo new graph_lab
cd graph_lab
cargo add serde serde_json

5.2 Project Structure

graph_lab/
├── src/
│   ├── lib.rs
│   ├── index_graph.rs
│   ├── arena_graph.rs
│   └── rc_graph.rs
├── examples/
│   └── graph_demo.rs
└── benches/
    └── graph_bench.rs

5.3 The Core Question You’re Answering

“How do I represent cyclic graphs in Rust while still obeying ownership and borrowing rules?”

5.4 Concepts You Must Understand First

  1. Ownership strategies (indices, arena, Rc/RefCell).
  2. Borrow splitting (split_at_mut).
  3. Runtime borrow checks with RefCell.

5.5 Questions to Guide Your Design

  1. How will node IDs stay stable in index-based graphs?
  2. When do you choose arena vs Rc?
  3. How do you avoid long-lived RefCell borrows?

5.6 Thinking Exercise

Design the same graph API in three styles and list the borrow checker challenges.

5.7 The Interview Questions They’ll Ask

  1. “Why are self-referential structures difficult in Rust?”
  2. “What is the cost of Rc<RefCell<T>>?”
  3. “How does split_at_mut help?”

5.8 Hints in Layers

Hint 1: Start with index-based graph. Hint 2: Add arena-based graph with lifetime ties. Hint 3: Implement Rc-based graph last to compare trade-offs.

5.9 Books That Will Help

| Topic | Book | Chapter | |—|—|—| | Ownership patterns | Too Many Lists | Graph-like structures | | Rc/RefCell | TRPL | Ch. 15 | | Borrow splitting | Rustonomicon | “Borrow Splitting” |

5.10 Implementation Phases

Phase 1: Index-Based Graph (4-5 days)

Goals:

  • Implement NodeId and adjacency list.

Checkpoint: DFS/BFS pass tests.

Phase 2: Arena Graph (3-4 days)

Goals:

  • Store nodes in arena and return references.

Checkpoint: Traversals work with lifetimes.

Phase 3: Rc Graph (3-4 days)

Goals:

  • Implement Rc/RefCell graph with cycles.

Checkpoint: Mutation and traversal work without panics.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |—|—|—|—| | Node storage | Vec vs arena | Start with Vec | Simpler | | Edge representation | indices vs refs | Indices for core | Avoid lifetimes | | Mutability | RefCell vs split | split_at_mut | Compile-time safety |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |—|—|—| | Unit Tests | Node add/remove | add_edge, remove_node | | Integration Tests | CLI demo | deterministic traversal | | Edge Case Tests | Self loops | detect cycles |

6.2 Critical Test Cases

  1. DFS/BFS produce expected order on fixed graph.
  2. Borrow errors do not occur in index-based implementation.
  3. Rc-based graph handles cycles without leaks (use Weak).

6.3 Test Data

node_count=10
edge_count=20
seed=42

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |—|—|—| | Double mutable borrow | compiler error | use split_at_mut | | Rc cycle leak | memory leak | use Weak for back edges | | Long-lived borrow | RefCell panic | shorten scopes |

7.2 Debugging Strategies

  • Add logging around borrow scopes.
  • Use cargo miri for Rc misuse.

7.3 Performance Traps

  • Rc graph allocations dominate runtime.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add graph serialization to JSON.

8.2 Intermediate Extensions

  • Implement weighted edges and Dijkstra.

8.3 Advanced Extensions

  • Add concurrent traversal using work-stealing.

9. Real-World Connections

9.1 Industry Applications

  • Graph-based query planners.
  • Dependency analysis in build systems.
  • petgraph crate.

9.3 Interview Relevance

  • Explain ownership strategies for complex data structures.

10. Resources

10.1 Essential Reading

  • TRPL Ch. 15.
  • Rustonomicon “Borrow Splitting”.

10.2 Video Resources

  • RustConf talks on ownership patterns.

10.3 Tools & Documentation

  • cargo bench

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain three ownership strategies for graphs.
  • I can describe borrow splitting.

11.2 Implementation

  • All three graph implementations pass tests.
  • Benchmarks compare performance.

11.3 Growth

  • I can justify the trade-offs in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Index-based graph with DFS/BFS.
  • CLI demo with deterministic output and failure case.

Full Completion:

  • Arena and Rc graphs implemented with benchmarks.

Excellence (Going Above & Beyond):

  • Weighted graphs and advanced algorithms.