Project 6: Binary Search Tree & Balancing
Build a BST, then implement a self-balancing tree (AVL or Red-Black) to preserve O(log n).
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2 weeks |
| Language | C (Alternatives: Rust, C++) |
| Prerequisites | Projects 1-5 |
| Key Topics | BST invariants, rotations, balancing |
1. Learning Objectives
By completing this project, you will:
- Implement BST insert, search, and delete.
- Visualize tree structure and rotations.
- Add balancing via AVL or Red-Black rules.
- Compare performance of balanced vs unbalanced trees.
2. Theoretical Foundation
2.1 Core Concepts
- BST invariant: Left subtree < node < right subtree.
- Tree height: Drives search and insert cost.
- Rotations: Local restructuring to restore balance.
- AVL vs Red-Black: Different balance guarantees and costs.
2.2 Why This Matters
Trees power ordered maps, file systems, indexes, and many OS structures. Without balancing, worst-case performance can degrade to O(n).
2.3 Historical Context / Background
AVL trees were introduced in 1962; Red-Black trees became widely used in the 1970s and are common in standard libraries.
2.4 Common Misconceptions
- “BSTs are always fast”: Inserting sorted data makes them a linked list.
- “Balancing is optional”: Real systems require predictable performance.
3. Project Specification
3.1 What You Will Build
- A BST with insert/search/delete.
- A tree visualizer for shape and height.
- A balanced tree implementation (AVL or Red-Black).
3.2 Functional Requirements
- Implement BST operations and traversal.
- Measure tree height and node counts.
- Implement balancing rotations.
- Provide performance comparison on sorted vs random input.
3.3 Non-Functional Requirements
- Performance: O(log n) average, near O(log n) worst-case after balancing.
- Reliability: Handle delete cases (0, 1, 2 children).
- Usability: Visualize tree structure textually.
3.4 Example Usage / Output
$ ./bst_demo
Insert: 10, 20, 5, 7
Tree height: 3
Rotate left at 10
Balanced height: 2
3.5 Real World Outcome
The program shows the BST after each insert. When you enable balancing, it prints rotation steps and the resulting height. You can compare how sorted input destroys the unbalanced tree and how balancing fixes it.
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌──────────────────┐ ┌────────────────┐
│ BST Core │────▶│ Balancing Logic │────▶│ Visualizer │
└──────────────┘ └──────────────────┘ └────────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| BST Node | Key/value + children | Store height or color |
| Rotation | Restore balance | Single vs double rotations |
| Visualizer | Display structure | ASCII tree output |
4.3 Data Structures
typedef struct Node {
int key;
int value;
int height; // for AVL
struct Node* left;
struct Node* right;
} Node;
4.4 Algorithm Overview
Key Algorithm: AVL Insert
- Insert recursively like BST.
- Update height.
- Compute balance factor.
- Rotate if factor is outside [-1, 1].
Complexity Analysis:
- Time: O(log n) after balancing
- Space: O(n)
5. Implementation Guide
5.1 Development Environment Setup
clang -Wall -Wextra -g -o bst_demo src/*.c
5.2 Project Structure
project-root/
├── src/
│ ├── bst.c
│ ├── avl.c
│ ├── visualize.c
│ └── main.c
├── tests/
│ └── test_bst.c
└── Makefile
5.3 The Core Question You’re Answering
“How can I preserve ordered search while preventing the tree from turning into a list?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- BST property
- Tree traversal (in-order, pre-order)
- Rotation mechanics
5.5 Questions to Guide Your Design
- Will you implement AVL or Red-Black first?
- How will you update height or color after rotation?
- How will you handle duplicate keys?
5.6 Thinking Exercise
Insert keys 1..7 into a BST and draw the shape. Then draw how AVL rotations would balance it.
5.7 The Interview Questions They’ll Ask
- “Why do BSTs degrade to O(n) in the worst case?”
- “What is a rotation and why does it preserve order?”
- “Compare AVL vs Red-Black trees.”
5.8 Hints in Layers
Hint 1: Visualize early Print the tree after each insert to catch rotation bugs.
Hint 2: Start with AVL AVL has simpler invariants for learning.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| BSTs | CLRS | Ch. 12 |
| Balanced trees | CLRS | Ch. 13 |
5.10 Implementation Phases
Phase 1: Foundation (3 days)
Goals:
- BST insert/search/delete
Tasks:
- Implement standard BST operations.
- Add traversal and height calculation.
Checkpoint: Insert/search/delete works on random input.
Phase 2: Core Functionality (5 days)
Goals:
- AVL or RB balancing
Tasks:
- Implement rotations.
- Apply balancing after insert/delete.
Checkpoint: Height remains O(log n) for sorted inserts.
Phase 3: Polish & Edge Cases (4 days)
Goals:
- Visualization and benchmarking
Tasks:
- Implement ASCII tree print.
- Compare operations counts on sorted vs random.
Checkpoint: Benchmark output shows balanced improvement.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Balanced tree | AVL, Red-Black | AVL | Easier to implement first |
| Duplicate keys | Reject, overwrite | Overwrite | Matches map semantics |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Rotations | left, right, double |
| Integration Tests | Insert/delete | mixed operations |
| Edge Case Tests | Skewed input | 1..1000 |
6.2 Critical Test Cases
- Insert sorted keys and confirm height <= 1.44 log2(n).
- Delete node with two children.
- Verify in-order traversal is sorted.
6.3 Test Data
Keys: 1..1000 (sorted) and random shuffle
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Missing height update | Wrong balance factor | Update after rotation |
| Rotation mislinks | Lost subtrees | Carefully rewire pointers |
| Delete bugs | Memory leaks | Free removed nodes |
7.2 Debugging Strategies
- Print subtree heights and balance factors.
- Use small trees to validate rotation cases.
7.3 Performance Traps
- Ignoring balance factors can degrade to O(n).
8. Extensions & Challenges
8.1 Beginner Extensions
- Add range query (print keys in range).
- Add iterative search to avoid recursion.
8.2 Intermediate Extensions
- Implement Red-Black tree.
- Add order statistics (kth smallest).
8.3 Advanced Extensions
- B-tree implementation for disk storage.
- Persistence (immutable BST).
9. Real-World Connections
9.1 Industry Applications
- Databases: ordered indexes.
- File systems: directories and inode trees.
9.2 Related Open Source Projects
- Linux kernel: RB tree implementation.
- PostgreSQL: B-tree indexes.
9.3 Interview Relevance
- BST operations and rotations are classic interview topics.
10. Resources
10.1 Essential Reading
- CLRS - Ch. 12-13
- Algorithms, 4th Edition - balanced search trees
10.2 Video Resources
- “AVL Trees” - data structures courses
10.3 Tools & Documentation
- Graphviz: https://graphviz.org/
10.4 Related Projects in This Series
- Previous Project: Hash Table
- Next Project: Heap & Priority Queue
11. Self-Assessment Checklist
11.1 Understanding
- I can explain BST invariants.
- I can perform rotations by hand.
- I can compare AVL vs RB trees.
11.2 Implementation
- Inserts and deletes maintain balance.
- Traversals return sorted order.
11.3 Growth
- I can reason about height bounds.
12. Submission / Completion Criteria
Minimum Viable Completion:
- BST with insert/search/delete.
- Visualization of structure.
Full Completion:
- Balanced tree with rotations.
Excellence (Going Above & Beyond):
- Red-Black or B-tree implementation.
This guide was generated from DATA_STRUCTURES_FROM_FIRST_PRINCIPLES.md. For the complete learning path, see the parent directory README.