Project 6: Binary Search Tree & Balancing

Build a BST, then implement a self-balancing tree (AVL or Red-Black) to preserve O(log n).

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2 weeks
Language C (Alternatives: Rust, C++)
Prerequisites Projects 1-5
Key Topics BST invariants, rotations, balancing

1. Learning Objectives

By completing this project, you will:

  1. Implement BST insert, search, and delete.
  2. Visualize tree structure and rotations.
  3. Add balancing via AVL or Red-Black rules.
  4. Compare performance of balanced vs unbalanced trees.

2. Theoretical Foundation

2.1 Core Concepts

  • BST invariant: Left subtree < node < right subtree.
  • Tree height: Drives search and insert cost.
  • Rotations: Local restructuring to restore balance.
  • AVL vs Red-Black: Different balance guarantees and costs.

2.2 Why This Matters

Trees power ordered maps, file systems, indexes, and many OS structures. Without balancing, worst-case performance can degrade to O(n).

2.3 Historical Context / Background

AVL trees were introduced in 1962; Red-Black trees became widely used in the 1970s and are common in standard libraries.

2.4 Common Misconceptions

  • “BSTs are always fast”: Inserting sorted data makes them a linked list.
  • “Balancing is optional”: Real systems require predictable performance.

3. Project Specification

3.1 What You Will Build

  • A BST with insert/search/delete.
  • A tree visualizer for shape and height.
  • A balanced tree implementation (AVL or Red-Black).

3.2 Functional Requirements

  1. Implement BST operations and traversal.
  2. Measure tree height and node counts.
  3. Implement balancing rotations.
  4. Provide performance comparison on sorted vs random input.

3.3 Non-Functional Requirements

  • Performance: O(log n) average, near O(log n) worst-case after balancing.
  • Reliability: Handle delete cases (0, 1, 2 children).
  • Usability: Visualize tree structure textually.

3.4 Example Usage / Output

$ ./bst_demo
Insert: 10, 20, 5, 7
Tree height: 3
Rotate left at 10
Balanced height: 2

3.5 Real World Outcome

The program shows the BST after each insert. When you enable balancing, it prints rotation steps and the resulting height. You can compare how sorted input destroys the unbalanced tree and how balancing fixes it.


4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────────┐     ┌────────────────┐
│ BST Core     │────▶│ Balancing Logic  │────▶│ Visualizer     │
└──────────────┘     └──────────────────┘     └────────────────┘

4.2 Key Components

Component Responsibility Key Decisions
BST Node Key/value + children Store height or color
Rotation Restore balance Single vs double rotations
Visualizer Display structure ASCII tree output

4.3 Data Structures

typedef struct Node {
    int key;
    int value;
    int height;    // for AVL
    struct Node* left;
    struct Node* right;
} Node;

4.4 Algorithm Overview

Key Algorithm: AVL Insert

  1. Insert recursively like BST.
  2. Update height.
  3. Compute balance factor.
  4. Rotate if factor is outside [-1, 1].

Complexity Analysis:

  • Time: O(log n) after balancing
  • Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

clang -Wall -Wextra -g -o bst_demo src/*.c

5.2 Project Structure

project-root/
├── src/
│   ├── bst.c
│   ├── avl.c
│   ├── visualize.c
│   └── main.c
├── tests/
│   └── test_bst.c
└── Makefile

5.3 The Core Question You’re Answering

“How can I preserve ordered search while preventing the tree from turning into a list?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. BST property
  2. Tree traversal (in-order, pre-order)
  3. Rotation mechanics

5.5 Questions to Guide Your Design

  1. Will you implement AVL or Red-Black first?
  2. How will you update height or color after rotation?
  3. How will you handle duplicate keys?

5.6 Thinking Exercise

Insert keys 1..7 into a BST and draw the shape. Then draw how AVL rotations would balance it.

5.7 The Interview Questions They’ll Ask

  1. “Why do BSTs degrade to O(n) in the worst case?”
  2. “What is a rotation and why does it preserve order?”
  3. “Compare AVL vs Red-Black trees.”

5.8 Hints in Layers

Hint 1: Visualize early Print the tree after each insert to catch rotation bugs.

Hint 2: Start with AVL AVL has simpler invariants for learning.

5.9 Books That Will Help

Topic Book Chapter
BSTs CLRS Ch. 12
Balanced trees CLRS Ch. 13

5.10 Implementation Phases

Phase 1: Foundation (3 days)

Goals:

  • BST insert/search/delete

Tasks:

  1. Implement standard BST operations.
  2. Add traversal and height calculation.

Checkpoint: Insert/search/delete works on random input.

Phase 2: Core Functionality (5 days)

Goals:

  • AVL or RB balancing

Tasks:

  1. Implement rotations.
  2. Apply balancing after insert/delete.

Checkpoint: Height remains O(log n) for sorted inserts.

Phase 3: Polish & Edge Cases (4 days)

Goals:

  • Visualization and benchmarking

Tasks:

  1. Implement ASCII tree print.
  2. Compare operations counts on sorted vs random.

Checkpoint: Benchmark output shows balanced improvement.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Balanced tree AVL, Red-Black AVL Easier to implement first
Duplicate keys Reject, overwrite Overwrite Matches map semantics

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Rotations left, right, double
Integration Tests Insert/delete mixed operations
Edge Case Tests Skewed input 1..1000

6.2 Critical Test Cases

  1. Insert sorted keys and confirm height <= 1.44 log2(n).
  2. Delete node with two children.
  3. Verify in-order traversal is sorted.

6.3 Test Data

Keys: 1..1000 (sorted) and random shuffle

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Missing height update Wrong balance factor Update after rotation
Rotation mislinks Lost subtrees Carefully rewire pointers
Delete bugs Memory leaks Free removed nodes

7.2 Debugging Strategies

  • Print subtree heights and balance factors.
  • Use small trees to validate rotation cases.

7.3 Performance Traps

  • Ignoring balance factors can degrade to O(n).

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add range query (print keys in range).
  • Add iterative search to avoid recursion.

8.2 Intermediate Extensions

  • Implement Red-Black tree.
  • Add order statistics (kth smallest).

8.3 Advanced Extensions

  • B-tree implementation for disk storage.
  • Persistence (immutable BST).

9. Real-World Connections

9.1 Industry Applications

  • Databases: ordered indexes.
  • File systems: directories and inode trees.
  • Linux kernel: RB tree implementation.
  • PostgreSQL: B-tree indexes.

9.3 Interview Relevance

  • BST operations and rotations are classic interview topics.

10. Resources

10.1 Essential Reading

  • CLRS - Ch. 12-13
  • Algorithms, 4th Edition - balanced search trees

10.2 Video Resources

  • “AVL Trees” - data structures courses

10.3 Tools & Documentation

  • Graphviz: https://graphviz.org/

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain BST invariants.
  • I can perform rotations by hand.
  • I can compare AVL vs RB trees.

11.2 Implementation

  • Inserts and deletes maintain balance.
  • Traversals return sorted order.

11.3 Growth

  • I can reason about height bounds.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • BST with insert/search/delete.
  • Visualization of structure.

Full Completion:

  • Balanced tree with rotations.

Excellence (Going Above & Beyond):

  • Red-Black or B-tree implementation.

This guide was generated from DATA_STRUCTURES_FROM_FIRST_PRINCIPLES.md. For the complete learning path, see the parent directory README.