LEARN POSTGRESQL DEEP DIVE

Learn PostgreSQL: From Zero to Database Internals Master

Goal: Deeply understand PostgreSQL—not just SQL syntax, but how it works behind the scenes. Master MVCC, the query planner, storage engine, WAL, indexes, and understand why PostgreSQL makes the architectural choices that differentiate it from MySQL, Oracle, and other databases.

Why PostgreSQL Matters

PostgreSQL is not just another SQL database. It’s arguably the most advanced open-source relational database, and understanding its internals teaches you:

MVCC (Multi-Version Concurrency Control): How to handle concurrent access without locks
Write-Ahead Logging (WAL): The foundation of crash recovery and replication
Cost-Based Query Planning: How databases decide between index scans and sequential scans
B-tree and Beyond: Why different index types exist for different problems
Transaction Isolation: What “ACID” actually means in practice

After completing these projects, you will:

Understand every byte in a PostgreSQL page
Know exactly what happens when you run SELECT, INSERT, UPDATE
Read EXPLAIN ANALYZE output like a book
Build your own mini-database with PostgreSQL-style MVCC
Know when PostgreSQL is the right choice vs MySQL, Oracle, or others

PostgreSQL vs Other Databases: The Key Differences

Before diving into projects, understanding PostgreSQL’s unique architecture choices:

PostgreSQL vs MySQL

Aspect	PostgreSQL	MySQL (InnoDB)
MVCC Implementation	Old versions stored in main table (heap)	Old versions in separate “undo logs” (rollback segment)
Cleanup	VACUUM required to reclaim space	Automatic purge of undo logs
Indexing	Indexes store TIDs (pointer to heap)	Clustered index stores data, secondary indexes point to PK
Read Performance	Faster (no version reconstruction)	May need to read undo log for old versions
Write Amplification	Higher (UPDATE = DELETE + INSERT)	Lower (in-place updates possible)
VACUUM Overhead	Must run periodically	No equivalent needed
Extensions	First-class (PostGIS, pg_trgm, etc.)	Limited
SQL Compliance	Very high	Moderate (silent truncations, etc.)
JSON Support	JSONB (binary, indexable)	JSON (text-based)

PostgreSQL vs Oracle

Aspect	PostgreSQL	Oracle
License	Open source (PostgreSQL License)	Commercial (expensive)
MVCC	Similar approach (multi-versioning)	Undo tablespace
Procedural Language	PL/pgSQL, PL/Python, etc.	PL/SQL
Partitioning	Declarative (since v10)	Highly mature
RAC Equivalent	None (use Citus, Patroni)	Real Application Clusters
Feature Parity	~95% for most use cases	Industry leader in enterprise features

PostgreSQL vs SQLite

Aspect	PostgreSQL	SQLite
Architecture	Client-server	Embedded (single file)
Concurrency	Full MVCC, many writers	Single writer at a time
Use Case	Production applications	Mobile, embedded, testing
Scale	Petabytes	Gigabytes (practical)

Core Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                     PostgreSQL Architecture                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                       │
│   Client ───► Postmaster ───► Backend Process (per connection)       │
│                    │                    │                             │
│                    │         ┌──────────┴──────────┐                 │
│                    │         │                     │                 │
│                    ▼         ▼                     ▼                 │
│              ┌─────────────────────────────────────────┐             │
│              │           Shared Memory                  │             │
│              │  ┌─────────────┐  ┌─────────────────┐   │             │
│              │  │ Shared      │  │ WAL Buffers     │   │             │
│              │  │ Buffers     │  │ (Write-Ahead    │   │             │
│              │  │ (Page Cache)│  │  Log)           │   │             │
│              │  └──────┬──────┘  └────────┬────────┘   │             │
│              └─────────┼──────────────────┼────────────┘             │
│                        │                  │                          │
│              ┌─────────┼──────────────────┼────────────┐             │
│              │         ▼                  ▼            │             │
│   Storage    │  ┌─────────────┐    ┌─────────────┐    │             │
│              │  │ Data Files  │    │ WAL Files   │    │             │
│              │  │ (Tables,    │    │ (pg_wal/)   │    │             │
│              │  │  Indexes)   │    │             │    │             │
│              │  └─────────────┘    └─────────────┘    │             │
│              └────────────────────────────────────────┘             │
│                                                                       │
│   Background Processes:                                               │
│   • WAL Writer      • Checkpointer    • Autovacuum                  │
│   • Background Writer  • Stats Collector  • WAL Archiver             │
│                                                                       │
└─────────────────────────────────────────────────────────────────────┘

Fundamental Concepts

Page-Based Storage
- Everything is stored in 8KB pages (configurable at compile time)
- Tables (heaps), indexes, WAL all use pages
- Pages contain: header (24 bytes), item pointers, free space, tuples
MVCC (Multi-Version Concurrency Control)
- Each tuple has xmin (transaction that inserted) and xmax (transaction that deleted/updated)
- Readers never block writers, writers never block readers
- Old versions accumulate → need VACUUM
Write-Ahead Logging (WAL)
- All changes written to WAL before data files
- Enables crash recovery: replay WAL from last checkpoint
- Foundation for replication (streaming and logical)
Transaction Isolation
- Read Committed (default): See committed changes from other transactions
- Repeatable Read: Snapshot at transaction start
- Serializable: Full isolation (SSI - Serializable Snapshot Isolation)

Query Processing Pipeline

SQL Text → Parser → Rewriter → Planner/Optimizer → Executor → Results

Index Types
- B-tree: Default, for equality and range queries
- Hash: Equality only (rarely used, now WAL-logged)
- GiST: Generalized Search Tree (geometric, full-text)
- GIN: Generalized Inverted Index (full-text, arrays, JSONB)
- BRIN: Block Range Index (very large tables, sorted data)

Project List

Projects are ordered from foundational concepts to advanced implementations.

Project 1: Build a Page-Based Storage Engine

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Storage Engines / File Systems
Software or Tool: Custom Storage Engine
Main Book: “Database Internals” by Alex Petrov

What you’ll build: A simple page-based storage engine that stores records in 8KB pages, manages free space, and supports insert/read operations. This is the foundation PostgreSQL (and most databases) build upon.

Why it teaches PostgreSQL: PostgreSQL’s heap storage is built on pages. Understanding page layout, tuple storage, and free space management is essential to understanding VACUUM, bloat, and performance.

Core challenges you’ll face:

Page layout design → maps to PostgreSQL’s PageHeaderData, ItemId, tuples
Free space management → maps to Free Space Map (FSM)
Tuple addressing → maps to TID (tuple identifier) = (page, offset)
Buffer management → maps to shared_buffers, page replacement

Key Concepts:

Page Structure: PostgreSQL Docs - Database Page Layout
Heap Files: “Database Internals” Chapter 3 - Alex Petrov
Buffer Pool: “Database Internals” Chapter 5 - Alex Petrov

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: C programming, understanding of file I/O, basic data structures

Real world outcome:

# Initialize storage file
$ ./pagestore init mydb.dat

# Insert records
$ ./pagestore insert mydb.dat "Hello, World!"
Inserted at TID (0, 1)

$ ./pagestore insert mydb.dat "PostgreSQL rocks!"
Inserted at TID (0, 2)

# Read by TID
$ ./pagestore read mydb.dat 0 1
Hello, World!

# Show page info
$ ./pagestore pageinfo mydb.dat 0
Page 0:
  Lower: 32 (items start)
  Upper: 8160 (free space end)
  Free Space: 8128 bytes
  Items: 2
    [1] offset=8180, len=16, data="Hello, World!"
    [2] offset=8160, len=20, data="PostgreSQL rocks!"

# Fill up page, see it create a new one
$ for i in {1..400}; do ./pagestore insert mydb.dat "Record $i"; done
...
Inserted at TID (1, 45)  # Now using page 1!

Implementation Hints:

PostgreSQL page layout (8KB):

+-------------------+
| PageHeaderData    |  24 bytes
| (pd_lsn, pd_lower,|
|  pd_upper, etc.)  |
+-------------------+
| ItemId Array      |  4 bytes each, grows downward
| [1] [2] [3] ...   |
+-------------------+
|                   |
| Free Space        |
|                   |
+-------------------+
| Tuple 3           |  Tuples grow upward
| Tuple 2           |
| Tuple 1           |
+-------------------+
| Special Space     |  (Optional, for indexes)
+-------------------+

Core structures:

#define PAGE_SIZE 8192

typedef struct PageHeader {
    uint32_t pd_lsn;        // WAL position (we'll simplify)
    uint16_t pd_lower;      // Offset to start of free space
    uint16_t pd_upper;      // Offset to end of free space
    uint16_t pd_special;    // Offset to special space
    // ... more fields in real PostgreSQL
} PageHeader;

typedef struct ItemId {
    uint16_t lp_off;        // Offset to tuple
    uint16_t lp_len;        // Length of tuple
    // In PostgreSQL: also has lp_flags
} ItemId;

Insert algorithm:

Check if tuple fits: pd_upper - pd_lower >= sizeof(ItemId) + tuple_size
If not, allocate new page
Write tuple at pd_upper - tuple_size
Add ItemId at position pd_lower
Update pd_lower and pd_upper

Learning milestones:

Pages store and retrieve data → You understand basic page layout
Free space is tracked correctly → You understand space management
Multiple pages work → You understand file growth
TIDs are stable → You understand tuple addressing

Project 2: Implement MVCC (Multi-Version Concurrency Control)

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go, C++
Coolness Level: Level 5: Pure Magic
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Database Internals / Concurrency
Software or Tool: MVCC Implementation
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: Extend your storage engine with MVCC. Each tuple has xmin/xmax transaction IDs. Transactions see consistent snapshots. UPDATEs create new versions.

Why it teaches PostgreSQL: MVCC is PostgreSQL’s core concurrency mechanism. Understanding xmin/xmax, visibility rules, and why dead tuples accumulate explains VACUUM, bloat, and transaction isolation.

Core challenges you’ll face:

Transaction ID management → maps to xid, xid wraparound
Visibility rules → maps to snapshot, xmin/xmax checking
UPDATE as DELETE+INSERT → maps to PostgreSQL’s HOT, bloat
Snapshot isolation → maps to transaction isolation levels

Resources for key challenges:

PostgreSQL MVCC Internals - Detailed xmin/xmax explanation
The Internals of PostgreSQL - MVCC - Comprehensive guide

Key Concepts:

MVCC: “Designing Data-Intensive Applications” Chapter 7 - Martin Kleppmann
Transaction Snapshots: PostgreSQL Docs - MVCC Intro
Tuple Visibility: PostgreSQL Source - heapam_visibility.c

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 1 completed, understanding of transactions

Real world outcome:

# Start two concurrent transactions
$ ./mvccdb shell

mvcc> BEGIN;
Transaction 100 started

mvcc> INSERT INTO users VALUES (1, 'Alice');
Inserted at TID (0, 1), xmin=100

mvcc> SELECT * FROM users;
(1, 'Alice')

# In another terminal, start transaction 101
mvcc2> BEGIN;
Transaction 101 started

mvcc2> SELECT * FROM users;
(empty)  # Can't see uncommitted row from tx 100!

# Back in first terminal
mvcc> COMMIT;
Transaction 100 committed

# Now second transaction can see it
mvcc2> SELECT * FROM users;
(1, 'Alice')

# Update creates new version
mvcc2> UPDATE users SET name = 'Alicia' WHERE id = 1;
Updated: old TID (0,1) xmax=101, new TID (0,2) xmin=101

# Show tuple versions
mvcc2> SELECT * FROM users WITH VERSIONS;
TID(0,1): (1, 'Alice')   xmin=100 xmax=101 [DEAD]
TID(0,2): (1, 'Alicia')  xmin=101 xmax=0   [LIVE]

Implementation Hints:

Tuple header with MVCC fields:

typedef struct HeapTupleHeader {
    uint32_t t_xmin;        // Inserting transaction ID
    uint32_t t_xmax;        // Deleting/updating transaction ID (0 if none)
    uint32_t t_cid;         // Command ID within transaction
    ItemPointerData t_ctid; // Current TID of this or newer version
    // ... data follows
} HeapTupleHeader;

Transaction snapshot:

typedef struct Snapshot {
    uint32_t xmin;          // All xids < xmin are visible
    uint32_t xmax;          // All xids >= xmax are invisible
    uint32_t *xip;          // Array of in-progress xids
    int xip_count;
} Snapshot;

Visibility check (simplified PostgreSQL logic):

bool tuple_is_visible(HeapTupleHeader *tuple, Snapshot *snapshot) {
    // Tuple was inserted by...
    if (tuple->t_xmin == my_transaction_id) {
        // ...me, and not yet deleted by me
        if (tuple->t_xmax == 0 || tuple->t_xmax != my_transaction_id)
            return true;
        return false;  // I deleted it
    }

    // Inserted by committed transaction visible in my snapshot?
    if (!xid_is_visible(tuple->t_xmin, snapshot))
        return false;

    // Not deleted, or deleted by transaction not visible to me?
    if (tuple->t_xmax == 0)
        return true;
    if (!xid_is_visible(tuple->t_xmax, snapshot))
        return true;  // Deletion not visible yet

    return false;  // Deleted by visible transaction
}

UPDATE flow:

Find current tuple, set its t_xmax = current_xid
Insert new tuple with t_xmin = current_xid
Update old tuple’s t_ctid to point to new tuple

Learning milestones:

Transactions see only committed data → You understand visibility
Concurrent transactions don’t interfere → You understand isolation
UPDATE creates new version → You understand why VACUUM is needed
Snapshot sees consistent state → You understand snapshot isolation

Project 3: Build a Write-Ahead Log (WAL)

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Database Internals / Crash Recovery
Software or Tool: WAL Implementation
Main Book: “Database Internals” by Alex Petrov

What you’ll build: A Write-Ahead Log that records all changes before they hit data files. Implement crash recovery by replaying the log.

Why it teaches PostgreSQL: WAL is the foundation of PostgreSQL’s durability and replication. Understanding LSN, checkpoints, and recovery explains how PostgreSQL survives crashes and how streaming replication works.

Core challenges you’ll face:

Log record format → maps to XLogRecord structure
Write ordering (WAL before data) → maps to WAL protocol
Checkpoints → maps to reducing recovery time
Crash recovery → maps to redo from checkpoint

Key Concepts:

WAL Internals: PostgreSQL Docs - WAL Internals
Crash Recovery: “Database Internals” Chapter 9 - Alex Petrov
ARIES Algorithm: The classic WAL algorithm (PostgreSQL is influenced by it)

Difficulty: Expert Time estimate: 3 weeks Prerequisites: Projects 1-2 completed

Real world outcome:

# Insert some data
$ ./waldb insert "Record 1"
LSN 0/00000001: INSERT Record 1
Inserted at TID (0, 1)

$ ./waldb insert "Record 2"
LSN 0/00000028: INSERT Record 2
Inserted at TID (0, 2)

# Show WAL contents
$ ./waldb walinfo
WAL Segment: 000000010000000000000001
  Record at 0/00000001: INSERT len=16 "Record 1"
  Record at 0/00000028: INSERT len=16 "Record 2"
  Record at 0/00000050: CHECKPOINT

# Simulate crash (kill without proper shutdown)
$ ./waldb insert "Record 3"
LSN 0/00000078: INSERT Record 3
$ kill -9 $(pidof waldb)

# Data file might be corrupted/incomplete, but WAL has the record
$ ./waldb recover
Recovering from checkpoint at LSN 0/00000050
Replaying: INSERT at 0/00000078
Recovery complete. 1 records replayed.

$ ./waldb select
TID(0,1): Record 1
TID(0,2): Record 2
TID(0,3): Record 3  # Recovered!

Implementation Hints:

WAL record structure:

typedef struct WALRecord {
    uint32_t xl_tot_len;    // Total length of record
    uint32_t xl_xid;        // Transaction ID
    uint64_t xl_prev;       // LSN of previous record
    uint8_t  xl_info;       // Flag bits
    uint8_t  xl_rmid;       // Resource manager ID (heap, btree, etc.)
    // Followed by record-specific data
} WALRecord;

typedef struct HeapInsertRecord {
    WALRecord header;
    ItemPointerData target_tid;
    uint16_t data_len;
    // Followed by tuple data
} HeapInsertRecord;

Write-ahead protocol:

void insert_tuple(Page *page, Tuple *tuple) {
    // 1. Generate WAL record FIRST
    WALRecord *record = create_insert_record(page, tuple);
    uint64_t lsn = wal_insert(record);

    // 2. Modify page in buffer
    page_insert_tuple(page, tuple);

    // 3. Mark page with LSN (for checkpoint)
    page->pd_lsn = lsn;

    // Note: Page might not be written to disk yet!
    // That's okay - WAL guarantees durability
}

Checkpoint:

void checkpoint() {
    // 1. Write checkpoint record to WAL
    uint64_t redo_lsn = current_wal_lsn();
    wal_insert(create_checkpoint_record(redo_lsn));

    // 2. Flush all dirty pages to disk
    for (each dirty_page in buffer_pool) {
        write_page_to_disk(dirty_page);
    }

    // 3. Update control file with checkpoint location
    update_control_file(redo_lsn);
}

Recovery:

void recover() {
    // 1. Find last checkpoint from control file
    uint64_t redo_lsn = read_control_file();

    // 2. Replay all WAL records from that point
    WALReader *reader = open_wal_at(redo_lsn);
    while (WALRecord *record = wal_read_next(reader)) {
        replay_record(record);  // Apply change to data page
    }
}

Learning milestones:

WAL records are written correctly → You understand the format
Data survives simulated crash → You understand the protocol
Recovery replays from checkpoint → You understand crash recovery
Checkpoints limit recovery time → You understand the tradeoff

Project 4: Implement a B-tree Index

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Data Structures / Indexing
Software or Tool: B-tree Implementation
Main Book: “Database Internals” by Alex Petrov

What you’ll build: A page-based B-tree index that stores keys and TIDs (pointers to heap tuples). Support insert, search, and range scans.

Why it teaches PostgreSQL: B-tree is PostgreSQL’s default index type. Understanding how keys map to TIDs, page splits, and leaf page linking explains index scans, index bloat, and REINDEX.

Core challenges you’ll face:

Page-based B-tree structure → maps to internal vs leaf pages
Page splits → maps to index growth, bloat
TID storage → maps to non-clustered index design
Range scans via leaf linking → maps to index-only scans

Key Concepts:

B-tree Implementation: PostgreSQL Docs - B-tree Implementation
B-tree Algorithms: “Database Internals” Chapters 2-4 - Alex Petrov
PostgreSQL Indexes: PostgresPro - Indexes in PostgreSQL

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 1 completed, B-tree theory

Real world outcome:

# Create index on existing heap data
$ ./btreedb createindex users_name_idx users name

# Insert data (goes to heap, index updated)
$ ./btreedb insert users '(1, "Alice")'
Heap: TID (0, 1)
Index: "Alice" -> (0, 1)

$ ./btreedb insert users '(2, "Bob")'
$ ./btreedb insert users '(3, "Charlie")'

# Exact match lookup
$ ./btreedb lookup users_name_idx "Bob"
Found: "Bob" at TID (0, 2)
  Fetching from heap: (2, "Bob")

# Range scan
$ ./btreedb range users_name_idx "A" "C"
  "Alice" -> (0, 1) -> (1, "Alice")
  "Bob"   -> (0, 2) -> (2, "Bob")

# Show index structure
$ ./btreedb indexinfo users_name_idx
B-tree Index: users_name_idx
  Height: 1
  Root Page: 0
  Leaf Pages: 1

  Page 0 (LEAF):
    [1] key="Alice" tid=(0,1)
    [2] key="Bob"   tid=(0,2)
    [3] key="Charlie" tid=(0,3)
    right_sibling=NULL

# Insert enough to cause page split
$ for i in {4..500}; do ./btreedb insert users "($i, \"User$i\")"; done

$ ./btreedb indexinfo users_name_idx
B-tree Index: users_name_idx
  Height: 2
  Root Page: 2 (INTERNAL)
  Leaf Pages: 3

  Page 2 (ROOT/INTERNAL):
    [1] key="Alice"    child_page=0
    [2] key="User250"  child_page=1

  Page 0 (LEAF): [Alice..User249] right_sibling=1
  Page 1 (LEAF): [User250..User99] right_sibling=NULL

Implementation Hints:

B-tree page types:

#define BT_LEAF     1
#define BT_INTERNAL 2
#define BT_ROOT     4

typedef struct BTPageOpaque {
    uint32_t btpo_prev;     // Left sibling (for range scans)
    uint32_t btpo_next;     // Right sibling
    uint16_t btpo_flags;    // LEAF, INTERNAL, ROOT
    uint16_t btpo_level;    // 0 for leaf, increases upward
} BTPageOpaque;

// Leaf tuple: key + TID
typedef struct BTLeafTuple {
    ItemPointerData t_tid;  // Points to heap tuple
    // Followed by key data
} BTLeafTuple;

// Internal tuple: key + child page
typedef struct BTInternalTuple {
    uint32_t child_page;    // Points to child page
    // Followed by key data (high key of child)
} BTInternalTuple;

Search algorithm:

ItemPointer btree_search(BTIndex *index, Datum key) {
    Page *page = read_page(index, index->root_page);

    // Descend to leaf
    while (!is_leaf(page)) {
        int child = find_child_for_key(page, key);
        page = read_page(index, child);
    }

    // Binary search within leaf
    int pos = binary_search(page, key);
    if (pos >= 0) {
        return get_tid_at(page, pos);
    }
    return NULL;  // Not found
}

Insert with split:

void btree_insert(BTIndex *index, Datum key, ItemPointer tid) {
    // Find leaf page
    Page *leaf = find_leaf_for_key(index, key);

    if (has_space(leaf, key)) {
        insert_into_page(leaf, key, tid);
    } else {
        // Split!
        Page *new_leaf = allocate_page();
        Datum split_key = split_page(leaf, new_leaf, key, tid);

        // Insert split_key into parent
        btree_insert_parent(index, leaf, new_leaf, split_key);
    }
}

Learning milestones:

Search finds correct tuple → You understand B-tree traversal
Page splits work correctly → You understand B-tree growth
Range scans follow leaf links → You understand linked leaves
Index stays balanced → You understand B-tree properties

Project 5: Build a VACUUM Process

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Database Internals / Maintenance
Software or Tool: VACUUM Implementation
Main Book: “PostgreSQL 14 Internals” by Egor Rogov

What you’ll build: A VACUUM process that scans tables, identifies dead tuples (invisible to all transactions), and reclaims their space for reuse.

Why it teaches PostgreSQL: VACUUM is essential because PostgreSQL’s MVCC leaves dead tuples behind. Understanding VACUUM explains bloat, autovacuum tuning, and the famous “transaction ID wraparound” problem.

Core challenges you’ll face:

Identifying dead tuples → maps to visibility rules, oldest active transaction
Reclaiming space → maps to marking space free vs VACUUM FULL
Updating visibility map → maps to optimization for future vacuums
Transaction ID freezing → maps to wraparound prevention

Key Concepts:

VACUUM Process: PostgreSQL Docs - VACUUM
Dead Tuples: Dead Tuples and VACUUM
Autovacuum: Tuning Autovacuum

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1-2 completed

Real world outcome:

# Create table and do updates (creating dead tuples)
$ ./vacuumdb shell
> INSERT INTO users VALUES (1, 'Alice');
TID (0, 1), xmin=100

> UPDATE users SET name = 'Alicia' WHERE id = 1;
Old: TID (0, 1), xmax=101 [DEAD]
New: TID (0, 2), xmin=101

> UPDATE users SET name = 'ALICIA' WHERE id = 1;
Old: TID (0, 2), xmax=102 [DEAD]
New: TID (0, 3), xmin=102

# Check table stats
> \dt+ users
Table: users
  Live tuples: 1
  Dead tuples: 2
  Table size: 8192 bytes (1 page)

# Run VACUUM
> VACUUM users;
Scanning page 0...
  Tuple (0, 1): DEAD (xmax 101 < oldest_active 103) - marking for reuse
  Tuple (0, 2): DEAD (xmax 102 < oldest_active 103) - marking for reuse
  Tuple (0, 3): LIVE
Vacuum complete: 2 dead tuples reclaimed

> \dt+ users
Table: users
  Live tuples: 1
  Dead tuples: 0
  Table size: 8192 bytes (1 page)
  Free space: 168 bytes reclaimed

# New inserts reuse space
> INSERT INTO users VALUES (2, 'Bob');
TID (0, 1)  # Reused!

Implementation Hints:

VACUUM algorithm:

void vacuum_table(Table *table) {
    // 1. Get oldest active transaction (horizon)
    uint32_t oldest_xid = get_oldest_active_xid();

    // 2. Scan all pages
    for (int pageno = 0; pageno < table->num_pages; pageno++) {
        Page *page = read_page(table, pageno);
        bool page_has_dead = false;

        // Check each tuple
        for (int i = 0; i < page_tuple_count(page); i++) {
            HeapTuple *tuple = get_tuple(page, i);

            if (tuple_is_dead(tuple, oldest_xid)) {
                // Mark as dead - space can be reused
                mark_dead(page, i);
                page_has_dead = true;
                stats.dead_tuples_removed++;
            }
        }

        // 3. Compact page if needed (optional, defrag)
        if (page_has_dead) {
            compact_page(page);
        }

        // 4. Update visibility map
        if (page_all_visible(page)) {
            set_visibility_map_bit(table, pageno);
        }
    }
}

bool tuple_is_dead(HeapTuple *tuple, uint32_t oldest_xid) {
    // Tuple is dead if:
    // - It was deleted (xmax != 0)
    // - The deleting transaction committed (xmax is committed)
    // - No active transaction can see it (xmax < oldest_xid)
    return tuple->t_xmax != 0 &&
           xid_is_committed(tuple->t_xmax) &&
           tuple->t_xmax < oldest_xid;
}

Visibility Map:

2 bits per page: “all-visible” and “all-frozen”
“All-visible” pages can be skipped by VACUUM (no dead tuples possible)
“All-frozen” pages can be skipped even for anti-wraparound vacuum

Learning milestones:

Dead tuples are identified correctly → You understand visibility
Space is reclaimed → You understand free space management
Visibility map is updated → You understand optimization
Transaction freezing works → You understand wraparound prevention

Project 6: Build a Query Parser

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C (with Flex/Bison) or Python
Alternative Programming Languages: Rust (nom), Go, Java (ANTLR)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Compilers / Language Processing
Software or Tool: SQL Parser
Main Book: “Language Implementation Patterns” by Terence Parr

What you’ll build: A SQL parser that converts SQL text into an Abstract Syntax Tree (AST). Support SELECT, INSERT, UPDATE, DELETE with WHERE clauses.

Why it teaches PostgreSQL: Every query goes through the parser first. Understanding how SQL becomes a parse tree explains error messages, syntax variations, and the first step of query processing.

Core challenges you’ll face:

Lexical analysis → maps to tokenizing SQL keywords, identifiers, literals
Grammar definition → maps to SQL syntax rules
AST construction → maps to parse tree structure
Error reporting → maps to syntax error messages

Key Concepts:

PostgreSQL Parser: PostgreSQL Source - gram.y
SQL Grammar: SQL standard, PostgreSQL extensions
Parser Generators: Flex/Bison, ANTLR

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Understanding of parsing theory, basic SQL knowledge

Real world outcome:

$ ./sqlparse "SELECT name, age FROM users WHERE age > 21"

Parse Tree:
SelectStmt
├── targetList
│   ├── ResTarget(name="name")
│   └── ResTarget(name="age")
├── fromClause
│   └── RangeVar(relname="users")
└── whereClause
    └── A_Expr(kind=AEXPR_OP)
        ├── name: ">"
        ├── lexpr: ColumnRef(name="age")
        └── rexpr: A_Const(val=21)

$ ./sqlparse "INSERT INTO users (name, age) VALUES ('Alice', 30)"

Parse Tree:
InsertStmt
├── relation: RangeVar(relname="users")
├── cols
│   ├── ResTarget(name="name")
│   └── ResTarget(name="age")
└── selectStmt
    └── SelectStmt(valuesLists)
        └── List
            ├── A_Const(val='Alice')
            └── A_Const(val=30)

$ ./sqlparse "SELEC * FROM users"
Error at line 1, column 1: syntax error, unexpected IDENT, expecting SELECT

Implementation Hints:

Token types:

typedef enum TokenType {
    T_SELECT, T_INSERT, T_UPDATE, T_DELETE,
    T_FROM, T_WHERE, T_INTO, T_VALUES, T_SET,
    T_AND, T_OR, T_NOT,
    T_EQ, T_NE, T_LT, T_GT, T_LE, T_GE,
    T_LPAREN, T_RPAREN, T_COMMA, T_SEMICOLON,
    T_IDENT, T_STRING, T_NUMBER,
    T_EOF
} TokenType;

AST node types:

typedef struct Node {
    NodeType type;
} Node;

typedef struct SelectStmt {
    NodeType type;  // T_SelectStmt
    List *targetList;    // What to select
    List *fromClause;    // Tables
    Node *whereClause;   // Conditions
    List *groupClause;   // GROUP BY
    List *orderClause;   // ORDER BY
    Node *limitCount;    // LIMIT
} SelectStmt;

typedef struct A_Expr {
    NodeType type;  // T_A_Expr
    A_Expr_Kind kind;    // AEXPR_OP, AEXPR_AND, AEXPR_OR
    char *name;          // Operator name
    Node *lexpr;         // Left operand
    Node *rexpr;         // Right operand
} A_Expr;

Using Bison (simplified):

select_stmt:
    SELECT target_list FROM from_clause where_clause
    {
        SelectStmt *n = makeNode(SelectStmt);
        n->targetList = $2;
        n->fromClause = $4;
        n->whereClause = $5;
        $$ = (Node *)n;
    }
    ;

where_clause:
    /* empty */ { $$ = NULL; }
    | WHERE a_expr { $$ = $2; }
    ;

Learning milestones:

Simple SELECT parses → You understand basic grammar
WHERE with operators works → You understand expression parsing
All CRUD statements parse → You have complete coverage
Error messages are helpful → You understand error recovery

Project 7: Build a Query Planner/Optimizer

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C or Python
Alternative Programming Languages: Rust, Go, Java
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Database Internals / Query Optimization
Software or Tool: Query Planner
Main Book: “Database Internals” by Alex Petrov

What you’ll build: A cost-based query planner that takes a parse tree, considers available indexes and table statistics, and produces an execution plan. Output EXPLAIN-style plans.

Why it teaches PostgreSQL: The planner decides whether to use an index scan or sequential scan, which join algorithm to use, and in what order to join tables. Understanding this is essential for query tuning.

Core challenges you’ll face:

Plan enumeration → maps to generating alternative plans
Cost estimation → maps to I/O cost, CPU cost
Statistics usage → maps to pg_stats, row estimates
Join ordering → maps to combinatorial explosion, GEQO

Key Concepts:

PostgreSQL Planner: PostgreSQL Docs - Planner/Optimizer
Cost Model: Bruce Momjian’s Optimizer PDF
Join Algorithms: Nested Loop, Hash Join, Merge Join

Difficulty: Expert Time estimate: 4 weeks Prerequisites: Project 6 completed, statistics basics

Real world outcome:

$ ./queryplan "SELECT * FROM users WHERE id = 42"

Table Statistics:
  users: 100,000 rows, 1000 pages
  users_pkey (btree on id): 300 pages, unique

Plan 1: Sequential Scan
  Scan all 1000 pages, filter for id=42
  Estimated cost: 1000.00 (I/O) + 100000.00 (CPU filter)
  Estimated rows: 1

Plan 2: Index Scan using users_pkey
  Descend B-tree (3 levels) + read 1 heap page
  Estimated cost: 4.00 (index I/O) + 1.00 (heap I/O)
  Estimated rows: 1

CHOSEN: Plan 2 (Index Scan) - Cost 5.00 vs 101000.00

EXPLAIN output:
Index Scan using users_pkey on users  (cost=0.42..5.00 rows=1)
  Index Cond: (id = 42)

$ ./queryplan "SELECT * FROM orders JOIN users ON orders.user_id = users.id"

Join Planning:
  orders: 1,000,000 rows
  users: 100,000 rows

Plan 1: Nested Loop (orders outer)
  For each of 1M orders, index lookup on users
  Cost: 1,000,000 * 5.00 = 5,000,000.00

Plan 2: Hash Join
  Build hash on users (smaller): 100,000 * 0.01 = 1,000.00
  Probe with orders: 1,000,000 * 0.001 = 1,000.00
  Cost: 2,000.00

Plan 3: Merge Join
  Sort orders: 1,000,000 * log(1M) * 0.01 = 200,000.00
  Sort users: 100,000 * log(100K) * 0.01 = 17,000.00
  Merge: 1,100,000 * 0.001 = 1,100.00
  Cost: 218,100.00

CHOSEN: Plan 2 (Hash Join) - Cost 2,000.00

EXPLAIN output:
Hash Join  (cost=2500.00..5000.00 rows=1000000)
  Hash Cond: (orders.user_id = users.id)
  ->  Seq Scan on orders  (cost=0.00..1500.00 rows=1000000)
  ->  Hash  (cost=1000.00..1000.00 rows=100000)
        ->  Seq Scan on users  (cost=0.00..1000.00 rows=100000)

Implementation Hints:

Cost model components:

typedef struct PlanCost {
    double startup_cost;    // Cost before first row
    double total_cost;      // Cost for all rows
    double rows;            // Estimated row count
    double width;           // Average row width
} PlanCost;

// Cost parameters (like PostgreSQL's)
#define SEQ_PAGE_COST       1.0
#define RANDOM_PAGE_COST    4.0
#define CPU_TUPLE_COST      0.01
#define CPU_INDEX_TUPLE_COST 0.005
#define CPU_OPERATOR_COST   0.0025

Sequential scan cost:

PlanCost cost_seqscan(Table *table, Qual *qual) {
    PlanCost cost;
    cost.startup_cost = 0;
    cost.total_cost = table->pages * SEQ_PAGE_COST +
                      table->rows * CPU_TUPLE_COST;
    cost.rows = estimate_rows(table, qual);
    return cost;
}

Index scan cost:

PlanCost cost_indexscan(Index *index, Table *table, Qual *qual) {
    PlanCost cost;
    double selectivity = estimate_selectivity(qual, index);
    double index_pages = index->pages * selectivity;
    double heap_pages = table->pages * selectivity;

    cost.startup_cost = 0;
    cost.total_cost = index_pages * RANDOM_PAGE_COST +  // Index I/O
                      heap_pages * RANDOM_PAGE_COST +    // Heap I/O
                      (table->rows * selectivity) * CPU_INDEX_TUPLE_COST;
    cost.rows = table->rows * selectivity;
    return cost;
}

Join planning:

Plan *plan_join(Table *outer, Table *inner, JoinType type) {
    Plan *best = NULL;
    double best_cost = INFINITY;

    // Try nested loop
    Plan *nl = create_nested_loop(outer, inner);
    if (nl->cost.total_cost < best_cost) {
        best = nl;
        best_cost = nl->cost.total_cost;
    }

    // Try hash join (if equijoin)
    if (has_equality_condition(type)) {
        Plan *hj = create_hash_join(outer, inner);
        if (hj->cost.total_cost < best_cost) {
            best = hj;
            best_cost = hj->cost.total_cost;
        }
    }

    // Try merge join (if sortable)
    Plan *mj = create_merge_join(outer, inner);
    if (mj->cost.total_cost < best_cost) {
        best = mj;
        best_cost = mj->cost.total_cost;
    }

    return best;
}

Learning milestones:

Chooses index scan over seq scan appropriately → You understand cost modeling
Join algorithm selection works → You understand join costs
Estimates match reality (roughly) → You understand statistics
EXPLAIN output matches PostgreSQL format → Full understanding

Project 8: Build a Query Executor

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C or Python
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Database Internals / Execution
Software or Tool: Query Executor
Main Book: “Database Internals” by Alex Petrov

What you’ll build: An executor that takes a query plan and produces results. Implement the iterator model (Volcano) with operators like SeqScan, IndexScan, Filter, Join, Sort, Aggregate.

Why it teaches PostgreSQL: The executor is where queries actually run. Understanding the iterator model explains EXPLAIN ANALYZE timing, memory usage, and why some queries are slow.

Core challenges you’ll face:

Iterator model → maps to Open/Next/Close per operator
Operator implementation → maps to scan, join, sort nodes
Memory management → maps to work_mem, spilling to disk
Pipeline vs blocking → maps to why Sort blocks, Filter doesn’t

Key Concepts:

Volcano Iterator Model: Classic paper by Goetz Graefe
PostgreSQL Executor: PostgreSQL Source - execMain.c
Operator Types: Scan, Join, Sort, Aggregate, Limit

Difficulty: Advanced Time estimate: 3 weeks Prerequisites: Projects 1-7 helpful

Real world outcome:

$ ./executor run "SELECT name FROM users WHERE age > 21 ORDER BY name LIMIT 10"

Execution Plan:
Limit (limit=10)
└── Sort (key=name)
    └── Filter (age > 21)
        └── SeqScan (users)

Execution Trace:
[Limit] Open()
  [Sort] Open()
    [Filter] Open()
      [SeqScan] Open() - table users
    [Filter] Next() -> ('Alice', 25) ✓ passes filter
    [Filter] Next() -> ('Bob', 19) ✗ filtered out
    [Filter] Next() -> ('Charlie', 30) ✓ passes filter
    ... (reading all rows into Sort)
  [Sort] sorting 4,521 rows by name
  [Sort] Next() -> ('Aaron', 22)
[Limit] Next() -> ('Aaron', 22)
Output: Aaron
[Limit] Next() -> ('Abby', 25)
Output: Abby
... (8 more)
[Limit] count=10, stopping

Results:
Aaron
Abby
Adam
...
(10 rows)

Execution Stats:
  SeqScan: 10000 rows scanned, 50ms
  Filter: 4521 rows passed (45.2%)
  Sort: 4521 rows sorted, 120ms, 2MB memory
  Limit: 10 rows returned
  Total: 175ms

Implementation Hints:

Iterator interface:

typedef struct PlanState {
    NodeType type;
    Plan *plan;
    struct PlanState *lefttree;
    struct PlanState *righttree;

    // Methods
    void (*Open)(struct PlanState *);
    TupleTableSlot *(*Next)(struct PlanState *);
    void (*Close)(struct PlanState *);
} PlanState;

SeqScan implementation:

typedef struct SeqScanState {
    PlanState ps;
    HeapScanDesc scan;
    int current_page;
    int current_tuple;
} SeqScanState;

void SeqScan_Open(PlanState *ps) {
    SeqScanState *ss = (SeqScanState *)ps;
    ss->scan = heap_beginscan(ss->ps.plan->table);
    ss->current_page = 0;
    ss->current_tuple = 0;
}

TupleTableSlot *SeqScan_Next(PlanState *ps) {
    SeqScanState *ss = (SeqScanState *)ps;
    HeapTuple tuple = heap_getnext(ss->scan);
    if (tuple == NULL)
        return NULL;
    return MakeTupleTableSlot(tuple);
}

Filter implementation:

TupleTableSlot *Filter_Next(PlanState *ps) {
    FilterState *fs = (FilterState *)ps;

    while (true) {
        TupleTableSlot *slot = fs->ps.lefttree->Next(fs->ps.lefttree);
        if (slot == NULL)
            return NULL;

        if (ExecQual(fs->qual, slot))
            return slot;  // Passes filter
        // else continue to next tuple
    }
}

Sort implementation (blocking):

void Sort_Open(PlanState *ps) {
    SortState *ss = (SortState *)ps;
    ss->ps.lefttree->Open(ss->ps.lefttree);

    // Must read ALL input before producing output
    ss->tuples = NIL;
    TupleTableSlot *slot;
    while ((slot = ss->ps.lefttree->Next(ss->ps.lefttree)) != NULL) {
        ss->tuples = lappend(ss->tuples, copy_slot(slot));
    }

    // Sort in memory (or spill to disk if too large)
    qsort(ss->tuples, ...);
    ss->current = 0;
}

TupleTableSlot *Sort_Next(PlanState *ps) {
    SortState *ss = (SortState *)ps;
    if (ss->current >= list_length(ss->tuples))
        return NULL;
    return list_nth(ss->tuples, ss->current++);
}

Learning milestones:

SeqScan returns all rows → You understand basic iteration
Filter works correctly → You understand predicate evaluation
Join produces correct results → You understand join algorithms
Sort handles large data → You understand blocking operators

Project 9: Implement Connection Pooling

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: Go
Alternative Programming Languages: Rust, C, Python
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Networking / Database Administration
Software or Tool: Connection Pooler (like PgBouncer)
Main Book: “High Performance PostgreSQL” by Gregory Smith

What you’ll build: A connection pooler that sits between clients and PostgreSQL, reusing connections to reduce overhead. Support session, transaction, and statement pooling modes.

Why it teaches PostgreSQL: PostgreSQL’s process-per-connection model is expensive. Understanding connection pooling explains why PgBouncer exists, connection limits, and performance under high concurrency.

Core challenges you’ll face:

PostgreSQL wire protocol → maps to message parsing
Connection state management → maps to session vs transaction pooling
Authentication forwarding → maps to md5, scram-sha-256
Transaction boundaries → maps to detecting BEGIN/COMMIT

Key Concepts:

PostgreSQL Protocol: PostgreSQL Docs - Frontend/Backend Protocol
PgBouncer Internals: PgBouncer Source
Connection Pooling Modes: Session, Transaction, Statement

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Network programming, understanding of SQL transactions

Real world outcome:

# Start pooler
$ ./pgpool --listen=5433 --backend=localhost:5432 --pool-size=10 --mode=transaction

Pooler started on :5433
Backend: localhost:5432
Pool size: 10 connections
Mode: transaction

# Connect via pooler
$ psql -p 5433 -U myuser mydb
psql (15.0)
Type "help" for help.

mydb=> SELECT 1;
 ?column?
----------
        1
(1 row)

# In another terminal, show pool status
$ ./pgpool status
Active connections: 1
Idle connections: 9
Waiting clients: 0

Client 1 (192.168.1.10:54321):
  State: idle
  Backend: conn#3
  Queries: 5
  Transaction: none

# Simulate load
$ pgbench -p 5433 -c 100 -j 10 -T 60 mydb

Pool Status During Load:
  Active connections: 10 (max)
  Waiting clients: 90
  Queries/sec: 15,234
  Avg wait time: 0.3ms

Implementation Hints:

PostgreSQL message format:

type Message struct {
    Type   byte    // 'Q' for Query, 'X' for Terminate, etc.
    Length int32   // Length including self
    Data   []byte
}

// Read a message from connection
func ReadMessage(conn net.Conn) (*Message, error) {
    header := make([]byte, 5)
    conn.Read(header)

    msg := &Message{
        Type:   header[0],
        Length: int32(binary.BigEndian.Uint32(header[1:5])),
    }
    msg.Data = make([]byte, msg.Length-4)
    conn.Read(msg.Data)
    return msg, nil
}

Connection pool:

type Pool struct {
    backend    string
    size       int
    mode       PoolMode  // SESSION, TRANSACTION, STATEMENT
    idle       chan *BackendConn
    mu         sync.Mutex
}

func (p *Pool) Acquire() *BackendConn {
    select {
    case conn := <-p.idle:
        return conn
    default:
        // Wait or create new if under limit
        return <-p.idle
    }
}

func (p *Pool) Release(conn *BackendConn) {
    if p.mode == TRANSACTION && conn.inTransaction {
        // Can't reuse - connection has uncommitted transaction
        conn.Query("ROLLBACK")
    }
    // Reset session state if needed
    conn.Query("DISCARD ALL")
    p.idle <- conn
}

Transaction detection:

func (c *ClientHandler) HandleQuery(query string) {
    upperQuery := strings.ToUpper(strings.TrimSpace(query))

    if strings.HasPrefix(upperQuery, "BEGIN") {
        c.inTransaction = true
    } else if strings.HasPrefix(upperQuery, "COMMIT") ||
              strings.HasPrefix(upperQuery, "ROLLBACK") {
        c.inTransaction = false
        if c.pool.mode == TRANSACTION {
            c.releaseBackend()  // Return to pool
        }
    }

    // Forward to backend
    c.backend.Send(query)
    response := c.backend.ReadResponse()
    c.client.Send(response)
}

Learning milestones:

Queries route through pooler → You understand the protocol
Connection reuse works → You understand pooling benefit
Transaction mode respects boundaries → You understand state management
High concurrency is handled → You understand the scalability gain

Project 10: Implement Streaming Replication

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: Go or C
Alternative Programming Languages: Rust, Python
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Database Internals / Replication
Software or Tool: Streaming Replication
Main Book: “PostgreSQL 14 Internals” by Egor Rogov

What you’ll build: A streaming replication system where a primary sends WAL records to a standby in real-time. The standby applies them to stay synchronized.

Why it teaches PostgreSQL: Streaming replication is how PostgreSQL achieves high availability. Understanding WAL shipping, LSN, and replay explains hot standby, synchronous commits, and failover.

Core challenges you’ll face:

Replication protocol → maps to START_REPLICATION command
WAL streaming → maps to continuous WAL shipping
LSN tracking → maps to knowing standby position
Synchronous vs async → maps to durability vs performance

Key Concepts:

Streaming Replication Protocol: PostgreSQL Docs - Protocol Replication
WAL Shipping: PostgreSQL Docs - Log Shipping
Logical vs Physical: Comparison of Solutions

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 3 (WAL) completed

Real world outcome:

# Start primary
$ ./repldb primary --port=5432 --wal-dir=/data/wal

# Start standby (connects to primary)
$ ./repldb standby --port=5433 --primary=localhost:5432

Standby connecting to primary...
Replication started from LSN 0/1000000
Streaming WAL...

# On primary: insert data
$ ./repldb query --port=5432 "INSERT INTO users VALUES (1, 'Alice')"
Inserted. WAL LSN: 0/1000100

# Standby receives it immediately
Standby log:
  Received WAL record at 0/1000100: INSERT
  Applied to local storage
  Replay LSN: 0/1000100

# Query standby
$ ./repldb query --port=5433 "SELECT * FROM users"
(1, 'Alice')

# Show replication status
$ ./repldb repl-status --port=5432
Primary LSN: 0/1000200
Standby 1:
  Address: localhost:5433
  State: streaming
  Sent LSN: 0/1000200
  Write LSN: 0/1000200
  Flush LSN: 0/1000180
  Replay LSN: 0/1000180
  Lag: 32 bytes

# Failover test: kill primary
$ kill $(pidof repldb-primary)

$ ./repldb promote --port=5433
Standby promoted to primary!
Now accepting writes on port 5433.

Implementation Hints:

Replication protocol flow:

Primary                          Standby
   |                                |
   |<------ IDENTIFY_SYSTEM --------|
   |------- system_id, timeline --->|
   |                                |
   |<-- START_REPLICATION lsn ------|
   |                                |
   |-------- WAL data ------------->|
   |-------- WAL data ------------->|
   |<------ Standby status ---------|
   |-------- WAL data ------------->|
   ...

WAL sender (primary):

func (s *WALSender) Stream(startLSN uint64) {
    reader := OpenWAL(startLSN)

    for {
        // Read next WAL record
        record, lsn := reader.Next()

        // Send to standby
        msg := &WALDataMessage{
            StartLSN: lsn,
            EndLSN:   lsn + uint64(len(record)),
            Data:     record,
        }
        s.conn.Send(msg)

        // Check for standby feedback
        if feedback := s.conn.TryRecv(); feedback != nil {
            s.updateStandbyStatus(feedback)
        }

        // If caught up, wait for new WAL
        if reader.AtEnd() {
            s.waitForNewWAL()
        }
    }
}

WAL receiver (standby):

func (r *WALReceiver) Receive() {
    // Connect to primary
    r.conn = Connect(r.primaryAddr)
    r.conn.Send(&StartReplication{LSN: r.lastAppliedLSN})

    for {
        msg := r.conn.Recv()

        switch m := msg.(type) {
        case *WALDataMessage:
            // Write to local WAL file
            r.walWriter.Write(m.Data)
            r.writeLSN = m.EndLSN

            // Apply to local database
            r.applyWALRecord(m.Data)
            r.replayLSN = m.EndLSN

            // Send feedback periodically
            if time.Since(r.lastFeedback) > 10*time.Second {
                r.sendFeedback()
            }
        }
    }
}

Learning milestones:

WAL streams continuously → You understand the protocol
Standby stays synchronized → You understand replay
Failover works → You understand promotion
Lag is tracked → You understand monitoring

Project 11: Build a Full-Text Search Engine

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: Python or C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Information Retrieval / Indexing
Software or Tool: Full-Text Search (like PostgreSQL tsvector/tsquery)
Main Book: “Introduction to Information Retrieval” by Manning, Raghavan, Schütze

What you’ll build: A full-text search system with tokenization, stemming, stop words, and inverted index (GIN-style). Support queries with AND, OR, NOT, and phrase matching.

Why it teaches PostgreSQL: PostgreSQL’s tsvector/tsquery and GIN indexes are powerful but often misunderstood. Building your own explains when to use FTS vs LIKE, and how GIN indexes work.

Core challenges you’ll face:

Text normalization → maps to tsvector creation
Inverted index structure → maps to GIN index
Query parsing → maps to tsquery operators
Ranking → maps to ts_rank, relevance

Key Concepts:

PostgreSQL Full Text Search: PostgreSQL Docs - Full Text Search
GIN Index: PostgreSQL Docs - GIN Indexes
Inverted Index: “Introduction to Information Retrieval” Chapter 1

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Basic understanding of text processing

Real world outcome:

# Index some documents
$ ./ftsearch index doc1 "PostgreSQL is an advanced open source database"
Document 1 indexed. Tokens: [postgresql, advanc, open, sourc, databas]

$ ./ftsearch index doc2 "MySQL is also an open source database system"
Document 2 indexed. Tokens: [mysql, also, open, sourc, databas, system]

$ ./ftsearch index doc3 "Oracle is a commercial database"
Document 3 indexed. Tokens: [oracl, commerci, databas]

# Simple search
$ ./ftsearch query "open source"
Results (ranked by relevance):
  1. doc1 (score: 0.85) - "PostgreSQL is an advanced open source database"
  2. doc2 (score: 0.80) - "MySQL is also an open source database system"

# Boolean queries
$ ./ftsearch query "database & !open"
Results:
  1. doc3 (score: 0.70) - "Oracle is a commercial database"

$ ./ftsearch query "postgresql | mysql"
Results:
  1. doc1 (score: 0.90) - "PostgreSQL is an advanced..."
  2. doc2 (score: 0.90) - "MySQL is also an open..."

# Show inverted index
$ ./ftsearch indexinfo
Inverted Index:
  "databas" -> [doc1, doc2, doc3]
  "open"    -> [doc1, doc2]
  "sourc"   -> [doc1, doc2]
  "postgresql" -> [doc1]
  "mysql"   -> [doc2]
  ...

Implementation Hints:

Text processing pipeline:

def normalize(text):
    # 1. Lowercase
    text = text.lower()

    # 2. Tokenize (split on non-alphanumeric)
    tokens = re.findall(r'\w+', text)

    # 3. Remove stop words
    tokens = [t for t in tokens if t not in STOP_WORDS]

    # 4. Stem (Porter stemmer)
    tokens = [stem(t) for t in tokens]

    return tokens

# "PostgreSQL is an advanced open source database"
# -> ['postgresql', 'advanc', 'open', 'sourc', 'databas']

Inverted index (GIN-style):

class InvertedIndex:
    def __init__(self):
        self.index = {}  # token -> [(doc_id, positions), ...]

    def add_document(self, doc_id, text):
        tokens = normalize(text)
        for pos, token in enumerate(tokens):
            if token not in self.index:
                self.index[token] = []
            self.index[token].append((doc_id, pos))

    def search(self, token):
        return self.index.get(stem(token.lower()), [])

    def search_and(self, tokens):
        results = None
        for token in tokens:
            docs = set(doc_id for doc_id, pos in self.search(token))
            if results is None:
                results = docs
            else:
                results &= docs
        return results

Ranking (TF-IDF style):

def rank(doc_id, query_tokens, index, total_docs):
    score = 0
    for token in query_tokens:
        postings = index.search(token)
        if not postings:
            continue

        # TF: how often token appears in doc
        tf = sum(1 for d, p in postings if d == doc_id)

        # IDF: how rare is token across all docs
        df = len(set(d for d, p in postings))
        idf = math.log(total_docs / df)

        score += tf * idf
    return score

Learning milestones:

Tokenization works → You understand text normalization
Searches return correct documents → You understand inverted index
Boolean queries work → You understand set operations
Ranking makes sense → You understand relevance scoring

Project 12: Build a JSON Document Store

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: Python or Go
Alternative Programming Languages: Rust, C
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Document Databases / JSON
Software or Tool: JSONB Implementation
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A document store with JSONB-style binary storage, path queries ($.address.city), and GIN indexing for fast lookups.

Why it teaches PostgreSQL: PostgreSQL’s JSONB is powerful but different from MongoDB. Understanding binary storage, path operators, and GIN indexing explains when to use JSONB vs normalized tables.

Core challenges you’ll face:

Binary JSON format → maps to JSONB vs JSON
Path queries → maps to jsonb_path_query, @>
GIN indexing on JSONB → maps to jsonb_path_ops, jsonb_ops
Efficient updates → maps to jsonb_set, why updates can be costly

Key Concepts:

PostgreSQL JSONB: PostgreSQL Docs - JSON Types
JSON Path: PostgreSQL Docs - JSON Path
JSONB Indexing: PostgreSQL Docs - JSONB Indexing

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Understanding of JSON, Project 4 (B-tree) helpful

Real world outcome:

# Insert documents
$ ./jsondb insert users '{"name": "Alice", "age": 30, "address": {"city": "NYC"}}'
Inserted doc_id=1

$ ./jsondb insert users '{"name": "Bob", "age": 25, "address": {"city": "LA"}}'
Inserted doc_id=2

# Path query
$ ./jsondb query users '$.address.city'
doc_id=1: "NYC"
doc_id=2: "LA"

# Filter query
$ ./jsondb find users 'age > 25'
{"id": 1, "name": "Alice", "age": 30, "address": {"city": "NYC"}}

# Containment query (like @>)
$ ./jsondb find users --contains '{"address": {"city": "NYC"}}'
{"id": 1, "name": "Alice", ...}

# Create GIN index
$ ./jsondb createindex users_data_idx users --gin

# Now containment queries use index
$ ./jsondb explain find users --contains '{"name": "Alice"}'
Index Scan using users_data_idx
  Index Cond: (data @> '{"name": "Alice"}'::jsonb)

# Update nested path
$ ./jsondb update users 1 --set '$.address.zip' '"10001"'
Updated. New value:
{"name": "Alice", "age": 30, "address": {"city": "NYC", "zip": "10001"}}

Implementation Hints:

JSONB binary format (simplified):

class JsonbValue:
    OBJECT = 1
    ARRAY = 2
    STRING = 3
    NUMBER = 4
    BOOL = 5
    NULL = 6

def to_jsonb(value):
    """Convert Python value to binary JSONB format."""
    if isinstance(value, dict):
        # Object: type + count + (key_offset, value_offset)* + keys + values
        entries = []
        for k, v in sorted(value.items()):  # Keys sorted!
            entries.append((k, to_jsonb(v)))
        return encode_object(entries)
    elif isinstance(value, list):
        # Array: type + count + (value_offset)* + values
        return encode_array([to_jsonb(v) for v in value])
    elif isinstance(value, str):
        return bytes([STRING]) + encode_string(value)
    # ... etc

GIN indexing for JSONB:

class JsonbGinIndex:
    def __init__(self, ops='jsonb_path_ops'):
        self.ops = ops
        self.index = {}  # key -> set of doc_ids

    def index_document(self, doc_id, jsonb):
        if self.ops == 'jsonb_path_ops':
            # Index paths: hash of path to value
            for path, value in extract_paths(jsonb):
                key = hash(path + str(value))
                self.index.setdefault(key, set()).add(doc_id)
        else:  # jsonb_ops
            # Index all keys and values separately
            for key in extract_all_keys(jsonb):
                self.index.setdefault(('key', key), set()).add(doc_id)
            for value in extract_all_values(jsonb):
                self.index.setdefault(('val', value), set()).add(doc_id)

    def search_contains(self, pattern):
        # @> operator: find docs containing pattern
        required_keys = set()
        for path, value in extract_paths(pattern):
            key = hash(path + str(value))
            required_keys.add(key)

        result = None
        for key in required_keys:
            docs = self.index.get(key, set())
            if result is None:
                result = docs
            else:
                result &= docs
        return result or set()

Path query evaluation:

def evaluate_path(doc, path):
    """Evaluate JSONPath like '$.address.city' on document."""
    parts = parse_path(path)  # ['$', 'address', 'city']
    current = doc

    for part in parts[1:]:  # Skip '$'
        if part.isdigit():
            current = current[int(part)]  # Array index
        else:
            current = current.get(part)  # Object key

        if current is None:
            return None
    return current

Learning milestones:

Documents store and retrieve → You understand JSONB storage
Path queries work → You understand navigation
GIN index speeds up containment → You understand JSONB indexing
Updates work correctly → You understand immutability tradeoffs

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
1. Page-Based Storage	Advanced	2 weeks	⭐⭐⭐	⭐⭐⭐
2. MVCC Implementation	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
3. Write-Ahead Log	Expert	3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
4. B-tree Index	Expert	3-4 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐
5. VACUUM Process	Advanced	2 weeks	⭐⭐⭐⭐	⭐⭐⭐
6. Query Parser	Advanced	2 weeks	⭐⭐⭐	⭐⭐⭐
7. Query Planner	Expert	4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
8. Query Executor	Advanced	3 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐
9. Connection Pooling	Advanced	2-3 weeks	⭐⭐⭐	⭐⭐⭐
10. Streaming Replication	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
11. Full-Text Search	Advanced	2-3 weeks	⭐⭐⭐	⭐⭐⭐⭐
12. JSON Document Store	Advanced	2 weeks	⭐⭐⭐	⭐⭐⭐⭐

Recommended Learning Path

Core Track (The Heart of PostgreSQL)

Project 1 (Pages) → Project 2 (MVCC) → Project 3 (WAL) → Project 5 (VACUUM)

This path teaches you the storage engine fundamentals that make PostgreSQL unique.

Query Processing Track

Project 6 (Parser) → Project 7 (Planner) → Project 8 (Executor)

This path teaches you how SQL becomes results.

Advanced Features Track

Project 4 (B-tree) → Project 11 (FTS) → Project 12 (JSONB)

This path teaches you PostgreSQL’s powerful indexing and data types.

Production Track

Project 9 (Pooling) → Project 10 (Replication)

This path teaches you how to run PostgreSQL at scale.

Final Capstone: Build MiniPostgres

File: LEARN_POSTGRESQL_DEEP_DIVE.md
Main Programming Language: C or Rust
Alternative Programming Languages: Go
Coolness Level: Level 5: Pure Magic
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 5: Master
Knowledge Area: Database Internals / Full Stack
Software or Tool: Complete RDBMS
Main Book: All previous books combined

What you’ll build: Combine all projects into a working mini-PostgreSQL that accepts SQL connections, parses queries, uses MVCC for concurrency, stores data in pages, uses WAL for durability, and supports B-tree indexes.

Why it teaches mastery: This is the ultimate integration. You’ll understand how every piece fits together, why PostgreSQL makes its design choices, and what tradeoffs exist.

Real world outcome:

$ ./minipostgres start

MiniPostgres Server v0.1
Listening on port 5432
Storage: page-based, 8KB pages
Concurrency: MVCC (xmin/xmax)
Durability: WAL with checkpoints
Indexes: B-tree

$ psql -h localhost -p 5432

minipostgres=> CREATE TABLE users (id INT, name TEXT);
CREATE TABLE

minipostgres=> INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob');
INSERT 0 2

minipostgres=> SELECT * FROM users WHERE id = 1;
 id | name
----+-------
  1 | Alice
(1 row)

minipostgres=> CREATE INDEX users_id_idx ON users (id);
CREATE INDEX

minipostgres=> EXPLAIN SELECT * FROM users WHERE id = 1;
 Index Scan using users_id_idx on users (cost=0.00..1.00 rows=1)
   Index Cond: (id = 1)

minipostgres=> BEGIN;
BEGIN
minipostgres=> UPDATE users SET name = 'Alicia' WHERE id = 1;
UPDATE 1
-- In another session, still sees 'Alice' (MVCC!)
minipostgres=> COMMIT;
COMMIT

minipostgres=> VACUUM users;
VACUUM (removed 1 dead tuple)

Summary

#	Project	Main Language
1	Build a Page-Based Storage Engine	C
2	Implement MVCC	C
3	Build a Write-Ahead Log (WAL)	C
4	Implement a B-tree Index	C
5	Build a VACUUM Process	C
6	Build a Query Parser	C (Flex/Bison)
7	Build a Query Planner/Optimizer	C
8	Build a Query Executor	C
9	Implement Connection Pooling	Go
10	Implement Streaming Replication	Go
11	Build a Full-Text Search Engine	Python
12	Build a JSON Document Store	Python
Capstone	Build MiniPostgres	C/Rust

Key Resources

Books

“Database Internals” by Alex Petrov - Best book on storage engine internals
“Designing Data-Intensive Applications” by Martin Kleppmann - Context for all database decisions
“PostgreSQL 14 Internals” by Egor Rogov - Free, PostgreSQL-specific deep dive
“The Internals of PostgreSQL” by Hironobu Suzuki - Free online

Official Documentation

PostgreSQL Docs - Comprehensive and excellent
PostgreSQL Source Code - The ultimate reference

Articles & Blogs

PostgresPro Blog - Excellent internals series
Bruce Momjian’s Writings - From a core contributor
How PostgreSQL Works - Architecture overview

Talks

CMU Database Group - Academic database courses
PGCon Videos - PostgreSQL conference talks

Comparison Resources

MySQL vs PostgreSQL Storage - Architecture comparison
PostgreSQL vs Oracle - Enterprise comparison

“To truly understand PostgreSQL, you must build PostgreSQL. Only then will you know why every line of its code exists.”