Learn Nutanix & Hyperconverged Infrastructure (HCI) from Scratch

Goal: Deeply understand how Nutanix and Hyperconverged Infrastructure work by building the core components yourself in C—from block devices to distributed storage, virtualization, and cluster management.

What is Nutanix/HCI?

Nutanix is a Hyperconverged Infrastructure (HCI) platform that combines compute (virtualization), storage, and networking into a unified software-defined system. Instead of separate SAN/NAS arrays and compute servers, everything runs on commodity x86 nodes with local storage, managed by intelligent software.

The Nutanix Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         NUTANIX CLUSTER                              │
├─────────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐                │
│  │   NODE 1    │   │   NODE 2    │   │   NODE 3    │   ...         │
│  │ ┌─────────┐ │   │ ┌─────────┐ │   │ ┌─────────┐ │                │
│  │ │  VMs    │ │   │ │  VMs    │ │   │ │  VMs    │ │                │
│  │ └────┬────┘ │   │ └────┬────┘ │   │ └────┬────┘ │                │
│  │      │      │   │      │      │   │      │      │                │
│  │ ┌────▼────┐ │   │ ┌────▼────┐ │   │ ┌────▼────┐ │                │
│  │ │   CVM   │◄├───┼─┤   CVM   │◄├───┼─┤   CVM   │ │  ◄── Controller│
│  │ │(Storage │ │   │ │(Storage │ │   │ │(Storage │ │      VMs       │
│  │ │Controller│ │   │ │Controller│ │   │ │Controller│ │                │
│  │ └────┬────┘ │   │ └────┬────┘ │   │ └────┬────┘ │                │
│  │      │      │   │      │      │   │      │      │                │
│  │ ┌────▼────┐ │   │ ┌────▼────┐ │   │ ┌────▼────┐ │                │
│  │ │SSD│HDD  │ │   │ │SSD│HDD  │ │   │ │SSD│HDD  │ │  ◄── Local     │
│  │ └─────────┘ │   │ └─────────┘ │   │ └─────────┘ │      Storage   │
│  └─────────────┘   └─────────────┘   └─────────────┘                │
├─────────────────────────────────────────────────────────────────────┤
│              DISTRIBUTED STORAGE FABRIC (DSF/NDFS)                   │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │ • Data Locality: VM data stored on same node when possible   │   │
│  │ • Replication: RF2 (2 copies) or RF3 (3 copies)              │   │
│  │ • Metadata: Distributed via Cassandra-like store              │   │
│  │ • OpLog: Write journal on SSD for burst absorption           │   │
│  │ • Extent Store: Actual data blocks (1MB/4MB chunks)          │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Key Components You’ll Learn to Build

Component	What It Does	Nutanix Equivalent
Block Device	Raw storage interface	Local SSD/HDD
File System	Organize data with metadata	ext4 (Extent Store)
iSCSI Target	Block storage over network	DSF iSCSI interface
NFS Server	File storage over network	DSF NFS interface
Consensus Algorithm	Cluster agreement	Paxos/Raft for metadata
Distributed KV Store	Metadata storage	Cassandra-based metadata
Hypervisor	Run virtual machines	AHV (KVM-based)
Write-Ahead Log	Durability & performance	OpLog
Replication Engine	Data redundancy	RF2/RF3 replication
Cluster Manager	Orchestrate everything	Prism/CVM services

Core Concept Analysis

1. Storage Layer Architecture

Nutanix uses a tiered storage architecture:

User VM I/O
     │
     ▼
┌─────────────────────────────────────────────────────────────┐
│                      CONTROLLER VM (CVM)                     │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│  │   UNIFIED   │    │    OPLOG    │    │   EXTENT    │     │
│  │   CACHE     │    │   (Write    │    │   STORE     │     │
│  │  (Read $)   │    │   Journal)  │    │  (Blocks)   │     │
│  └──────┬──────┘    └──────┬──────┘    └──────┬──────┘     │
│         │                  │                  │             │
│         ▼                  ▼                  ▼             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                 STORAGE TIERING                       │   │
│  │    SSD (Hot) ◄────────────────────► HDD (Cold)        │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

2. Distributed Storage Fabric (DSF)

The DSF provides:

Data Locality: Keep data close to compute
Automatic Tiering: Hot data on SSD, cold on HDD
Replication: RF2 (2 copies) or RF3 (3 copies)
Self-Healing: Automatic rebuild on failures

3. The CVM (Controller VM)

Each node runs a Controller VM that:

Handles all storage I/O for VMs on that node
Participates in cluster-wide metadata management
Replicates data to other CVMs
Provides iSCSI/NFS/SMB endpoints

4. Consensus & Metadata

Metadata is stored in a Cassandra-like distributed database
Cluster decisions use Paxos/Raft consensus
Leader election for various services

5. Hypervisor Layer (AHV)

Nutanix’s Acropolis Hypervisor (AHV) is based on:

Linux kernel with KVM
QEMU for device emulation
libvirt for management

Project List

Projects are ordered from fundamental building blocks to complete systems. Each builds on previous concepts.

Project 1: Block Device Emulator

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Storage / Block Devices / Linux Kernel
Software or Tool: Linux NBD (Network Block Device)
Main Book: “Linux Device Drivers, Second Edition” by Jonathan Corbet and Alessandro Rubini

What you’ll build: A user-space block device that exposes a file as a virtual disk, accessible via /dev/nbdX, capable of being formatted and mounted like a real disk.

Why it teaches Nutanix storage: This is the foundation of all storage in Nutanix. The Extent Store ultimately stores data as blocks. Understanding how blocks work—reads, writes, sectors, and the interface between userspace and kernel—is essential before building anything more complex.

Core challenges you’ll face:

Understanding block vs. character devices → maps to how storage is fundamentally addressed
Implementing the NBD protocol → maps to network storage protocols (iSCSI)
Handling concurrent I/O requests → maps to CVM I/O path handling
Ensuring data durability (fsync) → maps to OpLog commit semantics

Key Concepts:

Block Device Interface: “Linux Device Drivers” Ch. 16 - Corbet & Rubini
NBD Protocol: Linux kernel documentation Documentation/admin-guide/blockdev/nbd.rst
Sector Addressing: “Computer Systems: A Programmer’s Perspective” Ch. 6 - Bryant & O’Hallaron
I/O Scheduling: “Understanding the Linux Kernel” Ch. 14 - Bovet & Cesati

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: C programming, basic Linux system calls (open, read, write, mmap), understanding of file descriptors, basic networking (sockets)

Real world outcome:

$ ./block_device_server --file storage.img --size 1G &
$ sudo nbd-client localhost /dev/nbd0
$ sudo mkfs.ext4 /dev/nbd0
$ sudo mount /dev/nbd0 /mnt/test
$ echo "Hello Nutanix!" > /mnt/test/hello.txt
$ cat /mnt/test/hello.txt
Hello Nutanix!
$ sudo umount /mnt/test

Implementation Hints:

The Network Block Device (NBD) protocol is straightforward:

Client connects via TCP
Negotiation phase: exchange capabilities
Transmission phase: handle READ/WRITE/FLUSH/TRIM commands

Each request has a simple structure:

Magic number (request identifier)
Command type (READ=0, WRITE=1, DISCONNECT=2, FLUSH=3, TRIM=4)
Handle (client-provided ID to match responses)
Offset in bytes
Length in bytes

Questions to guide your implementation:

How do you handle a READ request? (Seek + read from backing file)
How do you handle a WRITE request? (Receive data, seek, write, optionally sync)
What happens if two requests arrive simultaneously? (Threading or async I/O)
How do you ensure data actually hits disk? (fsync vs fdatasync)

Start by reading the NBD protocol specification and the nbd-server source code.

Learning milestones:

Basic read/write works → You understand block addressing and the NBD protocol
Can format and mount the device → You understand how filesystems interact with block devices
Concurrent I/O without corruption → You understand synchronization in storage systems
fsync guarantees durability → You understand the critical difference between buffered and durable writes

Project 2: Simple Extent-Based File System

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: File Systems / Storage / Data Structures
Software or Tool: FUSE (Filesystem in Userspace)
Main Book: “Operating Systems: Three Easy Pieces” by Remzi H. Arpaci-Dusseau

What you’ll build: A FUSE-based file system that stores files as contiguous extents (like ext4 and Nutanix’s Extent Store), with a superblock, inode table, and extent allocation.

Why it teaches Nutanix storage: Nutanix uses ext4 as the underlying extent store, but the key insight is understanding extents—contiguous ranges of blocks. This is how Nutanix’s Extent Groups work (1MB or 4MB chunks of data).

Core challenges you’ll face:

Designing the on-disk layout → maps to Extent Store organization
Implementing extent allocation → maps to how Nutanix allocates Extent Groups
Managing metadata (inodes) → maps to metadata vs. data separation in NDFS
Handling file growth and fragmentation → maps to data placement decisions

Key Concepts:

File System Layout: “Operating Systems: Three Easy Pieces” Ch. 40 - Arpaci-Dusseau
Extent-Based Allocation: “Understanding the Linux Kernel” Ch. 18 - Bovet & Cesati
FUSE API: libfuse documentation and examples
Journaling Concepts: “Operating Systems: Three Easy Pieces” Ch. 42

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Project 1 (Block Device), understanding of basic file system concepts (inodes, directories), C pointers and structs

Real world outcome:

$ ./my_extent_fs --format /dev/nbd0
Formatting with 256 inodes, 1024 extent groups (1MB each)...
Superblock written at block 0
Inode table at blocks 1-8
Extent bitmap at blocks 9-16
Data extents start at block 17

$ ./my_extent_fs /dev/nbd0 /mnt/myfs
$ dd if=/dev/urandom of=/mnt/myfs/bigfile bs=1M count=100
$ ./my_extent_fs --stats /dev/nbd0
Files: 1
Total extents allocated: 100
Fragmentation: 0% (all contiguous)

$ ls -la /mnt/myfs/
-rw-r--r-- 1 root root 104857600 Dec 21 10:30 bigfile

Implementation Hints:

Your on-disk layout should look like:

Block 0:         Superblock (magic, version, block size, counts)
Blocks 1-N:      Inode Table (fixed-size inode structures)
Blocks N+1-M:    Extent Bitmap (which extents are free)
Blocks M+1-end:  Data Extents (actual file data)

An inode structure might include:

File type (regular, directory, symlink)
Permissions, uid, gid
Size in bytes
Timestamps (atime, mtime, ctime)
Extent list: array of (start_block, length) pairs

Key questions:

How do you find free extents quickly? (Bitmap search)
How do you handle files larger than a single extent? (Extent list)
How do you implement directories? (Special file containing name→inode mappings)
What happens when you delete a file? (Mark extents as free in bitmap)

Start with the FUSE “hello” example and gradually add operations.

Learning milestones:

Can create empty files → You understand inodes and the VFS interface
Can write/read file data → You understand extent allocation
Directories work → You understand hierarchical namespace
Large files work → You understand multi-extent files
Delete reclaims space → You understand the full lifecycle

Project 3: Write-Ahead Log (OpLog)

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Storage / Durability / Performance
Software or Tool: Custom implementation
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A high-performance write-ahead log that absorbs random writes, batches them, and drains to a backend store—exactly like Nutanix’s OpLog.

Why it teaches Nutanix storage: The OpLog is critical to Nutanix performance. It absorbs burst writes on fast SSD, coalesces them, and drains to the Extent Store. This pattern is used in every high-performance storage system.

Core challenges you’ll face:

Ensuring durability before acknowledgment → maps to OpLog sync semantics
Batching and coalescing writes → maps to OpLog drain optimization
Recovery after crash → maps to CVM crash recovery
Managing log space → maps to OpLog size management

Key Concepts:

Write-Ahead Logging: “Designing Data-Intensive Applications” Ch. 7 - Kleppmann
Log-Structured Storage: “Operating Systems: Three Easy Pieces” Ch. 43 - Arpaci-Dusseau
fsync Semantics: “The Linux Programming Interface” Ch. 13 - Kerrisk
Checkpointing: Database internals literature

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Projects 1-2, understanding of durability guarantees, file I/O

Real world outcome:

$ ./oplog_benchmark --writes 100000 --size 4K --random
Without OpLog:  2,500 IOPS (random 4K writes to HDD)
With OpLog:    45,000 IOPS (sequential log on SSD, async drain)

$ ./oplog_demo
Writing 1000 random 4K blocks...
OpLog entries: 1000
Triggering drain...
Coalesced to 847 unique blocks (15% dedup from overwrites)
Drain complete in 1.2s

$ # Simulate crash during drain
$ ./oplog_recovery /var/oplog
Found 523 uncommitted entries
Replaying to extent store...
Recovery complete. Data consistent.

Implementation Hints:

The OpLog structure:

┌────────────────────────────────────────────────────────────┐
│                     OPLOG (on SSD)                          │
├────────────────────────────────────────────────────────────┤
│ Header: magic, version, head_offset, tail_offset           │
├────────────────────────────────────────────────────────────┤
│ Entry 1: [seq_num][offset][length][checksum][data...]      │
│ Entry 2: [seq_num][offset][length][checksum][data...]      │
│ ...                                                        │
│ Entry N: [seq_num][offset][length][checksum][data...]      │
└────────────────────────────────────────────────────────────┘

Write path:

Receive write request (offset, length, data)
Append to OpLog with monotonic sequence number
fsync the OpLog
Acknowledge write to client
Background thread drains entries to Extent Store
After drain, update checkpoint and reclaim log space

Key questions:

How do you handle overwrites to the same offset? (Track in-memory map, coalesce)
What if the log fills up? (Block new writes until drain catches up)
How do you recover after a crash? (Replay from last checkpoint)
How do you ensure entries are complete? (Checksums, length fields)

Learning milestones:

Basic append works → You understand log structure
fsync provides durability → You understand the durability boundary
Drain coalesces writes → You understand write optimization
Crash recovery works → You understand transaction semantics

Project 4: TCP Server Framework

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go, C++
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Networking / Concurrency
Software or Tool: epoll/kqueue
Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A high-performance, event-driven TCP server using epoll that can handle thousands of concurrent connections—the foundation for all network storage protocols.

Why it teaches Nutanix networking: Every CVM communicates over TCP. iSCSI, NFS, inter-CVM replication, cluster management—all TCP. Understanding non-blocking I/O and event-driven architecture is essential.

Core challenges you’ll face:

Managing thousands of connections → maps to CVM handling many VM connections
Non-blocking I/O with epoll → maps to high-performance I/O path
Protocol parsing with partial reads → maps to iSCSI/NFS protocol handling
Graceful shutdown and cleanup → maps to CVM maintenance operations

Key Concepts:

Socket Programming: “The Linux Programming Interface” Ch. 56-61 - Kerrisk
epoll: “The Linux Programming Interface” Ch. 63 - Kerrisk
Non-blocking I/O: “Advanced Programming in the UNIX Environment” Ch. 14 - Stevens
Protocol Buffers: (for later RPC work)

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic socket programming (socket, bind, listen, accept), understanding of file descriptors

Real world outcome:

$ ./tcp_echo_server --port 9000 --workers 4 &
Server listening on port 9000 with 4 worker threads

$ # Benchmark with many connections
$ ./tcp_benchmark --host localhost --port 9000 --connections 10000 --requests 100000
Connections: 10000
Requests: 100000
Throughput: 150,000 req/sec
Latency p50: 0.2ms, p99: 1.5ms

$ # Verify graceful handling
$ for i in {1..100}; do echo "hello $i" | nc localhost 9000 & done
$ # All 100 connections handled correctly

Implementation Hints:

The epoll-based server pattern:

// Create epoll instance
int epfd = epoll_create1(0);

// Add listening socket
struct epoll_event ev = {.events = EPOLLIN, .data.fd = listen_fd};
epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);

// Event loop
while (running) {
    int n = epoll_wait(epfd, events, MAX_EVENTS, timeout);
    for (int i = 0; i < n; i++) {
        if (events[i].data.fd == listen_fd) {
            // Accept new connection
            int client = accept(listen_fd, ...);
            set_nonblocking(client);
            // Add to epoll
        } else {
            // Handle client I/O
        }
    }
}

Key questions:

How do you handle partial reads? (Buffer per connection, track position)
How do you handle partial writes? (Enable EPOLLOUT when buffer not empty)
How do you detect disconnection? (EPOLLHUP, EPOLLERR, or read returning 0)
How do you scale to multiple cores? (Multiple threads, each with own epoll)

Learning milestones:

Basic echo works → You understand the epoll event loop
Handles 1000+ connections → You understand scalability
Partial read/write handling → You understand buffering
Multi-threaded → You understand concurrent server architecture

Project 5: iSCSI Target Implementation

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Storage / Networking / SCSI Protocol
Software or Tool: iSCSI protocol (RFC 7143)
Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens

What you’ll build: A user-space iSCSI target that exports block devices over the network, allowing remote hosts to mount them as local disks.

Why it teaches Nutanix storage: Nutanix’s DSF provides iSCSI endpoints for VMs. AHV uses iSCSI multi-pathing to CVMs. Understanding iSCSI is understanding how block storage is delivered over networks.

Core challenges you’ll face:

Implementing iSCSI PDU parsing → maps to protocol handling in CVM
SCSI command handling (READ, WRITE, INQUIRY) → maps to block I/O processing
Session and connection management → maps to VM connection handling
Error recovery and multi-pathing → maps to CVM failover (autopathing)

Key Concepts:

iSCSI Protocol: RFC 7143 (iSCSI specification)
SCSI Commands: T10 SCSI specifications
Login Negotiation: RFC 7143 Section 5
Error Recovery: RFC 7143 Section 7

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Projects 1, 4 (Block Device, TCP Server), understanding of binary protocols

Real world outcome:

$ ./my_iscsi_target --port 3260 --target iqn.2024-01.com.mylab:storage1 \
    --lun 0 --backing-file /data/disk0.img --size 100G &
iSCSI Target started
  Target: iqn.2024-01.com.mylab:storage1
  LUN 0: /data/disk0.img (100GB)
  Listening on 0.0.0.0:3260

$ # On initiator (another machine)
$ iscsiadm -m discovery -t sendtargets -p 192.168.1.100
192.168.1.100:3260,1 iqn.2024-01.com.mylab:storage1

$ iscsiadm -m node --login
Logging in to [iface: default, target: iqn.2024-01.com.mylab:storage1]
Login successful.

$ lsblk
sdb    8:16   0   100G  0 disk    ← Our remote iSCSI disk!

$ sudo mkfs.ext4 /dev/sdb
$ sudo mount /dev/sdb /mnt/iscsi
$ # Full disk functionality over network!

Implementation Hints:

iSCSI is SCSI commands encapsulated in TCP. The PDU (Protocol Data Unit) structure:

┌─────────────────────────────────────────────────────────────┐
│                   iSCSI PDU HEADER (48 bytes)                │
├─────────────────────────────────────────────────────────────┤
│ Opcode (1B) │ Flags (1B) │ ... │ Total AHS Length │ ...     │
│ LUN │ Initiator Task Tag │ Target Transfer Tag │ ...        │
│ CmdSN │ ExpStatSN │ ...                                     │
├─────────────────────────────────────────────────────────────┤
│                   AHS (Additional Header Segments)           │
├─────────────────────────────────────────────────────────────┤
│                   Header Digest (optional CRC)               │
├─────────────────────────────────────────────────────────────┤
│                   DATA SEGMENT                               │
├─────────────────────────────────────────────────────────────┤
│                   Data Digest (optional CRC)                 │
└─────────────────────────────────────────────────────────────┘

Key iSCSI opcodes to implement:

0x01: SCSI Command (carries SCSI CDB)
0x03: Login Request
0x04: Text Request
0x06: Logout Request

Key SCSI commands (in the CDB):

0x12: INQUIRY (device info)
0x25: READ CAPACITY (disk size)
0x28: READ(10) (read blocks)
0x2A: WRITE(10) (write blocks)
0x00: TEST UNIT READY

Start with login negotiation, then implement INQUIRY and READ CAPACITY, then READ/WRITE.

Learning milestones:

Login handshake works → You understand iSCSI sessions
INQUIRY returns device info → You understand SCSI command structure
READ/WRITE works → You have a functional iSCSI target
Can boot a VM from it → Production-ready understanding

Project 6: NFS Server (v3)

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: File Systems / Networking / RPC
Software or Tool: ONC RPC, NFS v3 protocol
Main Book: “NFS Illustrated” by Brent Callaghan

What you’ll build: An NFS v3 server that exports a local directory over the network, accessible via standard mount commands.

Why it teaches Nutanix storage: Nutanix DSF provides NFS as a primary storage protocol. VMware datastores mount via NFS. Understanding NFS means understanding file-based storage access.

Core challenges you’ll face:

Implementing ONC RPC → maps to RPC framework used throughout Nutanix
NFS statelessness and filehandles → maps to distributed file identification
Handling concurrent file operations → maps to multi-VM file access
Performance optimization → maps to CVM NFS tuning

Key Concepts:

ONC RPC: RFC 5531
XDR Encoding: RFC 4506
NFS v3 Protocol: RFC 1813
Filehandle Design: “NFS Illustrated” - Callaghan

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 2 (File System), Project 4 (TCP Server), understanding of RPC concepts

Real world outcome:

$ ./my_nfs_server --export /data/share --port 2049 &
NFS Server started
  Export: /data/share
  Listening on 0.0.0.0:2049 (TCP) and 0.0.0.0:2049 (UDP)

$ # On client machine
$ sudo mount -t nfs -o vers=3 192.168.1.100:/data/share /mnt/nfs

$ ls /mnt/nfs
documents  photos  videos

$ cp /tmp/large_file.iso /mnt/nfs/
$ md5sum /mnt/nfs/large_file.iso
d41d8cd98f00b204e9800998ecf8427e  /mnt/nfs/large_file.iso

$ # Works exactly like local filesystem!

Implementation Hints:

NFS is built on ONC RPC (Sun RPC). You need to implement:

RPC Layer:
- Record marking (4-byte length prefix for TCP)
- XDR encoding/decoding
- Program/version/procedure dispatch
MOUNT Protocol (separate from NFS):
- MNT: Get filehandle for export root
- UMNT: Unmount
NFS v3 Procedures:
- GETATTR: Get file attributes
- LOOKUP: Find file by name in directory
- READ: Read file data
- WRITE: Write file data
- CREATE: Create file
- REMOVE: Delete file
- READDIR: List directory

Filehandle design is critical:

struct nfs_fh {
    uint32_t dev;      // Device ID
    uint64_t ino;      // Inode number
    uint32_t gen;      // Generation number (for stale detection)
};

Start with MOUNT to get root filehandle, then GETATTR and LOOKUP, then READ.

Learning milestones:

MOUNT returns filehandle → You understand NFS bootstrapping
LOOKUP and GETATTR work → You understand namespace traversal
READ/WRITE work → You have a functional NFS server
READDIR works → Full directory operations

Project 7: Raft Consensus Implementation

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Distributed Systems / Consensus
Software or Tool: Custom implementation based on Raft paper
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A complete Raft consensus library with leader election, log replication, and safety guarantees—the foundation for any distributed system.

Why it teaches Nutanix distributed systems: Nutanix uses consensus for cluster decisions, leader election, and metadata consistency. Understanding Raft means understanding how distributed systems agree on state.

Core challenges you’ll face:

Leader election with randomized timeouts → maps to cluster leadership in Nutanix
Log replication with majority quorum → maps to metadata replication
Safety despite network partitions → maps to split-brain prevention
Log compaction (snapshotting) → maps to metadata compaction

Key Concepts:

Raft Paper: “In Search of an Understandable Consensus Algorithm” - Ongaro & Ousterhout
Distributed Consensus: “Designing Data-Intensive Applications” Ch. 9 - Kleppmann
State Machine Replication: Lamport’s papers
Leader Election: “Distributed Systems” - Tanenbaum

Resources for key challenges:

Raft Paper - The original paper is excellent
Raft Visualization - Interactive explanation
TLA+ Spec - Formal specification

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 4 (TCP Server), understanding of distributed systems concepts, state machines

Real world outcome:

$ # Start 3-node cluster
$ ./raft_node --id 1 --peers 2:localhost:5002,3:localhost:5003 --port 5001 &
$ ./raft_node --id 2 --peers 1:localhost:5001,3:localhost:5003 --port 5002 &
$ ./raft_node --id 3 --peers 1:localhost:5001,2:localhost:5002 --port 5003 &

$ ./raft_client localhost:5001 set key1 value1
OK (leader=1, term=1, index=1)

$ ./raft_client localhost:5002 get key1
value1

$ # Kill leader
$ kill %1

$ # Wait for election...
$ ./raft_client localhost:5002 set key2 value2
OK (leader=2, term=2, index=2)  # New leader elected!

$ ./raft_client localhost:5003 get key1
value1  # State preserved!

Implementation Hints:

Raft has three main components:

Leader Election:
- Nodes start as Followers
- If no heartbeat received within election timeout, become Candidate
- Request votes from peers
- If majority votes received, become Leader
- Leader sends heartbeats to maintain leadership
Log Replication:
- Client sends command to Leader
- Leader appends to its log
- Leader sends AppendEntries to all peers
- When majority acknowledges, entry is committed
- Apply committed entries to state machine
Safety:
- Election restriction: Only nodes with up-to-date logs can become leader
- Leader completeness: Committed entries are never lost

State transitions:

                    times out,
                    starts election
        ┌───────────────────────────┐
        │                           │
        ▼           receives votes  │
    Candidate ◄────────────────► Follower
        │           from majority   ▲
        │                           │
        │ wins election             │
        ▼                           │
      Leader ───────────────────────┘
               discovers leader with
               higher term

Key data structures:

struct raft_node {
    int id;
    enum { FOLLOWER, CANDIDATE, LEADER } state;
    int current_term;
    int voted_for;
    struct log_entry *log;
    int commit_index;
    int last_applied;
    // Leader state
    int next_index[NUM_PEERS];
    int match_index[NUM_PEERS];
};

Learning milestones:

Leader election works → You understand the core voting protocol
Log replication works → You understand consensus
Survives leader failure → You understand fault tolerance
Handles network partitions → You understand safety properties

Project 8: Distributed Key-Value Store

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Distributed Systems / Storage
Software or Tool: Build on top of Raft
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A distributed key-value store using Raft consensus, similar to etcd or Consul—and similar to Nutanix’s Cassandra-based metadata store.

Why it teaches Nutanix metadata storage: Nutanix stores all metadata (where blocks are, VM info, cluster state) in a distributed database based on Cassandra. Understanding distributed KV stores means understanding the metadata plane.

Core challenges you’ll face:

Building a state machine on Raft → maps to metadata operations
Handling read consistency levels → maps to strong vs. eventual reads
Log compaction (snapshots) → maps to metadata compaction
Client request handling → maps to Prism API operations

Key Concepts:

State Machine Replication: Built on Project 7
Linearizability: “Designing Data-Intensive Applications” Ch. 9 - Kleppmann
Consistent Hashing: (for scaling beyond single Raft group)
Snapshotting: Raft paper Section 7

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 7 (Raft), hash tables, serialization

Real world outcome:

$ # Start cluster
$ ./kv_cluster --nodes 3 &

$ ./kv_client put /cluster/nodes/node1 '{"ip":"192.168.1.1","status":"healthy"}'
OK

$ ./kv_client put /cluster/nodes/node2 '{"ip":"192.168.1.2","status":"healthy"}'
OK

$ ./kv_client get /cluster/nodes/node1
{"ip":"192.168.1.1","status":"healthy"}

$ ./kv_client list /cluster/nodes/
/cluster/nodes/node1
/cluster/nodes/node2

$ # Watch for changes (like etcd watch)
$ ./kv_client watch /cluster/nodes/ &
$ ./kv_client put /cluster/nodes/node3 '{"ip":"192.168.1.3","status":"healthy"}'
[WATCH] PUT /cluster/nodes/node3 = {"ip":"192.168.1.3","status":"healthy"}

Implementation Hints:

The KV store is a state machine driven by Raft:

// Command types for the Raft log
enum kv_op {
    KV_PUT,
    KV_DELETE,
    KV_CAS,  // Compare-and-swap
};

struct kv_command {
    enum kv_op op;
    char key[256];
    char value[4096];
    uint64_t expected_version;  // For CAS
};

// State machine apply function
void kv_apply(struct kv_store *store, struct kv_command *cmd) {
    switch (cmd->op) {
        case KV_PUT:
            hashtable_put(store->ht, cmd->key, cmd->value);
            break;
        case KV_DELETE:
            hashtable_delete(store->ht, cmd->key);
            break;
        case KV_CAS:
            // Check version, then update
            break;
    }
}

Key features to implement:

Hierarchical keys: /cluster/nodes/node1 - support prefix listing
Versions: Track modification version for each key
Watch: Notify clients of changes to keys/prefixes
Snapshots: Periodically snapshot the hash table to truncate Raft log

Read handling options:

Strong read: Forward to leader, consistent but slower
Stale read: Read from local state, fast but may be stale
Linearizable read: Leader confirms it’s still leader before responding

Learning milestones:

Basic PUT/GET works → You understand state machine replication
Consistent across nodes → You understand linearizability
Snapshots work → You understand log compaction
Watch works → You understand event-driven distributed systems

Project 9: Data Replication Engine (RF2/RF3)

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Distributed Storage / Fault Tolerance
Software or Tool: Custom implementation
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A data replication engine that maintains RF2 (2 copies) or RF3 (3 copies) of data across nodes, with synchronous writes and automatic repair.

Why it teaches Nutanix data protection: This is exactly what Nutanix does. When you write to the OpLog, it’s synchronously replicated to 1 or 2 other CVMs before acknowledgment. Understanding this means understanding Nutanix’s fault tolerance.

Core challenges you’ll face:

Synchronous replication with quorum → maps to RF2/RF3 write semantics
Replica placement (node/rack awareness) → maps to availability domains
Failure detection and rebuild → maps to self-healing storage
Consistency during partial failures → maps to data integrity guarantees

Key Concepts:

Replication Strategies: “Designing Data-Intensive Applications” Ch. 5 - Kleppmann
Quorum Reads/Writes: “Designing Data-Intensive Applications” Ch. 5 - Kleppmann
Chain Replication: OSDI ‘04 paper by van Renesse & Schneider
Failure Detection: Phi Accrual Failure Detector

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 4, 7, 8 (TCP, Raft, KV Store)

Real world outcome:

$ # Start 4-node storage cluster with RF3
$ ./storage_cluster --nodes 4 --rf 3 &

$ # Write data
$ ./storage_client write block-0001 < /tmp/data.bin
Written block-0001 to nodes [1, 2, 4]
Replicas: 3, Quorum: 2

$ # Verify replication
$ ./storage_client locate block-0001
block-0001:
  Primary: node-1 (192.168.1.1)
  Replica: node-2 (192.168.1.2)
  Replica: node-4 (192.168.1.4)

$ # Simulate node failure
$ ./storage_cluster kill-node 2

$ # Automatic rebuild triggers
$ ./storage_client locate block-0001
block-0001:
  Primary: node-1 (192.168.1.1)
  Replica: node-4 (192.168.1.4)
  Replica: node-3 (192.168.1.3) [REBUILDING: 45%]

$ # Data still accessible during rebuild!
$ ./storage_client read block-0001 > /tmp/recovered.bin
$ diff /tmp/data.bin /tmp/recovered.bin
# No difference - data intact

Implementation Hints:

Write path for RF3:

Client Write Request
        │
        ▼
    Primary Node
        │
        ├──────────────────┬──────────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
   Write to Local     Replicate to      Replicate to
   Storage            Secondary 1       Secondary 2
        │                  │                  │
        └──────────────────┴──────────────────┘
                           │
                   Wait for Quorum (2 of 3)
                           │
                           ▼
                   Acknowledge to Client

Key data structures:

struct block_location {
    uint64_t block_id;
    int nodes[3];           // Which nodes have copies
    int primary;            // Primary owner
    uint64_t version;       // For conflict resolution
    enum { HEALTHY, DEGRADED, REBUILDING } status;
};

struct replication_config {
    int rf;                 // Replication factor (2 or 3)
    int write_quorum;       // rf/2 + 1
    int read_quorum;        // 1 (for availability) or rf/2+1 (for consistency)
    bool rack_aware;        // Spread replicas across racks
};

Placement algorithm (simplified):

Hash block ID to get primary node
Select RF-1 replicas from different failure domains
Store mapping in distributed metadata (Project 8)

Rebuild process:

Detect node failure (heartbeat timeout)
Find all blocks with replicas on failed node
For each block, read from surviving replica
Write to new node (chosen by placement algorithm)
Update metadata

Learning milestones:

RF2 write works → You understand quorum writes
Handles node failure → You understand fault detection
Automatic rebuild → You understand self-healing
No data loss → You understand durability guarantees

Project 10: Simple Hypervisor with KVM

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 5: Pure Magic
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 5: Master
Knowledge Area: Virtualization / Low-Level Systems
Software or Tool: Linux KVM, QEMU concepts
Main Book: “Modern Operating Systems” by Andrew Tanenbaum

What you’ll build: A minimal hypervisor using KVM that can boot a simple guest (like a tiny bootloader or bare-metal program), teaching you how virtualization actually works.

Why it teaches Nutanix compute: Nutanix AHV is built on KVM. Understanding KVM means understanding how VMs are created, run, and managed at the lowest level.

Core challenges you’ll face:

Understanding VMX/SVM extensions → maps to hardware virtualization
Setting up vCPU state → maps to VM resource allocation
Memory mapping (EPT/NPT) → maps to VM memory management
Handling VM exits → maps to device emulation

Key Concepts:

KVM API: Linux kernel KVM documentation
Intel VT-x/AMD-V: Intel/AMD manuals
Extended Page Tables: “Virtual Machines” - Smith & Nair
BIOS/UEFI Basics: OSDev Wiki

Resources for key challenges:

Difficulty: Master Time estimate: 2-4 weeks Prerequisites: Assembly language (x86), understanding of protected mode, memory management

Real world outcome:

$ # Create a simple guest program (prints "Hello from VM!")
$ cat > guest.asm << 'EOF'
BITS 16
org 0x7c00
mov si, msg
call print
hlt
print:
    lodsb
    or al, al
    jz done
    mov ah, 0x0e
    int 0x10
    jmp print
done:
    ret
msg: db "Hello from VM!", 0
times 510-($-$$) db 0
dw 0xaa55
EOF
$ nasm guest.asm -o guest.bin

$ ./my_hypervisor guest.bin
[KVM] Created VM (fd=3)
[KVM] Created vCPU (fd=4)
[KVM] Loaded guest at 0x7c00
[KVM] Entering guest...
[VM EXIT] I/O port 0x3f8: 'H'
[VM EXIT] I/O port 0x3f8: 'e'
[VM EXIT] I/O port 0x3f8: 'l'
[VM EXIT] I/O port 0x3f8: 'l'
[VM EXIT] I/O port 0x3f8: 'o'
...
[KVM] Guest halted

Implementation Hints:

The KVM API workflow:

// 1. Open KVM
int kvm = open("/dev/kvm", O_RDWR);

// 2. Create a VM
int vm = ioctl(kvm, KVM_CREATE_VM, 0);

// 3. Create memory region
void *mem = mmap(NULL, MEM_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
struct kvm_userspace_memory_region region = {
    .slot = 0,
    .guest_phys_addr = 0,
    .memory_size = MEM_SIZE,
    .userspace_addr = (uint64_t)mem,
};
ioctl(vm, KVM_SET_USER_MEMORY_REGION, &region);

// 4. Load guest code
memcpy(mem + 0x7c00, guest_code, guest_size);

// 5. Create vCPU
int vcpu = ioctl(vm, KVM_CREATE_VCPU, 0);

// 6. Setup vCPU state (registers, sregs)
struct kvm_sregs sregs;
ioctl(vcpu, KVM_GET_SREGS, &sregs);
sregs.cs.base = 0;
sregs.cs.selector = 0;
ioctl(vcpu, KVM_SET_SREGS, &sregs);

struct kvm_regs regs = {
    .rip = 0x7c00,
    .rflags = 2,
};
ioctl(vcpu, KVM_SET_REGS, &regs);

// 7. Run loop
while (1) {
    ioctl(vcpu, KVM_RUN, 0);
    switch (run->exit_reason) {
        case KVM_EXIT_IO:
            handle_io(run);
            break;
        case KVM_EXIT_HLT:
            return;
    }
}

Start simple:

Boot a “Hello World” that outputs to serial port (I/O port 0x3f8)
Handle basic I/O exits
Gradually add more features (interrupts, memory-mapped I/O)

Learning milestones:

Guest code runs → You understand KVM fundamentals
Handle I/O exits → You understand VM exits
Serial output works → You understand device emulation basics
Boot real-mode code → You understand CPU virtualization

Project 11: VM Memory Ballooning & Overcommit

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Virtualization / Memory Management
Software or Tool: KVM, virtio-balloon
Main Book: “Understanding the Linux Kernel” by Daniel P. Bovet

What you’ll build: A memory ballooning system where VMs can dynamically return memory to the host and reclaim it later—essential for efficient memory usage in HCI.

Why it teaches Nutanix compute: Nutanix uses memory ballooning and overcommit to run more VMs than would fit if all had dedicated memory. This is critical for density and efficiency.

Core challenges you’ll face:

Understanding virtio balloon protocol → maps to VM memory management
Guest-host communication → maps to CVM-hypervisor interaction
Memory pressure detection → maps to resource scheduling
KSM (Kernel Same-page Merging) → maps to memory deduplication

Key Concepts:

Virtio Specification: OASIS virtio spec
Memory Ballooning: “Virtual Machines” - Smith & Nair
KSM: Linux kernel documentation
Memory Overcommit: KVM documentation

Difficulty: Expert Time estimate: 2 weeks Prerequisites: Project 10 (KVM Hypervisor)

Real world outcome:

$ # Start VM with 4GB max, 2GB initial
$ ./my_vm --memory-max 4G --memory-initial 2G guest.img &

$ ./vm_balloon --vm 1 status
VM 1 Memory:
  Allocated: 2048 MB
  Balloon:   0 MB (inflated 0%)
  Available: 2048 MB

$ # Host needs memory - inflate balloon
$ ./vm_balloon --vm 1 inflate 1024
Inflating balloon by 1024 MB...
Guest returned 1024 MB to host

$ ./vm_balloon --vm 1 status
VM 1 Memory:
  Allocated: 2048 MB
  Balloon:   1024 MB (inflated 50%)
  Available: 1024 MB

$ # VM needs memory back
$ ./vm_balloon --vm 1 deflate 512
Deflating balloon by 512 MB...
Returned 512 MB to guest

$ # Show KSM savings
$ ./vm_ksm_stats
Pages shared: 15,234
Pages saved:  12,456 (48 MB)

Implementation Hints:

Memory ballooning concept:

BEFORE BALLOONING:
┌────────────────────────────────────────────┐
│              GUEST MEMORY (4GB)             │
│  ████████████████████████████████████████  │
│  All allocated to guest                     │
└────────────────────────────────────────────┘

AFTER INFLATING BALLOON (1GB):
┌────────────────────────────────────────────┐
│              GUEST MEMORY (4GB)             │
│  ████████████████████████  ░░░░░░░░░░░░░░  │
│  Guest usable (3GB)        │ Balloon (1GB) │
│                            │ Returned to   │
│                            │ host          │
└────────────────────────────────────────────┘

The balloon driver (in guest):

Receives “inflate” request from host
Allocates memory pages inside the guest
Tells host which guest-physical pages are in balloon
Host can unmap those pages, freeing physical memory

Host-side:

Uses virtio-balloon device for communication
Sends inflate/deflate commands
Tracks which pages are ballooned
Can madvise(MADV_DONTNEED) to release pages

Learning milestones:

Virtio balloon device works → You understand virtio
Inflate/deflate works → You understand memory ballooning
Host reclaims memory → You understand the full cycle
Multiple VMs balanced → You understand resource management

Project 12: Snapshot & Clone System

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Storage / Copy-on-Write
Software or Tool: Custom implementation
Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau

What you’ll build: A snapshot and clone system using copy-on-write (CoW), allowing instant snapshots and space-efficient clones of block devices.

Why it teaches Nutanix storage: Nutanix provides instant VM snapshots and clones using CoW. This is critical for backup, testing, and disaster recovery.

Core challenges you’ll face:

Copy-on-write block management → maps to Nutanix snapshot implementation
Reference counting for shared blocks → maps to extent reference tracking
Crash consistency for CoW → maps to snapshot durability
Clone chain management → maps to linked clone VMs

Key Concepts:

Copy-on-Write: “Operating Systems: Three Easy Pieces” Ch. 43 - Arpaci-Dusseau
B-tree for Snapshots: ZFS/Btrfs design
Reference Counting: General CS concepts
Crash Consistency: “Operating Systems: Three Easy Pieces” Ch. 42

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Project 1, 2 (Block Device, File System)

Real world outcome:

$ # Create base volume
$ ./cow_storage create base.vol 10G
Created volume: base.vol (10GB)

$ # Write some data
$ ./cow_storage write base.vol 0 < /tmp/data.bin
Wrote 1GB to base.vol

$ # Take snapshot (instant!)
$ time ./cow_storage snapshot base.vol snap1
Created snapshot: snap1
real    0m0.001s   # Instant!

$ # Write more data to base
$ ./cow_storage write base.vol 1G < /tmp/more_data.bin
Wrote 500MB to base.vol (CoW: 125 new blocks)

$ # Snapshot uses no additional space for unchanged blocks
$ ./cow_storage stats
base.vol: 1.5GB used, 1.5GB unique
snap1:    1.0GB used, 0.0GB unique (100% shared with base)

$ # Create clone from snapshot (also instant!)
$ ./cow_storage clone snap1 clone1
Created clone: clone1

$ # Clone is writable, independent
$ ./cow_storage write clone1 0 < /tmp/different.bin
Wrote 100MB to clone1 (CoW: 25 new blocks)

Implementation Hints:

Copy-on-Write storage structure:

┌─────────────────────────────────────────────────────────────┐
│                    BLOCK MAPPING TABLE                       │
├─────────────────────────────────────────────────────────────┤
│ Virtual Block 0 → Physical Block 17 (refcount: 2)           │
│ Virtual Block 1 → Physical Block 23 (refcount: 1)           │
│ Virtual Block 2 → Physical Block 17 (refcount: 2) ← shared! │
│ ...                                                         │
└─────────────────────────────────────────────────────────────┘

WRITE to Virtual Block 0 (shared):
1. Allocate new physical block (e.g., 42)
2. Copy data from block 17 to block 42
3. Apply write to block 42
4. Update mapping: Virtual Block 0 → Physical Block 42
5. Decrement refcount of block 17 (now 1)

Key data structures:

struct volume {
    char name[64];
    uint64_t size;
    struct volume *parent;      // For clones
    uint64_t *block_map;        // Virtual → Physical
    uint64_t snapshot_epoch;    // For ordering
};

struct physical_block {
    uint64_t id;
    uint32_t refcount;
    uint8_t data[BLOCK_SIZE];
};

Snapshot creation:

Create new volume struct pointing to same block_map
Increment refcount of all physical blocks
Mark both volumes as CoW
Done! (No data copying)

Learning milestones:

Basic CoW write works → You understand copy-on-write
Snapshots are instant → You understand metadata-only snapshots
Clones share blocks → You understand reference counting
Crash recovery works → You understand CoW crash consistency

Project 13: Deduplication Engine

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Storage / Data Reduction
Software or Tool: Custom implementation
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A content-addressable deduplication engine that identifies duplicate blocks across all storage and stores each unique block only once.

Why it teaches Nutanix storage efficiency: Nutanix uses deduplication to reduce storage consumption. When 100 VMs have the same OS blocks, they’re stored once. Understanding this means understanding storage efficiency.

Core challenges you’ll face:

Fast content hashing → maps to fingerprint computation
Hash table design for billions of entries → maps to dedup metadata
Inline vs. post-process dedup → maps to performance tradeoffs
Handling hash collisions → maps to data integrity

Key Concepts:

Content-Addressable Storage: CAS fundamentals
SHA-256/xxHash: Hashing for fingerprints
Bloom Filters: For fast lookup
Hash Table Scaling: “Designing Data-Intensive Applications” Ch. 3

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1, 12 (Block Device, Snapshots), hash tables

Real world outcome:

$ # Enable deduplication on storage
$ ./dedup_storage init --volume data.vol --hash sha256

$ # Write 10 identical VMs (each 10GB)
$ for i in {1..10}; do
    ./dedup_storage write data.vol vm$i.img < base_ubuntu.img
done

$ ./dedup_storage stats data.vol
Total logical data:  100 GB (10 VMs × 10GB)
Total physical data:  10.2 GB
Dedup ratio:         9.8:1
Unique blocks:       2,621,440
Duplicate blocks:    23,592,960 (saved)

$ # Write a VM with 90% same content
$ ./dedup_storage write data.vol vm11.img < similar_ubuntu.img
Analyzed: 2,621,440 blocks
  Duplicate: 2,359,296 (90%)
  New:         262,144 (10%)
Physical space used: 1 GB (not 10 GB!)

Implementation Hints:

Deduplication architecture:

             Write Request (block + data)
                       │
                       ▼
            ┌─────────────────────┐
            │   Compute Hash      │
            │   (SHA256/xxHash)   │
            └──────────┬──────────┘
                       │
                       ▼
            ┌─────────────────────┐
            │   Lookup in         │
            │   Hash Index        │──────► Found? Return existing block ID
            └──────────┬──────────┘
                       │ Not found
                       ▼
            ┌─────────────────────┐
            │   Store Block       │
            │   Update Index      │
            └─────────────────────┘

Key data structures:

struct dedup_index {
    // Hash → Physical Block ID
    struct hash_entry {
        uint8_t hash[32];        // SHA-256
        uint64_t block_id;
        uint32_t refcount;
    } *entries;
    uint64_t capacity;
    uint64_t count;
    // Bloom filter for fast negative lookup
    struct bloom_filter *bf;
};

Inline dedup workflow:

Client sends write(offset, data)
Compute hash of data block
Check bloom filter (fast “probably not exists” test)
If bloom says maybe exists, check hash table
If exists: increment refcount, return existing block_id
If not exists: allocate new block, store data, add to index

Learning milestones:

Hash computation works → You understand content addressing
Duplicates detected → You understand dedup lookup
Space savings measured → You understand dedup ratio
Performance acceptable → You understand inline vs. post-process tradeoffs

Project 14: Erasure Coding (EC-X)

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Storage / Error Correction / Math
Software or Tool: Reed-Solomon or ISA-L library
Main Book: “Serious Cryptography, 2nd Edition” by Jean-Philippe Aumasson

What you’ll build: An erasure coding system that provides fault tolerance with less overhead than replication (e.g., 1.5x instead of 2x or 3x).

Why it teaches Nutanix storage efficiency: Nutanix EC-X uses erasure coding for cold data, reducing storage overhead while maintaining fault tolerance. This is advanced storage optimization.

Core challenges you’ll face:

Understanding Reed-Solomon math → maps to EC fundamentals
Stripe layout and encoding → maps to EC-X implementation
Partial stripe handling → maps to small write problem
Degraded read performance → maps to EC tradeoffs

Key Concepts:

Reed-Solomon Codes: “Error Control Coding” - Lin & Costello
Galois Field Arithmetic: GF(2^8) operations
Stripe Width Selection: Performance vs. overhead
ISA-L Library: Intel’s optimized EC implementation

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Linear algebra basics, Project 9 (Replication)

Real world outcome:

$ # Create EC storage pool (4 data + 2 parity)
$ ./ec_storage init --pool ecpool --data-chunks 4 --parity-chunks 2

$ # Write data
$ ./ec_storage write ecpool large_file.bin
Striping: 4+2 across 6 nodes
Total data: 10 GB
Parity overhead: 50% (5 GB parity)
Total space: 15 GB

$ # Compare to RF3
$ ./ec_storage simulate-rf3 large_file.bin
RF3 overhead: 200% (20 GB for 10 GB data)
Total space: 30 GB
EC saves: 15 GB (50% less than RF3!)

$ # Simulate 2 node failures
$ ./ec_storage fail-nodes 2 4

$ # Data still readable!
$ ./ec_storage read ecpool large_file.bin > /tmp/recovered.bin
Reading in degraded mode (2 nodes failed)
Reconstructing from parity...
Read complete.

$ diff large_file.bin /tmp/recovered.bin
# No difference!

Implementation Hints:

Erasure coding stripe layout (4+2 example):

                    DATA                      PARITY
┌────────┬────────┬────────┬────────┬────────┬────────┐
│ D0     │ D1     │ D2     │ D3     │ P0     │ P1     │
│ Node 1 │ Node 2 │ Node 3 │ Node 4 │ Node 5 │ Node 6 │
└────────┴────────┴────────┴────────┴────────┴────────┘

P0 = D0 ⊕ D1 ⊕ D2 ⊕ D3  (XOR parity)
P1 = Computed via Reed-Solomon (can recover any 2 failures)

The math (simplified):

Work in Galois Field GF(2^8) - byte operations
Encoding matrix multiplies data vector
Parity chunks are linear combinations of data chunks
To recover, solve system of equations with surviving chunks

Using Intel ISA-L (recommended for production):

#include <isa-l.h>

// Encode
unsigned char *data[4];   // 4 data chunks
unsigned char *parity[2]; // 2 parity chunks
unsigned char encode_matrix[6*4];
unsigned char g_tbls[2*4*32];

gf_gen_rs_matrix(encode_matrix, 6, 4);
ec_init_tables(4, 2, &encode_matrix[4*4], g_tbls);
ec_encode_data(chunk_size, 4, 2, g_tbls, data, parity);

// Decode (recover from 2 failures)
// ... (invert submatrix, reconstruct)

Learning milestones:

Encoding works → You understand parity computation
Single failure recovery → You understand XOR reconstruction
Double failure recovery → You understand Reed-Solomon
Performance is acceptable → You understand vectorization (SIMD)

Project 15: Cluster Health Monitor

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Go, Rust
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Distributed Systems / Monitoring
Software or Tool: Custom implementation
Main Book: “Site Reliability Engineering” by Google

What you’ll build: A cluster health monitoring system with failure detection, alerting, and automatic remediation—like Nutanix Prism’s health checks.

Why it teaches Nutanix operations: Nutanix constantly monitors cluster health, detects failures, and triggers automatic repair. Understanding this means understanding self-managing infrastructure.

Core challenges you’ll face:

Distributed failure detection → maps to node heartbeating
Consensus on cluster state → maps to quorum decisions
Cascading failure prevention → maps to admission control
Alert aggregation → maps to Prism alerting

Key Concepts:

Failure Detectors: Phi Accrual Failure Detector
Gossip Protocols: SWIM paper
Alert Management: “Site Reliability Engineering” Ch. 29
Chaos Engineering: Principles of controlled failure

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 7, 8 (Raft, KV Store)

Real world outcome:

$ ./cluster_monitor --join node1:8001,node2:8001,node3:8001

$ ./cluster_monitor status
Cluster Health: HEALTHY
Nodes: 3/3 online
  node1: HEALTHY (cpu: 45%, mem: 62%, disk: 34%)
  node2: HEALTHY (cpu: 23%, mem: 58%, disk: 41%)
  node3: HEALTHY (cpu: 67%, mem: 71%, disk: 28%)

Storage: HEALTHY
  Capacity: 10TB / 30TB (33%)
  RF: 3 (all data protected)

Recent Events: (last 24h)
  [INFO] node2 disk sdb: latency spike (15ms avg → 45ms)
  [WARN] node3 memory: usage above 70%

$ # Simulate node failure
$ ssh node2 "sudo systemctl stop cluster-monitor"

$ ./cluster_monitor status
Cluster Health: DEGRADED
Nodes: 2/3 online
  node1: HEALTHY
  node2: UNREACHABLE (last seen 30s ago)  ◄── Detected!
  node3: HEALTHY

Active Alerts:
  [CRITICAL] Node node2 unreachable - rebuild initiated

Rebuild Progress:
  Data on node2: 3.2TB
  Rebuilt: 1.1TB (34%)
  ETA: 12 minutes

Implementation Hints:

Failure detection using Phi Accrual Detector:

struct phi_detector {
    double *heartbeat_intervals;  // Ring buffer of intervals
    int sample_count;
    double last_heartbeat;
};

double compute_phi(struct phi_detector *d, double now) {
    double interval = now - d->last_heartbeat;
    double mean = compute_mean(d->heartbeat_intervals, d->sample_count);
    double variance = compute_variance(d->heartbeat_intervals, d->sample_count);

    // Phi = -log10(1 - F(now - last_heartbeat))
    // F is CDF of normal distribution
    double phi = -log10(1.0 - normal_cdf(interval, mean, sqrt(variance)));
    return phi;
}

// Phi > 8 means "very likely failed"

Health check framework:

struct health_check {
    char *name;
    int (*check_fn)(struct node *);
    enum { CRITICAL, WARNING, INFO } severity;
    int interval_seconds;
};

struct health_check checks[] = {
    {"disk_space", check_disk_space, CRITICAL, 60},
    {"memory_usage", check_memory, WARNING, 30},
    {"cpu_usage", check_cpu, INFO, 10},
    {"network_latency", check_network, WARNING, 30},
    {"service_status", check_services, CRITICAL, 10},
};

Learning milestones:

Heartbeat detection works → You understand failure detection
Phi detector accurate → You understand adaptive thresholds
Automatic rebuild triggers → You understand remediation
Alert aggregation → You understand operational systems

Project 16: Unified Storage Controller (Mini-CVM)

File: LEARN_NUTANIX_HCI_FROM_SCRATCH.md
Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 5: Pure Magic
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 5: Master
Knowledge Area: All of the Above
Software or Tool: All previous projects combined
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A unified storage controller that combines all previous components: block devices, OpLog, extent store, iSCSI/NFS, replication, snapshots, dedup, and health monitoring—a mini Controller VM.

Why it teaches Nutanix architecture: This is the capstone. The CVM is the heart of Nutanix—it’s where everything comes together. Building your own (simplified) CVM means truly understanding HCI.

Core challenges you’ll face:

Integrating all subsystems → maps to CVM architecture
I/O path optimization → maps to CVM performance
Failure handling across layers → maps to CVM resilience
Resource management → maps to CVM scheduling

Key Concepts:

Software Architecture: “Fundamentals of Software Architecture” - Richards & Ford
I/O Path Design: All previous projects
Service Orchestration: How CVMs coordinate
State Machine Design: Managing controller state

Difficulty: Master Time estimate: 4-6 weeks Prerequisites: All previous projects (1-15)

Real world outcome:

$ # Start mini-CVM
$ ./mini_cvm --config cvm.yaml --cluster-id 1 &

$ ./mini_cvm status
Mini-CVM v0.1 (node-1)
  Cluster: 3 nodes (all healthy)

Storage Subsystems:
  [✓] Block Device Layer
  [✓] Extent Store (ext4-based)
  [✓] OpLog (1GB on SSD)
  [✓] Unified Cache (2GB)
  [✓] Replication Engine (RF2)
  [✓] Snapshot Manager
  [✓] Deduplication (inline)

Protocols:
  [✓] iSCSI Target (port 3260)
  [✓] NFS v3 Server (port 2049)

Cluster Services:
  [✓] Raft Consensus (leader)
  [✓] Metadata Store
  [✓] Health Monitor

$ # Create a datastore
$ ./mini_cvm datastore create prod-ds1 --size 100G --rf 2

$ # Export via NFS
$ ./mini_cvm datastore export prod-ds1 --protocol nfs --path /prod-ds1

$ # On a client
$ mount -t nfs cvm-vip:/prod-ds1 /mnt/datastore
$ ls /mnt/datastore
# Ready for VM storage!

$ # Show I/O path
$ ./mini_cvm trace-io --duration 10s
I/O Trace (10 seconds):
  Total I/Os: 15,234
  Read:  8,456 (55%)
    Cache hit:  6,789 (80%)
    Extent read: 1,667 (20%)
  Write: 6,778 (45%)
    OpLog:      6,778 (100%)
    Drain to extent: 2,134 (async)
  Avg latency: 0.3ms (cache), 1.2ms (SSD), 8.5ms (HDD)

Implementation Hints:

CVM architecture (simplified):

┌────────────────────────────────────────────────────────────────┐
│                        MINI-CVM                                 │
├────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐            │
│  │ iSCSI Server│  │ NFS Server  │  │ REST API    │  Protocols │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘            │
│         │                │                │                    │
│         └────────────────┼────────────────┘                    │
│                          ▼                                     │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │                   I/O DISPATCHER                          │ │
│  │  - Request routing                                        │ │
│  │  - QoS/prioritization                                     │ │
│  │  - Caching decisions                                      │ │
│  └───────────────────────┬──────────────────────────────────┘ │
│                          │                                     │
│         ┌────────────────┼────────────────┐                   │
│         ▼                ▼                ▼                   │
│  ┌────────────┐  ┌─────────────┐  ┌─────────────┐            │
│  │  UNIFIED   │  │   OPLOG     │  │  EXTENT     │            │
│  │  CACHE     │  │   (WAL)     │  │  STORE      │            │
│  └─────┬──────┘  └──────┬──────┘  └──────┬──────┘            │
│        │                │                │                    │
│        └────────────────┴────────────────┘                    │
│                         │                                      │
│  ┌──────────────────────▼──────────────────────────────────┐  │
│  │              REPLICATION ENGINE                          │  │
│  │  - Sync writes to remote CVMs                           │  │
│  │  - Failure detection & rebuild                          │  │
│  └──────────────────────┬──────────────────────────────────┘  │
│                         │                                      │
│  ┌──────────────────────▼──────────────────────────────────┐  │
│  │              CLUSTER SERVICES                            │  │
│  │  - Raft consensus                                       │  │
│  │  - Metadata store                                       │  │
│  │  - Health monitoring                                    │  │
│  └─────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

I/O path for a write:

Receive write request (iSCSI or NFS)
Check dedup index - if duplicate, increment refcount
Write to OpLog (local SSD)
Replicate OpLog entry to remote CVM(s)
Wait for quorum acknowledgment
Acknowledge to client
Background: drain OpLog to Extent Store
Update metadata (block location)

Integration checklist:

Learning milestones:

All subsystems integrated → You understand the full architecture
I/O works end-to-end → You understand the data path
Failures handled gracefully → You understand fault tolerance
Performance is reasonable → You understand optimization

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor	Nutanix Component
1. Block Device	Intermediate	Weekend	⭐⭐⭐	⭐⭐	Local Storage
2. Extent FS	Advanced	2-3 weeks	⭐⭐⭐⭐	⭐⭐⭐	Extent Store
3. OpLog (WAL)	Advanced	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐	OpLog
4. TCP Server	Intermediate	1 week	⭐⭐⭐	⭐⭐	Network Layer
5. iSCSI Target	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	DSF iSCSI
6. NFS Server	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	DSF NFS
7. Raft Consensus	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	Cluster Consensus
8. Distributed KV	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	Metadata Store
9. Replication	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	RF2/RF3
10. KVM Hypervisor	Master	2-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	AHV
11. Memory Balloon	Expert	2 weeks	⭐⭐⭐⭐	⭐⭐⭐	Memory Mgmt
12. Snapshots	Advanced	2 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐	Snapshots
13. Deduplication	Advanced	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐	Dedup
14. Erasure Coding	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	EC-X
15. Health Monitor	Advanced	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐	Prism Health
16. Mini-CVM	Master	4-6 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	CVM

Recommended Learning Path

Phase 1: Storage Fundamentals (4-6 weeks)

Project 1: Block Device - Start here. Understand how block storage works.
Project 2: Extent FS - Build on blocks to create a file system.
Project 3: OpLog - Learn write optimization with journaling.

Phase 2: Network Storage (6-8 weeks)

Project 4: TCP Server - Foundation for all network protocols.
Project 5: iSCSI Target - Block storage over network (critical for Nutanix).
Project 6: NFS Server - File storage over network.

Phase 3: Distributed Systems (6-8 weeks)

Project 7: Raft Consensus - How clusters agree on state.
Project 8: Distributed KV - Metadata storage foundation.
Project 9: Replication - Data protection via RF2/RF3.

Phase 4: Virtualization (4-6 weeks)

Project 10: KVM Hypervisor - Understand how VMs actually work.
Project 11: Memory Ballooning - VM resource management.

Phase 5: Storage Optimization (4-6 weeks)

Project 12: Snapshots - Copy-on-write for instant backups.
Project 13: Deduplication - Reduce storage consumption.
Project 14: Erasure Coding - Space-efficient fault tolerance.

Phase 6: Operations & Integration (6-8 weeks)

Project 15: Health Monitor - Cluster self-management.
Project 16: Mini-CVM - Put it all together!

Total estimated time: 6-12 months (depending on your pace)

Capstone: Full HCI Stack

After completing all projects, you’ll have built:

┌─────────────────────────────────────────────────────────────────────┐
│                     YOUR MINI-HCI PLATFORM                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  MANAGEMENT PLANE                                                   │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ Health Monitor │ Cluster State │ REST API │ CLI               │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  COMPUTE PLANE                                                      │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ KVM Hypervisor │ Memory Ballooning │ VM Management            │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  STORAGE PLANE (Your Mini-CVM × N nodes)                           │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐  │ │
│  │ │ iSCSI   │ │  NFS    │ │ OpLog   │ │ Extent  │ │  Cache  │  │ │
│  │ │ Target  │ │ Server  │ │  (WAL)  │ │  Store  │ │         │  │ │
│  │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘  │ │
│  │      │           │           │           │           │        │ │
│  │      └───────────┴───────────┼───────────┴───────────┘        │ │
│  │                              │                                 │ │
│  │  ┌──────────────────────────▼────────────────────────────┐   │ │
│  │  │ Replication │ Snapshots │ Dedup │ Erasure Coding      │   │ │
│  │  └──────────────────────────────────────────────────────┘   │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  CLUSTER PLANE                                                      │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ Raft Consensus │ Distributed KV │ Failure Detection          │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Congratulations! You now understand how Hyperconverged Infrastructure works from first principles. You’ve built every major component in C and understand the trade-offs, challenges, and design decisions that go into production systems like Nutanix.

Essential Resources

Books (In Order of Relevance)

“Designing Data-Intensive Applications” by Martin Kleppmann - The book for distributed systems
“Operating Systems: Three Easy Pieces” by Arpaci-Dusseau - Free online, excellent for storage
“The Linux Programming Interface” by Michael Kerrisk - Essential for systems programming
“Linux Device Drivers” by Corbet & Rubini - For kernel-level understanding
“Understanding the Linux Kernel” by Bovet & Cesati - Deep Linux internals
“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Foundational knowledge

Papers

The Nutanix Bible - Official Nutanix architecture guide
Raft Paper - Consensus algorithm
Google File System Paper - Distributed storage inspiration
HDFS Architecture - Open-source DFS

Online Resources

Summary

#	Project	Main Language
1	Block Device Emulator	C
2	Simple Extent-Based File System	C
3	Write-Ahead Log (OpLog)	C
4	TCP Server Framework	C
5	iSCSI Target Implementation	C
6	NFS Server (v3)	C
7	Raft Consensus Implementation	C
8	Distributed Key-Value Store	C
9	Data Replication Engine (RF2/RF3)	C
10	Simple Hypervisor with KVM	C
11	VM Memory Ballooning & Overcommit	C
12	Snapshot & Clone System	C
13	Deduplication Engine	C
14	Erasure Coding (EC-X)	C
15	Cluster Health Monitor	C
16	Unified Storage Controller (Mini-CVM)	C

Learn Nutanix & Hyperconverged Infrastructure (HCI) from Scratch

What is Nutanix/HCI?

The Nutanix Architecture

Key Components You’ll Learn to Build

Core Concept Analysis

1. Storage Layer Architecture

2. Distributed Storage Fabric (DSF)

3. The CVM (Controller VM)

4. Consensus & Metadata

5. Hypervisor Layer (AHV)

Project List

Project 1: Block Device Emulator

Project 2: Simple Extent-Based File System

Project 3: Write-Ahead Log (OpLog)

Project 4: TCP Server Framework

Project 5: iSCSI Target Implementation

Project 6: NFS Server (v3)

Project 7: Raft Consensus Implementation

Project 8: Distributed Key-Value Store

Project 9: Data Replication Engine (RF2/RF3)

Project 10: Simple Hypervisor with KVM

Project 11: VM Memory Ballooning & Overcommit

Project 12: Snapshot & Clone System

Project 13: Deduplication Engine

Project 14: Erasure Coding (EC-X)

Project 15: Cluster Health Monitor

Project 16: Unified Storage Controller (Mini-CVM)

Project Comparison Table

Recommended Learning Path

Phase 1: Storage Fundamentals (4-6 weeks)

Phase 2: Network Storage (6-8 weeks)

Phase 3: Distributed Systems (6-8 weeks)

Phase 4: Virtualization (4-6 weeks)

Phase 5: Storage Optimization (4-6 weeks)

Phase 6: Operations & Integration (6-8 weeks)

Capstone: Full HCI Stack

Essential Resources

Books (In Order of Relevance)

Papers

Online Resources

Summary

Sources