Project 7: The “Poor Man’s Docker” (Container Runtime)
Run a command in isolated PID and mount namespaces with a minimal rootfs.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 2 weeks |
| Language | Go or C (Alt: Rust, Python) |
| Prerequisites | Project 2, root access |
| Key Topics | namespaces, cgroups, chroot/pivot_root |
1. Learning Objectives
By completing this project, you will:
- Create a new PID namespace with
cloneorunshare. - Mount a new
/procfor isolated process views. - Set up a root filesystem with
chrootorpivot_root. - Apply basic resource limits with cgroups.
2. Theoretical Foundation
2.1 Core Concepts
- Namespaces: Isolate process trees, mounts, hostnames, and networks.
- PID 1 role: The first process in the namespace handles signals and reaping.
- Cgroups: Enforce resource limits for CPU and memory.
2.2 Why This Matters
Containers are just processes with restricted views of kernel resources. This project reveals the mechanics.
2.3 Historical Context / Background
Linux namespaces and cgroups were introduced to enable containerization and isolation.
2.4 Common Misconceptions
- “Containers are VMs”: They share the kernel.
- “chroot is a container”: Without namespaces, it is not isolated.
3. Project Specification
3.1 What You Will Build
A small runtime that spawns a command in new namespaces with a private root and /proc.
3.2 Functional Requirements
- Create PID and mount namespaces.
- Mount a fresh
/procinside the container. - Set hostname in UTS namespace.
- Run a command as PID 1 inside container.
3.3 Non-Functional Requirements
- Safety: Clean up mounts and processes on exit.
- Reliability: PID 1 should reap child processes.
- Usability: Simple CLI
run <cmd>.
3.4 Example Usage / Output
$ sudo ./mycontainer run /bin/bash
container# ps aux
PID COMMAND
1 /bin/bash
2 ps aux
3.5 Real World Outcome
You will enter a shell with an isolated PID tree and hostname:
$ sudo ./mycontainer run /bin/bash
container# hostname
container-host
container# ps aux
PID COMMAND
1 /bin/bash
2 ps aux
4. Solution Architecture
4.1 High-Level Design
parent -> clone new namespaces -> setup rootfs/proc -> exec command -> monitor PID 1
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Namespace setup | PID/UTS/MNT | Use clone flags |
| Rootfs setup | chroot/pivot_root | chroot first |
| Proc mount | /proc inside | mount(“proc”) |
| Cgroups | Resource limits | optional in v1 |
4.3 Data Structures
struct child_config { char **argv; char *rootfs; };
4.4 Algorithm Overview
Key Algorithm: Container Spawn
-
clone with CLONE_NEWPID CLONE_NEWNS CLONE_NEWUTS. - In child, set hostname and rootfs.
- Mount /proc, exec command.
Complexity Analysis:
- Time: O(1) per spawn
- Space: O(1)
5. Implementation Guide
5.1 Development Environment Setup
go version
# or gcc --version
5.2 Project Structure
project-root/
├── main.c
└── README.md
5.3 The Core Question You’re Answering
“What actually is a container, and how is it different from a VM?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- namespaces (man 7 namespaces)
- chroot vs pivot_root
- PID 1 behavior
5.5 Questions to Guide Your Design
Before implementing, think through these:
- How will PID 1 handle signals and reaping?
- Will you isolate networking or use host network?
- How will you clean up mounts on exit?
5.6 Thinking Exercise
The ps lie
Explain why ps shows host processes if you don’t remount /proc.
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “What is a namespace?”
- “Why is PID 1 special?”
- “How do cgroups differ from namespaces?”
5.8 Hints in Layers
Hint 1: Use unshare first
Experiment with unshare --fork --pid --mount-proc /bin/bash.
Hint 2: Set hostname
Use sethostname in UTS namespace.
Hint 3: Mount /proc Mount a fresh procfs inside the container root.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Namespaces | “TLPI” | namespaces section |
| Containers | “Container Security” | Ch. 2-3 |
5.10 Implementation Phases
Phase 1: Foundation (3-4 days)
Goals:
- Spawn a process in new PID namespace.
Tasks:
- clone/unshare with CLONE_NEWPID.
- Exec a command.
Checkpoint: ps shows only container processes.
Phase 2: Core Functionality (4-5 days)
Goals:
- Add rootfs and /proc mount.
Tasks:
- chroot into rootfs.
- mount proc.
Checkpoint: ps works inside container.
Phase 3: Polish & Edge Cases (3-4 days)
Goals:
- Add hostname and basic cgroups.
Tasks:
- Set hostname.
- Add memory/cpu limits.
Checkpoint: Limits enforced and cleanup works.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Root isolation | chroot vs pivot_root | chroot first | Simpler |
| Networking | host vs isolated | host first | Less setup |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Namespace | PID isolation | ps output |
| Rootfs | chroot works | ls / |
| Proc | proper mount | ps shows 2 processes |
6.2 Critical Test Cases
- PID 1 receives SIGTERM and exits.
- /proc mounted inside container.
- Host processes hidden from container.
6.3 Test Data
Command: /bin/bash
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| No /proc mount | ps broken | mount proc inside |
| PID 1 exits | Zombie children | reap in init loop |
| chroot only | host visible | use namespaces |
7.2 Debugging Strategies
- Start with
unshareCLI to validate. - Use
straceto confirm mount and clone calls.
7.3 Performance Traps
None significant for a minimal runtime.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a
--hostnameflag. - Add simple logging.
8.2 Intermediate Extensions
- Add network namespace with veth pair.
- Add overlayfs rootfs.
8.3 Advanced Extensions
- Add image format and unpacking.
- Implement minimal OCI runtime spec.
9. Real-World Connections
9.1 Industry Applications
- Container runtimes and sandboxing tooling.
9.2 Related Open Source Projects
- runc: https://github.com/opencontainers/runc
- nsjail: https://github.com/google/nsjail
9.3 Interview Relevance
- Namespaces, cgroups, and PID 1 behavior are common interview topics.
10. Resources
10.1 Essential Reading
- namespaces(7) -
man 7 namespaces - clone(2) -
man 2 clone
10.2 Video Resources
- Container internals talks (search “Linux namespaces cgroups”)
10.3 Tools & Documentation
- unshare(1) -
man 1 unshare
10.4 Related Projects in This Series
- Build Your Own Shell: command execution inside containers.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain namespace types.
- I can explain PID 1 responsibilities.
- I can describe cgroup limits.
11.2 Implementation
- Container runs with isolated PID tree.
- /proc is correctly mounted.
- Hostname is set inside container.
11.3 Growth
- I can extend to network isolation.
- I can explain differences vs VMs.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Run a command in a PID namespace with a new /proc.
Full Completion:
- Add rootfs isolation and hostname change.
Excellence (Going Above & Beyond):
- Add cgroup limits and network namespace.
This guide was generated from LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md. For the complete learning path, see the parent directory.