Project 10: The Poor Man’s Docker (Container Runtime)

Build a minimal container runtime that isolates PID and mount namespaces and applies cgroup limits.

Quick Reference

Attribute	Value
Difficulty	Expert
Time Estimate	2 weeks
Main Programming Language	Go or C (Alternatives: Rust, Python)
Alternative Programming Languages	Rust, Python
Coolness Level	See REFERENCE.md (Level 5)
Business Potential	See REFERENCE.md (Level 4)
Prerequisites	Process control, mounts, cgroups, root access
Key Topics	namespaces, pivot_root, cgroups, PID 1

1. Learning Objectives

By completing this project, you will:

Explain how namespaces provide isolated views of system resources.
Start a process as PID 1 inside a new PID namespace.
Mount a new root filesystem and /proc inside the container.
Apply resource limits using cgroups.

2. All Theory Needed (Per-Concept Breakdown)

Namespaces and Container Assembly

Fundamentals Namespaces isolate a process’s view of global system resources like PIDs, mounts, and hostnames. Cgroups limit resource usage. A container is a process launched in new namespaces with limits and a dedicated root filesystem. The kernel is shared, but the process sees a different world. Understanding these primitives reveals that containers are not magic; they are controlled views and budgets built from standard kernel features.

Deep Dive A container runtime is a program that orchestrates a precise sequence of syscalls to set up isolation. The first step is to create a process in new namespaces. PID namespaces provide an isolated process ID tree where the first process becomes PID 1 inside the container. Mount namespaces provide a separate mount table so the container can have its own filesystem layout. UTS namespaces isolate hostname and domain name. Network namespaces, if used, isolate network interfaces. The runtime can create these namespaces using clone or unshare and then configure them before launching the target program.

The root filesystem is a critical piece. The runtime prepares a directory tree that contains the binaries and libraries needed by the container. It then switches the root to this tree using a root pivot operation. This ensures that the process sees the new filesystem as /. A common mistake is to use only chroot, which changes the root directory but does not isolate mount points or prevent access to file descriptors that reference the old root. pivot_root is the more complete approach because it swaps the old root and new root, allowing the old root to be unmounted.

The /proc filesystem must be mounted inside the container’s mount namespace. Without this, tools like ps will show host processes or fail entirely. Mounting procfs inside the namespace ensures that process introspection reflects the container’s PID namespace. This is a key insight: /proc reflects the namespace in which it is mounted, so it must be remounted after the namespace is created.

Cgroups enforce resource limits. A container without cgroups is not a safe isolation boundary because it can still consume unlimited CPU or memory. The runtime should create a cgroup, set limits, and attach the container process to it. This ensures that the kernel enforces resource budgets. For deterministic demos, use fixed limits and a controlled workload.

PID 1 behavior is special. In a PID namespace, the first process has PID 1 and is responsible for reaping zombies and handling signals correctly. Many subtle container bugs appear because PID 1 behaves differently from normal processes: it ignores some signals by default and has special reaping behavior. A minimal runtime should be aware of this and either run a simple init process or implement basic reaping itself.

Container assembly is therefore an orchestration of isolation (namespaces), filesystem setup (pivot_root and mounts), and resource control (cgroups). Each step has a clear contract and observable outcome, which is why this project is a deep test of your systems understanding.

How this fit on projects You will apply this concept in §3.1 for container requirements, in §4.1 for runtime architecture, and in §5.10 for the implementation phases. It builds directly on P09-cgroup-resource-governor.md.

Definitions & key terms

Namespace: Kernel feature that isolates a resource view.
PID 1: First process in a PID namespace.
pivot_root: Switches the root filesystem.
Mount namespace: Isolates mount table and filesystem view.
Container: Process with isolated namespaces and limits.

Mental model diagram

Host kernel
  |
  +-- Namespace set -> process (PID 1)
  |        |
  |        +-- mount /proc
  |        +-- new root filesystem

How it works

Create new namespaces for PID and mount.
Set up root filesystem and pivot.
Mount procfs inside namespace.
Apply cgroup limits.
Launch target program as PID 1.

Minimal concrete example

Inside container:
PID 1 -> /bin/sh
hostname -> sandbox
ps -> shows only container processes

Common misconceptions

“Containers are lightweight VMs.” They share the host kernel.
“chroot equals container.” chroot does not isolate mounts or PIDs.
“cgroups are optional.” They are required for resource isolation.

Check-your-understanding questions

Why must /proc be mounted inside the namespace?
What is special about PID 1?
Why is pivot_root safer than chroot?
How do cgroups and namespaces complement each other?

Check-your-understanding answers

/proc reflects the PID namespace where it is mounted.
PID 1 must reap zombies and handles signals differently.
pivot_root swaps roots and allows old root to be unmounted.
Namespaces isolate views; cgroups enforce resource limits.

Real-world applications

Container runtimes and orchestrators.
Sandbox environments for untrusted code.
Lightweight isolation in CI pipelines.

Where you’ll apply it

See §3.1 What You Will Build and §4.2 Key Components.
Also used in: P09-cgroup-resource-governor.md

References

namespaces(7) man page: https://man7.org/linux/man-pages/man7/namespaces.7.html
cgroup v2 docs: https://docs.kernel.org/admin-guide/cgroup-v2.html

Key insights Containers are composed, not invented: namespaces plus cgroups plus mounts.

Summary If you can build a minimal container runtime, you understand the core of Docker.

Homework/Exercises to practice the concept

Identify which namespaces are required for a minimal container.
Explain why ps fails without a /proc mount.

Solutions to the homework/exercises

PID and mount namespaces are the minimum for basic isolation.
/proc must be mounted inside the namespace to show container processes.

3. Project Specification

3.1 What You Will Build

A minimal container runtime that launches a command in new PID and mount namespaces, sets a hostname, mounts a new root filesystem, mounts /proc, and applies cgroup limits.

3.2 Functional Requirements

Namespace isolation: PID and mount namespaces at minimum.
Filesystem isolation: new root filesystem and /proc mount.
Resource limits: CPU and memory limits via cgroups.

3.3 Non-Functional Requirements

Performance: container startup in under 1 second.
Reliability: cleanup of mounts and cgroups.
Usability: clear CLI and error messages.

3.4 Example Usage / Output

$ sudo ./mycontainer run /bin/sh
container# hostname
sandbox
container# ps
PID  USER  CMD
1    root  /bin/sh
2    root  ps

3.5 Data Formats / Schemas / Protocols

Root filesystem layout: /bin, /lib, /proc, /tmp.
CLI syntax: mycontainer run <command>.

3.6 Edge Cases

Missing root filesystem.
Failure to mount /proc.
Lack of privileges for namespaces or cgroups.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

Run as root or with sufficient capabilities.
./mycontainer run /bin/sh in project root.

3.7.2 Golden Path Demo (Deterministic)

Use a fixed root filesystem and hostname.

3.7.3 If CLI: Exact terminal transcript

$ sudo ./mycontainer run /bin/sh
container# hostname
sandbox
container# ps
PID  USER  CMD
1    root  /bin/sh
2    root  ps
container# exit
# exit code: 0

Failure demo (deterministic):

$ sudo ./mycontainer run /bin/sh
error: missing root filesystem
# exit code: 2

Exit codes:

0 success
2 missing root filesystem

4. Solution Architecture

4.1 High-Level Design

setup namespaces -> pivot root -> mount /proc -> apply cgroups -> exec

4.2 Key Components

Component	Responsibility	Key Decisions
Namespace setup	Create PID and mount namespaces	Use clone/unshare
Rootfs setup	Prepare and switch root	pivot_root preferred
Cgroup manager	Apply CPU/memory limits	Reuse Project 9 logic

4.4 Data Structures (No Full Code)

Namespace config: flags for PID, mount, UTS.
Rootfs config: path, required directories.

4.4 Algorithm Overview

Key Algorithm: container launch

Create new namespaces.
Set hostname and mount root filesystem.
Mount /proc inside namespace.
Apply cgroup limits.
Exec target program as PID 1.

Complexity Analysis:

Time: O(1) setup plus program runtime.
Space: O(1) extra memory.

5. Implementation Guide

5.1 Development Environment Setup

# Run on a Linux system with root privileges

5.2 Project Structure

project-root/
├── src/
│   ├── container.c
│   ├── namespaces.c
│   └── cgroups.c
├── rootfs/
│   ├── bin/
│   └── lib/
└── README.md

5.3 The Core Question You’re Answering

“What is a container in kernel terms?”

5.4 Concepts You Must Understand First

Namespaces
- Which namespaces are required for isolation?
- Book Reference: “The Linux Programming Interface” - namespaces sections
Root filesystem setup
- Why pivot_root is safer than chroot.
- Book Reference: Linux kernel documentation

5.5 Questions to Guide Your Design

How will you ensure PID 1 reaps children?
How will you clean up mounts on exit?

5.6 Thinking Exercise

The /proc Trap

Explain what ps shows if /proc is not remounted inside the container.

5.7 The Interview Questions They’ll Ask

“How do namespaces differ from VMs?”
“What is PID 1 responsible for?”
“Why is pivot_root used?”
“How do cgroups enforce limits?”
“How does Docker isolate processes?”

5.8 Hints in Layers

Hint 1: Start with unshare Experiment with unshare in the shell to understand behavior.

Hint 2: Minimal rootfs Use a tiny rootfs with just a shell and libraries.

Hint 3: Mount /proc After entering the namespace, mount procfs.

Hint 4: Debugging Compare ps inside and outside the container.

5.9 Books That Will Help

Topic	Book	Chapter
Namespaces	“The Linux Programming Interface”	namespaces sections
Containers	“Container Security” by Liz Rice	Ch. 2-3

5.10 Implementation Phases

Phase 1: Foundation (3 days)

Goals:

Create PID and mount namespaces.

Tasks:

Launch a child with new namespaces.
Set hostname inside namespace.

Checkpoint: hostname differs inside container.

Phase 2: Core Functionality (4 days)

Goals:

Set up rootfs and /proc.

Tasks:

Prepare rootfs directory.
pivot_root into new root.
Mount /proc.

Checkpoint: ps shows only container processes.

Phase 3: Polish & Edge Cases (3 days)

Goals:

Add cgroup limits and cleanup.

Tasks:

Apply CPU/memory limits.
Clean up mounts on exit.

Checkpoint: cgroup limits are enforced and cleanup is reliable.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Root switch	chroot vs pivot_root	pivot_root	stronger isolation
Namespace set	minimal vs full	PID + mount + UTS	focused scope

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Config parsing	validate CLI args
Integration Tests	Full container run	/bin/sh
Edge Case Tests	Missing rootfs	error output

6.2 Critical Test Cases

Isolation: ps shows only container processes.
Rootfs: files outside rootfs are inaccessible.
Limits: CPU/memory caps applied.

6.3 Test Data

Fixed rootfs tree with /bin/sh

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
No /proc mount	ps shows host	mount procfs inside
chroot only	host mounts visible	use pivot_root
No cleanup	stale mounts	unmount on exit

7.2 Debugging Strategies

Compare namespaces: inspect /proc/self/ns inside and outside.
Verbose logs: print each setup step.

7.3 Performance Traps

Mount operations are fast; the main cost is process startup and I/O in the rootfs.

8. Extensions & Challenges

8.1 Beginner Extensions

Add UTS namespace for hostname.
Add environment variable injection.

8.2 Intermediate Extensions

Add network namespace with veth pair.
Add read-only rootfs option.

8.3 Advanced Extensions

Implement OCI bundle support.
Add seccomp syscall filtering.

9. Real-World Connections

9.1 Industry Applications

Container runtimes: runc and containerd.
Sandboxing: isolate untrusted code.

runc: https://github.com/opencontainers/runc - reference runtime.
containerd: https://github.com/containerd/containerd - container manager.

9.3 Interview Relevance

Containers and namespaces are standard systems interview topics.

10. Resources

10.1 Essential Reading

namespaces(7) man page
cgroup v2 documentation

10.2 Video Resources

“Containers from scratch” talks (search title)

10.3 Tools & Documentation

namespaces: https://man7.org/linux/man-pages/man7/namespaces.7.html
cgroup v2: https://docs.kernel.org/admin-guide/cgroup-v2.html

11. Self-Assessment Checklist

11.1 Understanding

I can explain PID and mount namespaces
I can explain pivot_root vs chroot
I understand why /proc must be remounted

11.2 Implementation

All functional requirements are met
Resource limits are applied
Cleanup is reliable

11.3 Growth

I can explain this project in an interview
I documented lessons learned
I can propose an extension

12. Submission / Completion Criteria

Minimum Viable Completion:

Run a process in PID + mount namespaces
Mount /proc inside container
Demonstrate isolation

Full Completion:

All minimum criteria plus:
Apply cgroup limits
Failure demo with exit code

Excellence (Going Above & Beyond):

OCI bundle support
Seccomp filtering

Project 10: The Poor Man’s Docker (Container Runtime)

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

Namespaces and Container Assembly

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 If CLI: Exact terminal transcript

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.4 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (3 days)

Phase 2: Core Functionality (4 days)

Phase 3: Polish & Edge Cases (3 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria