Project 5: Build Your Own Container From Scratch
Create a minimal container runtime using Linux namespaces, cgroups, and filesystem isolation.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C (Alternatives: Go, Rust) |
| Alternative Programming Languages | Go, Rust |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | Infrastructure tooling |
| Prerequisites | Linux process model, syscalls, filesystem basics |
| Key Topics | Namespaces, cgroups, chroot/pivot_root, capabilities |
1. Learning Objectives
By completing this project, you will:
- Create isolated processes using Linux namespaces.
- Apply cgroups to limit CPU and memory.
- Build a root filesystem for containers.
- Understand capabilities and privilege dropping.
- Explain the difference between containers and VMs.
- Run a deterministic container demo.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Linux Namespaces
Description / Expanded Explanation
Namespaces provide isolated views of system resources such as PID, network, mount, and user IDs. A container is essentially a process running inside a set of namespaces.
Definitions & Key Terms
- PID namespace -> isolated process IDs
- mount namespace -> isolated filesystem mounts
- network namespace -> isolated network stack
- user namespace -> map user IDs
Mental Model Diagram (ASCII)
host PID 1
|- container init (PID 1 inside namespace)
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Call
cloneorunsharewith namespace flags. - Child process gets isolated view (PID 1 inside container).
- Processes outside cannot see container processes by PID.
Minimal Concrete Example
clone(child_fn, stack, CLONE_NEWPID | CLONE_NEWNS, NULL);
Common Misconceptions
- “Namespaces provide security alone” -> they isolate but do not enforce limits.
- “PID 1 behaves like normal” -> PID 1 must reap zombies.
Check-Your-Understanding Questions
- Why does a container need a PID namespace?
- What happens if the container init process dies?
- How does a mount namespace affect file visibility?
Where You’ll Apply It
- See 3.1 for container creation.
- See 4.1 for architecture.
- Also used in: P08 TCP Socket Server
2.2 Cgroups and Resource Limits
Description / Expanded Explanation
Cgroups limit and account for resource usage. They control CPU shares, memory limits, and process counts to prevent containers from consuming all host resources.
Definitions & Key Terms
- cgroup -> kernel feature to limit resources
- cpu.shares -> relative CPU allocation
- memory.max -> memory limit
- pids.max -> process count limit
Mental Model Diagram (ASCII)
cgroup: /mycontainer
cpu.shares=256
memory.max=128M
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Create a cgroup directory.
- Write limits to control files.
- Add the container process to the cgroup.
- Kernel enforces limits.
Minimal Concrete Example
echo 134217728 > memory.max
Common Misconceptions
- “Cgroups are only for CPU” -> they cover multiple resources.
- “Limits are advisory” -> kernel enforces them strictly.
Check-Your-Understanding Questions
- What happens when a container exceeds memory.max?
- How are CPU shares enforced across containers?
- What is the difference between cgroup v1 and v2?
Where You’ll Apply It
- See 3.2 for functional requirements.
- See 5.10 Phase 2 for implementation.
- Also used in: P02 Load Balancer for resource isolation
2.3 Filesystem Isolation (chroot and pivot_root)
Description / Expanded Explanation
Containers need a root filesystem that looks like a full OS. chroot and pivot_root change the apparent root for a process, isolating file access.
Definitions & Key Terms
- chroot -> change root directory for a process
- pivot_root -> switch root filesystem and unmount old root
- bind mount -> mount directory to another location
- overlayfs -> layered filesystem for copy-on-write
Mental Model Diagram (ASCII)
/ (host)
/containers/rootfs -> new /
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Prepare rootfs directory with /bin, /lib, etc.
- Bind mount necessary files.
- Call
pivot_rootto switch to new root. - Unmount old root to prevent escape.
Minimal Concrete Example
pivot_root(new_root, put_old)
Common Misconceptions
- “chroot is secure” -> without namespaces it is not.
- “rootfs can be empty” -> you need binaries and libs.
Check-Your-Understanding Questions
- Why is pivot_root safer than chroot?
- What files must exist for a minimal shell container?
- How do bind mounts help share host directories?
Where You’ll Apply It
- See 3.1 for container setup.
- See 5.10 Phase 1 for rootfs creation.
- Also used in: P10 Mini Git Object Store
2.4 Capabilities and Seccomp
Description / Expanded Explanation
Linux capabilities split root privileges into smaller units. Seccomp filters system calls. Together they reduce the attack surface of a container.
Definitions & Key Terms
- capabilities -> fine-grained privileges
- seccomp -> system call filtering
- drop privileges -> remove unneeded capabilities
Mental Model Diagram (ASCII)
root -> [cap_net_bind, cap_sys_admin] -> drop most
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Start as root in a user namespace.
- Drop capabilities not required.
- Install seccomp filter to allow only safe syscalls.
Minimal Concrete Example
prctl(PR_CAPBSET_DROP, CAP_SYS_ADMIN)
Common Misconceptions
- “Running as root inside container is safe” -> it can still be dangerous.
- “Seccomp is too complex” -> even a small allowlist helps.
Check-Your-Understanding Questions
- Why drop CAP_SYS_ADMIN?
- What syscalls are needed for a shell container?
- How does seccomp prevent container escape?
Where You’ll Apply It
- See 3.2 for security requirements.
- See 5.10 Phase 3 for hardening.
- Also used in: P01 Memory Allocator for debug safety
2.5 Init Process and Process Lifecycle
Description / Expanded Explanation
The first process inside a PID namespace is PID 1. It must reap zombie processes and handle signals correctly. Container runtimes typically run a small init to manage this.
Definitions & Key Terms
- PID 1 -> first process in namespace
- zombie -> process that has exited but not reaped
- signal -> asynchronous notification
Mental Model Diagram (ASCII)
PID 1 (init)
|- child process
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Container starts with PID 1 init.
- Init forks child command.
- Init waits and reaps child exit.
- Signals are forwarded to the child.
Minimal Concrete Example
while ((pid = waitpid(-1, &status, 0)) > 0) { }
Common Misconceptions
- “PID 1 behaves like any process” -> it ignores some signals by default.
- “Zombies are harmless” -> too many can exhaust PID space.
Check-Your-Understanding Questions
- Why must PID 1 reap zombies?
- What happens if PID 1 exits?
- How do you forward SIGTERM to the child process?
Where You’ll Apply It
- See 4.2 for component responsibilities.
- See 5.10 Phase 1 for init implementation.
- Also used in: P08 TCP Socket Server
3. Project Specification
3.1 What You Will Build
A minimal container runtime that can run a command in isolated namespaces with a dedicated root filesystem and resource limits.
3.2 Functional Requirements
- Create PID, mount, and UTS namespaces.
- Set up root filesystem via pivot_root.
- Apply CPU and memory cgroup limits.
- Run a user-specified command inside container.
- Provide basic logging and deterministic demo.
3.3 Non-Functional Requirements
- Performance: container startup under 200 ms.
- Reliability: correct cleanup of cgroups and mounts.
- Security: drop capabilities.
3.4 Example Usage / Output
$ ./minict run --rootfs ./rootfs -- /bin/sh
/ # ps
PID 1 sh
3.5 Data Formats / Schemas / Protocols
Config file:
rootfs: ./rootfs
cgroups:
memory: 128M
cpu_shares: 256
3.6 Edge Cases
- Missing binaries in rootfs.
- Exceeding memory limit.
- Cleanup on failure.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
./minict run --rootfs ./rootfs -- /bin/sh
3.7.2 Golden Path Demo (Deterministic)
Use a fixed rootfs and fixed command sequence.
3.7.3 CLI Transcript (Success)
$ ./minict run --rootfs ./rootfs -- /bin/echo hello
hello
$ echo $?
0
3.7.3 CLI Transcript (Failure)
$ ./minict run --rootfs ./rootfs -- /bin/missing
minict: exec failed: /bin/missing
$ echo $?
127
3.7.4 Exit Codes
0success1invalid config2namespace setup failed127exec failed
4. Solution Architecture
4.1 High-Level Design
cli -> container setup -> namespaces -> cgroups -> exec cmd
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Namespace manager | create namespaces | clone/unshare | | Rootfs manager | pivot_root, mounts | bind mounts | | Cgroup manager | apply limits | v2 unified | | Init process | PID 1 reaping | waitpid loop |
4.3 Data Structures (No Full Code)
struct container_config {
char *rootfs;
size_t memory_limit;
int cpu_shares;
};
4.4 Algorithm Overview
- Parse config and validate rootfs.
- Create namespaces and fork child.
- Set up rootfs and mounts.
- Apply cgroups and drop capabilities.
- Exec target command.
Complexity Analysis
- Time: O(1) setup per container
- Space: O(1) per container metadata
5. Implementation Guide
5.1 Development Environment Setup
make
5.2 Project Structure
project-root/
├── src/ns.c
├── src/cgroups.c
├── src/rootfs.c
├── src/init.c
└── tests/
5.3 The Core Question You’re Answering
“What is a container really, and how does Linux isolate it?”
5.4 Concepts You Must Understand First
- Namespace types and clone flags
- cgroup v2 layout
- pivot_root and mounts
- capabilities and seccomp basics
5.5 Questions to Guide Your Design
- Which namespaces are strictly required for isolation?
- How do you ensure cleanup on failure?
- What is the minimal rootfs content?
5.6 Thinking Exercise
Draw the process tree from host PID to container PID 1 to child process.
5.7 The Interview Questions They’ll Ask
- How do containers differ from VMs?
- What is the role of cgroups?
- Why is PID 1 special in a container?
5.8 Hints in Layers
Hint 1: Start with a PID namespace and echo a command. Hint 2: Add mount namespace and chroot. Hint 3: Add cgroups last.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Linux internals | The Linux Programming Interface | Ch. 41-44 | | Containers | Linux Container Internals | Sections on namespaces |
5.10 Implementation Phases
Phase 1: Foundation (4-6 days)
Namespaces and init process.
Phase 2: Core Functionality (5-7 days)
Rootfs and cgroups.
Phase 3: Polish and Edge Cases (4-6 days)
Capabilities, cleanup, logging.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Namespace setup | clone vs unshare | clone | simple control | | Rootfs | chroot vs pivot_root | pivot_root | more secure | | Cgroups | v1 vs v2 | v2 | unified hierarchy |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | config parsing | invalid rootfs | | Integration Tests | isolation | ps inside container | | Resource Tests | limits | memory OOM |
6.2 Critical Test Cases
- Container sees PID 1 for its own init.
- Memory limit triggers OOM kill in container.
- Host files not accessible from container rootfs.
6.3 Test Data
commands: /bin/echo, /bin/sh, /usr/bin/yes
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | Missing mounts | commands not found | add /proc and /dev mounts | | No zombie reaping | zombie buildup | implement init loop | | cgroup cleanup | leaked directories | cleanup on exit |
7.2 Debugging Strategies
- Use
straceto inspect clone and mount syscalls. - Use
lsnsto list namespaces.
7.3 Performance Traps
- Excessive bind mounts slow startup.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add hostname isolation (UTS namespace).
- Add basic logging.
8.2 Intermediate Extensions
- Add network namespace and veth pair.
- Add seccomp filter.
8.3 Advanced Extensions
- Implement layered rootfs with overlayfs.
- Add container image import.
9. Real-World Connections
9.1 Industry Applications
- Docker, containerd, and Kubernetes runtimes.
9.2 Related Open Source Projects
- runc
- containerd
9.3 Interview Relevance
- Demonstrates OS isolation and security concepts.
10. Resources
10.1 Essential Reading
- The Linux Programming Interface (Namespaces and cgroups)
- Linux Container Internals
10.2 Video Resources
- Container internals talks
10.3 Tools & Documentation
strace,lsns,cgget
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain how namespaces isolate processes.
- I can explain cgroup resource limits.
- I can explain why PID 1 needs reaping.
11.2 Implementation
- Container runs command successfully.
- Resource limits are enforced.
- Rootfs isolation works.
11.3 Growth
- I can explain container internals in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Runs command in new namespaces.
- Rootfs isolation works.
Full Completion:
- Cgroups and capabilities implemented.
- Deterministic demo passes.
Excellence (Going Above & Beyond):
- Network namespace and overlayfs support.
- Image import and snapshotting.