Project 9: Cgroup Resource Governor

Build a tool that runs a process inside a cgroup with CPU and memory limits and reports usage.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1 week
Main Programming Language C or Python (Alternatives: Go, Rust)
Alternative Programming Languages Go, Rust
Coolness Level See REFERENCE.md (Level 4)
Business Potential See REFERENCE.md (Level 4)
Prerequisites Process management, filesystem basics
Key Topics cgroup v2, CPU limits, memory limits

1. Learning Objectives

By completing this project, you will:

  1. Explain how cgroup v2 enforces resource limits.
  2. Configure CPU and memory limits for a process.
  3. Read cgroup statistics and interpret them.
  4. Build deterministic limit tests with clear exit codes.

2. All Theory Needed (Per-Concept Breakdown)

Cgroup v2 Resource Control

Fundamentals Cgroups are kernel mechanisms for grouping processes and controlling resource usage. In cgroup v2, all controllers are managed under a single unified hierarchy. Each cgroup can set limits for CPU time, memory usage, and other resources. Processes are assigned to cgroups by writing their PIDs to control files. The kernel enforces limits by throttling CPU time or reclaiming memory, and it exposes usage statistics in readable files. Understanding this interface is essential for container runtimes and system resource management.

Deep Dive The cgroup v2 hierarchy is a tree rooted at the cgroup filesystem mount. Each node in the tree can have resource controllers enabled. The CPU controller uses a bandwidth model: a period and a quota define how much CPU time a cgroup can use in a given interval. For example, a quota of 20 ms in a 100 ms period corresponds to 20 percent CPU. The kernel enforces this by throttling tasks in the cgroup once they exceed their quota, which can affect latency and throughput.

The memory controller enforces a maximum usage limit. When a process in the cgroup exceeds memory.max, the kernel must reclaim memory. If it cannot reclaim enough, it triggers the OOM killer within that cgroup. This is important: cgroup OOM behavior is scoped, so the system can kill only processes inside the limited group rather than the whole system. This makes cgroups the foundation for safe multi-tenant environments.

Cgroup files are plain-text interfaces. You create a cgroup by creating a directory, enable controllers in the parent, and then write configuration values into files like cpu.max and memory.max. To attach a process, you write its PID into cgroup.procs. The kernel immediately enforces limits. Metrics such as cpu.stat and memory.current show usage, and you can poll these to build a monitoring report.

A key challenge is delegation and permissions. To manage cgroups, a process usually needs elevated privileges, unless the system has delegated a subtree. This is why systemd manages cgroups for services and why container runtimes coordinate with systemd. Your tool should either require root or explain the delegation requirement clearly. It should also clean up cgroup directories to avoid clutter.

Another subtlety is burst behavior. CPU limits are enforced over a period, so a process can use more than its average limit briefly, then be throttled. This can lead to surprising short-term performance spikes followed by pauses. Your measurements should therefore be taken over long enough intervals to observe steady-state behavior.

How this fit on projects You will apply this concept in §3.1 to define limit behavior, in §4.2 to design the cgroup controller interface, and in §6.2 for test cases. It also supports P10-container-runtime.md, where cgroups enforce container limits.

Definitions & key terms

  • Cgroup: Control group for resource accounting and limits.
  • Controller: Subsystem that enforces a resource type.
  • cpu.max: cgroup v2 CPU quota and period control.
  • memory.max: cgroup v2 memory limit.
  • cgroup.procs: file to attach PIDs to a cgroup.

Mental model diagram

/cgroup
  /serviceA (cpu.max, memory.max)
    |-- PID 1200
  /serviceB

How it works

  1. Create a cgroup directory.
  2. Enable controllers in parent.
  3. Write limits to control files.
  4. Attach process PID.
  5. Read stats and report.

Minimal concrete example

cpu.max: 20000 100000  (20% CPU)
memory.max: 209715200  (200 MB)

Common misconceptions

  • “cgroup limits are best-effort.” They are enforced by the kernel.
  • “Memory limits affect the whole system.” They are scoped to the cgroup.
  • “CPU limits are instantaneous.” They are enforced over a period.

Check-your-understanding questions

  1. Why is cgroup v2 called a unified hierarchy?
  2. How does the kernel enforce a CPU quota?
  3. What happens when memory.max is exceeded?
  4. Why are cgroups essential for containers?

Check-your-understanding answers

  1. All controllers share a single tree of cgroups.
  2. The kernel throttles tasks when quota is used.
  3. The cgroup may trigger OOM within its scope.
  4. They enforce resource limits for isolated workloads.

Real-world applications

  • Service resource isolation in systemd.
  • Container runtime resource limits.
  • Multi-tenant server safety.

Where you’ll apply it

References

  • cgroups(7) man page: https://man7.org/linux/man-pages/man7/cgroups.7.html
  • cgroup v2 docs: https://docs.kernel.org/admin-guide/cgroup-v2.html

Key insights Cgroups convert resource sharing from best-effort to enforced policy.

Summary If you can set and observe cgroup limits, you understand Linux resource control.

Homework/Exercises to practice the concept

  1. Identify which cgroup controllers are enabled on your system.
  2. Explain how CPU quota maps to percentage.

Solutions to the homework/exercises

  1. Inspect controller files in the cgroup root.
  2. Quota divided by period yields the percentage.

3. Project Specification

3.1 What You Will Build

A tool that creates a cgroup, applies CPU and memory limits, runs a command inside the cgroup, and prints usage stats when it finishes.

3.2 Functional Requirements

  1. Create cgroup in a specified path.
  2. Apply limits for CPU and memory.
  3. Run command inside the cgroup and collect stats.

3.3 Non-Functional Requirements

  • Performance: minimal overhead for small commands.
  • Reliability: clean up cgroup after run.
  • Usability: clear output and error codes.

3.4 Example Usage / Output

$ ./cg-run --cpu=20% --mem=200M ./stress
cpu.max = 20000 100000
memory.max = 209715200
cpu.usage_usec = 512345
memory.current = 154320128

3.5 Data Formats / Schemas / Protocols

  • CPU limit format: quota + period.
  • Memory limit format: bytes.

3.6 Edge Cases

  • cgroup v2 not mounted.
  • Insufficient permissions.
  • Command fails to start.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

  • Run as root or with delegated cgroup permissions.
  • ./cg-run --cpu=20% --mem=200M ./stress.

3.7.2 Golden Path Demo (Deterministic)

Use a fixed CPU and memory limit with a deterministic workload.

3.7.3 If CLI: Exact terminal transcript

$ sudo ./cg-run --cpu=20% --mem=200M ./busyloop
cpu.max = 20000 100000
memory.max = 209715200
cpu.stat: usage_usec=512345
memory.current=154320128
# exit code: 0

Failure demo (deterministic):

$ ./cg-run --cpu=20% --mem=200M ./busyloop
error: permission denied
# exit code: 13

Exit codes:

  • 0 success
  • 13 permission denied

4. Solution Architecture

4.1 High-Level Design

Create cgroup -> set limits -> attach process -> run -> read stats -> cleanup

4.2 Key Components

Component Responsibility Key Decisions
Cgroup manager Create/cleanup cgroup Use v2 hierarchy
Limit writer Configure cpu.max, memory.max Validate units
Runner Launch command inside cgroup Attach PID before exec

4.4 Data Structures (No Full Code)

  • Limit config: cpu_quota, cpu_period, memory_bytes.
  • Stat snapshot: cpu usage, memory current.

4.4 Algorithm Overview

Key Algorithm: apply limits

  1. Create cgroup dir.
  2. Enable controllers.
  3. Write limits.
  4. Launch process and attach PID.
  5. Read stats and cleanup.

Complexity Analysis:

  • Time: O(1) setup plus process runtime.
  • Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

# Ensure cgroup v2 is mounted and accessible

5.2 Project Structure

project-root/
├── src/
│   ├── cg_run.c
│   └── limits.c
├── tests/
│   └── cgroup_tests.sh
└── README.md

5.3 The Core Question You’re Answering

“How does Linux enforce resource limits in practice?”

5.4 Concepts You Must Understand First

  1. cgroup v2 hierarchy
    • How controllers are enabled.
    • Book Reference: “The Linux Programming Interface” - resource control
  2. CPU quota math
    • How to translate percent into quota/period.
    • Book Reference: “Operating Systems: Three Easy Pieces” - scheduling

5.5 Questions to Guide Your Design

  1. When should you attach the PID to the cgroup?
  2. How will you handle cleanup on failure?

5.6 Thinking Exercise

Quota Calculation

Compute cpu.max values for 10% and 50% limits with a 100ms period.

5.7 The Interview Questions They’ll Ask

  1. “What is cgroup v2 and how is it different from v1?”
  2. “How does CPU throttling work?”
  3. “What happens on memory limit violations?”
  4. “Why are cgroups essential for containers?”

5.8 Hints in Layers

Hint 1: Verify v2 Check the cgroup filesystem type.

Hint 2: Attach first Write PID into cgroup.procs after fork.

Hint 3: Validate limits Reject invalid percent values.

Hint 4: Debugging Read cpu.stat and memory.current for feedback.

5.9 Books That Will Help

Topic Book Chapter
Resource limits “The Linux Programming Interface” resource control
Scheduling “Operating Systems: Three Easy Pieces” CPU chapters

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Goals:

  • Create cgroup and set limits.

Tasks:

  1. Create directory and write limits.
  2. Read back configuration.

Checkpoint: Limits visible in cgroup files.

Phase 2: Core Functionality (2 days)

Goals:

  • Attach and run a process.

Tasks:

  1. Launch child process.
  2. Attach PID to cgroup.

Checkpoint: Process appears in cgroup.procs.

Phase 3: Polish & Edge Cases (2 days)

Goals:

  • Collect stats and cleanup.

Tasks:

  1. Read cpu/memory stats.
  2. Remove cgroup after exit.

Checkpoint: No leftover cgroup directories.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Controller enabling manual vs systemd manual learning focus
Limit units percent vs quota percent input user friendly

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Limit parsing 20% -> quota
Integration Tests Run workload busyloop
Edge Case Tests Permission error non-root run

6.2 Critical Test Cases

  1. CPU limit: usage stays below expected bound.
  2. Memory limit: memory.current stays below max.
  3. Permission: non-root run fails with exit code 13.

6.3 Test Data

Limits: CPU 20%, memory 200MB

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
v2 not mounted missing files mount cgroup2
Controllers disabled limits ignored enable controllers
Cleanup missing leftover dirs remove cgroup after use

7.2 Debugging Strategies

  • Inspect cgroup files: confirm values.
  • Read stats: confirm enforcement.

7.3 Performance Traps

Short runs may not show throttling; use longer workloads.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add IO limits if available.
  • Add JSON output.

8.2 Intermediate Extensions

  • Support multiple processes in one cgroup.
  • Add monitoring loop.

8.3 Advanced Extensions

  • Implement delegation to non-root users.
  • Integrate with systemd scopes.

9. Real-World Connections

9.1 Industry Applications

  • systemd: service resource control.
  • containers: enforce per-container limits.
  • systemd: https://systemd.io/ - cgroup integration.
  • runc: https://github.com/opencontainers/runc - runtime using cgroups.

9.3 Interview Relevance

Resource control is a key topic in container and systems interviews.


10. Resources

10.1 Essential Reading

  • cgroup v2 documentation
  • cgroups(7) man page

10.2 Video Resources

  • “cgroups in practice” - talks (search title)

10.3 Tools & Documentation

  • cgroup v2: https://docs.kernel.org/admin-guide/cgroup-v2.html
  • man7: https://man7.org/linux/man-pages/man7/cgroups.7.html

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how cpu.max works
  • I can explain memory.max behavior
  • I understand cgroup hierarchy

11.2 Implementation

  • All functional requirements are met
  • Limits are enforced
  • Cleanup is reliable

11.3 Growth

  • I can explain this project in an interview
  • I documented lessons learned
  • I can propose an extension

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Create cgroup and apply limits
  • Run process inside cgroup
  • Report usage stats

Full Completion:

  • All minimum criteria plus:
  • Failure demo with exit code
  • Clean cleanup behavior

Excellence (Going Above & Beyond):

  • Multi-process group management
  • Delegation and integration with systemd