Project 1: Kernel Interface Explorer (The /proc and /sys Spelunker)
Build a comprehensive tool that reads and interprets data from
/procand/sys, presenting kernel information in a human-readable dashboard format.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Beginner |
| Time Estimate | Weekend |
| Language | C (Python for initial exploration) |
| Prerequisites | Basic C, Linux command line, file I/O |
| Key Topics | procfs, sysfs, kernel data structures |
1. Learning Objectives
By completing this project, you will:
- Understand how the Linux kernel exposes information to userspace through pseudo-filesystems
- Learn the structure of
/procand/sysand what data they contain - Gain familiarity with kernel concepts like
task_struct, memory zones, and the device model - Build a foundation for understanding kernel internals before diving into kernel code
2. Theoretical Foundation
2.1 Core Concepts
procfs (/proc): A virtual filesystem that provides a view into the kernel’s internal data structures. Files in /proc don’t exist on disk—they’re generated on-the-fly when you read them.
sysfs (/sys): Another virtual filesystem that exposes the kernel’s device model. It provides a hierarchical view of devices, drivers, and their relationships.
Virtual Filesystem Architecture:
┌─────────────────────────────────────────────────────────────┐
│ User Application │
│ │ │
│ open("/proc/cpuinfo") │
│ │ │
│ ▼ │
├─────────────────────────────────────────────────────────────┤
│ VFS Layer │
│ (Routes to appropriate filesystem) │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ procfs │ │ sysfs │ │ ext4 │ │
│ │ handler │ │ handler │ │ handler │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Kernel │ │ Device │ │ Disk │ │
│ │ Data │ │ Model │ │ Data │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────┘
2.2 Why This Matters
Before you can write kernel code, you need to understand what the kernel knows and how it organizes that knowledge. The /proc and /sys filesystems are windows into the kernel’s soul—they show you:
- How processes are represented (
task_structvia/proc/[pid]/) - How memory is managed (zones, allocation stats)
- How devices are organized (the device model hierarchy)
- How the kernel tracks resources (file descriptors, network connections)
2.3 Historical Context
The /proc filesystem originated in Unix System V and was enhanced significantly in Linux. Originally it was just for process information (hence “proc”), but Linux expanded it to include system-wide information. The /sys filesystem was introduced in Linux 2.6 to provide a cleaner interface for device information, separating it from the increasingly cluttered /proc.
2.4 Common Misconceptions
- “/proc files are real files” - No, they’re generated when read. The “files” are actually functions that produce output.
- “/proc and /sys are the same” -
/procfocuses on processes and system stats;/sysfocuses on devices and drivers. - “These files are always available” - Some require specific kernel config options to be enabled.
3. Project Specification
3.1 What You Will Build
A command-line tool called kernel_explorer that provides:
- A dashboard view of system status
- Detailed per-process information
- Device hierarchy exploration
- Memory and CPU statistics
3.2 Functional Requirements
- Dashboard Mode (
--dashboard):- Display kernel version and uptime
- Show CPU and memory summary
- List top processes by memory usage
- Show device overview
- Process Mode (
--process PID):- Display detailed process information
- Show memory maps
- List open file descriptors
- Display cgroup membership
- Device Mode (
--devices):- Show device hierarchy from
/sys - Display device attributes
- Show driver bindings
- Show device hierarchy from
- Memory Mode (
--memory):- Parse
/proc/meminfocompletely - Show memory zones
- Display slab cache information
- Parse
3.3 Non-Functional Requirements
- Handle race conditions (processes disappearing)
- Parse files efficiently (streaming, not loading entire file)
- Provide clear error messages for missing data
- Support both terminal output and JSON export
3.4 Example Usage / Output
$ ./kernel_explorer --dashboard
╔══════════════════════ KERNEL EXPLORER v1.0 ══════════════════════╗
║ Kernel: 6.8.0-generic Uptime: 3d 14h 22m Load: 0.52 0.48 0.51
╠══════════════════════════════════════════════════════════════════╣
║ CPU: Intel i7-12700K (20 cores) Memory: 31.2 GB / 64 GB (48%)
║ Context Switches: 1,234,567,890 Interrupts: 987,654,321
╠══════════════════════════════════════════════════════════════════╣
║ TOP PROCESSES BY MEMORY:
║ PID NAME RSS STATE THREADS
║ 1234 firefox 2.1 GB S 156
║ 5678 code 1.8 GB S 89
║ 9012 chrome 1.2 GB S 45
╠══════════════════════════════════════════════════════════════════╣
║ DEVICES:
║ [block] nvme0n1 (Samsung SSD 980 PRO) - 1TB
║ [net] eth0 (Intel I225-V) - 1000Mb/s, UP
║ [usb] 1-1: Logitech USB Receiver
╚══════════════════════════════════════════════════════════════════╝
$ ./kernel_explorer --process 1234
Process: firefox (PID 1234)
├── State: Sleeping (interruptible)
├── Parent: 1 (systemd)
├── Threads: 156
├── Memory:
│ ├── Virtual: 15.2 GB
│ ├── Resident: 2.1 GB
│ ├── Shared: 234 MB
│ └── Memory Maps: 847 regions
├── File Descriptors: 234 open
├── CPU Affinity: 0-19
└── Cgroups: /user.slice/user-1000.slice
3.5 Real World Outcome
When complete, you’ll have a tool that rivals htop for system exploration, but with deeper insight into kernel structures. More importantly, you’ll have a mental model of what the kernel tracks and how it organizes that information.
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────┐
│ kernel_explorer │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CLI │ │ Output │ │ Cache │ │
│ │ Parser │ │ Formatter │ │ Manager │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ │
│ ┌─────────────────────────┴─────────────────────────┐ │
│ │ Data Collectors │ │
│ ├───────────┬───────────┬───────────┬───────────────┤ │
│ │ Process │ Memory │ Device │ System │ │
│ │ Collector │ Collector │ Collector │ Collector │ │
│ └─────┬─────┴─────┬─────┴─────┬─────┴───────┬───────┘ │
│ │ │ │ │ │
└────────┼───────────┼───────────┼─────────────┼───────────────┘
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌─────┴─────┐
│ /proc/ │ │ /proc/ │ │ /sys/ │ │ /proc/ │
│ [pid]/ │ │meminfo │ │ class/ │ │ version │
│ │ │slabinfo │ │ devices │ │ uptime │
└─────────┘ └─────────┘ └─────────┘ └───────────┘
4.2 Key Components
- CLI Parser: Handles command-line arguments and dispatches to appropriate mode
- Data Collectors: Modules that read and parse specific
/procand/sysfiles - Output Formatter: Formats data for terminal display or JSON export
- Cache Manager: Caches directory listings to handle race conditions
4.3 Data Structures
// Process information structure
struct process_info {
pid_t pid;
char comm[16]; // Command name (from /proc/[pid]/comm)
char state; // Process state (R, S, D, Z, T)
pid_t ppid; // Parent PID
int threads; // Thread count
unsigned long vsize; // Virtual memory size
long rss; // Resident set size (pages)
unsigned long utime; // User CPU time
unsigned long stime; // System CPU time
};
// Memory information structure
struct memory_info {
unsigned long total;
unsigned long free;
unsigned long available;
unsigned long buffers;
unsigned long cached;
unsigned long swap_total;
unsigned long swap_free;
};
// Device information structure
struct device_info {
char class[32]; // Device class (block, net, etc.)
char name[64]; // Device name
char driver[64]; // Driver name
char subsystem[32]; // Subsystem
// ... additional attributes
};
4.4 Algorithm Overview
Process Enumeration:
- Read
/procdirectory for numeric entries (PIDs) - For each PID, read
/proc/[pid]/statand parse fields - Handle ENOENT (process exited) gracefully
- Sort by desired metric (memory, CPU)
Memory Parsing:
- Open
/proc/meminfo - Parse key-value pairs (format:
Key: value kB) - Convert to appropriate units
Device Enumeration:
- Walk
/sys/class/for device classes - For each class, enumerate devices via symlinks
- Read device attributes from sysfs files
5. Implementation Guide
5.1 Development Environment Setup
# Install required tools
sudo apt install build-essential
# Create project structure
mkdir -p kernel_explorer/{src,include,tests}
cd kernel_explorer
5.2 Project Structure
kernel_explorer/
├── Makefile
├── include/
│ ├── process.h
│ ├── memory.h
│ ├── device.h
│ └── output.h
├── src/
│ ├── main.c
│ ├── process.c
│ ├── memory.c
│ ├── device.c
│ └── output.c
└── tests/
└── test_parsing.c
5.3 The Core Question You’re Answering
“How does the kernel expose its internal state to userspace, and what can we learn about kernel architecture by studying that interface?”
5.4 Concepts You Must Understand First
Before implementing, verify you can answer:
- What is a virtual filesystem?
- Can you explain why
/procfiles have size 0 but contain data? - Reference: “The Linux Programming Interface” Chapter 12
- Can you explain why
- What is task_struct?
- Can you describe what information the kernel tracks for each process?
- Reference: “Linux Kernel Development” Chapter 3
- What is the device model?
- Can you explain the relationship between devices, drivers, and buses?
- Reference: “Linux Device Drivers, 3rd Edition” Chapter 14
5.5 Questions to Guide Your Design
Parsing Strategy:
- How will you handle files that can be gigabytes in size (like
/proc/[pid]/maps)? - What if a process exits between listing
/procand reading its files?
Data Organization:
- Should you cache the process list, or re-scan each time?
- How will you represent the device hierarchy in memory?
Output Formatting:
- How will you handle terminal width constraints?
- Should numeric values be human-readable (1.5GB) or precise (1610612736)?
5.6 Thinking Exercise
Before writing code, trace through what happens when you run cat /proc/cpuinfo:
- Shell forks and execs
cat catcallsopen("/proc/cpuinfo", O_RDONLY)- VFS routes to procfs handler
- procfs finds the
cpuinfoentry and calls itsreadfunction - The kernel function generates CPU info on the fly
- Data is copied to userspace buffer
catcallsread()until EOFcatwrites data to stdoutcatexits
Now: What kernel functions are involved? What data structures are accessed?
5.7 Hints in Layers
Hint 1 - Starting Point:
Start with /proc/version and /proc/uptime—they’re single-line files that are easy to parse. Get comfortable with file I/O first.
Hint 2 - Process Information:
The key file is /proc/[pid]/stat. It contains ~52 fields separated by spaces, but beware: the command name (field 2) is in parentheses and may contain spaces. Parse carefully!
Hint 3 - Parsing Strategy:
// For /proc/[pid]/stat, handle the comm field specially:
// Format: pid (comm) state ppid ...
// The comm can contain spaces and ), so find the last ) first
char *comm_end = strrchr(buffer, ')');
if (!comm_end) return -1;
// Parse fields after the )
sscanf(comm_end + 2, "%c %d ...", &state, &ppid, ...);
Hint 4 - Directory Walking:
// Use opendir/readdir for /proc
DIR *proc = opendir("/proc");
struct dirent *entry;
while ((entry = readdir(proc)) != NULL) {
// Check if entry is a number (PID)
char *endptr;
long pid = strtol(entry->d_name, &endptr, 10);
if (*endptr == '\0' && pid > 0) {
// This is a process directory
}
}
5.8 The Interview Questions They’ll Ask
- “Explain the difference between /proc and /sys”
/procis for process info and kernel stats;/sysis for the device model/procis older and more ad-hoc;/sysis more structured
- “How would you find all open network sockets on a system?”
/proc/net/tcp,/proc/net/udp,/proc/net/unix- Or use
/proc/[pid]/fdand follow symlinks
- “What’s the difference between VmRSS and VmSize?”
- VmSize: Total virtual address space
- VmRSS: Physical memory actually used (resident)
- “How do you handle processes disappearing while you read them?”
- Check for ENOENT/ESRCH errors
- Don’t assume a file exists because the directory was listed
- “What kernel config options affect what’s visible in /proc?”
- CONFIG_PROC_FS enables /proc entirely
- Various CONFIG_* options add/remove specific entries
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| procfs structure | The Linux Programming Interface | Chapter 12 |
| sysfs and device model | Linux Device Drivers, 3rd Ed | Chapter 14 |
| Process representation | Linux Kernel Development | Chapter 3 |
| Memory management | Understanding the Linux Kernel | Chapter 8 |
5.10 Implementation Phases
Phase 1: Basic File Reading (Day 1)
- Read and print
/proc/version - Read and parse
/proc/uptime - Read and parse
/proc/loadavg
Phase 2: Memory Information (Day 1)
- Parse
/proc/meminfo - Format output nicely
Phase 3: Process Enumeration (Day 2)
- List all processes from
/proc - Parse
/proc/[pid]/statfor basic info - Sort and display top processes
Phase 4: Device Information (Day 2)
- Walk
/sys/class/blockfor block devices - Walk
/sys/class/netfor network devices - Display device attributes
Phase 5: Polish (Day 3)
- Add command-line argument parsing
- Improve output formatting
- Add error handling
5.11 Key Implementation Decisions
Decision 1: Buffered vs Unbuffered I/O
- Use standard
fopen/freadfor simplicity /procfiles are small; performance isn’t critical
Decision 2: Dynamic vs Static Allocation
- Process list size varies; use dynamic allocation
- Memory info structure is fixed; use static
Decision 3: Error Handling Strategy
- Silently skip missing processes (normal behavior)
- Report errors for system-wide files (indicates problem)
6. Testing Strategy
Unit Tests
// Test stat parsing
void test_parse_stat() {
const char *stat_line = "1234 (process name) S 1 1234 ...";
struct process_info info;
assert(parse_stat(stat_line, &info) == 0);
assert(info.pid == 1234);
assert(strcmp(info.comm, "process name") == 0);
assert(info.state == 'S');
}
// Test with tricky command names
void test_parse_stat_special_names() {
// Command name with space
const char *line1 = "1 (Web Content) S 2 ...";
// Command name with parenthesis
const char *line2 = "1 (app (deleted)) S 2 ...";
}
Integration Tests
# Verify basic operation
./kernel_explorer --dashboard > /dev/null && echo "Dashboard OK"
# Verify process mode with known PID
./kernel_explorer --process 1 | grep -q "systemd" && echo "Process OK"
# Compare with system tools
TOOL_UPTIME=$(./kernel_explorer --uptime)
SYS_UPTIME=$(uptime -s)
# Verify they match
Edge Cases
- Process that exits during enumeration
- Very long command names
- Kernel threads (different /proc structure)
- Systems with 1000s of processes
7. Common Pitfalls & Debugging
| Problem | Symptom | Root Cause | Fix |
|---|---|---|---|
| Segfault reading process | Crash in parse_stat | Process exited, file gone | Check fopen return value |
| Wrong memory values | Numbers seem off | Not converting from kB | Multiply by 1024 |
| Truncated command names | Names cut off at 15 chars | comm is limited to 16 bytes | Use /proc/[pid]/cmdline for full |
| Can’t read some processes | Permission denied | Not running as root | Some info requires elevated privileges |
| Device names missing | Empty device list | Wrong sysfs path | Check /sys/class/ vs /sys/devices/ |
Quick Verification Tests
# Verify you're reading /proc correctly
cat /proc/version
# Should match your tool's output
# Check process parsing
cat /proc/self/stat
# Use this to verify your parser
# Verify sysfs access
ls /sys/class/block/
# Should list block devices
8. Extensions & Challenges
Easy Extensions
- Add refresh mode: Update display every N seconds (like
top) - Add JSON output: For integration with other tools
- Add filtering: Show only processes matching a pattern
Medium Extensions
- Add /proc/[pid]/maps parsing: Show memory regions with permissions
- Add network statistics: Parse
/proc/net/devfor interface stats - Add CPU per-core usage: Parse
/proc/statfor per-CPU breakdown
Hard Extensions
- Add historical tracking: Track process memory over time
- Add container awareness: Handle cgroups and namespaces
- Add kernel symbol resolution: Map addresses to function names
9. Real-World Connections
How Production Tools Do This
htop: Uses the same /proc parsing but with ncurses UI. Study its source for efficient parsing techniques.
ps: The classic process tool. Uses /proc/[pid]/* extensively.
lsblk: Uses sysfs for block device information. Good example of device model navigation.
systemd-cgls: Shows cgroup hierarchy, parses /sys/fs/cgroup/.
Industry Usage
- Container runtimes (Docker, containerd): Read
/procto monitor container processes - Monitoring agents (Prometheus node_exporter): Parse
/procand/sysfor metrics - Performance tools (perf, sysstat): Heavy users of
/procinterfaces
10. Resources
Essential Documentation
man 5 proc- Detailed documentation of/procfilesystemDocumentation/filesystems/proc.rst- In-kernel documentationDocumentation/ABI/testing/sysfs-*- Sysfs interface documentation
Code References
- Linux kernel source:
fs/proc/- procfs implementation - Linux kernel source:
fs/sysfs/- sysfs implementation - htop source:
linux/LinuxProcessList.c- Example /proc parsing
Online Resources
11. Self-Assessment Checklist
Before moving to the next project, verify:
- I can explain what happens when
cat /proc/cpuinforuns - I understand why
/procfiles show size 0 - I can parse
/proc/[pid]/statcorrectly, including tricky command names - I know the difference between VmSize, VmRSS, and VmHWM
- I can navigate the sysfs device hierarchy
- I understand why some
/procfiles require root access - My tool handles missing processes gracefully
- I can explain the relationship between
/sys/class/and/sys/devices/
12. Submission / Completion Criteria
Your project is complete when:
- Dashboard mode shows kernel version, uptime, load, CPU info, memory stats, top processes, and devices
- Process mode shows detailed information for any valid PID
- Memory mode correctly parses all of
/proc/meminfo - Device mode shows the device hierarchy with attributes
- Error handling gracefully handles missing files and processes
- Code quality is clean, well-commented, and follows C best practices
Verification Commands
# Run these to verify your implementation:
./kernel_explorer --dashboard # Should show system overview
./kernel_explorer --process 1 # Should show init/systemd info
./kernel_explorer --memory # Should show memory breakdown
./kernel_explorer --devices # Should show device tree
./kernel_explorer --process 999999 # Should handle gracefully (no such process)
Next Project: P02 - System Call Tracer - Build your own strace to understand the user/kernel boundary.