Project 14: xv6 Operating System Study
Read and extend xv6 to understand real kernel subsystems working together.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 20-30 hours |
| Main Programming Language | C |
| Alternative Programming Languages | N/A |
| Coolness Level | Very High |
| Business Potential | Medium (OS expertise) |
| Prerequisites | boot, syscalls, VM, filesystem basics |
| Key Topics | xv6 codebase, traps, scheduler, fs |
1. Learning Objectives
By completing this project, you will:
- Navigate the xv6 codebase and explain core subsystems.
- Trace the syscall path from user space to kernel and back.
- Implement a small kernel extension (new syscall or scheduler tweak).
- Use QEMU and GDB to debug kernel behavior.
2. All Theory Needed (Per-Concept Breakdown)
Minimal Kernel Architecture (xv6)
Fundamentals
xv6 is a teaching OS with a small, readable kernel. It includes a bootloader, trap handling, process management, virtual memory, and a simple filesystem. Studying xv6 allows you to see how these subsystems fit together without the complexity of Linux. The codebase uses clear naming and minimal abstractions, which makes it ideal for tracing the flow of a syscall or a context switch.
Deep Dive into the concept
xv6 starts with a bootloader that loads the kernel into memory, then jumps to entry code that sets up page tables and enables paging. The kernel initializes devices, sets up the first user process, and enters the scheduler loop. Processes are represented by a struct proc containing state, page tables, and kernel stack pointers. The scheduler selects a runnable process and performs a context switch using swtch.
The trap path is the heart of the kernel-user boundary. When a user process makes a syscall, it executes a trap instruction (on xv6, int or syscall depending on version). The CPU jumps to the kernel trap handler, which saves registers, identifies the trap type, and dispatches to the syscall handler. The syscall handler uses a table to map syscall numbers to functions. Arguments are retrieved from the user stack or registers. After the syscall returns, control flows back to user space via iret.
The filesystem is a simplified inode-based design. There are direct blocks and a single indirect block for larger files. The buffer cache provides a cache of disk blocks and handles write-back. The file system code shows clear separation of concerns: name lookup, inode management, and block allocation. By reading and modifying xv6, you gain a holistic view of how OS subsystems coordinate.
For this project, you will implement a small extension. A classic task is adding a new syscall (e.g., getsysinfo) or modifying the scheduler to record runtime. This forces you to touch syscall tables, user stubs, and kernel handlers, ensuring you understand the full syscall path.
How this fit on projects
This concept ties together all prior projects, especially boot (Projects 1-2), syscalls (Project 8), VM (Project 4), and filesystem (Project 9).
Definitions & key terms
- Trap: CPU event that transfers control to kernel.
- swtch: context switch routine.
- syscall table: mapping of numbers to handlers.
- buffer cache: cached disk blocks.
Mental model diagram (ASCII)
User -> trap -> kernel -> syscall -> return
How it works (step-by-step)
- Build and boot xv6 in QEMU.
- Trace a syscall like
lsfrom user to kernel. - Modify kernel to add a new syscall.
- Rebuild and verify behavior.
Minimal concrete example
// in sysproc.c
int sys_uptime(void) { return ticks; }
Common misconceptions
- “xv6 is toy code”: it is small but structurally realistic.
- “syscalls are functions”: they are trap-based transitions.
Check-your-understanding questions
- Where does xv6 save registers during a trap?
- How does the syscall table map numbers to functions?
- What does the scheduler loop do?
Check-your-understanding answers
- In the trap frame stored on the kernel stack.
- An array indexed by syscall number.
- It scans for runnable processes and switches to them.
Real-world applications
- Understanding kernel design patterns.
Where you’ll apply it
References
- xv6 book and source code
- OSTEP and MIT 6.S081 lectures
Key insights
xv6 is a complete OS in miniature, showing real kernel architecture patterns.
Summary
Reading and extending xv6 connects all the OS concepts you have learned.
Homework/Exercises to practice the concept
- Add a syscall that returns free memory.
- Add a scheduler counter per process.
- Add a new user command that calls your syscall.
Solutions to the homework/exercises
- Walk free list and sum page counts.
- Increment a counter in the scheduler.
- Add user stub and program in user/.
3. Project Specification
3.1 What You Will Build
A documented exploration of xv6 plus a small extension (new syscall or scheduler tweak) with a demo program.
3.2 Functional Requirements
- Build and boot xv6 in QEMU.
- Trace a syscall path for a user program.
- Implement a kernel extension and test it.
- Document the modified files and reasoning.
3.3 Non-Functional Requirements
- Performance: xv6 boots normally.
- Reliability: no kernel panic after changes.
- Usability: clear notes and diff.
3.4 Example Usage / Output
$ make qemu
xv6 running...
$ uptime
1 1 1 100
3.5 Data Formats / Schemas / Protocols
- xv6 syscall table entries and user stubs.
3.6 Edge Cases
- Incorrect syscall number mapping.
- Missing user stub causing link errors.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make qemu
3.7.2 Golden Path Demo (Deterministic)
- Use a fixed syscall number and deterministic output string.
3.7.3 If CLI: exact terminal transcript
$ make qemu
xv6...
$ mysyscall
ok: ticks=123
Failure demo (deterministic):
$ make qemu
panic: unknown syscall
Exit codes:
0success2build error3runtime panic
4. Solution Architecture
4.1 High-Level Design
Boot -> init -> scheduler -> syscalls -> fs
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Boot | load kernel | QEMU + xv6 build | | Syscall path | trap + dispatch | table-based | | Extension | new syscall | minimal change |
4.3 Data Structures (No Full Code)
struct proc {
int pid;
int state;
int ticks;
};
4.4 Algorithm Overview
Key Algorithm: syscall dispatch
- Read syscall number from trap frame.
- Lookup handler in table.
- Call handler and return value.
Complexity Analysis:
- Time: O(1) dispatch
- Space: O(1)
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install qemu-system-x86 build-essential
5.2 Project Structure
xv6/
|-- kernel/
|-- user/
`-- Makefile
5.3 The Core Question You’re Answering
“How do real kernel subsystems fit together in a minimal, readable OS?”
5.4 Concepts You Must Understand First
- Trap handling and syscall dispatch.
- Process scheduler loop.
- Filesystem path resolution.
5.5 Questions to Guide Your Design
- Which files implement the syscall table?
- Where do you add user stubs?
- How will you test your new syscall?
5.6 Thinking Exercise
Trace ls from user program to kernel functions and back.
5.7 The Interview Questions They’ll Ask
- Why is xv6 useful for learning?
- How would you add a syscall safely?
5.8 Hints in Layers
Hint 1: Build and boot unmodified xv6.
Hint 2: Trace one syscall with print statements.
Hint 3: Add a new syscall and user command.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | xv6 | xv6 book | all | | OS design | OSTEP | 4-6 |
5.10 Implementation Phases
Phase 1: Build and read (6-8 hours)
Goals: understand code layout.
Phase 2: Trace syscalls (6-8 hours)
Goals: map syscall flow.
Phase 3: Extend (8-10 hours)
Goals: add syscall and user program.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Extension | syscall vs scheduler | syscall | minimal risk | | Debugging | printf vs gdb | printf | simpler in xv6 |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Build | compile xv6 | make qemu | | Integration | syscall test | run new command | | Regression | ensure boot | run ls, sh |
6.2 Critical Test Cases
- New syscall returns expected value.
- No kernel panics after change.
- User program links successfully.
6.3 Test Data
Expected output: "ok: ticks=123"
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |——–|———|———-| | Missing syscall stub | link error | update user headers | | Wrong syscall number | panic | update syscall table | | Trap frame mismatch | crash | verify registers |
7.2 Debugging Strategies
- Use QEMU
-S -sand GDB if needed. - Add printk-style logs in kernel.
7.3 Performance Traps
- Excessive debug logging inside scheduler loop.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a syscall that returns PID.
8.2 Intermediate Extensions
- Add a scheduler metric.
8.3 Advanced Extensions
- Add a simple system call tracing tool.
9. Real-World Connections
9.1 Industry Applications
- Kernel development workflows.
9.2 Related Open Source Projects
- xv6 and JOS.
9.3 Interview Relevance
- Demonstrates understanding of kernel subsystems.
10. Resources
10.1 Essential Reading
- xv6 book
10.2 Video Resources
- MIT 6.S081 lectures
10.3 Tools & Documentation
- QEMU and GDB docs
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can trace a syscall in xv6.
- I can explain the scheduler loop.
11.2 Implementation
- My kernel extension works.
11.3 Growth
- I can discuss xv6 in interviews.
12. Submission / Completion Criteria
Minimum Viable Completion:
- xv6 builds and runs with notes.
Full Completion:
- New syscall implemented and tested.
Excellence (Going Above & Beyond):
- Deeper kernel modification or scheduler change.