Project 14: xv6 Operating System Study

Read and extend xv6 to understand real kernel subsystems working together.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 20-30 hours
Main Programming Language C
Alternative Programming Languages N/A
Coolness Level Very High
Business Potential Medium (OS expertise)
Prerequisites boot, syscalls, VM, filesystem basics
Key Topics xv6 codebase, traps, scheduler, fs

1. Learning Objectives

By completing this project, you will:

  1. Navigate the xv6 codebase and explain core subsystems.
  2. Trace the syscall path from user space to kernel and back.
  3. Implement a small kernel extension (new syscall or scheduler tweak).
  4. Use QEMU and GDB to debug kernel behavior.

2. All Theory Needed (Per-Concept Breakdown)

Minimal Kernel Architecture (xv6)

Fundamentals

xv6 is a teaching OS with a small, readable kernel. It includes a bootloader, trap handling, process management, virtual memory, and a simple filesystem. Studying xv6 allows you to see how these subsystems fit together without the complexity of Linux. The codebase uses clear naming and minimal abstractions, which makes it ideal for tracing the flow of a syscall or a context switch.

Deep Dive into the concept

xv6 starts with a bootloader that loads the kernel into memory, then jumps to entry code that sets up page tables and enables paging. The kernel initializes devices, sets up the first user process, and enters the scheduler loop. Processes are represented by a struct proc containing state, page tables, and kernel stack pointers. The scheduler selects a runnable process and performs a context switch using swtch.

The trap path is the heart of the kernel-user boundary. When a user process makes a syscall, it executes a trap instruction (on xv6, int or syscall depending on version). The CPU jumps to the kernel trap handler, which saves registers, identifies the trap type, and dispatches to the syscall handler. The syscall handler uses a table to map syscall numbers to functions. Arguments are retrieved from the user stack or registers. After the syscall returns, control flows back to user space via iret.

The filesystem is a simplified inode-based design. There are direct blocks and a single indirect block for larger files. The buffer cache provides a cache of disk blocks and handles write-back. The file system code shows clear separation of concerns: name lookup, inode management, and block allocation. By reading and modifying xv6, you gain a holistic view of how OS subsystems coordinate.

For this project, you will implement a small extension. A classic task is adding a new syscall (e.g., getsysinfo) or modifying the scheduler to record runtime. This forces you to touch syscall tables, user stubs, and kernel handlers, ensuring you understand the full syscall path.

How this fit on projects

This concept ties together all prior projects, especially boot (Projects 1-2), syscalls (Project 8), VM (Project 4), and filesystem (Project 9).

Definitions & key terms

  • Trap: CPU event that transfers control to kernel.
  • swtch: context switch routine.
  • syscall table: mapping of numbers to handlers.
  • buffer cache: cached disk blocks.

Mental model diagram (ASCII)

User -> trap -> kernel -> syscall -> return

How it works (step-by-step)

  1. Build and boot xv6 in QEMU.
  2. Trace a syscall like ls from user to kernel.
  3. Modify kernel to add a new syscall.
  4. Rebuild and verify behavior.

Minimal concrete example

// in sysproc.c
int sys_uptime(void) { return ticks; }

Common misconceptions

  • “xv6 is toy code”: it is small but structurally realistic.
  • “syscalls are functions”: they are trap-based transitions.

Check-your-understanding questions

  1. Where does xv6 save registers during a trap?
  2. How does the syscall table map numbers to functions?
  3. What does the scheduler loop do?

Check-your-understanding answers

  1. In the trap frame stored on the kernel stack.
  2. An array indexed by syscall number.
  3. It scans for runnable processes and switches to them.

Real-world applications

  • Understanding kernel design patterns.

Where you’ll apply it

References

  • xv6 book and source code
  • OSTEP and MIT 6.S081 lectures

Key insights

xv6 is a complete OS in miniature, showing real kernel architecture patterns.

Summary

Reading and extending xv6 connects all the OS concepts you have learned.

Homework/Exercises to practice the concept

  1. Add a syscall that returns free memory.
  2. Add a scheduler counter per process.
  3. Add a new user command that calls your syscall.

Solutions to the homework/exercises

  1. Walk free list and sum page counts.
  2. Increment a counter in the scheduler.
  3. Add user stub and program in user/.

3. Project Specification

3.1 What You Will Build

A documented exploration of xv6 plus a small extension (new syscall or scheduler tweak) with a demo program.

3.2 Functional Requirements

  1. Build and boot xv6 in QEMU.
  2. Trace a syscall path for a user program.
  3. Implement a kernel extension and test it.
  4. Document the modified files and reasoning.

3.3 Non-Functional Requirements

  • Performance: xv6 boots normally.
  • Reliability: no kernel panic after changes.
  • Usability: clear notes and diff.

3.4 Example Usage / Output

$ make qemu
xv6 running...
$ uptime
1 1 1 100

3.5 Data Formats / Schemas / Protocols

  • xv6 syscall table entries and user stubs.

3.6 Edge Cases

  • Incorrect syscall number mapping.
  • Missing user stub causing link errors.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make qemu

3.7.2 Golden Path Demo (Deterministic)

  • Use a fixed syscall number and deterministic output string.

3.7.3 If CLI: exact terminal transcript

$ make qemu
xv6...
$ mysyscall
ok: ticks=123

Failure demo (deterministic):

$ make qemu
panic: unknown syscall

Exit codes:

  • 0 success
  • 2 build error
  • 3 runtime panic

4. Solution Architecture

4.1 High-Level Design

Boot -> init -> scheduler -> syscalls -> fs

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Boot | load kernel | QEMU + xv6 build | | Syscall path | trap + dispatch | table-based | | Extension | new syscall | minimal change |

4.3 Data Structures (No Full Code)

struct proc {
    int pid;
    int state;
    int ticks;
};

4.4 Algorithm Overview

Key Algorithm: syscall dispatch

  1. Read syscall number from trap frame.
  2. Lookup handler in table.
  3. Call handler and return value.

Complexity Analysis:

  • Time: O(1) dispatch
  • Space: O(1)

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install qemu-system-x86 build-essential

5.2 Project Structure

xv6/
|-- kernel/
|-- user/
`-- Makefile

5.3 The Core Question You’re Answering

“How do real kernel subsystems fit together in a minimal, readable OS?”

5.4 Concepts You Must Understand First

  1. Trap handling and syscall dispatch.
  2. Process scheduler loop.
  3. Filesystem path resolution.

5.5 Questions to Guide Your Design

  1. Which files implement the syscall table?
  2. Where do you add user stubs?
  3. How will you test your new syscall?

5.6 Thinking Exercise

Trace ls from user program to kernel functions and back.

5.7 The Interview Questions They’ll Ask

  1. Why is xv6 useful for learning?
  2. How would you add a syscall safely?

5.8 Hints in Layers

Hint 1: Build and boot unmodified xv6.

Hint 2: Trace one syscall with print statements.

Hint 3: Add a new syscall and user command.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | xv6 | xv6 book | all | | OS design | OSTEP | 4-6 |

5.10 Implementation Phases

Phase 1: Build and read (6-8 hours)

Goals: understand code layout.

Phase 2: Trace syscalls (6-8 hours)

Goals: map syscall flow.

Phase 3: Extend (8-10 hours)

Goals: add syscall and user program.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Extension | syscall vs scheduler | syscall | minimal risk | | Debugging | printf vs gdb | printf | simpler in xv6 |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Build | compile xv6 | make qemu | | Integration | syscall test | run new command | | Regression | ensure boot | run ls, sh |

6.2 Critical Test Cases

  1. New syscall returns expected value.
  2. No kernel panics after change.
  3. User program links successfully.

6.3 Test Data

Expected output: "ok: ticks=123"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |——–|———|———-| | Missing syscall stub | link error | update user headers | | Wrong syscall number | panic | update syscall table | | Trap frame mismatch | crash | verify registers |

7.2 Debugging Strategies

  • Use QEMU -S -s and GDB if needed.
  • Add printk-style logs in kernel.

7.3 Performance Traps

  • Excessive debug logging inside scheduler loop.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a syscall that returns PID.

8.2 Intermediate Extensions

  • Add a scheduler metric.

8.3 Advanced Extensions

  • Add a simple system call tracing tool.

9. Real-World Connections

9.1 Industry Applications

  • Kernel development workflows.
  • xv6 and JOS.

9.3 Interview Relevance

  • Demonstrates understanding of kernel subsystems.

10. Resources

10.1 Essential Reading

  • xv6 book

10.2 Video Resources

  • MIT 6.S081 lectures

10.3 Tools & Documentation

  • QEMU and GDB docs

11. Self-Assessment Checklist

11.1 Understanding

  • I can trace a syscall in xv6.
  • I can explain the scheduler loop.

11.2 Implementation

  • My kernel extension works.

11.3 Growth

  • I can discuss xv6 in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • xv6 builds and runs with notes.

Full Completion:

  • New syscall implemented and tested.

Excellence (Going Above & Beyond):

  • Deeper kernel modification or scheduler change.