Project 8: The Mirror Filesystem (FUSE)

Build a user-space filesystem that mirrors a directory while transforming file contents.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Main Programming Language C or Python (Alternatives: Rust, Go)
Alternative Programming Languages Rust, Go
Coolness Level See REFERENCE.md (Level 4)
Business Potential See REFERENCE.md (Level 4)
Prerequisites VFS basics, file I/O, callbacks
Key Topics FUSE, VFS callbacks, user-kernel bridge

1. Learning Objectives

By completing this project, you will:

  1. Explain how VFS forwards operations to a filesystem driver.
  2. Implement core filesystem callbacks in user space.
  3. Preserve metadata while transforming file contents.
  4. Debug user-kernel boundaries with deterministic tests.

2. All Theory Needed (Per-Concept Breakdown)

FUSE and User-Space Filesystems

Fundamentals FUSE (Filesystem in Userspace) lets you implement a filesystem without writing kernel code. The kernel exposes a device interface that forwards VFS operations like open, read, write, and getattr to a user-space daemon. Your daemon responds with data or errors, and the kernel presents the results as if they came from a normal filesystem. This makes it safe and fast to prototype filesystem behavior while relying on the same VFS contract that in-kernel filesystems use.

Deep Dive When a process accesses a path under a FUSE mount, the VFS layer recognizes that the mount point is backed by the FUSE driver. Instead of calling an in-kernel filesystem implementation, the kernel sends a request to /dev/fuse. A user-space library (libfuse) receives these requests and dispatches them to your callbacks. Each callback corresponds to a VFS operation: getattr for metadata, readdir for listing directories, open for opening files, read and write for data access, and so on.

The key concept is that the VFS contract remains the same. Tools like ls and cat do not know they are talking to a user-space filesystem. They simply call syscalls, and the kernel forwards the operations to your daemon. This means you must respect VFS expectations: you must return correct metadata, set appropriate error codes, and ensure consistent behavior across operations. If your getattr reports a size that does not match what read returns, applications will behave unpredictably. If your readdir returns inconsistent entries, directory listings will appear to change randomly.

Because FUSE is user-space, it adds context switches and copying overhead. This makes it slower than in-kernel filesystems, but easier to develop and safer to debug. It is ideal for experimental filesystems, encrypted overlays, network filesystems, and educational projects. Many real-world tools rely on FUSE for these reasons.

This project uses a mirror filesystem: all paths are mapped to an underlying real directory, but file contents are transformed (for example, reversed) when read or written. This teaches two important concepts. First, it separates metadata from data: you can preserve metadata while transforming content. Second, it shows how VFS operations compose to produce expected behavior from user space. When you write to a file through the mount point, the kernel calls your write callback; your code transforms the data and writes it to the underlying file. When you read, you reverse the transformation. The filesystem is consistent if these transformations are inverse operations.

Concurrency is another challenge. The kernel can issue multiple requests in parallel, so your daemon must be thread-safe. If your transformation logic uses shared buffers or global state, you must guard it. You also need to handle partial reads and writes, which means you must respect offsets and lengths, not assume full-file operations. This mirrors how real filesystems behave and is a key reason FUSE projects teach VFS internals well.

How this fit on projects You will apply this concept in §3.1 for the filesystem behavior, in §4.2 for callback components, and in §6.2 for tests. It also builds on P04-filesystem-explorer.md which introduced VFS and inodes.

Definitions & key terms

  • FUSE: Filesystem in Userspace, a kernel interface for user-space FS.
  • Callback: Function invoked by libfuse for a VFS operation.
  • Mount point: Directory where the filesystem is attached.
  • Passthrough: Filesystem that forwards operations to another directory.
  • Transformation: Content modification on read/write.

Mental model diagram

App -> syscall -> VFS -> /dev/fuse -> libfuse -> your callbacks

How it works

  1. Mount FUSE filesystem at a mount point.
  2. Kernel forwards VFS ops to your daemon.
  3. Your daemon maps paths to a backing store.
  4. Your daemon returns data or errors.

Minimal concrete example

write("hello") -> stored as "olleh"
read() -> returns "hello"

Common misconceptions

  • “FUSE is only for experiments.” Many production tools use it.
  • “Metadata is optional.” Incorrect metadata breaks user programs.
  • “Reads are whole-file.” The kernel can request partial reads.

Check-your-understanding questions

  1. Why does a FUSE filesystem need to implement getattr?
  2. What happens if you ignore file offsets in read?
  3. Why is FUSE slower than in-kernel filesystems?
  4. How does passthrough mode help debugging?

Check-your-understanding answers

  1. Tools like ls need size and permissions from metadata.
  2. Reads will return incorrect data and break applications.
  3. It adds context switches and user-kernel data copies.
  4. It verifies your wiring before adding transformations.

Real-world applications

  • Encrypted filesystems.
  • Remote filesystems over SSH.
  • Content transformation layers.

Where you’ll apply it

References

  • libfuse repository: https://github.com/libfuse/libfuse
  • VFS documentation: https://docs.kernel.org/filesystems/vfs.html

Key insights FUSE exposes the VFS contract without kernel coding.

Summary If you can implement a FUSE filesystem, you understand how VFS drives file behavior.

Homework/Exercises to practice the concept

  1. Write down the sequence of callbacks triggered by cat file.
  2. Explain why a passthrough filesystem is a good first step.

Solutions to the homework/exercises

  1. getattr -> open -> read -> release.
  2. It validates the wiring without transformation complexity.

3. Project Specification

3.1 What You Will Build

A FUSE filesystem that mirrors a real directory. Writes are transformed (for example, reversed) before saving. Reads reverse the transformation so the user sees original content through the mount.

3.2 Functional Requirements

  1. Mount and unmount reliably.
  2. Pass-through metadata (size, permissions, timestamps).
  3. Transform content on read and write.

3.3 Non-Functional Requirements

  • Performance: Acceptable for small files.
  • Reliability: No data corruption.
  • Usability: Clear mount/unmount instructions.

3.4 Example Usage / Output

$ ./mirrorfs real_dir/ mount_point/
$ echo "hello" > mount_point/test.txt
$ cat real_dir/test.txt
olleh

3.5 Data Formats / Schemas / Protocols

  • File content transformation must be reversible.
  • Metadata must match backing files.

3.6 Edge Cases

  • Partial writes.
  • Concurrent reads and writes.
  • Non-regular files (directories, symlinks).

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

  • Build and run ./mirrorfs real_dir/ mount_point/.
  • Ensure real_dir and mount_point exist.

3.7.2 Golden Path Demo (Deterministic)

Use a fixed string and verify transformation is reversible.

3.7.3 If CLI: Exact terminal transcript

$ ./mirrorfs real_dir/ mount_point/
$ echo "hello" > mount_point/test.txt
$ cat real_dir/test.txt
olleh
# exit code: 0

Failure demo (deterministic):

$ ./mirrorfs missing_dir/ mount_point/
error: backing directory not found
# exit code: 2

Exit codes:

  • 0 success
  • 2 invalid path

4. Solution Architecture

4.1 High-Level Design

VFS -> /dev/fuse -> libfuse -> callbacks -> backing directory

4.2 Key Components

Component Responsibility Key Decisions
Mount handler Initialize FUSE session Single-threaded first
Path mapper Map virtual paths to real paths Simple prefix mapping
Transformer Reverse or encode content Invertible transform

4.4 Data Structures (No Full Code)

  • Path mapping: mount path -> backing path.
  • Request context: operation, path, offset, size.

4.4 Algorithm Overview

Key Algorithm: read with transform

  1. Translate virtual path to backing path.
  2. Read backing file range.
  3. Apply transform and return bytes.

Complexity Analysis:

  • Time: O(n) for n bytes.
  • Space: O(n) buffer for transform.

5. Implementation Guide

5.1 Development Environment Setup

# Install libfuse and its headers

5.2 Project Structure

project-root/
├── src/
│   ├── mirrorfs.c
│   └── transform.c
├── tests/
│   └── fuse_tests.sh
└── README.md

5.3 The Core Question You’re Answering

“How does the kernel support many filesystems with one API?”

5.4 Concepts You Must Understand First

  1. VFS operations
    • What calls are required for a file read?
    • Book Reference: “Linux Kernel Development” - VFS chapter
  2. FUSE bridge
    • How does /dev/fuse forward requests?
    • Book Reference: libfuse documentation

5.5 Questions to Guide Your Design

  1. How will you handle partial reads and writes?
  2. How will you preserve metadata like permissions?

5.6 Thinking Exercise

Callback Trace

List the callbacks triggered by ls -l on the mount.

5.7 The Interview Questions They’ll Ask

  1. “Why is FUSE slower than kernel filesystems?”
  2. “What is the VFS contract?”
  3. “How do you handle concurrency in FUSE?”
  4. “Why must getattr be correct?”

5.8 Hints in Layers

Hint 1: Start with passthrough Implement a mirror without transforms.

Hint 2: Add transform Modify data only in read/write callbacks.

Hint 3: Handle errors Return correct error codes for missing paths.

Hint 4: Debugging Compare ls -l output on mount vs backing dir.

5.9 Books That Will Help

Topic Book Chapter
VFS “Linux Kernel Development” VFS chapter
FUSE libfuse docs reference

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Goals:

  • Mount a passthrough filesystem.

Tasks:

  1. Implement minimal callbacks.
  2. Verify basic read/write.

Checkpoint: ls and cat work on mount.

Phase 2: Core Functionality (3 days)

Goals:

  • Add transformation logic.

Tasks:

  1. Transform writes to backing store.
  2. Reverse transform on reads.

Checkpoint: Data round-trips correctly.

Phase 3: Polish & Edge Cases (2 days)

Goals:

  • Handle errors and concurrency.

Tasks:

  1. Correct error codes.
  2. Ensure thread safety.

Checkpoint: No crashes under concurrent access.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Threading single vs multi single first easier debugging
Transform reverse vs encode reverse easy to verify

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Transform correctness reverse twice -> original
Integration Tests Mount behavior ls, cat, echo
Edge Case Tests Partial writes small buffers

6.2 Critical Test Cases

  1. Write/read: data matches after transform.
  2. Metadata: permissions and size preserved.
  3. Missing path: correct error code.

6.3 Test Data

Test strings: "hello", "abcdef"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong getattr ls output wrong mirror metadata
Offset errors corrupted data respect offsets in read/write
Missing unmount mount persists provide cleanup steps

7.2 Debugging Strategies

  • Use strace: inspect VFS calls.
  • Log callbacks: verify order and parameters.

7.3 Performance Traps

Transforming large files in memory can be slow; use streaming where possible.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a passthrough mode flag.
  • Add read-only mode.

8.2 Intermediate Extensions

  • Add encryption transform.
  • Add logging for each operation.

8.3 Advanced Extensions

  • Implement caching layer.
  • Support sparse files.

9. Real-World Connections

9.1 Industry Applications

  • Encrypted overlays: secure storage.
  • Remote filesystems: sshfs-like tools.
  • sshfs: https://github.com/libfuse/sshfs - SSH-based FS.
  • encfs: https://github.com/vgough/encfs - encrypted FS.

9.3 Interview Relevance

VFS and FUSE concepts show deep understanding of filesystem internals.


10. Resources

10.1 Essential Reading

  • libfuse documentation
  • VFS documentation

10.2 Video Resources

  • “FUSE filesystems” - talks (search title)

10.3 Tools & Documentation

  • libfuse: https://github.com/libfuse/libfuse
  • VFS docs: https://docs.kernel.org/filesystems/vfs.html

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain VFS callbacks
  • I can explain why getattr matters
  • I understand how FUSE bridges user and kernel space

11.2 Implementation

  • All functional requirements are met
  • Transform is reversible
  • Mount/unmount are reliable

11.3 Growth

  • I can explain this project in an interview
  • I documented lessons learned
  • I can propose an extension

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Mount works
  • Read/write works with transform
  • Metadata preserved

Full Completion:

  • All minimum criteria plus:
  • Error handling and cleanup documented
  • Failure demo with exit code

Excellence (Going Above & Beyond):

  • Encryption transform
  • Performance optimizations and caching