Project 3: The Filesystem Explorer (ls -R Clone)

Build a recursive ls clone that reads directory entries and inode metadata.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate Weekend
Language C (Alt: Rust)
Prerequisites C structs, recursion
Key Topics inodes, stat, directory traversal

1. Learning Objectives

By completing this project, you will:

  1. Traverse directories with opendir/readdir.
  2. Use stat to read inode metadata.
  3. Format permissions and timestamps.
  4. Recursively walk directory trees safely.

2. Theoretical Foundation

2.1 Core Concepts

  • Directory entries: Map file names to inode numbers.
  • Inodes: Store metadata like size, permissions, and timestamps.
  • Stat syscall: The gateway from names to inode data.

2.2 Why This Matters

Understanding directory entries vs inodes explains how filenames, hard links, and metadata really work.

2.3 Historical Context / Background

Unix directory structures have long separated names (directory) from metadata (inode).

2.4 Common Misconceptions

  • “The name is in the file”: It lives in the directory.
  • “readdir gives metadata”: You must call stat for that.

3. Project Specification

3.1 What You Will Build

A tool myls that supports -l and -R, printing permissions, sizes, and timestamps.

3.2 Functional Requirements

  1. List directory contents in simple mode.
  2. Support -l for detailed metadata.
  3. Support -R for recursive traversal.

3.3 Non-Functional Requirements

  • Reliability: Skip unreadable directories gracefully.
  • Usability: Output format matches ls closely.

3.4 Example Usage / Output

$ ./myls -l
-rw-r--r-- 1 user group 512 Dec 22 10:01 main.c

3.5 Real World Outcome

You will list directories recursively and see accurate metadata:

$ ./myls -R
./src:
main.c
util.c

4. Solution Architecture

4.1 High-Level Design

open dir -> read entries -> stat each -> format -> recurse into subdirs

4.2 Key Components

Component Responsibility Key Decisions
Directory walker Iterate entries Skip . and ..
Stat formatter Permissions/time Use localtime
Recursion Traverse tree Avoid symlink loops

4.3 Data Structures

struct stat st;

4.4 Algorithm Overview

Key Algorithm: Recursive Walk

  1. List current directory.
  2. For each entry, if directory, recurse.

Complexity Analysis:

  • Time: O(n) entries
  • Space: O(depth) recursion

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── myls.c
└── README.md

5.3 The Core Question You’re Answering

“Where is a file name stored, and how do you get its metadata?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. dirent vs stat
  2. Permission bit masks
  3. Symlinks and recursion safety

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. How will you format permission bits as rwx?
  2. How do you avoid recursing into symlink loops?
  3. How will you resolve user and group names?

5.6 Thinking Exercise

Map name -> inode

Pick a file, use ls -li, then compare inode number with stat output.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is an inode?”
  2. “What is the difference between hard and soft links?”
  3. “Why is stat expensive on network filesystems?”

5.8 Hints in Layers

Hint 1: Format permissions Use bit masks on st_mode to build a string.

Hint 2: Skip . and .. Never recurse into these entries.

Hint 3: User/group names Use getpwuid and getgrgid.

5.9 Books That Will Help

Topic Book Chapter
File attributes “TLPI” Ch. 15
Directories “TLPI” Ch. 18

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals:

  • Basic directory listing.

Tasks:

  1. Use opendir/readdir.
  2. Print names.

Checkpoint: Equivalent to ls.

Phase 2: Core Functionality (2 days)

Goals:

  • Add -l details.

Tasks:

  1. Call stat.
  2. Format permissions and time.

Checkpoint: Output matches ls -l.

Phase 3: Polish & Edge Cases (1-2 days)

Goals:

  • Add recursion and safety.

Tasks:

  1. Implement -R.
  2. Avoid symlink loops.

Checkpoint: Recursion works on nested dirs.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Metadata source stat vs lstat lstat Preserve symlink info
Recursion DFS vs BFS DFS Simpler

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Listing Names correct Basic dir
Stat Metadata correct ls -l compare
Recursion Nested dirs small tree

6.2 Critical Test Cases

  1. Hidden files are excluded unless -a implemented.
  2. Symlinks are not followed recursively.
  3. Permission errors do not crash tool.

6.3 Test Data

Directory tree with nested folders

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Using stat on broken path Errors Build full path for entries
Recursing into symlink Infinite loop Use lstat
Incorrect perms Wrong rwx Verify bit masks

7.2 Debugging Strategies

  • Print full path before stat.
  • Compare with ls -l output.

7.3 Performance Traps

Calling stat per file is expensive; accept cost for correctness.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add -a to show hidden files.
  • Add -h for human sizes.

8.2 Intermediate Extensions

  • Add colorized output.
  • Add sorting options.

8.3 Advanced Extensions

  • Add file type icons.
  • Parallelize directory stat calls.

9. Real-World Connections

9.1 Industry Applications

  • Filesystem inspection tools and backup utilities.
  • coreutils ls: https://www.gnu.org/software/coreutils/

9.3 Interview Relevance

  • Inode, stat, and directory traversal are common Unix interview topics.

10. Resources

10.1 Essential Reading

  • stat(2) - man 2 stat
  • readdir(3) - man 3 readdir

10.2 Video Resources

  • Filesystem internals talks (search “inode directory entry”)

10.3 Tools & Documentation

  • ls(1) - man 1 ls

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain inode vs filename.
  • I can decode permission bits.
  • I can explain stat vs lstat.

11.2 Implementation

  • Basic listing works.
  • -l formatting is correct.
  • Recursion is safe.

11.3 Growth

  • I can extend to sorting and colors.
  • I can explain why ls is slow on NFS.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • List directory entries and support -l.

Full Completion:

  • Add -R recursion with safe traversal.

Excellence (Going Above & Beyond):

  • Add sorting, colors, and human sizes.

This guide was generated from LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md. For the complete learning path, see the parent directory.