Project 3: The Filesystem Explorer (ls -R Clone)
Build a recursive
lsclone that reads directory entries and inode metadata.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | Weekend |
| Language | C (Alt: Rust) |
| Prerequisites | C structs, recursion |
| Key Topics | inodes, stat, directory traversal |
1. Learning Objectives
By completing this project, you will:
- Traverse directories with
opendir/readdir. - Use
statto read inode metadata. - Format permissions and timestamps.
- Recursively walk directory trees safely.
2. Theoretical Foundation
2.1 Core Concepts
- Directory entries: Map file names to inode numbers.
- Inodes: Store metadata like size, permissions, and timestamps.
- Stat syscall: The gateway from names to inode data.
2.2 Why This Matters
Understanding directory entries vs inodes explains how filenames, hard links, and metadata really work.
2.3 Historical Context / Background
Unix directory structures have long separated names (directory) from metadata (inode).
2.4 Common Misconceptions
- “The name is in the file”: It lives in the directory.
- “readdir gives metadata”: You must call stat for that.
3. Project Specification
3.1 What You Will Build
A tool myls that supports -l and -R, printing permissions, sizes, and timestamps.
3.2 Functional Requirements
- List directory contents in simple mode.
- Support
-lfor detailed metadata. - Support
-Rfor recursive traversal.
3.3 Non-Functional Requirements
- Reliability: Skip unreadable directories gracefully.
- Usability: Output format matches
lsclosely.
3.4 Example Usage / Output
$ ./myls -l
-rw-r--r-- 1 user group 512 Dec 22 10:01 main.c
3.5 Real World Outcome
You will list directories recursively and see accurate metadata:
$ ./myls -R
./src:
main.c
util.c
4. Solution Architecture
4.1 High-Level Design
open dir -> read entries -> stat each -> format -> recurse into subdirs
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Directory walker | Iterate entries | Skip . and .. |
| Stat formatter | Permissions/time | Use localtime |
| Recursion | Traverse tree | Avoid symlink loops |
4.3 Data Structures
struct stat st;
4.4 Algorithm Overview
Key Algorithm: Recursive Walk
- List current directory.
- For each entry, if directory, recurse.
Complexity Analysis:
- Time: O(n) entries
- Space: O(depth) recursion
5. Implementation Guide
5.1 Development Environment Setup
gcc --version
5.2 Project Structure
project-root/
├── myls.c
└── README.md
5.3 The Core Question You’re Answering
“Where is a file name stored, and how do you get its metadata?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- dirent vs stat
- Permission bit masks
- Symlinks and recursion safety
5.5 Questions to Guide Your Design
Before implementing, think through these:
- How will you format permission bits as
rwx? - How do you avoid recursing into symlink loops?
- How will you resolve user and group names?
5.6 Thinking Exercise
Map name -> inode
Pick a file, use ls -li, then compare inode number with stat output.
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “What is an inode?”
- “What is the difference between hard and soft links?”
- “Why is
statexpensive on network filesystems?”
5.8 Hints in Layers
Hint 1: Format permissions
Use bit masks on st_mode to build a string.
Hint 2: Skip . and .. Never recurse into these entries.
Hint 3: User/group names
Use getpwuid and getgrgid.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| File attributes | “TLPI” | Ch. 15 |
| Directories | “TLPI” | Ch. 18 |
5.10 Implementation Phases
Phase 1: Foundation (1-2 days)
Goals:
- Basic directory listing.
Tasks:
- Use opendir/readdir.
- Print names.
Checkpoint: Equivalent to ls.
Phase 2: Core Functionality (2 days)
Goals:
- Add
-ldetails.
Tasks:
- Call stat.
- Format permissions and time.
Checkpoint: Output matches ls -l.
Phase 3: Polish & Edge Cases (1-2 days)
Goals:
- Add recursion and safety.
Tasks:
- Implement
-R. - Avoid symlink loops.
Checkpoint: Recursion works on nested dirs.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Metadata source | stat vs lstat | lstat | Preserve symlink info |
| Recursion | DFS vs BFS | DFS | Simpler |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Listing | Names correct | Basic dir |
| Stat | Metadata correct | ls -l compare |
| Recursion | Nested dirs | small tree |
6.2 Critical Test Cases
- Hidden files are excluded unless
-aimplemented. - Symlinks are not followed recursively.
- Permission errors do not crash tool.
6.3 Test Data
Directory tree with nested folders
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Using stat on broken path | Errors | Build full path for entries |
| Recursing into symlink | Infinite loop | Use lstat |
| Incorrect perms | Wrong rwx | Verify bit masks |
7.2 Debugging Strategies
- Print full path before stat.
- Compare with
ls -loutput.
7.3 Performance Traps
Calling stat per file is expensive; accept cost for correctness.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
-ato show hidden files. - Add
-hfor human sizes.
8.2 Intermediate Extensions
- Add colorized output.
- Add sorting options.
8.3 Advanced Extensions
- Add file type icons.
- Parallelize directory stat calls.
9. Real-World Connections
9.1 Industry Applications
- Filesystem inspection tools and backup utilities.
9.2 Related Open Source Projects
- coreutils ls: https://www.gnu.org/software/coreutils/
9.3 Interview Relevance
- Inode, stat, and directory traversal are common Unix interview topics.
10. Resources
10.1 Essential Reading
- stat(2) -
man 2 stat - readdir(3) -
man 3 readdir
10.2 Video Resources
- Filesystem internals talks (search “inode directory entry”)
10.3 Tools & Documentation
- ls(1) -
man 1 ls
10.4 Related Projects in This Series
- Process Psychic: uses /proc for metadata.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain inode vs filename.
- I can decode permission bits.
- I can explain stat vs lstat.
11.2 Implementation
- Basic listing works.
-lformatting is correct.- Recursion is safe.
11.3 Growth
- I can extend to sorting and colors.
- I can explain why ls is slow on NFS.
12. Submission / Completion Criteria
Minimum Viable Completion:
- List directory entries and support
-l.
Full Completion:
- Add
-Rrecursion with safe traversal.
Excellence (Going Above & Beyond):
- Add sorting, colors, and human sizes.
This guide was generated from LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md. For the complete learning path, see the parent directory.