Project 2: Advanced File Information Tool
Build a comprehensive file information tool that displays all metadata available from the stat structure—permissions, ownership, timestamps, inode numbers, block counts, and file type.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2 - Intermediate |
| Time Estimate | 10-20 hours |
| Language | C (primary), Rust/Go (alternatives) |
| Prerequisites | Project 1 (File Copy), C programming, basic UNIX |
| Key Topics | File Metadata, stat/lstat, Permissions, Inodes, File Types |
1. Learning Objectives
After completing this project, you will:
- Understand the complete
struct statand what each field represents - Master the difference between
stat()andlstat()for symbolic links - Parse and display file permission bits including special bits (setuid, setgid, sticky)
- Distinguish all UNIX file types (regular, directory, symlink, device, FIFO, socket)
- Convert between numeric UIDs/GIDs and human-readable usernames/groups
- Understand inode structure and what metadata filesystems actually store
- Work with UNIX timestamps and nanosecond precision
- Understand the relationship between hard links, inodes, and filenames
2. Theoretical Foundation
2.1 Core Concepts
The Inode: Heart of UNIX Filesystems
Every file in a UNIX filesystem has an inode (index node) that stores its metadata. The filename is NOT stored in the inode—it’s stored in the directory.
Directory Entry Inode (on disk)
┌───────────────────────────┐ ┌─────────────────────────────┐
│ filename: "myfile.txt" │ │ st_mode: 0100644 │
│ inode: 12345 ─────┼──────────▶│ st_nlink: 2 │
└───────────────────────────┘ │ st_uid: 1000 │
│ st_gid: 1000 │
Directory Entry (hard link) │ st_size: 4096 │
┌───────────────────────────┐ │ st_atime: 1710523456 │
│ filename: "hardlink.txt" │ │ st_mtime: 1710523400 │
│ inode: 12345 ─────┼──────────▶│ st_ctime: 1710523400 │
└───────────────────────────┘ │ st_blocks: 8 │
│ st_blksize: 4096 │
SAME INODE, different names! │ data block pointers... │
(st_nlink = 2) └─────────────────────────────┘
This design explains:
- Hard links: Multiple directory entries pointing to same inode
- Why
rmdoesn’t always free space: Another link may exist - Why
mvwithin filesystem is instant: Just change directory entry - Why permissions are per-file, not per-name: Stored in inode
The stat Structure
struct stat {
dev_t st_dev; // ID of device containing file
ino_t st_ino; // Inode number
mode_t st_mode; // File type and permissions
nlink_t st_nlink; // Number of hard links
uid_t st_uid; // User ID of owner
gid_t st_gid; // Group ID of owner
dev_t st_rdev; // Device ID (if special file)
off_t st_size; // Total size in bytes
blksize_t st_blksize; // Block size for filesystem I/O
blkcnt_t st_blocks; // Number of 512-byte blocks allocated
time_t st_atime; // Time of last access
time_t st_mtime; // Time of last modification
time_t st_ctime; // Time of last status change
};
File Type and Permission Encoding (st_mode)
The st_mode field packs file type AND permissions into a single 16-bit value:
st_mode bit layout (16 bits):
┌────────────────────────────────────────────────────────────────┐
│ Bits 15-12 │ Bit 11 │ Bit 10 │ Bit 9 │ Bits 8-6 │ Bits 5-3 │ Bits 2-0 │
├────────────┼────────┼────────┼────────┼──────────┼──────────┼──────────┤
│ File Type │ SetUID │ SetGID │ Sticky │ User │ Group │ Other │
│ (4 bits) │ │ │ │ rwx │ rwx │ rwx │
└────────────┴────────┴────────┴────────┴──────────┴──────────┴──────────┘
File Type Values (bits 15-12):
S_IFREG = 0100000 (regular file) '-'
S_IFDIR = 0040000 (directory) 'd'
S_IFLNK = 0120000 (symbolic link) 'l'
S_IFCHR = 0020000 (character device) 'c'
S_IFBLK = 0060000 (block device) 'b'
S_IFIFO = 0010000 (FIFO/named pipe) 'p'
S_IFSOCK = 0140000 (socket) 's'
Example: st_mode = 0100755 (octal)
┌───────────────┬───────┬───────┬───────┬───────┬───────┬───────┐
│ 1000 (file) │ 0 │ 0 │ 0 │ 111 │ 101 │ 101 │
│ S_IFREG │ no │ no │ no │ rwx │ r-x │ r-x │
└───────────────┴───────┴───────┴───────┴───────┴───────┴───────┘
Display: -rwxr-xr-x
Example: st_mode = 0041777 (octal) - /tmp directory
┌───────────────┬───────┬───────┬───────┬───────┬───────┬───────┐
│ 0100 (dir) │ 0 │ 0 │ 1 │ 111 │ 111 │ 111 │
│ S_IFDIR │ no │ no │ sticky│ rwx │ rwx │ rwx │
└───────────────┴───────┴───────┴───────┴───────┴───────┴───────┘
Display: drwxrwxrwt (note the 't' for sticky bit)
stat() vs lstat()
Regular file:
stat("/path/to/file") → Returns file's metadata
lstat("/path/to/file") → Returns file's metadata (same)
Symbolic link:
stat("/path/to/link") → Returns TARGET's metadata (follows link)
lstat("/path/to/link") → Returns LINK's metadata (doesn't follow)
lstat() stat()
│ │
▼ │
┌────────────────────────────────┐ │
│ symlink: /path/to/link │ │
│ → points to: /real/target │───────────────┘
│ size: 12 (length of target) │ │
│ type: S_IFLNK │ │
└────────────────────────────────┘ │
▼
┌────────────────────────────────┐
│ file: /real/target │
│ size: 4096 (actual content) │
│ type: S_IFREG │
└────────────────────────────────┘
The Three Timestamps
┌─────────────┬──────────────────────────────────────────────────────────┐
│ Timestamp │ What It Represents / When It's Updated │
├─────────────┼──────────────────────────────────────────────────────────┤
│ st_atime │ Access time - last time file data was read │
│ (atime) │ Updated by: read(), mmap(), open()+read │
│ │ Note: Many systems use "noatime" or "relatime" to │
│ │ reduce disk writes │
├─────────────┼──────────────────────────────────────────────────────────┤
│ st_mtime │ Modification time - last time file content changed │
│ (mtime) │ Updated by: write(), truncate() │
│ │ This is what `ls -l` shows by default │
├─────────────┼──────────────────────────────────────────────────────────┤
│ st_ctime │ Change time - last time inode metadata changed │
│ (ctime) │ Updated by: chmod(), chown(), rename(), link() │
│ │ Also updated when mtime changes │
│ │ Note: NOT creation time! UNIX has no creation time* │
└─────────────┴──────────────────────────────────────────────────────────┘
* Some filesystems (ext4, xfs) now have st_birthtime/st_crtime for creation
2.2 Why This Matters
Understanding file metadata is essential for:
- Backup software: Must preserve permissions, ownership, timestamps
- Security tools: Detect setuid binaries, world-writable files
- File managers: Display file information, size, type
- Build systems: Use mtime to detect changed files
- Forensics: Analyze file access patterns via timestamps
- Version control: Git uses inode info for efficient status checks
Real-world examples:
find -perm -4000finds setuid files (security audit)makecompares source and object mtimesrsyncuses size + mtime for quick change detectiongit statususes stat() to detect modified files
2.3 Historical Context
The inode concept dates to the original UNIX filesystem (1969-1971). Dennis Ritchie and Ken Thompson designed a system where:
- Filenames are just human-readable labels
- The real file identity is the inode number
- Multiple names can point to the same file (hard links)
- This enables efficient file operations
The stat structure has remained remarkably stable. Modern additions (nanosecond timestamps, 64-bit sizes) maintain backward compatibility. The POSIX standard ensures this works across all UNIX-like systems.
2.4 Common Misconceptions
Misconception 1: “ctime is creation time”
- Reality: ctime is “change time” (metadata change). UNIX traditionally has no creation time.
Misconception 2: “Deleting a file removes it immediately”
- Reality: It removes a directory entry. The inode persists until all links and open file handles are gone.
Misconception 3: “File size equals disk usage”
- Reality: Sparse files can have large st_size but few st_blocks. Compression/dedup also affects this.
Misconception 4: “chmod changes mtime”
- Reality: chmod changes ctime (metadata change), not mtime (content change).
Misconception 5: “Symbolic links have the same permissions as their target”
- Reality: Symlink permissions (usually lrwxrwxrwx) are mostly ignored; target permissions matter.
3. Project Specification
3.1 What You Will Build
A file information tool called mystat that:
- Displays all metadata from the stat structure
- Shows file type (regular, directory, symlink, device, etc.)
- Displays permissions in both symbolic (rwxr-xr-x) and octal (0755) formats
- Resolves UIDs/GIDs to usernames/group names
- Shows all three timestamps with nanosecond precision
- Handles symbolic links correctly (with -L flag to follow)
- Matches or exceeds the system
statcommand in detail
3.2 Functional Requirements
- Basic operation:
mystat FILEdisplays complete file metadata - Multiple files:
mystat FILE1 FILE2 ...shows info for each - Symbolic link handling: Default uses lstat();
-Lfollows links - Permission display: Show both “-rwxr-xr-x” and “(0755)” formats
- Username lookup: Convert UIDs to usernames, GIDs to group names
- File type detection: Correctly identify all 7 UNIX file types
- Time formatting: Human-readable timestamps with nanoseconds
- Device numbers: For block/char devices, show major/minor numbers
3.3 Non-Functional Requirements
- Correctness: Output must match actual file metadata
- Error handling: Graceful handling of non-existent files, permission issues
- Portability: Should work on Linux and macOS (with minor variations)
- Performance: Should handle thousands of files efficiently
- Memory safety: No leaks, no buffer overflows
3.4 Example Usage / Output
# 1. Check a regular file
$ ./mystat /bin/ls
File: /bin/ls
Type: regular file
Size: 142144 bytes (139 KiB)
Blocks: 280 (of 512 bytes each)
IO Block: 4096
Device: 8,1 (major,minor)
Inode: 131073
Links: 1
Access: -rwxr-xr-x (0755)
Uid: root (0)
Gid: root (0)
Access Time: 2024-03-15 10:23:45.123456789 -0700
Modify Time: 2023-11-10 08:15:32.000000000 -0800
Change Time: 2024-01-02 14:30:00.123456789 -0800
# 2. Check a directory
$ ./mystat /tmp
File: /tmp
Type: directory
Size: 4096 bytes (4 KiB)
Blocks: 8 (of 512 bytes each)
IO Block: 4096
Device: 8,1 (major,minor)
Inode: 1048577
Links: 15
Access: drwxrwxrwt (1777) # Note the sticky bit!
Uid: root (0)
Gid: root (0)
Access Time: 2024-03-15 09:00:00.000000000 -0700
Modify Time: 2024-03-15 08:45:00.000000000 -0700
Change Time: 2024-03-15 08:45:00.000000000 -0700
# 3. Check a symbolic link (with lstat - default)
$ ./mystat /usr/bin/python3
File: /usr/bin/python3 -> python3.11
Type: symbolic link
Size: 10 bytes # Size of the link text itself
Link Target: python3.11
Inode: 524289
Links: 1
Access: lrwxrwxrwx (0777) # Symlink permissions (usually ignored)
Uid: root (0)
Gid: root (0)
# 4. Check a symbolic link (with -L flag - follows link)
$ ./mystat -L /usr/bin/python3
File: /usr/bin/python3 (-> python3.11)
Type: regular file
Size: 5479736 bytes (5.2 MiB)
...
# 5. Check a block device
$ ./mystat /dev/sda
File: /dev/sda
Type: block special
Size: 0 bytes
Device ID: 8,0 (major,minor) # This is st_rdev for devices
Inode: 123
Access: brw-rw---- (0660)
Uid: root (0)
Gid: disk (6)
# 6. Check multiple files
$ ./mystat /etc/passwd /etc/shadow
File: /etc/passwd
Type: regular file
...
File: /etc/shadow
mystat: /etc/shadow: Permission denied
# 7. Check a FIFO (named pipe)
$ mkfifo /tmp/testpipe
$ ./mystat /tmp/testpipe
File: /tmp/testpipe
Type: FIFO (named pipe)
Access: prw-r--r-- (0644)
3.5 Real World Outcome
What success looks like:
- Complete metadata display: All stat fields shown correctly
- Type detection: All 7 file types properly identified
- Permission parsing: Both human-readable and octal formats
- Symlink handling: Correct behavior with and without -L
- Edge case handling: Devices, FIFOs, sockets all work
- Proper error messages: Permission denied, not found handled gracefully
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────┐
│ mystat │
│ │
│ ┌─────────────────┐ │
│ │ Parse Args │ ← -L flag, file paths │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ For each file argument: │ │
│ │ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ stat()/lstat() │ ← Get struct stat │ │
│ │ └────────┬────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Display Info │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │
│ │ │ │ File Type │ │ Permissions │ │ │ │
│ │ │ │ S_ISREG? │ │ parse mode │ │ │ │
│ │ │ │ S_ISDIR? │ │ build string│ │ │ │
│ │ │ └─────────────┘ └─────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │
│ │ │ │ User/Group │ │ Timestamps │ │ │ │
│ │ │ │ getpwuid() │ │ strftime() │ │ │ │
│ │ │ │ getgrgid() │ │ + nanosec │ │ │ │
│ │ │ └─────────────┘ └─────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │
│ │ │ │ Size/Blocks │ │ Inode/Links │ │ │ │
│ │ │ │ human size │ │ st_ino │ │ │ │
│ │ │ │ st_blocks │ │ st_nlink │ │ │ │
│ │ │ └─────────────┘ └─────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────┐ │ │ │
│ │ │ │ Symlink? │ ← readlink() if symlink │ │ │
│ │ │ │ show target │ │ │ │
│ │ │ └─────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Purpose | Key Functions |
|---|---|---|
| Argument Parser | Handle -L flag and file list | getopt() or manual |
| Stat Caller | Call stat() or lstat() based on flags | stat(), lstat() |
| Type Detector | Identify file type from st_mode | S_ISREG, S_ISDIR, etc. |
| Permission Formatter | Convert mode to “-rwxr-xr-x” string | Bit manipulation |
| User/Group Resolver | Convert UIDs/GIDs to names | getpwuid(), getgrgid() |
| Time Formatter | Format timestamps human-readably | strftime(), localtime() |
| Symlink Handler | Read and display symlink targets | readlink() |
4.3 Data Structures
// Configuration from command line
typedef struct {
int follow_symlinks; // -L flag
int verbose; // -v flag (extra detail)
const char **files; // Array of file paths
int num_files; // Number of files to process
} config_t;
// File information (wrapper around struct stat)
typedef struct {
struct stat st; // The raw stat structure
const char *path; // Original path argument
char *link_target; // Symlink target (if applicable)
char *username; // Resolved username
char *groupname; // Resolved group name
char type_char; // 'd', '-', 'l', 'c', 'b', 'p', 's'
char perms[11]; // "-rwxr-xr-x" string
} file_info_t;
4.4 Algorithm Overview
FUNCTION display_file_info(path, follow_symlinks):
// Step 1: Get file metadata
IF follow_symlinks:
result = stat(path, &st)
ELSE:
result = lstat(path, &st)
IF result == -1:
PRINT error message with errno
RETURN
// Step 2: Determine file type
type_char = get_type_char(st.st_mode)
type_name = get_type_name(st.st_mode)
// Step 3: Build permission string
perms = format_permissions(st.st_mode)
// Step 4: Resolve user/group names
pw = getpwuid(st.st_uid)
username = pw ? pw->pw_name : numeric_string(st.st_uid)
gr = getgrgid(st.st_gid)
groupname = gr ? gr->gr_name : numeric_string(st.st_gid)
// Step 5: Handle symlink target
IF S_ISLNK(st.st_mode):
target = readlink(path)
// Step 6: Format timestamps
atime_str = format_time(st.st_atime, st.st_atim.tv_nsec)
mtime_str = format_time(st.st_mtime, st.st_mtim.tv_nsec)
ctime_str = format_time(st.st_ctime, st.st_ctim.tv_nsec)
// Step 7: Print all information
PRINT all formatted fields
5. Implementation Guide
5.1 Development Environment Setup
# Create project directory
$ mkdir -p ~/projects/mystat
$ cd ~/projects/mystat
# Create test files of different types
$ touch regular_file.txt # Regular file
$ mkdir test_directory # Directory
$ ln -s regular_file.txt symlink # Symbolic link
$ ln regular_file.txt hardlink # Hard link
$ mkfifo fifo_pipe # Named pipe (FIFO)
# Check what devices exist (need root for some)
$ ls -la /dev/sda /dev/null /dev/tty
# Create socket for testing (requires netcat)
$ nc -lU /tmp/test.sock &
$ ls -la /tmp/test.sock
5.2 Project Structure
mystat/
├── mystat.c # Main source file
├── Makefile # Build configuration
├── regular_file.txt # Test: regular file
├── test_directory/ # Test: directory
├── symlink # Test: symbolic link
├── hardlink # Test: hard link
├── fifo_pipe # Test: FIFO
└── README.md # Documentation
5.3 The Core Question You’re Answering
“How does UNIX store and represent file metadata, and what’s the difference between a file’s data and its attributes?”
Every file has two parts: the data (contents) and the inode (metadata). Understanding the inode structure explains why hard links work, why mv is instant within a filesystem, and why permissions are per-file, not per-name.
5.4 Concepts You Must Understand First
Stop and research these before coding:
- The stat Structure
- What fields does
struct statcontain? - What’s the difference between
stat()andlstat()? - Book Reference: “APUE” Ch. 4.2
- What fields does
- File Types in UNIX
- Regular file, directory, character device, block device, FIFO, socket, symbolic link
- How are these encoded in
st_mode? - Book Reference: “APUE” Ch. 4.3
- Permission Bits
- What do r, w, x mean for files vs directories?
- What are setuid, setgid, and sticky bits?
- Book Reference: “APUE” Ch. 4.5-4.6
- The Three Timestamps
- Access time (atime), modification time (mtime), change time (ctime)
- Which operations update which timestamps?
5.5 Questions to Guide Your Design
Before implementing, think through these:
- Output Format
- Should you match
statcommand format or design your own? - How to display times—Unix epoch or human-readable?
- Should sizes be in bytes only, or also human-readable (KiB, MiB)?
- Should you match
- Symlink Handling
- Should you follow symlinks by default or not?
- How to show both link info and target info?
- What if the symlink target doesn’t exist (dangling link)?
- Error Messages
- How to handle permission denied?
- How to handle non-existent files?
- Should one bad file stop processing of subsequent files?
5.6 Thinking Exercise
Decode a Mode Value
Given st_mode = 0100755 (octal), decode it:
st_mode bits:
┌────────────────────────────────────────────────────────────────┐
│ File Type │ SetUID │ SetGID │ Sticky │ User │Group│Other │
│ (4 bits) │ S │ S │ T │ rwx │ rwx │ rwx │
├───────────────┼────────┼────────┼────────┼──────┼─────┼────────┤
│ 1000 │ 0 │ 0 │ 0 │ 7 │ 5 │ 5 │
│ (regular) │ (no) │ (no) │ (no) │ rwx │ r-x │ r-x │
└────────────────────────────────────────────────────────────────┘
Result: -rwxr-xr-x
S_ISREG(mode) → true (0100000 = regular file)
mode & S_IRWXU → 0700 (user: rwx)
mode & S_IRWXG → 0050 (group: r-x)
mode & S_IRWXO → 0005 (other: r-x)
Exercise Questions:
- What would
st_mode = 040755represent? (Answer: directory, drwxr-xr-x) - What about
st_mode = 0104755? (Answer: regular file with setuid, -rwsr-xr-x) - What about
st_mode = 0041777? (Answer: directory with sticky bit, drwxrwxrwt)
5.7 Hints in Layers
Hint 1: Getting Started
Call stat() or lstat() on the file path. Check the return value for errors.
struct stat st;
if (lstat(path, &st) == -1) {
perror(path);
return -1;
}
Hint 2: File Type Detection
Use the macros: S_ISREG(mode), S_ISDIR(mode), S_ISLNK(mode), etc.
char get_type_char(mode_t mode) {
if (S_ISREG(mode)) return '-';
if (S_ISDIR(mode)) return 'd';
if (S_ISLNK(mode)) return 'l';
if (S_ISCHR(mode)) return 'c';
if (S_ISBLK(mode)) return 'b';
if (S_ISFIFO(mode)) return 'p';
if (S_ISSOCK(mode)) return 's';
return '?';
}
Hint 3: Permission String
// Pseudocode for permission string
void format_permissions(mode_t mode, char *perms) {
perms[0] = get_type_char(mode);
perms[1] = (mode & S_IRUSR) ? 'r' : '-';
perms[2] = (mode & S_IWUSR) ? 'w' : '-';
perms[3] = (mode & S_IXUSR) ? 'x' : '-';
// Handle setuid: if S_ISUID, perms[3] = 's' or 'S'
if (mode & S_ISUID)
perms[3] = (mode & S_IXUSR) ? 's' : 'S';
perms[4] = (mode & S_IRGRP) ? 'r' : '-';
perms[5] = (mode & S_IWGRP) ? 'w' : '-';
perms[6] = (mode & S_IXGRP) ? 'x' : '-';
// Handle setgid: if S_ISGID, perms[6] = 's' or 'S'
if (mode & S_ISGID)
perms[6] = (mode & S_IXGRP) ? 's' : 'S';
perms[7] = (mode & S_IROTH) ? 'r' : '-';
perms[8] = (mode & S_IWOTH) ? 'w' : '-';
perms[9] = (mode & S_IXOTH) ? 'x' : '-';
// Handle sticky: if S_ISVTX, perms[9] = 't' or 'T'
if (mode & S_ISVTX)
perms[9] = (mode & S_IXOTH) ? 't' : 'T';
perms[10] = '\0';
}
Hint 4: Username Lookup
Use getpwuid(st.st_uid) to get the username from UID. Returns NULL if not found.
struct passwd *pw = getpwuid(st.st_uid);
if (pw) {
printf("Uid: %s (%d)\n", pw->pw_name, st.st_uid);
} else {
printf("Uid: %d\n", st.st_uid);
}
5.8 The Interview Questions They’ll Ask
- “What’s the difference between stat() and lstat()?”
- stat() follows symbolic links, lstat() doesn’t
- For symlinks: stat() returns target info, lstat() returns link info
- “How are hard links implemented at the filesystem level?”
- Multiple directory entries pointing to the same inode
- st_nlink shows the link count
- File is deleted when link count reaches 0 AND no open handles
- “Why does
rmon a file sometimes not free disk space?”- Another hard link may exist
- A process may still have the file open
- Space is freed when all links AND handles are gone
- “What’s the sticky bit and when would you use it?”
- On directories: only file owner can delete their files
- Used on /tmp to prevent users from deleting each other’s files
- Shows as ‘t’ or ‘T’ in permission string
- “How would you find all setuid programs on a system?”
find / -perm -4000 -type f- Check
st_mode & S_ISUID - Important for security audits
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| File metadata | “APUE” by Stevens | Ch. 4 |
| Filesystem structure | “The Linux Programming Interface” | Ch. 14-18 |
| Inode details | “Understanding the Linux Kernel” | Ch. 12 |
| Permission system | “The Linux Programming Interface” | Ch. 15 |
5.10 Implementation Phases
Phase 1: Basic stat() (2-3 hours)
- Accept single file argument
- Call stat() and print raw values
- Handle stat() errors
Phase 2: File Type Detection (2-3 hours)
- Implement type detection with S_IS* macros
- Add type name display
- Test with various file types
Phase 3: Permission Formatting (2-3 hours)
- Build permission string from st_mode
- Handle setuid, setgid, sticky bits
- Show both symbolic and octal formats
Phase 4: User/Group Resolution (1-2 hours)
- Use getpwuid() and getgrgid()
- Handle missing users/groups gracefully
Phase 5: Symlink Handling (2-3 hours)
- Implement lstat() as default
- Add -L flag for stat()
- Use readlink() to get symlink target
Phase 6: Timestamps and Polish (2-3 hours)
- Format timestamps with strftime()
- Add nanosecond precision
- Handle timezone display
5.11 Key Implementation Decisions
| Decision | Trade-offs |
|---|---|
| lstat() vs stat() default | lstat() is safer (don’t follow links by accident) |
| Output format | Match stat command for familiarity, or custom for readability |
| Nanosecond timestamps | Use st_atim.tv_nsec (POSIX) or st_atimespec.tv_nsec (BSD) |
| Error handling | Continue on errors or stop? Usually continue for multiple files |
| Human-readable sizes | Calculate KiB/MiB/GiB from st_size |
6. Testing Strategy
6.1 Unit Tests
| Test | Input | Expected Result |
|---|---|---|
| type_char for regular file | 0100644 | ’-‘ |
| type_char for directory | 0040755 | ‘d’ |
| type_char for symlink | 0120777 | ‘l’ |
| perms for 0755 | 0100755 | “-rwxr-xr-x” |
| perms for 4755 | 0104755 | “-rwsr-xr-x” |
| perms for 1777 | 0041777 | “drwxrwxrwt” |
6.2 Integration Tests
# Test regular file
$ ./mystat /bin/ls
# Should show: Type: regular file, permissions, etc.
# Test directory
$ ./mystat /tmp
# Should show: Type: directory, sticky bit if set
# Test symbolic link without -L
$ ln -s /etc/passwd testlink
$ ./mystat testlink
# Should show: Type: symbolic link, link target
# Test symbolic link with -L
$ ./mystat -L testlink
# Should show target file info
# Compare with system stat
$ stat /bin/ls
$ ./mystat /bin/ls
# Major fields should match
6.3 Edge Cases to Test
| Case | Command | Expected Behavior |
|---|---|---|
| Non-existent file | ./mystat nosuchfile |
Error: No such file |
| Permission denied | ./mystat /etc/shadow |
Error: Permission denied |
| Dangling symlink | ln -s /nonexistent broken && ./mystat broken |
Show link info, note broken |
| Device files | ./mystat /dev/null |
Show character special, 1,3 |
| Setuid binary | ./mystat /bin/su |
Show ‘s’ in permissions |
| Socket | ./mystat /var/run/docker.sock |
Show socket type |
| Multiple files | ./mystat /etc/passwd /etc/shadow |
Show both, error on shadow |
6.4 Verification Commands
# Compare output with system stat
$ stat --format='Type: %F\nSize: %s\nInode: %i\nLinks: %h' /bin/ls
$ ./mystat /bin/ls | grep -E 'Type|Size|Inode|Links'
# Verify permission parsing
$ ls -la /bin/ls
$ ./mystat /bin/ls | grep Access
# Check symlink handling
$ stat -L /usr/bin/python3 # follows link
$ stat /usr/bin/python3 # doesn't follow
$ ./mystat -L /usr/bin/python3
$ ./mystat /usr/bin/python3
# Test with valgrind
$ valgrind --leak-check=full ./mystat /bin/ls
7. Common Pitfalls & Debugging
Problem 1: “Always shows ‘regular file’ for symlinks”
- Why: Using
stat()instead oflstat() - Fix: Use
lstat()by default to get info about the link itself
Problem 2: “Permission bits wrong for special files”
- Why: Not handling setuid/setgid/sticky in permission string
- Fix: Check
S_ISUID,S_ISGID,S_ISVTXand modify x to s/S or t/T
Problem 3: “getpwuid returns NULL”
- Why: User doesn’t exist (e.g., file from another system)
- Fix: Display numeric UID when lookup fails
Problem 4: “Timestamps show wrong timezone”
- Why: Using gmtime() instead of localtime()
- Fix: Use localtime() for local timezone display
Problem 5: “Size shows wrong for block devices”
- Why: st_size is 0 for devices; actual size requires ioctl
- Fix: Note that st_size is not meaningful for devices
Problem 6: “Nanoseconds not showing”
- Why: Using st_atime (seconds only) instead of st_atim.tv_nsec
- Fix: Access the timespec structure for nanoseconds
8. Extensions & Challenges
8.1 Easy Extensions
| Extension | Description | Learning |
|---|---|---|
| JSON output | -j flag for JSON format | Structured output |
| Human sizes | Show “4.5 MiB” instead of bytes | Size formatting |
| Recursive | -R for directory contents | Directory iteration |
| Minimal output | -s for single-line summary | Output formatting |
8.2 Advanced Challenges
| Challenge | Description | Learning |
|---|---|---|
| ACL support | Show Access Control Lists | getfacl equivalent |
| Extended attrs | Show xattrs | getxattr() |
| SELinux context | Show security context | lgetfilecon() |
| File capabilities | Show capabilities | libcap |
| btrfs info | Show subvolume, compression | ioctl() |
8.3 Research Topics
- How do different filesystems store metadata? (ext4 vs xfs vs btrfs)
- What is the birth time (crtime) and which filesystems support it?
- How does NFS handle stat() across the network?
- What information is lost when copying files between filesystems?
9. Real-World Connections
9.1 Production Systems Using This
| System | How It Uses stat() | Notable Feature |
|---|---|---|
| ls | Lists file metadata | Sorts by size/time |
| find | Searches by metadata | Permission/time filters |
| rsync | Compares files | Size+mtime optimization |
| git | Tracks file changes | Uses stat() for status |
| make | Dependency checking | Compares mtimes |
| tar/cpio | Archive metadata | Preserves permissions |
9.2 How the Pros Do It
GNU coreutils stat:
- Uses statx() on modern Linux for extended info
- Supports multiple output formats
- Handles SELinux contexts
rsync:
- Uses stat() to quickly compare files
- Falls back to checksums when timestamps unreliable
- Preserves all metadata with -a flag
9.3 Reading the Source
# View GNU coreutils stat source
$ git clone https://github.com/coreutils/coreutils
$ less coreutils/src/stat.c
# View FreeBSD stat source
$ less /usr/src/usr.bin/stat/stat.c
# Key functions to study:
# - struct stat field access
# - Permission string formatting
# - Time formatting
10. Resources
10.1 Man Pages
$ man 2 stat # stat() system call
$ man 2 lstat # lstat() system call
$ man 2 fstat # fstat() on file descriptor
$ man 7 inode # Inode structure details
$ man 3 getpwuid # User ID to name
$ man 3 getgrgid # Group ID to name
$ man 3 readlink # Read symlink target
$ man 3 strftime # Time formatting
10.2 Online Resources
- Linux man pages - stat(2)
- The Linux Programming Interface - File Attributes
- POSIX stat specification
10.3 Book Chapters
| Book | Chapters | Topics Covered |
|---|---|---|
| “APUE” by Stevens | Ch. 4 | Files and Directories |
| “TLPI” by Kerrisk | Ch. 14-18 | File Systems, File Attributes |
| “Linux System Programming” by Love | Ch. 7 | File and Directory Management |
11. Self-Assessment Checklist
Before considering this project complete, verify:
- I can explain what an inode is and what it contains
- I understand the difference between stat() and lstat()
- I can decode st_mode to determine file type and permissions
- I know what the setuid, setgid, and sticky bits do
- I can explain the three timestamps and what updates each
- My tool correctly identifies all 7 UNIX file types
- My permission string handles special bits correctly (s, S, t, T)
- I handle non-existent users/groups gracefully
- Symbolic links are handled correctly with and without -L
- I can answer all the interview questions listed above
- My code compiles without warnings and has no memory leaks
12. Submission / Completion Criteria
Your project is complete when:
- Functionality
./mystat FILEdisplays complete file metadata- All 7 file types correctly identified
- Permission string matches
ls -loutput - Timestamps show with nanosecond precision
- Quality
- Compiles with
gcc -Wall -Wextra -Werrorwith no warnings - Zero valgrind errors
- Graceful error handling
- Compiles with
- Testing
- Works with regular files, directories, symlinks
- Works with devices, FIFOs, sockets
- Handles permission errors gracefully
- -L flag correctly follows symlinks
- Understanding
- Can explain each field of struct stat
- Can decode any permission mode by hand
- Understands inode concept thoroughly
Next Steps
After completing this project, you’re ready for:
- Project 3: Directory Walker - Apply stat() recursively
- Project 4: Shell Implementation - Use stat() for command validation
- Project 5: Process Monitor - Apply these concepts to /proc filesystem
The understanding of file metadata you’ve gained here is essential for backup software, security tools, and file management applications.