Project 2: Advanced File Information Tool

Build a comprehensive file information tool that displays all metadata available from the stat structure—permissions, ownership, timestamps, inode numbers, block counts, and file type.

Quick Reference

Attribute Value
Difficulty Level 2 - Intermediate
Time Estimate 10-20 hours
Language C (primary), Rust/Go (alternatives)
Prerequisites Project 1 (File Copy), C programming, basic UNIX
Key Topics File Metadata, stat/lstat, Permissions, Inodes, File Types

1. Learning Objectives

After completing this project, you will:

  • Understand the complete struct stat and what each field represents
  • Master the difference between stat() and lstat() for symbolic links
  • Parse and display file permission bits including special bits (setuid, setgid, sticky)
  • Distinguish all UNIX file types (regular, directory, symlink, device, FIFO, socket)
  • Convert between numeric UIDs/GIDs and human-readable usernames/groups
  • Understand inode structure and what metadata filesystems actually store
  • Work with UNIX timestamps and nanosecond precision
  • Understand the relationship between hard links, inodes, and filenames

2. Theoretical Foundation

2.1 Core Concepts

The Inode: Heart of UNIX Filesystems

Every file in a UNIX filesystem has an inode (index node) that stores its metadata. The filename is NOT stored in the inode—it’s stored in the directory.

Directory Entry                          Inode (on disk)
┌───────────────────────────┐           ┌─────────────────────────────┐
│ filename: "myfile.txt"    │           │ st_mode: 0100644            │
│ inode:    12345      ─────┼──────────▶│ st_nlink: 2                 │
└───────────────────────────┘           │ st_uid: 1000                │
                                        │ st_gid: 1000                │
Directory Entry (hard link)             │ st_size: 4096               │
┌───────────────────────────┐           │ st_atime: 1710523456        │
│ filename: "hardlink.txt"  │           │ st_mtime: 1710523400        │
│ inode:    12345      ─────┼──────────▶│ st_ctime: 1710523400        │
└───────────────────────────┘           │ st_blocks: 8                │
                                        │ st_blksize: 4096            │
SAME INODE, different names!            │ data block pointers...      │
(st_nlink = 2)                          └─────────────────────────────┘

This design explains:

  • Hard links: Multiple directory entries pointing to same inode
  • Why rm doesn’t always free space: Another link may exist
  • Why mv within filesystem is instant: Just change directory entry
  • Why permissions are per-file, not per-name: Stored in inode

The stat Structure

struct stat {
    dev_t     st_dev;      // ID of device containing file
    ino_t     st_ino;      // Inode number
    mode_t    st_mode;     // File type and permissions
    nlink_t   st_nlink;    // Number of hard links
    uid_t     st_uid;      // User ID of owner
    gid_t     st_gid;      // Group ID of owner
    dev_t     st_rdev;     // Device ID (if special file)
    off_t     st_size;     // Total size in bytes
    blksize_t st_blksize;  // Block size for filesystem I/O
    blkcnt_t  st_blocks;   // Number of 512-byte blocks allocated
    time_t    st_atime;    // Time of last access
    time_t    st_mtime;    // Time of last modification
    time_t    st_ctime;    // Time of last status change
};

File Type and Permission Encoding (st_mode)

The st_mode field packs file type AND permissions into a single 16-bit value:

st_mode bit layout (16 bits):
┌────────────────────────────────────────────────────────────────┐
│ Bits 15-12 │ Bit 11 │ Bit 10 │ Bit 9  │ Bits 8-6 │ Bits 5-3 │ Bits 2-0 │
├────────────┼────────┼────────┼────────┼──────────┼──────────┼──────────┤
│ File Type  │ SetUID │ SetGID │ Sticky │   User   │  Group   │  Other   │
│  (4 bits)  │        │        │        │   rwx    │   rwx    │   rwx    │
└────────────┴────────┴────────┴────────┴──────────┴──────────┴──────────┘

File Type Values (bits 15-12):
  S_IFREG  = 0100000  (regular file)     '-'
  S_IFDIR  = 0040000  (directory)        'd'
  S_IFLNK  = 0120000  (symbolic link)    'l'
  S_IFCHR  = 0020000  (character device) 'c'
  S_IFBLK  = 0060000  (block device)     'b'
  S_IFIFO  = 0010000  (FIFO/named pipe)  'p'
  S_IFSOCK = 0140000  (socket)           's'

Example: st_mode = 0100755 (octal)
  ┌───────────────┬───────┬───────┬───────┬───────┬───────┬───────┐
  │ 1000 (file)   │   0   │   0   │   0   │  111  │  101  │  101  │
  │ S_IFREG       │ no    │ no    │ no    │  rwx  │  r-x  │  r-x  │
  └───────────────┴───────┴───────┴───────┴───────┴───────┴───────┘
  Display: -rwxr-xr-x

Example: st_mode = 0041777 (octal) - /tmp directory
  ┌───────────────┬───────┬───────┬───────┬───────┬───────┬───────┐
  │ 0100 (dir)    │   0   │   0   │   1   │  111  │  111  │  111  │
  │ S_IFDIR       │ no    │ no    │ sticky│  rwx  │  rwx  │  rwx  │
  └───────────────┴───────┴───────┴───────┴───────┴───────┴───────┘
  Display: drwxrwxrwt  (note the 't' for sticky bit)

stat() vs lstat()

Regular file:
stat("/path/to/file")   →  Returns file's metadata
lstat("/path/to/file")  →  Returns file's metadata (same)

Symbolic link:
stat("/path/to/link")   →  Returns TARGET's metadata (follows link)
lstat("/path/to/link")  →  Returns LINK's metadata (doesn't follow)

                     lstat()                      stat()
                        │                            │
                        ▼                            │
    ┌────────────────────────────────┐               │
    │ symlink: /path/to/link         │               │
    │   → points to: /real/target    │───────────────┘
    │   size: 12 (length of target)  │               │
    │   type: S_IFLNK                │               │
    └────────────────────────────────┘               │
                                                     ▼
                                     ┌────────────────────────────────┐
                                     │ file: /real/target             │
                                     │   size: 4096 (actual content)  │
                                     │   type: S_IFREG                │
                                     └────────────────────────────────┘

The Three Timestamps

┌─────────────┬──────────────────────────────────────────────────────────┐
│  Timestamp  │  What It Represents / When It's Updated                  │
├─────────────┼──────────────────────────────────────────────────────────┤
│  st_atime   │  Access time - last time file data was read             │
│  (atime)    │  Updated by: read(), mmap(), open()+read                 │
│             │  Note: Many systems use "noatime" or "relatime" to      │
│             │  reduce disk writes                                      │
├─────────────┼──────────────────────────────────────────────────────────┤
│  st_mtime   │  Modification time - last time file content changed     │
│  (mtime)    │  Updated by: write(), truncate()                        │
│             │  This is what `ls -l` shows by default                  │
├─────────────┼──────────────────────────────────────────────────────────┤
│  st_ctime   │  Change time - last time inode metadata changed         │
│  (ctime)    │  Updated by: chmod(), chown(), rename(), link()         │
│             │  Also updated when mtime changes                         │
│             │  Note: NOT creation time! UNIX has no creation time*    │
└─────────────┴──────────────────────────────────────────────────────────┘

* Some filesystems (ext4, xfs) now have st_birthtime/st_crtime for creation

2.2 Why This Matters

Understanding file metadata is essential for:

  • Backup software: Must preserve permissions, ownership, timestamps
  • Security tools: Detect setuid binaries, world-writable files
  • File managers: Display file information, size, type
  • Build systems: Use mtime to detect changed files
  • Forensics: Analyze file access patterns via timestamps
  • Version control: Git uses inode info for efficient status checks

Real-world examples:

  • find -perm -4000 finds setuid files (security audit)
  • make compares source and object mtimes
  • rsync uses size + mtime for quick change detection
  • git status uses stat() to detect modified files

2.3 Historical Context

The inode concept dates to the original UNIX filesystem (1969-1971). Dennis Ritchie and Ken Thompson designed a system where:

  • Filenames are just human-readable labels
  • The real file identity is the inode number
  • Multiple names can point to the same file (hard links)
  • This enables efficient file operations

The stat structure has remained remarkably stable. Modern additions (nanosecond timestamps, 64-bit sizes) maintain backward compatibility. The POSIX standard ensures this works across all UNIX-like systems.

2.4 Common Misconceptions

Misconception 1: “ctime is creation time”

  • Reality: ctime is “change time” (metadata change). UNIX traditionally has no creation time.

Misconception 2: “Deleting a file removes it immediately”

  • Reality: It removes a directory entry. The inode persists until all links and open file handles are gone.

Misconception 3: “File size equals disk usage”

  • Reality: Sparse files can have large st_size but few st_blocks. Compression/dedup also affects this.

Misconception 4: “chmod changes mtime”

  • Reality: chmod changes ctime (metadata change), not mtime (content change).

Misconception 5: “Symbolic links have the same permissions as their target”

  • Reality: Symlink permissions (usually lrwxrwxrwx) are mostly ignored; target permissions matter.

3. Project Specification

3.1 What You Will Build

A file information tool called mystat that:

  • Displays all metadata from the stat structure
  • Shows file type (regular, directory, symlink, device, etc.)
  • Displays permissions in both symbolic (rwxr-xr-x) and octal (0755) formats
  • Resolves UIDs/GIDs to usernames/group names
  • Shows all three timestamps with nanosecond precision
  • Handles symbolic links correctly (with -L flag to follow)
  • Matches or exceeds the system stat command in detail

3.2 Functional Requirements

  1. Basic operation: mystat FILE displays complete file metadata
  2. Multiple files: mystat FILE1 FILE2 ... shows info for each
  3. Symbolic link handling: Default uses lstat(); -L follows links
  4. Permission display: Show both “-rwxr-xr-x” and “(0755)” formats
  5. Username lookup: Convert UIDs to usernames, GIDs to group names
  6. File type detection: Correctly identify all 7 UNIX file types
  7. Time formatting: Human-readable timestamps with nanoseconds
  8. Device numbers: For block/char devices, show major/minor numbers

3.3 Non-Functional Requirements

  1. Correctness: Output must match actual file metadata
  2. Error handling: Graceful handling of non-existent files, permission issues
  3. Portability: Should work on Linux and macOS (with minor variations)
  4. Performance: Should handle thousands of files efficiently
  5. Memory safety: No leaks, no buffer overflows

3.4 Example Usage / Output

# 1. Check a regular file
$ ./mystat /bin/ls
File: /bin/ls
Type: regular file
Size: 142144 bytes (139 KiB)
Blocks: 280 (of 512 bytes each)
IO Block: 4096
Device: 8,1 (major,minor)
Inode: 131073
Links: 1
Access: -rwxr-xr-x (0755)
Uid: root (0)
Gid: root (0)
Access Time: 2024-03-15 10:23:45.123456789 -0700
Modify Time: 2023-11-10 08:15:32.000000000 -0800
Change Time: 2024-01-02 14:30:00.123456789 -0800

# 2. Check a directory
$ ./mystat /tmp
File: /tmp
Type: directory
Size: 4096 bytes (4 KiB)
Blocks: 8 (of 512 bytes each)
IO Block: 4096
Device: 8,1 (major,minor)
Inode: 1048577
Links: 15
Access: drwxrwxrwt (1777)  # Note the sticky bit!
Uid: root (0)
Gid: root (0)
Access Time: 2024-03-15 09:00:00.000000000 -0700
Modify Time: 2024-03-15 08:45:00.000000000 -0700
Change Time: 2024-03-15 08:45:00.000000000 -0700

# 3. Check a symbolic link (with lstat - default)
$ ./mystat /usr/bin/python3
File: /usr/bin/python3 -> python3.11
Type: symbolic link
Size: 10 bytes  # Size of the link text itself
Link Target: python3.11
Inode: 524289
Links: 1
Access: lrwxrwxrwx (0777)  # Symlink permissions (usually ignored)
Uid: root (0)
Gid: root (0)

# 4. Check a symbolic link (with -L flag - follows link)
$ ./mystat -L /usr/bin/python3
File: /usr/bin/python3 (-> python3.11)
Type: regular file
Size: 5479736 bytes (5.2 MiB)
...

# 5. Check a block device
$ ./mystat /dev/sda
File: /dev/sda
Type: block special
Size: 0 bytes
Device ID: 8,0 (major,minor)  # This is st_rdev for devices
Inode: 123
Access: brw-rw---- (0660)
Uid: root (0)
Gid: disk (6)

# 6. Check multiple files
$ ./mystat /etc/passwd /etc/shadow
File: /etc/passwd
Type: regular file
...

File: /etc/shadow
mystat: /etc/shadow: Permission denied

# 7. Check a FIFO (named pipe)
$ mkfifo /tmp/testpipe
$ ./mystat /tmp/testpipe
File: /tmp/testpipe
Type: FIFO (named pipe)
Access: prw-r--r-- (0644)

3.5 Real World Outcome

What success looks like:

  1. Complete metadata display: All stat fields shown correctly
  2. Type detection: All 7 file types properly identified
  3. Permission parsing: Both human-readable and octal formats
  4. Symlink handling: Correct behavior with and without -L
  5. Edge case handling: Devices, FIFOs, sockets all work
  6. Proper error messages: Permission denied, not found handled gracefully

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                          mystat                                  │
│                                                                  │
│  ┌─────────────────┐                                            │
│  │  Parse Args     │ ← -L flag, file paths                      │
│  └────────┬────────┘                                            │
│           │                                                      │
│           ▼                                                      │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              For each file argument:                     │    │
│  │                                                          │    │
│  │  ┌─────────────────┐                                    │    │
│  │  │ stat()/lstat()  │ ← Get struct stat                  │    │
│  │  └────────┬────────┘                                    │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌─────────────────────────────────────────────────┐    │    │
│  │  │                  Display Info                    │    │    │
│  │  │                                                  │    │    │
│  │  │  ┌─────────────┐  ┌─────────────┐              │    │    │
│  │  │  │ File Type   │  │ Permissions │              │    │    │
│  │  │  │ S_ISREG?    │  │ parse mode  │              │    │    │
│  │  │  │ S_ISDIR?    │  │ build string│              │    │    │
│  │  │  └─────────────┘  └─────────────┘              │    │    │
│  │  │                                                  │    │    │
│  │  │  ┌─────────────┐  ┌─────────────┐              │    │    │
│  │  │  │ User/Group  │  │ Timestamps  │              │    │    │
│  │  │  │ getpwuid()  │  │ strftime()  │              │    │    │
│  │  │  │ getgrgid()  │  │ + nanosec   │              │    │    │
│  │  │  └─────────────┘  └─────────────┘              │    │    │
│  │  │                                                  │    │    │
│  │  │  ┌─────────────┐  ┌─────────────┐              │    │    │
│  │  │  │ Size/Blocks │  │ Inode/Links │              │    │    │
│  │  │  │ human size  │  │ st_ino      │              │    │    │
│  │  │  │ st_blocks   │  │ st_nlink    │              │    │    │
│  │  │  └─────────────┘  └─────────────┘              │    │    │
│  │  │                                                  │    │    │
│  │  │  ┌─────────────┐                                │    │    │
│  │  │  │ Symlink?    │ ← readlink() if symlink        │    │    │
│  │  │  │ show target │                                │    │    │
│  │  │  └─────────────┘                                │    │    │
│  │  └──────────────────────────────────────────────────┘    │    │
│  │                                                          │    │
│  └──────────────────────────────────────────────────────────┘    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Purpose Key Functions
Argument Parser Handle -L flag and file list getopt() or manual
Stat Caller Call stat() or lstat() based on flags stat(), lstat()
Type Detector Identify file type from st_mode S_ISREG, S_ISDIR, etc.
Permission Formatter Convert mode to “-rwxr-xr-x” string Bit manipulation
User/Group Resolver Convert UIDs/GIDs to names getpwuid(), getgrgid()
Time Formatter Format timestamps human-readably strftime(), localtime()
Symlink Handler Read and display symlink targets readlink()

4.3 Data Structures

// Configuration from command line
typedef struct {
    int follow_symlinks;     // -L flag
    int verbose;             // -v flag (extra detail)
    const char **files;      // Array of file paths
    int num_files;           // Number of files to process
} config_t;

// File information (wrapper around struct stat)
typedef struct {
    struct stat st;          // The raw stat structure
    const char *path;        // Original path argument
    char *link_target;       // Symlink target (if applicable)
    char *username;          // Resolved username
    char *groupname;         // Resolved group name
    char type_char;          // 'd', '-', 'l', 'c', 'b', 'p', 's'
    char perms[11];          // "-rwxr-xr-x" string
} file_info_t;

4.4 Algorithm Overview

FUNCTION display_file_info(path, follow_symlinks):
    // Step 1: Get file metadata
    IF follow_symlinks:
        result = stat(path, &st)
    ELSE:
        result = lstat(path, &st)

    IF result == -1:
        PRINT error message with errno
        RETURN

    // Step 2: Determine file type
    type_char = get_type_char(st.st_mode)
    type_name = get_type_name(st.st_mode)

    // Step 3: Build permission string
    perms = format_permissions(st.st_mode)

    // Step 4: Resolve user/group names
    pw = getpwuid(st.st_uid)
    username = pw ? pw->pw_name : numeric_string(st.st_uid)

    gr = getgrgid(st.st_gid)
    groupname = gr ? gr->gr_name : numeric_string(st.st_gid)

    // Step 5: Handle symlink target
    IF S_ISLNK(st.st_mode):
        target = readlink(path)

    // Step 6: Format timestamps
    atime_str = format_time(st.st_atime, st.st_atim.tv_nsec)
    mtime_str = format_time(st.st_mtime, st.st_mtim.tv_nsec)
    ctime_str = format_time(st.st_ctime, st.st_ctim.tv_nsec)

    // Step 7: Print all information
    PRINT all formatted fields

5. Implementation Guide

5.1 Development Environment Setup

# Create project directory
$ mkdir -p ~/projects/mystat
$ cd ~/projects/mystat

# Create test files of different types
$ touch regular_file.txt                    # Regular file
$ mkdir test_directory                       # Directory
$ ln -s regular_file.txt symlink            # Symbolic link
$ ln regular_file.txt hardlink              # Hard link
$ mkfifo fifo_pipe                          # Named pipe (FIFO)

# Check what devices exist (need root for some)
$ ls -la /dev/sda /dev/null /dev/tty

# Create socket for testing (requires netcat)
$ nc -lU /tmp/test.sock &
$ ls -la /tmp/test.sock

5.2 Project Structure

mystat/
├── mystat.c              # Main source file
├── Makefile              # Build configuration
├── regular_file.txt      # Test: regular file
├── test_directory/       # Test: directory
├── symlink               # Test: symbolic link
├── hardlink              # Test: hard link
├── fifo_pipe             # Test: FIFO
└── README.md             # Documentation

5.3 The Core Question You’re Answering

“How does UNIX store and represent file metadata, and what’s the difference between a file’s data and its attributes?”

Every file has two parts: the data (contents) and the inode (metadata). Understanding the inode structure explains why hard links work, why mv is instant within a filesystem, and why permissions are per-file, not per-name.

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. The stat Structure
    • What fields does struct stat contain?
    • What’s the difference between stat() and lstat()?
    • Book Reference: “APUE” Ch. 4.2
  2. File Types in UNIX
    • Regular file, directory, character device, block device, FIFO, socket, symbolic link
    • How are these encoded in st_mode?
    • Book Reference: “APUE” Ch. 4.3
  3. Permission Bits
    • What do r, w, x mean for files vs directories?
    • What are setuid, setgid, and sticky bits?
    • Book Reference: “APUE” Ch. 4.5-4.6
  4. The Three Timestamps
    • Access time (atime), modification time (mtime), change time (ctime)
    • Which operations update which timestamps?

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Output Format
    • Should you match stat command format or design your own?
    • How to display times—Unix epoch or human-readable?
    • Should sizes be in bytes only, or also human-readable (KiB, MiB)?
  2. Symlink Handling
    • Should you follow symlinks by default or not?
    • How to show both link info and target info?
    • What if the symlink target doesn’t exist (dangling link)?
  3. Error Messages
    • How to handle permission denied?
    • How to handle non-existent files?
    • Should one bad file stop processing of subsequent files?

5.6 Thinking Exercise

Decode a Mode Value

Given st_mode = 0100755 (octal), decode it:

st_mode bits:
┌────────────────────────────────────────────────────────────────┐
│   File Type   │ SetUID │ SetGID │ Sticky │ User │Group│Other   │
│   (4 bits)    │   S    │   S    │   T    │ rwx  │ rwx │ rwx    │
├───────────────┼────────┼────────┼────────┼──────┼─────┼────────┤
│    1000       │   0    │   0    │   0    │  7   │  5  │  5     │
│  (regular)    │  (no)  │  (no)  │  (no)  │ rwx  │ r-x │ r-x    │
└────────────────────────────────────────────────────────────────┘

Result: -rwxr-xr-x

S_ISREG(mode) → true (0100000 = regular file)
mode & S_IRWXU → 0700 (user: rwx)
mode & S_IRWXG → 0050 (group: r-x)
mode & S_IRWXO → 0005 (other: r-x)

Exercise Questions:

  • What would st_mode = 040755 represent? (Answer: directory, drwxr-xr-x)
  • What about st_mode = 0104755? (Answer: regular file with setuid, -rwsr-xr-x)
  • What about st_mode = 0041777? (Answer: directory with sticky bit, drwxrwxrwt)

5.7 Hints in Layers

Hint 1: Getting Started Call stat() or lstat() on the file path. Check the return value for errors.

struct stat st;
if (lstat(path, &st) == -1) {
    perror(path);
    return -1;
}

Hint 2: File Type Detection Use the macros: S_ISREG(mode), S_ISDIR(mode), S_ISLNK(mode), etc.

char get_type_char(mode_t mode) {
    if (S_ISREG(mode))  return '-';
    if (S_ISDIR(mode))  return 'd';
    if (S_ISLNK(mode))  return 'l';
    if (S_ISCHR(mode))  return 'c';
    if (S_ISBLK(mode))  return 'b';
    if (S_ISFIFO(mode)) return 'p';
    if (S_ISSOCK(mode)) return 's';
    return '?';
}

Hint 3: Permission String

// Pseudocode for permission string
void format_permissions(mode_t mode, char *perms) {
    perms[0] = get_type_char(mode);
    perms[1] = (mode & S_IRUSR) ? 'r' : '-';
    perms[2] = (mode & S_IWUSR) ? 'w' : '-';
    perms[3] = (mode & S_IXUSR) ? 'x' : '-';
    // Handle setuid: if S_ISUID, perms[3] = 's' or 'S'
    if (mode & S_ISUID)
        perms[3] = (mode & S_IXUSR) ? 's' : 'S';

    perms[4] = (mode & S_IRGRP) ? 'r' : '-';
    perms[5] = (mode & S_IWGRP) ? 'w' : '-';
    perms[6] = (mode & S_IXGRP) ? 'x' : '-';
    // Handle setgid: if S_ISGID, perms[6] = 's' or 'S'
    if (mode & S_ISGID)
        perms[6] = (mode & S_IXGRP) ? 's' : 'S';

    perms[7] = (mode & S_IROTH) ? 'r' : '-';
    perms[8] = (mode & S_IWOTH) ? 'w' : '-';
    perms[9] = (mode & S_IXOTH) ? 'x' : '-';
    // Handle sticky: if S_ISVTX, perms[9] = 't' or 'T'
    if (mode & S_ISVTX)
        perms[9] = (mode & S_IXOTH) ? 't' : 'T';

    perms[10] = '\0';
}

Hint 4: Username Lookup Use getpwuid(st.st_uid) to get the username from UID. Returns NULL if not found.

struct passwd *pw = getpwuid(st.st_uid);
if (pw) {
    printf("Uid: %s (%d)\n", pw->pw_name, st.st_uid);
} else {
    printf("Uid: %d\n", st.st_uid);
}

5.8 The Interview Questions They’ll Ask

  1. “What’s the difference between stat() and lstat()?”
    • stat() follows symbolic links, lstat() doesn’t
    • For symlinks: stat() returns target info, lstat() returns link info
  2. “How are hard links implemented at the filesystem level?”
    • Multiple directory entries pointing to the same inode
    • st_nlink shows the link count
    • File is deleted when link count reaches 0 AND no open handles
  3. “Why does rm on a file sometimes not free disk space?”
    • Another hard link may exist
    • A process may still have the file open
    • Space is freed when all links AND handles are gone
  4. “What’s the sticky bit and when would you use it?”
    • On directories: only file owner can delete their files
    • Used on /tmp to prevent users from deleting each other’s files
    • Shows as ‘t’ or ‘T’ in permission string
  5. “How would you find all setuid programs on a system?”
    • find / -perm -4000 -type f
    • Check st_mode & S_ISUID
    • Important for security audits

5.9 Books That Will Help

Topic Book Chapter
File metadata “APUE” by Stevens Ch. 4
Filesystem structure “The Linux Programming Interface” Ch. 14-18
Inode details “Understanding the Linux Kernel” Ch. 12
Permission system “The Linux Programming Interface” Ch. 15

5.10 Implementation Phases

Phase 1: Basic stat() (2-3 hours)

  • Accept single file argument
  • Call stat() and print raw values
  • Handle stat() errors

Phase 2: File Type Detection (2-3 hours)

  • Implement type detection with S_IS* macros
  • Add type name display
  • Test with various file types

Phase 3: Permission Formatting (2-3 hours)

  • Build permission string from st_mode
  • Handle setuid, setgid, sticky bits
  • Show both symbolic and octal formats

Phase 4: User/Group Resolution (1-2 hours)

  • Use getpwuid() and getgrgid()
  • Handle missing users/groups gracefully

Phase 5: Symlink Handling (2-3 hours)

  • Implement lstat() as default
  • Add -L flag for stat()
  • Use readlink() to get symlink target

Phase 6: Timestamps and Polish (2-3 hours)

  • Format timestamps with strftime()
  • Add nanosecond precision
  • Handle timezone display

5.11 Key Implementation Decisions

Decision Trade-offs
lstat() vs stat() default lstat() is safer (don’t follow links by accident)
Output format Match stat command for familiarity, or custom for readability
Nanosecond timestamps Use st_atim.tv_nsec (POSIX) or st_atimespec.tv_nsec (BSD)
Error handling Continue on errors or stop? Usually continue for multiple files
Human-readable sizes Calculate KiB/MiB/GiB from st_size

6. Testing Strategy

6.1 Unit Tests

Test Input Expected Result
type_char for regular file 0100644 ’-‘
type_char for directory 0040755 ‘d’
type_char for symlink 0120777 ‘l’
perms for 0755 0100755 “-rwxr-xr-x”
perms for 4755 0104755 “-rwsr-xr-x”
perms for 1777 0041777 “drwxrwxrwt”

6.2 Integration Tests

# Test regular file
$ ./mystat /bin/ls
# Should show: Type: regular file, permissions, etc.

# Test directory
$ ./mystat /tmp
# Should show: Type: directory, sticky bit if set

# Test symbolic link without -L
$ ln -s /etc/passwd testlink
$ ./mystat testlink
# Should show: Type: symbolic link, link target

# Test symbolic link with -L
$ ./mystat -L testlink
# Should show target file info

# Compare with system stat
$ stat /bin/ls
$ ./mystat /bin/ls
# Major fields should match

6.3 Edge Cases to Test

Case Command Expected Behavior
Non-existent file ./mystat nosuchfile Error: No such file
Permission denied ./mystat /etc/shadow Error: Permission denied
Dangling symlink ln -s /nonexistent broken && ./mystat broken Show link info, note broken
Device files ./mystat /dev/null Show character special, 1,3
Setuid binary ./mystat /bin/su Show ‘s’ in permissions
Socket ./mystat /var/run/docker.sock Show socket type
Multiple files ./mystat /etc/passwd /etc/shadow Show both, error on shadow

6.4 Verification Commands

# Compare output with system stat
$ stat --format='Type: %F\nSize: %s\nInode: %i\nLinks: %h' /bin/ls
$ ./mystat /bin/ls | grep -E 'Type|Size|Inode|Links'

# Verify permission parsing
$ ls -la /bin/ls
$ ./mystat /bin/ls | grep Access

# Check symlink handling
$ stat -L /usr/bin/python3  # follows link
$ stat /usr/bin/python3     # doesn't follow
$ ./mystat -L /usr/bin/python3
$ ./mystat /usr/bin/python3

# Test with valgrind
$ valgrind --leak-check=full ./mystat /bin/ls

7. Common Pitfalls & Debugging

Problem 1: “Always shows ‘regular file’ for symlinks”

  • Why: Using stat() instead of lstat()
  • Fix: Use lstat() by default to get info about the link itself

Problem 2: “Permission bits wrong for special files”

  • Why: Not handling setuid/setgid/sticky in permission string
  • Fix: Check S_ISUID, S_ISGID, S_ISVTX and modify x to s/S or t/T

Problem 3: “getpwuid returns NULL”

  • Why: User doesn’t exist (e.g., file from another system)
  • Fix: Display numeric UID when lookup fails

Problem 4: “Timestamps show wrong timezone”

  • Why: Using gmtime() instead of localtime()
  • Fix: Use localtime() for local timezone display

Problem 5: “Size shows wrong for block devices”

  • Why: st_size is 0 for devices; actual size requires ioctl
  • Fix: Note that st_size is not meaningful for devices

Problem 6: “Nanoseconds not showing”

  • Why: Using st_atime (seconds only) instead of st_atim.tv_nsec
  • Fix: Access the timespec structure for nanoseconds

8. Extensions & Challenges

8.1 Easy Extensions

Extension Description Learning
JSON output -j flag for JSON format Structured output
Human sizes Show “4.5 MiB” instead of bytes Size formatting
Recursive -R for directory contents Directory iteration
Minimal output -s for single-line summary Output formatting

8.2 Advanced Challenges

Challenge Description Learning
ACL support Show Access Control Lists getfacl equivalent
Extended attrs Show xattrs getxattr()
SELinux context Show security context lgetfilecon()
File capabilities Show capabilities libcap
btrfs info Show subvolume, compression ioctl()

8.3 Research Topics

  • How do different filesystems store metadata? (ext4 vs xfs vs btrfs)
  • What is the birth time (crtime) and which filesystems support it?
  • How does NFS handle stat() across the network?
  • What information is lost when copying files between filesystems?

9. Real-World Connections

9.1 Production Systems Using This

System How It Uses stat() Notable Feature
ls Lists file metadata Sorts by size/time
find Searches by metadata Permission/time filters
rsync Compares files Size+mtime optimization
git Tracks file changes Uses stat() for status
make Dependency checking Compares mtimes
tar/cpio Archive metadata Preserves permissions

9.2 How the Pros Do It

GNU coreutils stat:

  • Uses statx() on modern Linux for extended info
  • Supports multiple output formats
  • Handles SELinux contexts

rsync:

  • Uses stat() to quickly compare files
  • Falls back to checksums when timestamps unreliable
  • Preserves all metadata with -a flag

9.3 Reading the Source

# View GNU coreutils stat source
$ git clone https://github.com/coreutils/coreutils
$ less coreutils/src/stat.c

# View FreeBSD stat source
$ less /usr/src/usr.bin/stat/stat.c

# Key functions to study:
# - struct stat field access
# - Permission string formatting
# - Time formatting

10. Resources

10.1 Man Pages

$ man 2 stat        # stat() system call
$ man 2 lstat       # lstat() system call
$ man 2 fstat       # fstat() on file descriptor
$ man 7 inode       # Inode structure details
$ man 3 getpwuid    # User ID to name
$ man 3 getgrgid    # Group ID to name
$ man 3 readlink    # Read symlink target
$ man 3 strftime    # Time formatting

10.2 Online Resources

10.3 Book Chapters

Book Chapters Topics Covered
“APUE” by Stevens Ch. 4 Files and Directories
“TLPI” by Kerrisk Ch. 14-18 File Systems, File Attributes
“Linux System Programming” by Love Ch. 7 File and Directory Management

11. Self-Assessment Checklist

Before considering this project complete, verify:

  • I can explain what an inode is and what it contains
  • I understand the difference between stat() and lstat()
  • I can decode st_mode to determine file type and permissions
  • I know what the setuid, setgid, and sticky bits do
  • I can explain the three timestamps and what updates each
  • My tool correctly identifies all 7 UNIX file types
  • My permission string handles special bits correctly (s, S, t, T)
  • I handle non-existent users/groups gracefully
  • Symbolic links are handled correctly with and without -L
  • I can answer all the interview questions listed above
  • My code compiles without warnings and has no memory leaks

12. Submission / Completion Criteria

Your project is complete when:

  1. Functionality
    • ./mystat FILE displays complete file metadata
    • All 7 file types correctly identified
    • Permission string matches ls -l output
    • Timestamps show with nanosecond precision
  2. Quality
    • Compiles with gcc -Wall -Wextra -Werror with no warnings
    • Zero valgrind errors
    • Graceful error handling
  3. Testing
    • Works with regular files, directories, symlinks
    • Works with devices, FIFOs, sockets
    • Handles permission errors gracefully
    • -L flag correctly follows symlinks
  4. Understanding
    • Can explain each field of struct stat
    • Can decode any permission mode by hand
    • Understands inode concept thoroughly

Next Steps

After completing this project, you’re ready for:

  • Project 3: Directory Walker - Apply stat() recursively
  • Project 4: Shell Implementation - Use stat() for command validation
  • Project 5: Process Monitor - Apply these concepts to /proc filesystem

The understanding of file metadata you’ve gained here is essential for backup software, security tools, and file management applications.