Project 13: Fmt Module - Extensible Printf

Build an extensible printf-like formatting system where you can register custom format specifiers.

Quick Reference

Attribute Value
Difficulty Level 4 - Expert
Time Estimate 5-7 days
Language C
Prerequisites Mem module, Assert module, Deep understanding of variadic functions
Key Topics Format string parsing, Variadic functions, Extensibility patterns, Output abstraction

1. Learning Objectives

By completing this project, you will:

  1. Master variadic functions - Deeply understand va_start, va_arg, va_end, and their limitations
  2. Implement extensible APIs - Design systems that users can extend without modifying library code
  3. Parse format strings - Handle the complex printf format syntax (flags, width, precision, specifiers)
  4. Abstract output destinations - Write to strings, files, or callbacks with the same core code
  5. Understand the Open/Closed Principle - Create code that’s open for extension but closed for modification

Why Fmt Matters:

The printf family of functions is one of the most widely used APIs in programming history. Understanding how it works internally reveals:

  • How variadic functions actually work at the ABI level
  • Why printf is both powerful and dangerous (format string vulnerabilities)
  • How to design truly extensible interfaces in C
  • The challenges of type safety in a weakly-typed language

Real-World Applications:

  • Logging frameworks: Custom formatters for timestamps, log levels, contexts
  • Debugging tools: Formatters for complex data structures
  • Protocol handlers: Custom serialization formats
  • Database systems: SQL query formatting with type-specific handling

2. Theoretical Foundation

2.1 Core Concepts

Variadic Functions in C

Variadic functions accept a variable number of arguments. This is how printf can take any number of arguments:

Variadic Function Mechanics
--------------------------------------------------------------------------------

The va_* macros from <stdarg.h>:

    int printf(const char *fmt, ...) {
        va_list ap;           // Argument pointer
        va_start(ap, fmt);    // Initialize ap to point after fmt

        // ... process format string ...

        int value = va_arg(ap, int);     // Get next arg as int
        double d = va_arg(ap, double);   // Get next arg as double
        char *s = va_arg(ap, char *);    // Get next arg as char*

        va_end(ap);           // Clean up
        return count;
    }

How it works (simplified x86-64 view):

    Stack after printf("x=%d y=%f", 42, 3.14) is called:

    Higher addresses
    ┌─────────────────────────────┐
    │ 3.14 (double, 8 bytes)      │  ← va_arg(ap, double) gets this
    ├─────────────────────────────┤
    │ 42 (int, promoted to 8)     │  ← va_arg(ap, int) gets this
    ├─────────────────────────────┤
    │ "x=%d y=%f" (pointer)       │  ← fmt parameter
    ├─────────────────────────────┤
    │ return address              │
    └─────────────────────────────┘
    Lower addresses

CRITICAL LIMITATION:
    va_arg(ap, TYPE) doesn't know what type is actually there!
    If you call va_arg(ap, int) but a double was passed, UNDEFINED BEHAVIOR.
    The format string is the ONLY way printf knows argument types.

Type Promotions:
    - char, short → int
    - float → double
    - Arrays → pointers

    So: va_arg(ap, char) is WRONG, use va_arg(ap, int)
        va_arg(ap, float) is WRONG, use va_arg(ap, double)

Printf Format Specification

The format string syntax is surprisingly complex:

Printf Format Specification
--------------------------------------------------------------------------------

Format: %[flags][width][.precision][length]specifier

Components:
┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   % - 0 8 . 3 l d                                                       │
│   │ │ │ │ │ │ │ │                                                       │
│   │ │ │ │ │ │ │ └── specifier: d (signed decimal)                       │
│   │ │ │ │ │ │ └──── length: l (long)                                    │
│   │ │ │ │ │ └────── precision: 3 (3 decimal places)                     │
│   │ │ │ │ └──────── precision prefix: .                                 │
│   │ │ │ └────────── width: 8 (minimum field width)                      │
│   │ │ └──────────── flag: 0 (pad with zeros)                            │
│   │ └────────────── flag: - (left justify)                              │
│   └──────────────── format start: %                                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Flags:
    -     Left-justify within field width
    +     Always show sign for numbers
    (space) Prefix positive numbers with space
    #     Alternate form (0x for hex, etc.)
    0     Pad with zeros instead of spaces

Width:
    N     Minimum field width (pad if shorter)
    *     Width specified as argument

Precision:
    .N    For d/i/o/u/x: minimum digits
          For e/E/f/F: decimal places
          For s: maximum characters
    .*    Precision specified as argument

Length modifiers:
    hh    char
    h     short
    l     long
    ll    long long
    z     size_t
    t     ptrdiff_t
    L     long double

Common specifiers:
    d, i  Signed decimal integer
    u     Unsigned decimal integer
    x, X  Unsigned hexadecimal (lower/upper)
    o     Unsigned octal
    f, F  Floating point (lower/upper for inf/nan)
    e, E  Scientific notation
    g, G  Shortest of %f or %e
    c     Character
    s     String
    p     Pointer
    n     Store count of characters written (DANGEROUS!)
    %     Literal %

Extensibility Through Function Pointers

The key insight of Hanson’s Fmt module is that format specifiers can be user-defined:

Extensible Format Design
--------------------------------------------------------------------------------

Traditional printf:
    Specifier table is HARDCODED
    Adding 'D' for dates? Modify printf source code!

Hanson's Fmt:
    Specifier table is a RUNTIME ARRAY of function pointers
    Adding 'D' for dates? Just register a function!

    // Registration
    Fmt_register('D', date_converter);

    // The converter function signature
    void date_converter(
        int code,           // The format character ('D')
        va_list *ap,        // Argument pointer (note: pointer to va_list!)
        int (*put)(int c, void *cl),  // Output function
        void *cl,           // Client data for output function
        unsigned char flags[256],     // Flag array
        int width,          // Field width
        int precision       // Precision
    );

    // Usage
    struct Date today = {2024, 12, 28};
    Fmt_print("Today: %D\n", &today);
    // Output: Today: 2024-12-28

The Converter Table:
    ┌─────────────────────────────────────────────────────────────────┐
    │ Index │ Converter Function                                       │
    ├───────┼──────────────────────────────────────────────────────────┤
    │  'd'  │ int_converter (built-in)                                 │
    │  's'  │ string_converter (built-in)                              │
    │  'x'  │ hex_converter (built-in)                                 │
    │  ...  │ ...                                                      │
    │  'D'  │ date_converter (USER REGISTERED)                         │
    │  'I'  │ ip_addr_converter (USER REGISTERED)                      │
    └───────┴──────────────────────────────────────────────────────────┘

Why va_list *ap (pointer)?
    The converter advances ap by calling va_arg.
    If we passed va_list directly (by value), the advancement
    would be lost when the converter returns.
    Passing a pointer lets the converter modify the caller's ap.

Output Abstraction

Fmt separates formatting logic from output destination:

Output Abstraction Pattern
--------------------------------------------------------------------------------

The Problem:
    printf writes to stdout
    sprintf writes to a string
    fprintf writes to a file
    ...all with duplicated formatting logic!

The Solution:
    One formatting engine + pluggable output function

    int (*put)(int c, void *cl);
    // put(char, client_data) returns char written or EOF on error

For stdout:
    int stdout_put(int c, void *cl) {
        return putchar(c);
    }

For string buffer:
    struct string_state {
        char *buf;
        int size;
        int pos;
    };

    int string_put(int c, void *cl) {
        struct string_state *s = cl;
        if (s->pos < s->size - 1) {
            s->buf[s->pos++] = c;
            s->buf[s->pos] = '\0';
        }
        return c;
    }

For file:
    int file_put(int c, void *cl) {
        return fputc(c, (FILE *)cl);
    }

Architecture:
    ┌──────────────────────────────────────────────────────────────────┐
    │                       Fmt_print / Fmt_fprint / Fmt_string         │
    │                                                                  │
    │  ┌────────────┐    ┌────────────┐    ┌────────────┐             │
    │  │ Fmt_print  │    │ Fmt_fprint │    │ Fmt_string │             │
    │  │ (stdout)   │    │ (file)     │    │ (buffer)   │             │
    │  └─────┬──────┘    └─────┬──────┘    └─────┬──────┘             │
    │        │                 │                 │                     │
    │        └────────────────┬┴─────────────────┘                     │
    │                         │                                        │
    │                         ▼                                        │
    │              ┌─────────────────────┐                             │
    │              │ Fmt_vfmt (core)     │                             │
    │              │                     │                             │
    │              │ parse format string │                             │
    │              │ call converters     │                             │
    │              │ call put() for each │                             │
    │              │   character         │                             │
    │              └─────────────────────┘                             │
    │                         │                                        │
    │              ┌──────────┼──────────┐                             │
    │              │          │          │                             │
    │              ▼          ▼          ▼                             │
    │         stdout_put  file_put  string_put                         │
    │                                                                  │
    └──────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

Understanding printf internals is foundational for:

  1. Security: Format string vulnerabilities are still discovered in production software
  2. Debugging: Custom formatters make debugging complex data structures trivial
  3. Logging: Production logging systems need extensible formatting
  4. Serialization: Custom formats for protocols and data storage

Industry Impact:

  • Every C and C++ program uses printf-family functions
  • Log4j, syslog, and similar systems extend printf concepts
  • Database query formatters use similar patterns
  • Network protocol handlers format packets for debugging

2.3 Historical Context

The printf function dates back to the earliest days of C:

  • 1970s: printf developed for Unix, influenced by BCPL’s writef
  • 1989: ANSI C standardized printf format specifiers
  • 1999: C99 added %a (hex float), %zd (size_t), etc.
  • 2000s: Format string attacks became well-known security issue
  • Today: printf remains unchanged; type-safe alternatives exist (C++ streams, Rust format!)

Hanson’s Contribution: Hanson’s Fmt module shows that printf’s extensibility problem is solvable without language changes. By using function pointers for converters and abstracting output, Fmt achieves what C++ iostream tried (and many argue failed) to do elegantly.

2.4 Common Misconceptions

Misconception 1: “printf knows argument types at runtime”

  • Reality: printf has NO type information. It trusts the format string completely. printf("%d", 3.14) compiles and runs (incorrectly).

Misconception 2: “va_arg validates types”

  • Reality: va_arg just interprets memory as the specified type. There’s no validation.

Misconception 3: “%n is safe because it’s rarely used”

  • Reality: %n writes to memory and is a major security risk. Hanson’s Fmt doesn’t implement it.

Misconception 4: “Variadic functions are the same as varargs in other languages”

  • Reality: C’s variadics work by stack manipulation, not arrays or lists. They’re fundamentally unsafe.

3. Project Specification

3.1 What You Will Build

A complete Fmt module implementing Hanson’s CII interface:

// fmt.h - The Interface

#ifndef FMT_INCLUDED
#define FMT_INCLUDED

#include <stdarg.h>
#include <stdio.h>

#define T Fmt_T
typedef void (*T)(int code, va_list *ap,
    int put(int c, void *cl), void *cl,
    unsigned char flags[256], int width, int precision);

/* Output functions */
extern void Fmt_fmt    (int put(int c, void *cl), void *cl,
                        const char *fmt, ...);
extern void Fmt_vfmt   (int put(int c, void *cl), void *cl,
                        const char *fmt, va_list ap);

/* Convenience wrappers */
extern void Fmt_print  (const char *fmt, ...);
extern void Fmt_fprint (FILE *stream, const char *fmt, ...);
extern int  Fmt_sfmt   (char *buf, int size, const char *fmt, ...);
extern int  Fmt_vsfmt  (char *buf, int size, const char *fmt, va_list ap);
extern char *Fmt_string(const char *fmt, ...);

/* Output to dynamically-sized string */
extern char *Fmt_vstring(const char *fmt, va_list ap);

/* Converter registration */
extern T Fmt_register(int code, T cvt);

/* Convenience: Put formatted output */
extern void Fmt_putd   (const char *str, int len,
                        int put(int c, void *cl), void *cl,
                        unsigned char flags[256], int width, int precision);
extern void Fmt_puts   (const char *str, int len,
                        int put(int c, void *cl), void *cl,
                        unsigned char flags[256], int width, int precision);

#undef T
#endif

3.2 Functional Requirements

  1. Core formatting engine that parses format strings and dispatches to converters
  2. Built-in converters for standard types: d, i, u, x, X, o, c, s, p, f, e, g
  3. Flag handling: -, +, ` , #, 0`
  4. Width and precision: Both numeric and * (from argument)
  5. Length modifiers: h, l, L for different integer/float sizes
  6. Output abstraction: Support stdout, FILE*, and string buffers
  7. Extensibility: Fmt_register() to add custom converters
  8. Dynamic string output: Fmt_string() returns malloc’d result

3.3 Non-Functional Requirements

  1. Correctness: Produce identical output to standard printf for supported specifiers
  2. Safety: Never buffer overflow, validate format strings
  3. Efficiency: Minimal overhead vs direct output
  4. Portability: Handle different integer sizes correctly

3.4 Example Usage / Output

$ ./fmt_test

# Test 1: Standard conversions
Fmt_print("Integer: %d, String: %s\n", 42, "hello")
Output: Integer: 42, String: hello

# Test 2: Width and precision
Fmt_print("|%10d|%-10d|%010d|\n", 42, 42, 42)
Output: |        42|42        |0000000042|

Fmt_print("|%.5s|%10.5s|\n", "hello world", "hello world")
Output: |hello|     hello|

# Test 3: Custom converter - dates
Registering 'D' for Date type...
struct Date { int year, month, day; };
struct Date today = {2024, 12, 28};
Fmt_print("Today: %D\n", &today)
Output: Today: 2024-12-28

# Test 4: Custom converter - IP addresses
Registering 'I' for IP address...
uint32_t ip = 0xC0A80101;  // 192.168.1.1
Fmt_print("Server: %I\n", ip)
Output: Server: 192.168.1.1

# Test 5: Output to string
char buffer[100];
int n = Fmt_sfmt(buffer, sizeof(buffer), "Name: %s, Age: %d", "Alice", 30);
Buffer contains: "Name: Alice, Age: 30"
Characters written: 21

# Test 6: Dynamic string allocation
char *result = Fmt_string("PI = %.10f", 3.14159265358979);
Result: "PI = 3.1415926536"
(caller must free result)

# Test 7: Output to file
FILE *log = fopen("app.log", "a");
Fmt_fprint(log, "[%D %T] Error code: %d\n", &date, &time, errno);
Written to file: [2024-12-28 14:30:45] Error code: 2

# Test 8: Flags and padding
Fmt_print("Hex: %#x, %#X\n", 255, 255)
Output: Hex: 0xff, 0XFF

Fmt_print("Sign: %+d, %+d\n", 42, -42)
Output: Sign: +42, -42

Fmt_print("Space: % d, % d\n", 42, -42)
Output: Space:  42, -42

4. Solution Architecture

4.1 High-Level Design

Fmt Module Architecture
--------------------------------------------------------------------------------

                    ┌─────────────────────────────────────┐
                    │           fmt.h (Interface)           │
                    │                                       │
                    │  Fmt_T ─────► converter function type │
                    │  Fmt_print, Fmt_fprint, Fmt_string    │
                    │  Fmt_register                         │
                    └─────────────────┬─────────────────────┘
                                      │
                    ┌─────────────────┴─────────────────────┐
                    │          fmt.c (Implementation)        │
                    │                                       │
                    │  ┌─────────────────────────────────┐  │
                    │  │ Converter Table [256]           │  │
                    │  │ cvt['d'] = cvt_d (integer)      │  │
                    │  │ cvt['s'] = cvt_s (string)       │  │
                    │  │ cvt['x'] = cvt_x (hex)          │  │
                    │  │ cvt['D'] = (user registered)    │  │
                    │  └─────────────────────────────────┘  │
                    │                                       │
                    │  Fmt_vfmt:                            │
                    │  1. Parse format string               │
                    │  2. Extract flags, width, precision   │
                    │  3. Look up converter                 │
                    │  4. Call converter with put function  │
                    └───────────────────────────────────────┘
                                      │
                    ┌─────────────────┴─────────────────────┐
                    │         Output Abstractions           │
                    │                                       │
                    │  stdout_put ──► putchar               │
                    │  file_put ────► fputc                 │
                    │  string_put ──► buffer[pos++]         │
                    └───────────────────────────────────────┘

4.2 Key Components

Converter Table:

// Array of function pointers, indexed by character
static Fmt_T cvt[256];

// Initialization (in module init or first use)
cvt['d'] = cvt['i'] = cvt_d;
cvt['u'] = cvt_u;
cvt['x'] = cvt_x;
cvt['X'] = cvt_X;
cvt['s'] = cvt_s;
cvt['c'] = cvt_c;
cvt['p'] = cvt_p;
// ... etc

Core Formatting Loop:

void Fmt_vfmt(int put(int c, void *cl), void *cl,
              const char *fmt, va_list ap) {
    while (*fmt) {
        if (*fmt != '%' || *++fmt == '%') {
            put(*fmt++, cl);  // Literal character or %%
        } else {
            // Parse flags, width, precision, length
            unsigned char flags[256] = {0};
            int width = 0, precision = -1;

            // Parse flags
            while (/* is flag */) {
                flags[(unsigned char)*fmt++] = 1;
            }

            // Parse width
            if (*fmt == '*') {
                width = va_arg(ap, int);
                fmt++;
            } else {
                while (isdigit(*fmt))
                    width = width * 10 + (*fmt++ - '0');
            }

            // Parse precision
            if (*fmt == '.') {
                fmt++;
                precision = 0;
                if (*fmt == '*') {
                    precision = va_arg(ap, int);
                    fmt++;
                } else {
                    while (isdigit(*fmt))
                        precision = precision * 10 + (*fmt++ - '0');
                }
            }

            // Dispatch to converter
            int c = (unsigned char)*fmt++;
            if (cvt[c])
                cvt[c](c, &ap, put, cl, flags, width, precision);
            else
                // Handle unknown specifier
        }
    }
}

4.3 Data Structures

Flags Array:

// Instead of bitfield, use array indexed by flag character
unsigned char flags[256];
flags['-'] = 1;  // Left justify
flags['+'] = 1;  // Show sign
flags[' '] = 1;  // Space for positive
flags['#'] = 1;  // Alternate form
flags['0'] = 1;  // Zero pad

// In converter:
if (flags['-']) { /* left justify */ }

Output State for Strings:

struct string_cl {
    char *buf;      // Output buffer
    char *bp;       // Current position
    char *be;       // End of buffer (buf + size)
};

static int string_put(int c, void *cl) {
    struct string_cl *p = cl;
    if (p->bp < p->be)
        *p->bp++ = c;
    return c;
}

4.4 Algorithm Overview

Format String Parsing State Machine:

State Machine for Format Parsing
--------------------------------------------------------------------------------

Start ──► Literal ─┐
              │    │
              │    │ (non-%)
              │    │
              │    └─► Output character ──► Start
              │
              │ (%)
              │
              ▼
          Percent ─┐
              │    │
              │    │ (%)
              │    └─► Output '%' ──► Start
              │
              │ (flag: -, +, space, #, 0)
              │
              ▼
          Flags ────► (loop on flags)
              │
              │ (digit or *)
              │
              ▼
          Width ────► (accumulate or get from args)
              │
              │ (.)
              │
              ▼
          Precision ─► (accumulate or get from args)
              │
              │ (h, l, L, etc.)
              │
              ▼
          Length ───► (set length modifier)
              │
              │ (conversion specifier)
              │
              ▼
          Convert ──► Dispatch to converter ──► Start

5. Implementation Guide

5.1 Development Environment Setup

# Required tools
gcc --version          # GCC with C11 support
make --version

# Project structure
C_INTERFACES_AND_IMPLEMENTATIONS_MASTERY/
├── include/
│   ├── fmt.h
│   ├── mem.h
│   └── assert.h
├── src/
│   ├── fmt.c
│   ├── mem.c
│   └── assert.c
├── tests/
│   └── fmt_test.c
└── Makefile

# Compiler flags
CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g

5.2 Project Structure

fmt.h          Interface (public API)
fmt.c          Implementation (converters, parser)
fmt_test.c     Comprehensive test suite

5.3 The Core Question You’re Answering

“How do you design a formatting system that’s as convenient as printf but allows users to add new format specifiers without modifying the library?”

This embodies the Open/Closed Principle: the Fmt module is open for extension (new converters can be registered) but closed for modification (you don’t edit fmt.c to add features).

5.4 Concepts You Must Understand First

Before implementing, ensure you can answer:

  1. va_list behavior: What happens if you call va_arg with the wrong type?
  2. Type promotions: Why can’t you va_arg a float? Why not a char?
  3. va_list copying: Why does va_copy exist? When do you need it?
  4. Function pointers: How do you declare, assign, and call function pointers?

5.5 Questions to Guide Your Design

  1. Registration: How do users register a custom converter? What’s the function signature?
  2. Dispatch: How do you look up the converter for a given character?
  3. Context: What information does a converter need? (output function, flags, width, precision)
  4. Output: How do you abstract output to work with files, strings, and callbacks?
  5. Errors: What happens with unknown specifiers? Invalid format strings?

5.6 Thinking Exercise

Trace through this format string:

Fmt_print("Value: %+8.3f\n", 3.14159);

Walk through the parsing:

  1. “Value: “ - literal characters, output directly
  2. ”%” - start of format specifier
  3. ”+” - flag: show sign
  4. “8” - width: 8 characters minimum
  5. “.3” - precision: 3 decimal places
  6. “f” - specifier: floating point

What should the output be?

  • “+3.142” is 6 characters
  • Pad to 8: “ +3.142”
  • Full output: “Value: +3.142\n”

Now design the converter signature:

What does the f converter need to receive?

  • The argument (via va_list)
  • Where to output (put function)
  • Flags (+ is set)
  • Width (8)
  • Precision (3)

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual Direction):

Create a table mapping characters to converter functions:

typedef void (*Fmt_T)(int code, va_list *ap,
    int put(int c, void *cl), void *cl,
    unsigned char flags[256], int width, int precision);

static Fmt_T cvt[256] = {0};

// Initialize built-in converters
void init_converters(void) {
    cvt['d'] = cvt['i'] = cvt_d;
    cvt['s'] = cvt_s;
    cvt['c'] = cvt_c;
    // ...
}

Hint 2 - Next Level (More Specific Guidance):

Each converter has this structure:

static void cvt_d(int code, va_list *ap,
    int put(int c, void *cl), void *cl,
    unsigned char flags[256], int width, int precision) {

    // 1. Get the argument
    long val = va_arg(*ap, int);  // Note: int, not long!

    // 2. Convert to string representation
    char buf[43];  // Enough for 128-bit int + sign + null
    char *p = buf + sizeof(buf);
    *--p = '\0';

    unsigned long m = val < 0 ? -val : val;
    do {
        *--p = m % 10 + '0';
    } while ((m /= 10) > 0);

    if (val < 0)
        *--p = '-';
    else if (flags['+'])
        *--p = '+';
    else if (flags[' '])
        *--p = ' ';

    // 3. Apply width/padding and output
    Fmt_putd(p, buf + sizeof(buf) - 1 - p, put, cl, flags, width, precision);
}

Hint 3 - Technical Details (Approach/Pseudocode):

The output function put(char, client_data) abstracts where output goes:

// For Fmt_print (stdout)
static int stdout_put(int c, void *cl) {
    (void)cl;
    return putchar(c);
}

void Fmt_print(const char *fmt, ...) {
    va_list ap;
    va_start(ap, fmt);
    Fmt_vfmt(stdout_put, NULL, fmt, ap);
    va_end(ap);
}

// For Fmt_sfmt (string buffer)
struct buf_cl {
    char *buf;
    int size;
    int pos;
};

static int string_put(int c, void *cl) {
    struct buf_cl *b = cl;
    if (b->pos < b->size - 1)
        b->buf[b->pos++] = c;
    return c;
}

int Fmt_sfmt(char *buf, int size, const char *fmt, ...) {
    struct buf_cl cl = {buf, size, 0};
    va_list ap;
    va_start(ap, fmt);
    Fmt_vfmt(string_put, &cl, fmt, ap);
    va_end(ap);
    if (cl.pos < size)
        buf[cl.pos] = '\0';
    return cl.pos;
}

Hint 4 - Tools/Debugging (Verification Methods):

Parse format string character by character. On %, parse flags/width/precision, then dispatch:

void Fmt_vfmt(int put(int c, void *cl), void *cl,
              const char *fmt, va_list ap) {
    assert(fmt);

    while (*fmt) {
        if (*fmt != '%' || *(fmt+1) == '%') {
            // Literal char or %%
            put(*fmt++, cl);
            if (*(fmt-1) == '%' && *fmt == '%')
                fmt++;  // Skip second %
            continue;
        }

        fmt++;  // Skip %

        // Parse flags
        unsigned char flags[256] = {0};
        for (; *fmt && strchr("-+ #0", *fmt); fmt++)
            flags[(unsigned char)*fmt] = 1;

        // Parse width
        int width = 0;
        if (*fmt == '*') {
            width = va_arg(ap, int);
            fmt++;
        } else {
            while (isdigit(*fmt))
                width = width * 10 + *fmt++ - '0';
        }

        // Parse precision
        int precision = -1;  // -1 means not specified
        if (*fmt == '.') {
            fmt++;
            precision = 0;
            if (*fmt == '*') {
                precision = va_arg(ap, int);
                fmt++;
            } else {
                while (isdigit(*fmt))
                    precision = precision * 10 + *fmt++ - '0';
            }
        }

        // Get converter and dispatch
        int c = (unsigned char)*fmt++;
        assert(cvt[c]);  // Unknown specifier
        (*cvt[c])(c, &ap, put, cl, flags, width, precision);
    }
}

5.8 The Interview Questions They’ll Ask

  1. “Design a logging library with custom format specifiers. How do you handle extensions?”

    Expected Strong Answer: I’d use a registration pattern like Hanson’s Fmt. Each custom type (timestamps, request IDs, user objects) gets a registered converter. The converter function signature provides everything needed: access to arguments (va_list*), output function, and formatting parameters. This is the Open/Closed Principle in action - the logging library doesn’t need to know about custom types at compile time.

  2. “Implement printf for an embedded system with no heap. What changes?”

    Expected Strong Answer: The core algorithm works the same, but: (1) No Fmt_string() since it mallocs. (2) Stack-based buffers for number conversion must handle maximum size. (3) No floating point if the system lacks FPU. (4) Possibly no varargs on some architectures - might need macro tricks or fixed-argument versions. Consider using a circular buffer for output if memory is very constrained.

  3. “How does printf know the types of its arguments?”

    Expected Strong Answer: It doesn’t! Printf has zero runtime type information. It completely trusts the format string. If you write printf("%d", 3.14), the double’s bits are interpreted as an int - undefined behavior, usually garbage output. This is why printf format string attacks exist. Modern compilers (with -Wformat) can warn about mismatches, but only for literal format strings.

  4. “What’s wrong with printf from a type-safety perspective? How do languages fix this?”

    Expected Strong Answer: Printf is fundamentally type-unsafe because: (1) Format string is just a string, not a typed template, (2) va_arg has no type checking, (3) Compiler can’t enforce format/argument agreement at compile time. Solutions: C++ uses operator« with overloading (type-safe but verbose). Rust’s format! is a macro that verifies types at compile time. Python’s f-strings are evaluated with type context. D and Zig use compile-time format string parsing.

  5. “Implement a format string vulnerability detector.”

    Expected Strong Answer: Key vulnerabilities: (1) %n writes to memory - detect and warn/block, (2) Missing arguments - count specifiers vs actual args, (3) Type mismatches - track expected types from format string. Static analysis: parse format strings at compile time, match against argument types. Runtime: track arguments consumed vs provided, never execute %n. The safest approach is to not support %n at all, which is what Hanson’s Fmt does.

5.9 Books That Will Help

Topic Book Chapter
Fmt implementation “C Interfaces and Implementations” by Hanson Ch. 14
Variadic functions “The C Programming Language” by K&R Ch. 7.3
Printf internals “Expert C Programming” by van der Linden Ch. 5
Format string attacks “The Art of Software Security Assessment” Format string chapter
Type safety “Types and Programming Languages” by Pierce For theoretical background

5.10 Implementation Phases

Phase 1: Framework (Day 1-2)

  • Define Fmt_T converter type
  • Implement converter registration
  • Implement Fmt_vfmt skeleton (output literals)
  • Test with format strings containing no specifiers

Phase 2: Format Parsing (Day 2-3)

  • Parse flags, width, precision
  • Handle * for width and precision from arguments
  • Test parsing with various format strings

Phase 3: Basic Converters (Day 3-4)

  • Implement cvt_d (integers)
  • Implement cvt_s (strings)
  • Implement cvt_c (characters)
  • Test basic formatting

Phase 4: Advanced Converters (Day 4-5)

  • Implement cvt_x, cvt_X (hexadecimal)
  • Implement cvt_o (octal)
  • Implement cvt_p (pointer)
  • Handle length modifiers (h, l, ll)

Phase 5: Output Abstractions (Day 5-6)

  • Implement Fmt_print (stdout)
  • Implement Fmt_fprint (FILE*)
  • Implement Fmt_sfmt (buffer)
  • Implement Fmt_string (malloc’d)

Phase 6: Polish (Day 6-7)

  • Implement floating point converters
  • Add custom converter demonstration
  • Comprehensive testing
  • Edge case handling

5.11 Key Implementation Decisions

  1. Float Support: Full printf compatibility requires floating-point formatting, which is complex. You may implement this last or skip it initially.

  2. %n Support: Don’t implement %n. It’s a security risk and rarely useful.

  3. Thread Safety: The converter table is global. For thread safety, either make it const after initialization or use thread-local storage.

  4. Error Handling: Unknown specifiers should assert or be passed through literally. Choose one and document it.

  5. Buffer Sizing: For number-to-string conversion, ensure buffers are large enough for 64-bit integers in all bases.


6. Testing Strategy

Unit Tests

void test_simple_formats(void) {
    char buf[100];

    Fmt_sfmt(buf, sizeof(buf), "hello");
    assert(strcmp(buf, "hello") == 0);

    Fmt_sfmt(buf, sizeof(buf), "%d", 42);
    assert(strcmp(buf, "42") == 0);

    Fmt_sfmt(buf, sizeof(buf), "%s", "world");
    assert(strcmp(buf, "world") == 0);

    Fmt_sfmt(buf, sizeof(buf), "x=%d, y=%d", 10, 20);
    assert(strcmp(buf, "x=10, y=20") == 0);

    printf("test_simple_formats: PASSED\n");
}

void test_width_precision(void) {
    char buf[100];

    Fmt_sfmt(buf, sizeof(buf), "|%10d|", 42);
    assert(strcmp(buf, "|        42|") == 0);

    Fmt_sfmt(buf, sizeof(buf), "|%-10d|", 42);
    assert(strcmp(buf, "|42        |") == 0);

    Fmt_sfmt(buf, sizeof(buf), "|%010d|", 42);
    assert(strcmp(buf, "|0000000042|") == 0);

    Fmt_sfmt(buf, sizeof(buf), "|%.5s|", "hello world");
    assert(strcmp(buf, "|hello|") == 0);

    printf("test_width_precision: PASSED\n");
}

void test_flags(void) {
    char buf[100];

    Fmt_sfmt(buf, sizeof(buf), "%+d %+d", 42, -42);
    assert(strcmp(buf, "+42 -42") == 0);

    Fmt_sfmt(buf, sizeof(buf), "% d % d", 42, -42);
    assert(strcmp(buf, " 42 -42") == 0);

    Fmt_sfmt(buf, sizeof(buf), "%#x", 255);
    assert(strcmp(buf, "0xff") == 0);

    Fmt_sfmt(buf, sizeof(buf), "%#X", 255);
    assert(strcmp(buf, "0XFF") == 0);

    printf("test_flags: PASSED\n");
}

Custom Converter Test

// Custom converter for dates
struct Date {
    int year, month, day;
};

static void cvt_date(int code, va_list *ap,
    int put(int c, void *cl), void *cl,
    unsigned char flags[256], int width, int precision) {
    (void)code; (void)flags; (void)width; (void)precision;

    struct Date *d = va_arg(*ap, struct Date *);
    char buf[11];
    sprintf(buf, "%04d-%02d-%02d", d->year, d->month, d->day);

    for (char *p = buf; *p; p++)
        put(*p, cl);
}

void test_custom_converter(void) {
    Fmt_register('D', cvt_date);

    struct Date today = {2024, 12, 28};
    char buf[100];

    Fmt_sfmt(buf, sizeof(buf), "Today: %D", &today);
    assert(strcmp(buf, "Today: 2024-12-28") == 0);

    printf("test_custom_converter: PASSED\n");
}

Comparison with Standard Printf

void test_printf_compatibility(void) {
    char expected[1000], actual[1000];

    // Test various formats
    const char *tests[][2] = {
        {"%d", "42"},
        {"%5d", "   42"},
        {"%-5d", "42   "},
        {"%+d", "+42"},
        {"%x", "2a"},
        {"%X", "2A"},
        {"%#x", "0x2a"},
        {"%s", "hello"},
        {"%10s", "     hello"},
        {"%.3s", "hel"},
        {"%%", "%"},
        {NULL, NULL}
    };

    for (int i = 0; tests[i][0]; i++) {
        if (strchr(tests[i][0], 'd') || strchr(tests[i][0], 'x') || strchr(tests[i][0], 'X')) {
            sprintf(expected, tests[i][0], 42);
            Fmt_sfmt(actual, sizeof(actual), tests[i][0], 42);
        } else if (strchr(tests[i][0], 's')) {
            sprintf(expected, tests[i][0], "hello");
            Fmt_sfmt(actual, sizeof(actual), tests[i][0], "hello");
        } else {
            sprintf(expected, tests[i][0]);
            Fmt_sfmt(actual, sizeof(actual), tests[i][0]);
        }

        assert(strcmp(expected, actual) == 0);
    }

    printf("test_printf_compatibility: PASSED\n");
}

7. Common Pitfalls & Debugging

Problem 1: “va_arg gets wrong values”

  • Symptom: Numbers look like garbage, strings crash
  • Why: Using wrong type with va_arg, or misaligned argument consumption
  • Fix: Remember type promotions: use int not char, use double not float
  • Test: Print va_arg values at each step to verify

Problem 2: “Width from * doesn’t work”

  • Symptom: Width always zero when using %*d
  • Why: Forgot to consume the width argument with va_arg
  • Fix: When you see *, call va_arg(ap, int) for width BEFORE the value
  • Test: Fmt_print("%*d", 10, 42) should output ` 42`

Problem 3: “Segfault in custom converter”

  • Symptom: Crash when using registered converter
  • Why: Forgetting that ap is va_list*, not va_list
  • Fix: Use va_arg(*ap, Type), not va_arg(ap, Type)
  • Test: Step through converter in debugger

Problem 4: “Buffer overflow with long strings”

  • Symptom: Corruption when formatting long strings or large numbers
  • Why: Fixed-size buffer too small
  • Fix: For Fmt_string, dynamically allocate. For Fmt_sfmt, respect size parameter
  • Test: Format very long strings, verify truncation

Problem 5: “Output missing after custom converter”

  • Symptom: Characters after custom specifier don’t appear
  • Why: Converter didn’t advance ap correctly, or corrupted format pointer
  • Fix: Ensure va_arg is called for every argument the converter consumes
  • Test: Format string with specifier in middle: "A %D B"

8. Extensions & Challenges

8.1 Floating Point Support

Implement %f, %e, %g converters with full precision handling. This is surprisingly complex - consider using snprintf internally at first.

8.2 Positional Arguments

Support %1$d %2$s syntax for argument reordering (useful for internationalization).

8.3 Color Codes

Register converters for terminal colors: Fmt_print("%{red}Error%{reset}: %s", msg)

8.4 JSON Output

Register a %J converter that properly escapes and quotes strings for JSON.

8.5 Localization

Add locale-aware number formatting (thousands separators, decimal point).


9. Real-World Connections

Logging Frameworks

Log4j/Logback Pattern Layout:

// Custom patterns are exactly this concept
%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - %msg%n

Database Formatting

SQLite printf Extension:

// SQLite allows custom printf converters via virtual tables
// Similar extensibility concept

Protocol Buffers Text Format

// Google's protobuf text format uses custom formatters
// for message fields, enums, etc.

Debugging Tools

GDB Pretty Printers:

# GDB allows custom formatters for complex types
# Same concept, different implementation language

10. Resources

Primary References

  1. C Interfaces and Implementations by David Hanson, Chapter 14
    • Official CII source: https://github.com/drh/cii
  2. The C Programming Language by K&R, Chapter 7
    • Variadic functions foundation

Online Resources

  1. Printf specification: https://en.cppreference.com/w/c/io/fprintf
  2. Format string attacks: https://owasp.org/www-community/attacks/Format_string_attack
  3. GNU libc printf internals: https://sourceware.org/glibc/wiki/Debugging/Formatter

11. Self-Assessment Checklist

Before considering this project complete, verify:

  • Fmt_print outputs to stdout correctly
  • Fmt_fprint outputs to FILE* correctly
  • Fmt_sfmt outputs to buffer with size limit
  • Fmt_string returns malloc’d string
  • Basic specifiers work: %d, %s, %c, %x, %p
  • Width works: %10d, %-10s
  • Precision works: %.5s, %.3d
  • Width from argument works: %*d
  • Flags work: -, +, ` , #, 0`
  • %% outputs literal %
  • Fmt_register adds custom converter
  • Custom converter receives correct parameters
  • Output matches printf for supported formats
  • No memory leaks in Fmt_string
  • No buffer overflows in Fmt_sfmt

12. Submission / Completion Criteria

Your Fmt module implementation is complete when:

  1. All tests pass: Both your tests and comparison with standard printf
  2. Memory-safe: No leaks, no overflows
  3. Documented: Header file explains usage clearly
  4. Extensible: Demonstrated with at least one custom converter
  5. Follows CII conventions: Function pointer type, registration API

Deliverables:

  • fmt.h - Interface with documentation
  • fmt.c - Implementation with converters
  • fmt_test.c - Comprehensive test suite
  • Makefile - Build configuration
  • Brief writeup explaining: (1) your converter table design, (2) how you handle output abstraction, (3) a custom converter example