Project 13: Fmt Module - Extensible Printf
Build an extensible printf-like formatting system where you can register custom format specifiers.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4 - Expert |
| Time Estimate | 5-7 days |
| Language | C |
| Prerequisites | Mem module, Assert module, Deep understanding of variadic functions |
| Key Topics | Format string parsing, Variadic functions, Extensibility patterns, Output abstraction |
1. Learning Objectives
By completing this project, you will:
- Master variadic functions - Deeply understand
va_start,va_arg,va_end, and their limitations - Implement extensible APIs - Design systems that users can extend without modifying library code
- Parse format strings - Handle the complex printf format syntax (flags, width, precision, specifiers)
- Abstract output destinations - Write to strings, files, or callbacks with the same core code
- Understand the Open/Closed Principle - Create code that’s open for extension but closed for modification
Why Fmt Matters:
The printf family of functions is one of the most widely used APIs in programming history. Understanding how it works internally reveals:
- How variadic functions actually work at the ABI level
- Why printf is both powerful and dangerous (format string vulnerabilities)
- How to design truly extensible interfaces in C
- The challenges of type safety in a weakly-typed language
Real-World Applications:
- Logging frameworks: Custom formatters for timestamps, log levels, contexts
- Debugging tools: Formatters for complex data structures
- Protocol handlers: Custom serialization formats
- Database systems: SQL query formatting with type-specific handling
2. Theoretical Foundation
2.1 Core Concepts
Variadic Functions in C
Variadic functions accept a variable number of arguments. This is how printf can take any number of arguments:
Variadic Function Mechanics
--------------------------------------------------------------------------------
The va_* macros from <stdarg.h>:
int printf(const char *fmt, ...) {
va_list ap; // Argument pointer
va_start(ap, fmt); // Initialize ap to point after fmt
// ... process format string ...
int value = va_arg(ap, int); // Get next arg as int
double d = va_arg(ap, double); // Get next arg as double
char *s = va_arg(ap, char *); // Get next arg as char*
va_end(ap); // Clean up
return count;
}
How it works (simplified x86-64 view):
Stack after printf("x=%d y=%f", 42, 3.14) is called:
Higher addresses
┌─────────────────────────────┐
│ 3.14 (double, 8 bytes) │ ← va_arg(ap, double) gets this
├─────────────────────────────┤
│ 42 (int, promoted to 8) │ ← va_arg(ap, int) gets this
├─────────────────────────────┤
│ "x=%d y=%f" (pointer) │ ← fmt parameter
├─────────────────────────────┤
│ return address │
└─────────────────────────────┘
Lower addresses
CRITICAL LIMITATION:
va_arg(ap, TYPE) doesn't know what type is actually there!
If you call va_arg(ap, int) but a double was passed, UNDEFINED BEHAVIOR.
The format string is the ONLY way printf knows argument types.
Type Promotions:
- char, short → int
- float → double
- Arrays → pointers
So: va_arg(ap, char) is WRONG, use va_arg(ap, int)
va_arg(ap, float) is WRONG, use va_arg(ap, double)
Printf Format Specification
The format string syntax is surprisingly complex:
Printf Format Specification
--------------------------------------------------------------------------------
Format: %[flags][width][.precision][length]specifier
Components:
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ % - 0 8 . 3 l d │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ └── specifier: d (signed decimal) │
│ │ │ │ │ │ │ └──── length: l (long) │
│ │ │ │ │ │ └────── precision: 3 (3 decimal places) │
│ │ │ │ │ └──────── precision prefix: . │
│ │ │ │ └────────── width: 8 (minimum field width) │
│ │ │ └──────────── flag: 0 (pad with zeros) │
│ │ └────────────── flag: - (left justify) │
│ └──────────────── format start: % │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Flags:
- Left-justify within field width
+ Always show sign for numbers
(space) Prefix positive numbers with space
# Alternate form (0x for hex, etc.)
0 Pad with zeros instead of spaces
Width:
N Minimum field width (pad if shorter)
* Width specified as argument
Precision:
.N For d/i/o/u/x: minimum digits
For e/E/f/F: decimal places
For s: maximum characters
.* Precision specified as argument
Length modifiers:
hh char
h short
l long
ll long long
z size_t
t ptrdiff_t
L long double
Common specifiers:
d, i Signed decimal integer
u Unsigned decimal integer
x, X Unsigned hexadecimal (lower/upper)
o Unsigned octal
f, F Floating point (lower/upper for inf/nan)
e, E Scientific notation
g, G Shortest of %f or %e
c Character
s String
p Pointer
n Store count of characters written (DANGEROUS!)
% Literal %
Extensibility Through Function Pointers
The key insight of Hanson’s Fmt module is that format specifiers can be user-defined:
Extensible Format Design
--------------------------------------------------------------------------------
Traditional printf:
Specifier table is HARDCODED
Adding 'D' for dates? Modify printf source code!
Hanson's Fmt:
Specifier table is a RUNTIME ARRAY of function pointers
Adding 'D' for dates? Just register a function!
// Registration
Fmt_register('D', date_converter);
// The converter function signature
void date_converter(
int code, // The format character ('D')
va_list *ap, // Argument pointer (note: pointer to va_list!)
int (*put)(int c, void *cl), // Output function
void *cl, // Client data for output function
unsigned char flags[256], // Flag array
int width, // Field width
int precision // Precision
);
// Usage
struct Date today = {2024, 12, 28};
Fmt_print("Today: %D\n", &today);
// Output: Today: 2024-12-28
The Converter Table:
┌─────────────────────────────────────────────────────────────────┐
│ Index │ Converter Function │
├───────┼──────────────────────────────────────────────────────────┤
│ 'd' │ int_converter (built-in) │
│ 's' │ string_converter (built-in) │
│ 'x' │ hex_converter (built-in) │
│ ... │ ... │
│ 'D' │ date_converter (USER REGISTERED) │
│ 'I' │ ip_addr_converter (USER REGISTERED) │
└───────┴──────────────────────────────────────────────────────────┘
Why va_list *ap (pointer)?
The converter advances ap by calling va_arg.
If we passed va_list directly (by value), the advancement
would be lost when the converter returns.
Passing a pointer lets the converter modify the caller's ap.
Output Abstraction
Fmt separates formatting logic from output destination:
Output Abstraction Pattern
--------------------------------------------------------------------------------
The Problem:
printf writes to stdout
sprintf writes to a string
fprintf writes to a file
...all with duplicated formatting logic!
The Solution:
One formatting engine + pluggable output function
int (*put)(int c, void *cl);
// put(char, client_data) returns char written or EOF on error
For stdout:
int stdout_put(int c, void *cl) {
return putchar(c);
}
For string buffer:
struct string_state {
char *buf;
int size;
int pos;
};
int string_put(int c, void *cl) {
struct string_state *s = cl;
if (s->pos < s->size - 1) {
s->buf[s->pos++] = c;
s->buf[s->pos] = '\0';
}
return c;
}
For file:
int file_put(int c, void *cl) {
return fputc(c, (FILE *)cl);
}
Architecture:
┌──────────────────────────────────────────────────────────────────┐
│ Fmt_print / Fmt_fprint / Fmt_string │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Fmt_print │ │ Fmt_fprint │ │ Fmt_string │ │
│ │ (stdout) │ │ (file) │ │ (buffer) │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ └────────────────┬┴─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Fmt_vfmt (core) │ │
│ │ │ │
│ │ parse format string │ │
│ │ call converters │ │
│ │ call put() for each │ │
│ │ character │ │
│ └─────────────────────┘ │
│ │ │
│ ┌──────────┼──────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ stdout_put file_put string_put │
│ │
└──────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
Understanding printf internals is foundational for:
- Security: Format string vulnerabilities are still discovered in production software
- Debugging: Custom formatters make debugging complex data structures trivial
- Logging: Production logging systems need extensible formatting
- Serialization: Custom formats for protocols and data storage
Industry Impact:
- Every C and C++ program uses printf-family functions
- Log4j, syslog, and similar systems extend printf concepts
- Database query formatters use similar patterns
- Network protocol handlers format packets for debugging
2.3 Historical Context
The printf function dates back to the earliest days of C:
- 1970s: printf developed for Unix, influenced by BCPL’s writef
- 1989: ANSI C standardized printf format specifiers
- 1999: C99 added
%a(hex float),%zd(size_t), etc. - 2000s: Format string attacks became well-known security issue
- Today: printf remains unchanged; type-safe alternatives exist (C++ streams, Rust format!)
Hanson’s Contribution: Hanson’s Fmt module shows that printf’s extensibility problem is solvable without language changes. By using function pointers for converters and abstracting output, Fmt achieves what C++ iostream tried (and many argue failed) to do elegantly.
2.4 Common Misconceptions
Misconception 1: “printf knows argument types at runtime”
- Reality: printf has NO type information. It trusts the format string completely.
printf("%d", 3.14)compiles and runs (incorrectly).
Misconception 2: “va_arg validates types”
- Reality: va_arg just interprets memory as the specified type. There’s no validation.
Misconception 3: “%n is safe because it’s rarely used”
- Reality:
%nwrites to memory and is a major security risk. Hanson’s Fmt doesn’t implement it.
Misconception 4: “Variadic functions are the same as varargs in other languages”
- Reality: C’s variadics work by stack manipulation, not arrays or lists. They’re fundamentally unsafe.
3. Project Specification
3.1 What You Will Build
A complete Fmt module implementing Hanson’s CII interface:
// fmt.h - The Interface
#ifndef FMT_INCLUDED
#define FMT_INCLUDED
#include <stdarg.h>
#include <stdio.h>
#define T Fmt_T
typedef void (*T)(int code, va_list *ap,
int put(int c, void *cl), void *cl,
unsigned char flags[256], int width, int precision);
/* Output functions */
extern void Fmt_fmt (int put(int c, void *cl), void *cl,
const char *fmt, ...);
extern void Fmt_vfmt (int put(int c, void *cl), void *cl,
const char *fmt, va_list ap);
/* Convenience wrappers */
extern void Fmt_print (const char *fmt, ...);
extern void Fmt_fprint (FILE *stream, const char *fmt, ...);
extern int Fmt_sfmt (char *buf, int size, const char *fmt, ...);
extern int Fmt_vsfmt (char *buf, int size, const char *fmt, va_list ap);
extern char *Fmt_string(const char *fmt, ...);
/* Output to dynamically-sized string */
extern char *Fmt_vstring(const char *fmt, va_list ap);
/* Converter registration */
extern T Fmt_register(int code, T cvt);
/* Convenience: Put formatted output */
extern void Fmt_putd (const char *str, int len,
int put(int c, void *cl), void *cl,
unsigned char flags[256], int width, int precision);
extern void Fmt_puts (const char *str, int len,
int put(int c, void *cl), void *cl,
unsigned char flags[256], int width, int precision);
#undef T
#endif
3.2 Functional Requirements
- Core formatting engine that parses format strings and dispatches to converters
- Built-in converters for standard types: d, i, u, x, X, o, c, s, p, f, e, g
- Flag handling:
-,+, `,#,0` - Width and precision: Both numeric and
*(from argument) - Length modifiers: h, l, L for different integer/float sizes
- Output abstraction: Support stdout, FILE*, and string buffers
- Extensibility:
Fmt_register()to add custom converters - Dynamic string output:
Fmt_string()returns malloc’d result
3.3 Non-Functional Requirements
- Correctness: Produce identical output to standard printf for supported specifiers
- Safety: Never buffer overflow, validate format strings
- Efficiency: Minimal overhead vs direct output
- Portability: Handle different integer sizes correctly
3.4 Example Usage / Output
$ ./fmt_test
# Test 1: Standard conversions
Fmt_print("Integer: %d, String: %s\n", 42, "hello")
Output: Integer: 42, String: hello
# Test 2: Width and precision
Fmt_print("|%10d|%-10d|%010d|\n", 42, 42, 42)
Output: | 42|42 |0000000042|
Fmt_print("|%.5s|%10.5s|\n", "hello world", "hello world")
Output: |hello| hello|
# Test 3: Custom converter - dates
Registering 'D' for Date type...
struct Date { int year, month, day; };
struct Date today = {2024, 12, 28};
Fmt_print("Today: %D\n", &today)
Output: Today: 2024-12-28
# Test 4: Custom converter - IP addresses
Registering 'I' for IP address...
uint32_t ip = 0xC0A80101; // 192.168.1.1
Fmt_print("Server: %I\n", ip)
Output: Server: 192.168.1.1
# Test 5: Output to string
char buffer[100];
int n = Fmt_sfmt(buffer, sizeof(buffer), "Name: %s, Age: %d", "Alice", 30);
Buffer contains: "Name: Alice, Age: 30"
Characters written: 21
# Test 6: Dynamic string allocation
char *result = Fmt_string("PI = %.10f", 3.14159265358979);
Result: "PI = 3.1415926536"
(caller must free result)
# Test 7: Output to file
FILE *log = fopen("app.log", "a");
Fmt_fprint(log, "[%D %T] Error code: %d\n", &date, &time, errno);
Written to file: [2024-12-28 14:30:45] Error code: 2
# Test 8: Flags and padding
Fmt_print("Hex: %#x, %#X\n", 255, 255)
Output: Hex: 0xff, 0XFF
Fmt_print("Sign: %+d, %+d\n", 42, -42)
Output: Sign: +42, -42
Fmt_print("Space: % d, % d\n", 42, -42)
Output: Space: 42, -42
4. Solution Architecture
4.1 High-Level Design
Fmt Module Architecture
--------------------------------------------------------------------------------
┌─────────────────────────────────────┐
│ fmt.h (Interface) │
│ │
│ Fmt_T ─────► converter function type │
│ Fmt_print, Fmt_fprint, Fmt_string │
│ Fmt_register │
└─────────────────┬─────────────────────┘
│
┌─────────────────┴─────────────────────┐
│ fmt.c (Implementation) │
│ │
│ ┌─────────────────────────────────┐ │
│ │ Converter Table [256] │ │
│ │ cvt['d'] = cvt_d (integer) │ │
│ │ cvt['s'] = cvt_s (string) │ │
│ │ cvt['x'] = cvt_x (hex) │ │
│ │ cvt['D'] = (user registered) │ │
│ └─────────────────────────────────┘ │
│ │
│ Fmt_vfmt: │
│ 1. Parse format string │
│ 2. Extract flags, width, precision │
│ 3. Look up converter │
│ 4. Call converter with put function │
└───────────────────────────────────────┘
│
┌─────────────────┴─────────────────────┐
│ Output Abstractions │
│ │
│ stdout_put ──► putchar │
│ file_put ────► fputc │
│ string_put ──► buffer[pos++] │
└───────────────────────────────────────┘
4.2 Key Components
Converter Table:
// Array of function pointers, indexed by character
static Fmt_T cvt[256];
// Initialization (in module init or first use)
cvt['d'] = cvt['i'] = cvt_d;
cvt['u'] = cvt_u;
cvt['x'] = cvt_x;
cvt['X'] = cvt_X;
cvt['s'] = cvt_s;
cvt['c'] = cvt_c;
cvt['p'] = cvt_p;
// ... etc
Core Formatting Loop:
void Fmt_vfmt(int put(int c, void *cl), void *cl,
const char *fmt, va_list ap) {
while (*fmt) {
if (*fmt != '%' || *++fmt == '%') {
put(*fmt++, cl); // Literal character or %%
} else {
// Parse flags, width, precision, length
unsigned char flags[256] = {0};
int width = 0, precision = -1;
// Parse flags
while (/* is flag */) {
flags[(unsigned char)*fmt++] = 1;
}
// Parse width
if (*fmt == '*') {
width = va_arg(ap, int);
fmt++;
} else {
while (isdigit(*fmt))
width = width * 10 + (*fmt++ - '0');
}
// Parse precision
if (*fmt == '.') {
fmt++;
precision = 0;
if (*fmt == '*') {
precision = va_arg(ap, int);
fmt++;
} else {
while (isdigit(*fmt))
precision = precision * 10 + (*fmt++ - '0');
}
}
// Dispatch to converter
int c = (unsigned char)*fmt++;
if (cvt[c])
cvt[c](c, &ap, put, cl, flags, width, precision);
else
// Handle unknown specifier
}
}
}
4.3 Data Structures
Flags Array:
// Instead of bitfield, use array indexed by flag character
unsigned char flags[256];
flags['-'] = 1; // Left justify
flags['+'] = 1; // Show sign
flags[' '] = 1; // Space for positive
flags['#'] = 1; // Alternate form
flags['0'] = 1; // Zero pad
// In converter:
if (flags['-']) { /* left justify */ }
Output State for Strings:
struct string_cl {
char *buf; // Output buffer
char *bp; // Current position
char *be; // End of buffer (buf + size)
};
static int string_put(int c, void *cl) {
struct string_cl *p = cl;
if (p->bp < p->be)
*p->bp++ = c;
return c;
}
4.4 Algorithm Overview
Format String Parsing State Machine:
State Machine for Format Parsing
--------------------------------------------------------------------------------
Start ──► Literal ─┐
│ │
│ │ (non-%)
│ │
│ └─► Output character ──► Start
│
│ (%)
│
▼
Percent ─┐
│ │
│ │ (%)
│ └─► Output '%' ──► Start
│
│ (flag: -, +, space, #, 0)
│
▼
Flags ────► (loop on flags)
│
│ (digit or *)
│
▼
Width ────► (accumulate or get from args)
│
│ (.)
│
▼
Precision ─► (accumulate or get from args)
│
│ (h, l, L, etc.)
│
▼
Length ───► (set length modifier)
│
│ (conversion specifier)
│
▼
Convert ──► Dispatch to converter ──► Start
5. Implementation Guide
5.1 Development Environment Setup
# Required tools
gcc --version # GCC with C11 support
make --version
# Project structure
C_INTERFACES_AND_IMPLEMENTATIONS_MASTERY/
├── include/
│ ├── fmt.h
│ ├── mem.h
│ └── assert.h
├── src/
│ ├── fmt.c
│ ├── mem.c
│ └── assert.c
├── tests/
│ └── fmt_test.c
└── Makefile
# Compiler flags
CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g
5.2 Project Structure
fmt.h Interface (public API)
fmt.c Implementation (converters, parser)
fmt_test.c Comprehensive test suite
5.3 The Core Question You’re Answering
“How do you design a formatting system that’s as convenient as printf but allows users to add new format specifiers without modifying the library?”
This embodies the Open/Closed Principle: the Fmt module is open for extension (new converters can be registered) but closed for modification (you don’t edit fmt.c to add features).
5.4 Concepts You Must Understand First
Before implementing, ensure you can answer:
- va_list behavior: What happens if you call va_arg with the wrong type?
- Type promotions: Why can’t you va_arg a
float? Why not achar? - va_list copying: Why does va_copy exist? When do you need it?
- Function pointers: How do you declare, assign, and call function pointers?
5.5 Questions to Guide Your Design
- Registration: How do users register a custom converter? What’s the function signature?
- Dispatch: How do you look up the converter for a given character?
- Context: What information does a converter need? (output function, flags, width, precision)
- Output: How do you abstract output to work with files, strings, and callbacks?
- Errors: What happens with unknown specifiers? Invalid format strings?
5.6 Thinking Exercise
Trace through this format string:
Fmt_print("Value: %+8.3f\n", 3.14159);
Walk through the parsing:
- “Value: “ - literal characters, output directly
- ”%” - start of format specifier
- ”+” - flag: show sign
- “8” - width: 8 characters minimum
- “.3” - precision: 3 decimal places
- “f” - specifier: floating point
What should the output be?
- “+3.142” is 6 characters
- Pad to 8: “ +3.142”
- Full output: “Value: +3.142\n”
Now design the converter signature:
What does the f converter need to receive?
- The argument (via va_list)
- Where to output (put function)
- Flags (+ is set)
- Width (8)
- Precision (3)
5.7 Hints in Layers
Hint 1 - Starting Point (Conceptual Direction):
Create a table mapping characters to converter functions:
typedef void (*Fmt_T)(int code, va_list *ap,
int put(int c, void *cl), void *cl,
unsigned char flags[256], int width, int precision);
static Fmt_T cvt[256] = {0};
// Initialize built-in converters
void init_converters(void) {
cvt['d'] = cvt['i'] = cvt_d;
cvt['s'] = cvt_s;
cvt['c'] = cvt_c;
// ...
}
Hint 2 - Next Level (More Specific Guidance):
Each converter has this structure:
static void cvt_d(int code, va_list *ap,
int put(int c, void *cl), void *cl,
unsigned char flags[256], int width, int precision) {
// 1. Get the argument
long val = va_arg(*ap, int); // Note: int, not long!
// 2. Convert to string representation
char buf[43]; // Enough for 128-bit int + sign + null
char *p = buf + sizeof(buf);
*--p = '\0';
unsigned long m = val < 0 ? -val : val;
do {
*--p = m % 10 + '0';
} while ((m /= 10) > 0);
if (val < 0)
*--p = '-';
else if (flags['+'])
*--p = '+';
else if (flags[' '])
*--p = ' ';
// 3. Apply width/padding and output
Fmt_putd(p, buf + sizeof(buf) - 1 - p, put, cl, flags, width, precision);
}
Hint 3 - Technical Details (Approach/Pseudocode):
The output function put(char, client_data) abstracts where output goes:
// For Fmt_print (stdout)
static int stdout_put(int c, void *cl) {
(void)cl;
return putchar(c);
}
void Fmt_print(const char *fmt, ...) {
va_list ap;
va_start(ap, fmt);
Fmt_vfmt(stdout_put, NULL, fmt, ap);
va_end(ap);
}
// For Fmt_sfmt (string buffer)
struct buf_cl {
char *buf;
int size;
int pos;
};
static int string_put(int c, void *cl) {
struct buf_cl *b = cl;
if (b->pos < b->size - 1)
b->buf[b->pos++] = c;
return c;
}
int Fmt_sfmt(char *buf, int size, const char *fmt, ...) {
struct buf_cl cl = {buf, size, 0};
va_list ap;
va_start(ap, fmt);
Fmt_vfmt(string_put, &cl, fmt, ap);
va_end(ap);
if (cl.pos < size)
buf[cl.pos] = '\0';
return cl.pos;
}
Hint 4 - Tools/Debugging (Verification Methods):
Parse format string character by character. On %, parse flags/width/precision, then dispatch:
void Fmt_vfmt(int put(int c, void *cl), void *cl,
const char *fmt, va_list ap) {
assert(fmt);
while (*fmt) {
if (*fmt != '%' || *(fmt+1) == '%') {
// Literal char or %%
put(*fmt++, cl);
if (*(fmt-1) == '%' && *fmt == '%')
fmt++; // Skip second %
continue;
}
fmt++; // Skip %
// Parse flags
unsigned char flags[256] = {0};
for (; *fmt && strchr("-+ #0", *fmt); fmt++)
flags[(unsigned char)*fmt] = 1;
// Parse width
int width = 0;
if (*fmt == '*') {
width = va_arg(ap, int);
fmt++;
} else {
while (isdigit(*fmt))
width = width * 10 + *fmt++ - '0';
}
// Parse precision
int precision = -1; // -1 means not specified
if (*fmt == '.') {
fmt++;
precision = 0;
if (*fmt == '*') {
precision = va_arg(ap, int);
fmt++;
} else {
while (isdigit(*fmt))
precision = precision * 10 + *fmt++ - '0';
}
}
// Get converter and dispatch
int c = (unsigned char)*fmt++;
assert(cvt[c]); // Unknown specifier
(*cvt[c])(c, &ap, put, cl, flags, width, precision);
}
}
5.8 The Interview Questions They’ll Ask
-
“Design a logging library with custom format specifiers. How do you handle extensions?”
Expected Strong Answer: I’d use a registration pattern like Hanson’s Fmt. Each custom type (timestamps, request IDs, user objects) gets a registered converter. The converter function signature provides everything needed: access to arguments (va_list*), output function, and formatting parameters. This is the Open/Closed Principle in action - the logging library doesn’t need to know about custom types at compile time.
-
“Implement printf for an embedded system with no heap. What changes?”
Expected Strong Answer: The core algorithm works the same, but: (1) No Fmt_string() since it mallocs. (2) Stack-based buffers for number conversion must handle maximum size. (3) No floating point if the system lacks FPU. (4) Possibly no varargs on some architectures - might need macro tricks or fixed-argument versions. Consider using a circular buffer for output if memory is very constrained.
-
“How does printf know the types of its arguments?”
Expected Strong Answer: It doesn’t! Printf has zero runtime type information. It completely trusts the format string. If you write
printf("%d", 3.14), the double’s bits are interpreted as an int - undefined behavior, usually garbage output. This is why printf format string attacks exist. Modern compilers (with-Wformat) can warn about mismatches, but only for literal format strings. -
“What’s wrong with printf from a type-safety perspective? How do languages fix this?”
Expected Strong Answer: Printf is fundamentally type-unsafe because: (1) Format string is just a string, not a typed template, (2) va_arg has no type checking, (3) Compiler can’t enforce format/argument agreement at compile time. Solutions: C++ uses operator« with overloading (type-safe but verbose). Rust’s format! is a macro that verifies types at compile time. Python’s f-strings are evaluated with type context. D and Zig use compile-time format string parsing.
-
“Implement a format string vulnerability detector.”
Expected Strong Answer: Key vulnerabilities: (1)
%nwrites to memory - detect and warn/block, (2) Missing arguments - count specifiers vs actual args, (3) Type mismatches - track expected types from format string. Static analysis: parse format strings at compile time, match against argument types. Runtime: track arguments consumed vs provided, never execute%n. The safest approach is to not support%nat all, which is what Hanson’s Fmt does.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Fmt implementation | “C Interfaces and Implementations” by Hanson | Ch. 14 |
| Variadic functions | “The C Programming Language” by K&R | Ch. 7.3 |
| Printf internals | “Expert C Programming” by van der Linden | Ch. 5 |
| Format string attacks | “The Art of Software Security Assessment” | Format string chapter |
| Type safety | “Types and Programming Languages” by Pierce | For theoretical background |
5.10 Implementation Phases
Phase 1: Framework (Day 1-2)
- Define Fmt_T converter type
- Implement converter registration
- Implement Fmt_vfmt skeleton (output literals)
- Test with format strings containing no specifiers
Phase 2: Format Parsing (Day 2-3)
- Parse flags, width, precision
- Handle
*for width and precision from arguments - Test parsing with various format strings
Phase 3: Basic Converters (Day 3-4)
- Implement cvt_d (integers)
- Implement cvt_s (strings)
- Implement cvt_c (characters)
- Test basic formatting
Phase 4: Advanced Converters (Day 4-5)
- Implement cvt_x, cvt_X (hexadecimal)
- Implement cvt_o (octal)
- Implement cvt_p (pointer)
- Handle length modifiers (h, l, ll)
Phase 5: Output Abstractions (Day 5-6)
- Implement Fmt_print (stdout)
- Implement Fmt_fprint (FILE*)
- Implement Fmt_sfmt (buffer)
- Implement Fmt_string (malloc’d)
Phase 6: Polish (Day 6-7)
- Implement floating point converters
- Add custom converter demonstration
- Comprehensive testing
- Edge case handling
5.11 Key Implementation Decisions
-
Float Support: Full printf compatibility requires floating-point formatting, which is complex. You may implement this last or skip it initially.
-
%n Support: Don’t implement %n. It’s a security risk and rarely useful.
-
Thread Safety: The converter table is global. For thread safety, either make it const after initialization or use thread-local storage.
-
Error Handling: Unknown specifiers should assert or be passed through literally. Choose one and document it.
-
Buffer Sizing: For number-to-string conversion, ensure buffers are large enough for 64-bit integers in all bases.
6. Testing Strategy
Unit Tests
void test_simple_formats(void) {
char buf[100];
Fmt_sfmt(buf, sizeof(buf), "hello");
assert(strcmp(buf, "hello") == 0);
Fmt_sfmt(buf, sizeof(buf), "%d", 42);
assert(strcmp(buf, "42") == 0);
Fmt_sfmt(buf, sizeof(buf), "%s", "world");
assert(strcmp(buf, "world") == 0);
Fmt_sfmt(buf, sizeof(buf), "x=%d, y=%d", 10, 20);
assert(strcmp(buf, "x=10, y=20") == 0);
printf("test_simple_formats: PASSED\n");
}
void test_width_precision(void) {
char buf[100];
Fmt_sfmt(buf, sizeof(buf), "|%10d|", 42);
assert(strcmp(buf, "| 42|") == 0);
Fmt_sfmt(buf, sizeof(buf), "|%-10d|", 42);
assert(strcmp(buf, "|42 |") == 0);
Fmt_sfmt(buf, sizeof(buf), "|%010d|", 42);
assert(strcmp(buf, "|0000000042|") == 0);
Fmt_sfmt(buf, sizeof(buf), "|%.5s|", "hello world");
assert(strcmp(buf, "|hello|") == 0);
printf("test_width_precision: PASSED\n");
}
void test_flags(void) {
char buf[100];
Fmt_sfmt(buf, sizeof(buf), "%+d %+d", 42, -42);
assert(strcmp(buf, "+42 -42") == 0);
Fmt_sfmt(buf, sizeof(buf), "% d % d", 42, -42);
assert(strcmp(buf, " 42 -42") == 0);
Fmt_sfmt(buf, sizeof(buf), "%#x", 255);
assert(strcmp(buf, "0xff") == 0);
Fmt_sfmt(buf, sizeof(buf), "%#X", 255);
assert(strcmp(buf, "0XFF") == 0);
printf("test_flags: PASSED\n");
}
Custom Converter Test
// Custom converter for dates
struct Date {
int year, month, day;
};
static void cvt_date(int code, va_list *ap,
int put(int c, void *cl), void *cl,
unsigned char flags[256], int width, int precision) {
(void)code; (void)flags; (void)width; (void)precision;
struct Date *d = va_arg(*ap, struct Date *);
char buf[11];
sprintf(buf, "%04d-%02d-%02d", d->year, d->month, d->day);
for (char *p = buf; *p; p++)
put(*p, cl);
}
void test_custom_converter(void) {
Fmt_register('D', cvt_date);
struct Date today = {2024, 12, 28};
char buf[100];
Fmt_sfmt(buf, sizeof(buf), "Today: %D", &today);
assert(strcmp(buf, "Today: 2024-12-28") == 0);
printf("test_custom_converter: PASSED\n");
}
Comparison with Standard Printf
void test_printf_compatibility(void) {
char expected[1000], actual[1000];
// Test various formats
const char *tests[][2] = {
{"%d", "42"},
{"%5d", " 42"},
{"%-5d", "42 "},
{"%+d", "+42"},
{"%x", "2a"},
{"%X", "2A"},
{"%#x", "0x2a"},
{"%s", "hello"},
{"%10s", " hello"},
{"%.3s", "hel"},
{"%%", "%"},
{NULL, NULL}
};
for (int i = 0; tests[i][0]; i++) {
if (strchr(tests[i][0], 'd') || strchr(tests[i][0], 'x') || strchr(tests[i][0], 'X')) {
sprintf(expected, tests[i][0], 42);
Fmt_sfmt(actual, sizeof(actual), tests[i][0], 42);
} else if (strchr(tests[i][0], 's')) {
sprintf(expected, tests[i][0], "hello");
Fmt_sfmt(actual, sizeof(actual), tests[i][0], "hello");
} else {
sprintf(expected, tests[i][0]);
Fmt_sfmt(actual, sizeof(actual), tests[i][0]);
}
assert(strcmp(expected, actual) == 0);
}
printf("test_printf_compatibility: PASSED\n");
}
7. Common Pitfalls & Debugging
Problem 1: “va_arg gets wrong values”
- Symptom: Numbers look like garbage, strings crash
- Why: Using wrong type with va_arg, or misaligned argument consumption
- Fix: Remember type promotions: use
intnotchar, usedoublenotfloat - Test: Print va_arg values at each step to verify
Problem 2: “Width from * doesn’t work”
- Symptom: Width always zero when using
%*d - Why: Forgot to consume the width argument with va_arg
- Fix: When you see
*, callva_arg(ap, int)for width BEFORE the value - Test:
Fmt_print("%*d", 10, 42)should output ` 42`
Problem 3: “Segfault in custom converter”
- Symptom: Crash when using registered converter
- Why: Forgetting that ap is
va_list*, notva_list - Fix: Use
va_arg(*ap, Type), notva_arg(ap, Type) - Test: Step through converter in debugger
Problem 4: “Buffer overflow with long strings”
- Symptom: Corruption when formatting long strings or large numbers
- Why: Fixed-size buffer too small
- Fix: For Fmt_string, dynamically allocate. For Fmt_sfmt, respect size parameter
- Test: Format very long strings, verify truncation
Problem 5: “Output missing after custom converter”
- Symptom: Characters after custom specifier don’t appear
- Why: Converter didn’t advance ap correctly, or corrupted format pointer
- Fix: Ensure va_arg is called for every argument the converter consumes
- Test: Format string with specifier in middle:
"A %D B"
8. Extensions & Challenges
8.1 Floating Point Support
Implement %f, %e, %g converters with full precision handling. This is surprisingly complex - consider using snprintf internally at first.
8.2 Positional Arguments
Support %1$d %2$s syntax for argument reordering (useful for internationalization).
8.3 Color Codes
Register converters for terminal colors: Fmt_print("%{red}Error%{reset}: %s", msg)
8.4 JSON Output
Register a %J converter that properly escapes and quotes strings for JSON.
8.5 Localization
Add locale-aware number formatting (thousands separators, decimal point).
9. Real-World Connections
Logging Frameworks
Log4j/Logback Pattern Layout:
// Custom patterns are exactly this concept
%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - %msg%n
Database Formatting
SQLite printf Extension:
// SQLite allows custom printf converters via virtual tables
// Similar extensibility concept
Protocol Buffers Text Format
// Google's protobuf text format uses custom formatters
// for message fields, enums, etc.
Debugging Tools
GDB Pretty Printers:
# GDB allows custom formatters for complex types
# Same concept, different implementation language
10. Resources
Primary References
- C Interfaces and Implementations by David Hanson, Chapter 14
- Official CII source: https://github.com/drh/cii
- The C Programming Language by K&R, Chapter 7
- Variadic functions foundation
Online Resources
- Printf specification: https://en.cppreference.com/w/c/io/fprintf
- Format string attacks: https://owasp.org/www-community/attacks/Format_string_attack
- GNU libc printf internals: https://sourceware.org/glibc/wiki/Debugging/Formatter
11. Self-Assessment Checklist
Before considering this project complete, verify:
Fmt_printoutputs to stdout correctlyFmt_fprintoutputs to FILE* correctlyFmt_sfmtoutputs to buffer with size limitFmt_stringreturns malloc’d string- Basic specifiers work:
%d,%s,%c,%x,%p - Width works:
%10d,%-10s - Precision works:
%.5s,%.3d - Width from argument works:
%*d - Flags work:
-,+, `,#,0` %%outputs literal%Fmt_registeradds custom converter- Custom converter receives correct parameters
- Output matches printf for supported formats
- No memory leaks in Fmt_string
- No buffer overflows in Fmt_sfmt
12. Submission / Completion Criteria
Your Fmt module implementation is complete when:
- All tests pass: Both your tests and comparison with standard printf
- Memory-safe: No leaks, no overflows
- Documented: Header file explains usage clearly
- Extensible: Demonstrated with at least one custom converter
- Follows CII conventions: Function pointer type, registration API
Deliverables:
fmt.h- Interface with documentationfmt.c- Implementation with convertersfmt_test.c- Comprehensive test suiteMakefile- Build configuration- Brief writeup explaining: (1) your converter table design, (2) how you handle output abstraction, (3) a custom converter example