Project 6: Document Processing Pipeline (Capstone)

Project 6: Document Processing Pipeline (Capstone)

Project Overview

Attribute Details
Difficulty Level 4: Expert
Time Estimate 2-3 months
Programming Language C (core), Python (web interface)
Knowledge Area Full Stack / Systems Design
Prerequisites All previous projects

What youโ€™ll build: A complete document processing system that accepts PostScript, PDF, or a custom markup language as input, processes through a unified internal representation, and outputs to PDF, SVG, PNG, or printer commands.

Why this is the ultimate test: This mirrors what production systems like Ghostscript, Cairo, and print servers actually do. Youโ€™ll understand why these systems are architected the way they are.


Learning Objectives

By completing this capstone project, you will:

  1. Design a unified graphics model that captures PS and PDF semantics
  2. Implement multiple input parsers feeding one representation
  3. Implement multiple output backends from one representation
  4. Build production-quality software with error handling, logging, and testing
  5. Create a usable interface (CLI and optionally web)
  6. Understand the architecture of professional document processors

The Core Question Youโ€™re Answering

โ€œHow do you build a document processing pipeline that can handle any input format and produce any output format?โ€

This is the fundamental architecture question behind:

  • Ghostscript: PostScript/PDF โ†’ PDF/PNG/printer
  • Cairo: Abstract graphics โ†’ PDF/SVG/PNG/X11
  • Print servers: Application โ†’ printer language
  • Browser engines: HTML/CSS โ†’ screen/PDF

The answer is the Intermediate Representation (IR) pattern:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           THE INTERMEDIATE REPRESENTATION PATTERN                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚   INPUTS                    IR                    OUTPUTS       โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€                   โ”€โ”€โ”€โ”€                   โ”€โ”€โ”€โ”€โ”€โ”€โ”€       โ”‚
โ”‚                                                                 โ”‚
โ”‚   PostScript โ”€โ”€โ”€โ”€โ”€โ”                        โ”Œโ”€โ”€โ”€โ”€โ”€ PDF           โ”‚
โ”‚                   โ”‚                        โ”‚                    โ”‚
โ”‚   PDF โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ–ถ Graphics Model โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€ SVG           โ”‚
โ”‚                   โ”‚                        โ”‚                    โ”‚
โ”‚   Custom Markup โ”€โ”€โ”˜                        โ”œโ”€โ”€โ”€โ”€โ”€ PNG           โ”‚
โ”‚                                            โ”‚                    โ”‚
โ”‚                                            โ””โ”€โ”€โ”€โ”€โ”€ Printer       โ”‚
โ”‚                                                                 โ”‚
โ”‚   N inputs + M outputs = N+M implementations                    โ”‚
โ”‚   (Not Nร—M implementations!)                                    โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Deep Theoretical Foundation

1. The Graphics Model

Your unified graphics model must capture all operations from all input formats:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    UNIFIED GRAPHICS MODEL                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  DOCUMENT                                                       โ”‚
โ”‚  โ”œโ”€โ”€ Metadata (title, author, creation date)                   โ”‚
โ”‚  โ”œโ”€โ”€ Resources (fonts, images, color profiles)                 โ”‚
โ”‚  โ””โ”€โ”€ Pages[]                                                    โ”‚
โ”‚       โ””โ”€โ”€ Page                                                  โ”‚
โ”‚            โ”œโ”€โ”€ Size (width, height)                             โ”‚
โ”‚            โ”œโ”€โ”€ Resources (fonts, images)                        โ”‚
โ”‚            โ””โ”€โ”€ Operations[]                                     โ”‚
โ”‚                 โ”œโ”€โ”€ Path Operations                             โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ MoveTo(x, y)                          โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ LineTo(x, y)                          โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ CurveTo(x1,y1, x2,y2, x3,y3)         โ”‚
โ”‚                 โ”‚    โ””โ”€โ”€ ClosePath()                           โ”‚
โ”‚                 โ”œโ”€โ”€ Paint Operations                            โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ Stroke(path, color, width)            โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ Fill(path, color, rule)               โ”‚
โ”‚                 โ”‚    โ””โ”€โ”€ Clip(path, rule)                      โ”‚
โ”‚                 โ”œโ”€โ”€ Text Operations                             โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ SetFont(name, size)                   โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ ShowText(string, x, y)                โ”‚
โ”‚                 โ”‚    โ””โ”€โ”€ ShowGlyphs(glyphs[], positions[])     โ”‚
โ”‚                 โ”œโ”€โ”€ Image Operations                            โ”‚
โ”‚                 โ”‚    โ””โ”€โ”€ DrawImage(image, matrix)              โ”‚
โ”‚                 โ”œโ”€โ”€ State Operations                            โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ Save()                                โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ Restore()                             โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ SetColor(color)                       โ”‚
โ”‚                 โ”‚    โ”œโ”€โ”€ SetLineWidth(width)                   โ”‚
โ”‚                 โ”‚    โ””โ”€โ”€ ConcatMatrix(matrix)                  โ”‚
โ”‚                 โ””โ”€โ”€ Group Operations                            โ”‚
โ”‚                      โ”œโ”€โ”€ BeginGroup(transparency)              โ”‚
โ”‚                      โ””โ”€โ”€ EndGroup()                            โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2. Pipeline Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    PROCESSING PIPELINE                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  Stage 1: INPUT PARSING                                         โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                         โ”‚
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚
โ”‚  โ”‚ PS Parser    โ”‚  โ”‚ PDF Parser   โ”‚  โ”‚ Markup Parserโ”‚          โ”‚
โ”‚  โ”‚              โ”‚  โ”‚              โ”‚  โ”‚              โ”‚          โ”‚
โ”‚  โ”‚ Tokenize     โ”‚  โ”‚ Parse xref   โ”‚  โ”‚ Parse custom โ”‚          โ”‚
โ”‚  โ”‚ Execute      โ”‚  โ”‚ Dereference  โ”‚  โ”‚ syntax       โ”‚          โ”‚
โ”‚  โ”‚ Capture ops  โ”‚  โ”‚ Extract ops  โ”‚  โ”‚ Build ops    โ”‚          โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚
โ”‚         โ”‚                 โ”‚                 โ”‚                   โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                      โ–ผ                                          โ”‚
โ”‚  Stage 2: GRAPHICS MODEL                                        โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                        โ”‚
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚                   Document Object                       โ”‚    โ”‚
โ”‚  โ”‚                                                         โ”‚    โ”‚
โ”‚  โ”‚  โ€ข Page list with operations                           โ”‚    โ”‚
โ”‚  โ”‚  โ€ข Font references                                      โ”‚    โ”‚
โ”‚  โ”‚  โ€ข Image data                                           โ”‚    โ”‚
โ”‚  โ”‚  โ€ข Metadata                                             โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                      โ”‚                                          โ”‚
โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”‚
โ”‚         โ–ผ            โ–ผ            โ–ผ            โ–ผ               โ”‚
โ”‚  Stage 3: OUTPUT RENDERING                                      โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                      โ”‚
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚PDF Writerโ”‚  โ”‚SVG Writerโ”‚  โ”‚PNG Renderโ”‚  โ”‚PCL Writerโ”‚       โ”‚
โ”‚  โ”‚          โ”‚  โ”‚          โ”‚  โ”‚          โ”‚  โ”‚          โ”‚       โ”‚
โ”‚  โ”‚Generate  โ”‚  โ”‚Generate  โ”‚  โ”‚Rasterize โ”‚  โ”‚Generate  โ”‚       โ”‚
โ”‚  โ”‚PDF objs  โ”‚  โ”‚SVG XML   โ”‚  โ”‚to bitmap โ”‚  โ”‚printer   โ”‚       โ”‚
โ”‚  โ”‚+ xref    โ”‚  โ”‚          โ”‚  โ”‚          โ”‚  โ”‚commands  โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. The Device Abstraction

Like Ghostscript, your system uses a device abstraction:

// Abstract output device interface
typedef struct Device {
    // Device identification
    const char* name;
    DeviceType type;  // VECTOR, RASTER, STREAM

    // Device capabilities
    bool supports_color;
    bool supports_transparency;
    int max_resolution;

    // Device procedures
    int (*open)(struct Device* dev, const char* output);
    int (*close)(struct Device* dev);

    // Graphics operations
    int (*begin_page)(struct Device* dev, double width, double height);
    int (*end_page)(struct Device* dev);

    int (*set_color)(struct Device* dev, Color color);
    int (*set_line_width)(struct Device* dev, double width);
    int (*set_transform)(struct Device* dev, Matrix matrix);

    int (*move_to)(struct Device* dev, double x, double y);
    int (*line_to)(struct Device* dev, double x, double y);
    int (*curve_to)(struct Device* dev, double x1, double y1,
                    double x2, double y2, double x3, double y3);
    int (*close_path)(struct Device* dev);

    int (*stroke)(struct Device* dev);
    int (*fill)(struct Device* dev, FillRule rule);

    int (*draw_text)(struct Device* dev, const char* text,
                     double x, double y, Font* font);
    int (*draw_image)(struct Device* dev, Image* img, Matrix transform);

    int (*save)(struct Device* dev);
    int (*restore)(struct Device* dev);

    // Device-specific data
    void* private_data;
} Device;

Project Specification

Core Features

Input Formats

  1. PostScript (subset)
    • Path operations
    • Paint operations
    • Transformations
    • Basic text
  2. PDF (subset)
    • Object parsing
    • Content stream interpretation
    • Basic fonts
  3. Custom Markup Language
    • Simple, readable syntax for document creation
    • Example:
      page 612x792
      rect 100 100 200 300 fill=#ff0000
      text "Hello" at 200 500 font="Helvetica" size=24
      line from 0 0 to 612 792 stroke=#000000 width=2
      endpage
      

Output Formats

  1. PDF
    • Valid PDF 1.4+ output
    • Optional compression
  2. SVG
    • Vector graphics output
    • Viewable in browsers
  3. PNG
    • Rasterized output
    • Configurable DPI
  4. Printer (optional)
    • PCL or ESC/P commands

Command-Line Interface

# Convert PostScript to PDF
./docpipe convert input.ps -o output.pdf

# Convert PDF to PNG at 300 DPI
./docpipe convert input.pdf -o output.png --dpi 300

# Convert custom markup to SVG
./docpipe convert input.dml -o output.svg

# Specify input/output formats explicitly
./docpipe convert --from=postscript --to=pdf input.ps output.pdf

# Multi-page to single pages
./docpipe convert input.pdf -o page_%d.png

# List supported formats
./docpipe formats

Web Interface (Optional)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Document Processing Pipeline                            [?]   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Upload Document                                          โ”‚  โ”‚
โ”‚  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                        โ”‚  โ”‚
โ”‚  โ”‚  [    Drag & drop or click to select    ]                โ”‚  โ”‚
โ”‚  โ”‚                                                           โ”‚  โ”‚
โ”‚  โ”‚  Supported: .ps, .pdf, .dml                              โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                                 โ”‚
โ”‚  Output Format:  [PDF โ–ผ]     DPI: [150]     Pages: [All โ–ผ]    โ”‚
โ”‚                                                                 โ”‚
โ”‚  [     Convert     ]                                           โ”‚
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Preview                                                  โ”‚  โ”‚
โ”‚  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                  โ”‚  โ”‚
โ”‚  โ”‚                                                           โ”‚  โ”‚
โ”‚  โ”‚  [    Rendered preview will appear here    ]             โ”‚  โ”‚
โ”‚  โ”‚                                                           โ”‚  โ”‚
โ”‚  โ”‚  [< Page 1 of 3 >]          [Download PDF]               โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Solution Architecture

High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    DOCUMENT PROCESSING PIPELINE                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                      CLI / Web API                        โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  docpipe convert input.ps -o output.pdf                  โ”‚ โ”‚
โ”‚  โ”‚  POST /api/convert { file, output_format }               โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                            โ”‚                                    โ”‚
โ”‚                            โ–ผ                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                    FORMAT DETECTOR                        โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  Detect input format from extension or magic bytes       โ”‚ โ”‚
โ”‚  โ”‚  Select appropriate parser                                โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                            โ”‚                                    โ”‚
โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚         โ–ผ                  โ–ผ                  โ–ผ                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
โ”‚  โ”‚ PS Parser   โ”‚    โ”‚ PDF Parser  โ”‚    โ”‚ DML Parser  โ”‚        โ”‚
โ”‚  โ”‚             โ”‚    โ”‚             โ”‚    โ”‚             โ”‚        โ”‚
โ”‚  โ”‚ Stack-based โ”‚    โ”‚ Object-     โ”‚    โ”‚ Line-based  โ”‚        โ”‚
โ”‚  โ”‚ interpreter โ”‚    โ”‚ oriented    โ”‚    โ”‚ parser      โ”‚        โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚
โ”‚         โ”‚                  โ”‚                  โ”‚                โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ”‚                            โ–ผ                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                   GRAPHICS MODEL                          โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  Document                                                 โ”‚ โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ metadata: {...}                                     โ”‚ โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ fonts: [Font, ...]                                  โ”‚ โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ images: [Image, ...]                                โ”‚ โ”‚
โ”‚  โ”‚  โ””โ”€โ”€ pages: [                                            โ”‚ โ”‚
โ”‚  โ”‚       Page { size, operations: [Op, ...] }               โ”‚ โ”‚
โ”‚  โ”‚     ]                                                     โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                            โ”‚                                    โ”‚
โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚         โ–ผ                  โ–ผ                  โ–ผ        โ–ผ       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   ...     โ”‚
โ”‚  โ”‚ PDF Device  โ”‚    โ”‚ SVG Device  โ”‚    โ”‚PNG Deviceโ”‚           โ”‚
โ”‚  โ”‚             โ”‚    โ”‚             โ”‚    โ”‚          โ”‚           โ”‚
โ”‚  โ”‚ Generate    โ”‚    โ”‚ Generate    โ”‚    โ”‚Rasterize โ”‚           โ”‚
โ”‚  โ”‚ PDF objects โ”‚    โ”‚ SVG XML     โ”‚    โ”‚+ encode  โ”‚           โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚
โ”‚         โ”‚                  โ”‚                โ”‚                  โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ”‚                            โ–ผ                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                      OUTPUT FILE                          โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Data Structures

// Color representation
typedef struct {
    ColorSpace space;  // RGB, CMYK, Gray
    union {
        struct { double r, g, b; } rgb;
        struct { double c, m, y, k; } cmyk;
        double gray;
    } value;
    double alpha;
} Color;

// 2D transformation matrix
typedef struct {
    double a, b, c, d, tx, ty;
} Matrix;

// Path segment
typedef enum {
    PATH_MOVE, PATH_LINE, PATH_CURVE, PATH_CLOSE
} PathSegmentType;

typedef struct {
    PathSegmentType type;
    double x, y;
    double x1, y1, x2, y2;  // Control points for curves
} PathSegment;

// Path
typedef struct {
    PathSegment* segments;
    size_t count;
    size_t capacity;
} Path;

// Graphics operation
typedef enum {
    OP_MOVE_TO, OP_LINE_TO, OP_CURVE_TO, OP_CLOSE_PATH,
    OP_STROKE, OP_FILL, OP_CLIP,
    OP_SET_COLOR, OP_SET_LINE_WIDTH, OP_SET_TRANSFORM,
    OP_DRAW_TEXT, OP_DRAW_IMAGE,
    OP_SAVE, OP_RESTORE
} OpType;

typedef struct {
    OpType type;
    union {
        struct { double x, y; } point;
        struct { double x1, y1, x2, y2, x3, y3; } curve;
        struct { Color color; } set_color;
        struct { double width; } set_line_width;
        struct { Matrix matrix; } set_transform;
        struct { char* text; double x, y; int font_id; } draw_text;
        struct { int image_id; Matrix transform; } draw_image;
        struct { FillRule rule; } fill;
    } data;
} Operation;

// Page
typedef struct {
    double width, height;
    Operation* operations;
    size_t op_count;
    size_t op_capacity;
} Page;

// Font resource
typedef struct {
    int id;
    char* name;
    char* family;
    double size;
    unsigned char* data;  // Embedded font data
    size_t data_len;
} Font;

// Image resource
typedef struct {
    int id;
    int width, height;
    int channels;
    unsigned char* data;
    size_t data_len;
    ImageFormat format;  // RAW, JPEG, PNG
} Image;

// Document (the unified representation)
typedef struct {
    char* title;
    char* author;
    time_t creation_date;

    Font* fonts;
    size_t font_count;

    Image* images;
    size_t image_count;

    Page* pages;
    size_t page_count;
} Document;

Implementation Guide

Phase 1: Graphics Model (Week 1)

Start with the core data structures:

// graphics.h - Graphics model API

Document* doc_create(void);
void doc_destroy(Document* doc);

Page* doc_add_page(Document* doc, double width, double height);
int doc_add_font(Document* doc, const char* name, const char* family);
int doc_add_image(Document* doc, int width, int height,
                  const unsigned char* data, size_t len);

void page_move_to(Page* page, double x, double y);
void page_line_to(Page* page, double x, double y);
void page_curve_to(Page* page, double x1, double y1,
                   double x2, double y2, double x3, double y3);
void page_close_path(Page* page);

void page_stroke(Page* page);
void page_fill(Page* page, FillRule rule);

void page_set_color(Page* page, Color color);
void page_set_line_width(Page* page, double width);
void page_set_transform(Page* page, Matrix m);

void page_draw_text(Page* page, const char* text,
                    double x, double y, int font_id);
void page_draw_image(Page* page, int image_id, Matrix transform);

void page_save(Page* page);
void page_restore(Page* page);

Phase 2: Output Devices (Weeks 2-3)

Implement the device interface for each output format:

// device.h - Device abstraction

typedef struct Device Device;

// Create devices
Device* pdf_device_create(const char* output_path);
Device* svg_device_create(const char* output_path);
Device* png_device_create(const char* output_path, int dpi);

// Common device operations
void device_destroy(Device* dev);
int device_render_document(Device* dev, Document* doc);

// --- PDF Device Implementation ---

typedef struct {
    Device base;
    FILE* file;
    long* offsets;
    int num_objects;
    int next_obj_num;
    // PDF-specific state
} PDFDevice;

static int pdf_begin_page(Device* dev, double width, double height) {
    PDFDevice* pdf = (PDFDevice*)dev;
    // Start accumulating content stream
    return 0;
}

static int pdf_stroke(Device* dev) {
    PDFDevice* pdf = (PDFDevice*)dev;
    // Add "S" to content stream
    buffer_append(pdf->content, "S\n");
    return 0;
}

// ... implement all operations ...

Device* pdf_device_create(const char* output_path) {
    PDFDevice* pdf = calloc(1, sizeof(PDFDevice));

    pdf->base.name = "pdfwrite";
    pdf->base.begin_page = pdf_begin_page;
    pdf->base.end_page = pdf_end_page;
    pdf->base.stroke = pdf_stroke;
    // ... set all function pointers ...

    pdf->file = fopen(output_path, "wb");
    return (Device*)pdf;
}

Phase 3: Input Parsers (Weeks 4-6)

Implement parsers that produce the graphics model:

// parser.h - Input format parsers

typedef enum {
    FORMAT_UNKNOWN,
    FORMAT_POSTSCRIPT,
    FORMAT_PDF,
    FORMAT_DML
} InputFormat;

InputFormat detect_format(const char* filename, const unsigned char* data, size_t len);

Document* parse_postscript(const char* filename);
Document* parse_pdf(const char* filename);
Document* parse_dml(const char* filename);

// Unified parse function
Document* parse_document(const char* filename) {
    FILE* f = fopen(filename, "rb");
    unsigned char header[32];
    fread(header, 1, sizeof(header), f);
    fclose(f);

    InputFormat format = detect_format(filename, header, sizeof(header));

    switch (format) {
        case FORMAT_POSTSCRIPT:
            return parse_postscript(filename);
        case FORMAT_PDF:
            return parse_pdf(filename);
        case FORMAT_DML:
            return parse_dml(filename);
        default:
            fprintf(stderr, "Unknown format: %s\n", filename);
            return NULL;
    }
}

Phase 4: CLI Tool (Week 7)

// main.c - Command-line interface

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>

void print_usage(const char* prog) {
    printf("Usage: %s convert [options] input output\n", prog);
    printf("\nOptions:\n");
    printf("  --from=FORMAT    Input format (ps, pdf, dml)\n");
    printf("  --to=FORMAT      Output format (pdf, svg, png)\n");
    printf("  --dpi=N          DPI for raster output (default: 150)\n");
    printf("  --pages=RANGE    Pages to convert (e.g., 1-5, 2,4,6)\n");
    printf("  -v, --verbose    Verbose output\n");
    printf("  -h, --help       Show this help\n");
}

int cmd_convert(int argc, char** argv) {
    int dpi = 150;
    int verbose = 0;
    const char* from_format = NULL;
    const char* to_format = NULL;
    const char* pages = "all";

    static struct option long_options[] = {
        {"from", required_argument, 0, 'f'},
        {"to", required_argument, 0, 't'},
        {"dpi", required_argument, 0, 'd'},
        {"pages", required_argument, 0, 'p'},
        {"verbose", no_argument, 0, 'v'},
        {"help", no_argument, 0, 'h'},
        {0, 0, 0, 0}
    };

    int c;
    while ((c = getopt_long(argc, argv, "f:t:d:p:vh", long_options, NULL)) != -1) {
        switch (c) {
            case 'f': from_format = optarg; break;
            case 't': to_format = optarg; break;
            case 'd': dpi = atoi(optarg); break;
            case 'p': pages = optarg; break;
            case 'v': verbose = 1; break;
            case 'h': print_usage(argv[0]); return 0;
        }
    }

    if (optind + 2 > argc) {
        fprintf(stderr, "Error: Missing input or output file\n");
        print_usage(argv[0]);
        return 1;
    }

    const char* input = argv[optind];
    const char* output = argv[optind + 1];

    if (verbose) {
        printf("Converting %s to %s\n", input, output);
    }

    // Parse input
    Document* doc = parse_document(input);
    if (!doc) {
        fprintf(stderr, "Error: Failed to parse %s\n", input);
        return 1;
    }

    // Create output device
    Device* dev = NULL;
    if (to_format && strcmp(to_format, "pdf") == 0) {
        dev = pdf_device_create(output);
    } else if (to_format && strcmp(to_format, "svg") == 0) {
        dev = svg_device_create(output);
    } else if (to_format && strcmp(to_format, "png") == 0) {
        dev = png_device_create(output, dpi);
    } else {
        // Detect from extension
        const char* ext = strrchr(output, '.');
        if (ext && strcmp(ext, ".pdf") == 0) {
            dev = pdf_device_create(output);
        } else if (ext && strcmp(ext, ".svg") == 0) {
            dev = svg_device_create(output);
        } else if (ext && strcmp(ext, ".png") == 0) {
            dev = png_device_create(output, dpi);
        }
    }

    if (!dev) {
        fprintf(stderr, "Error: Unknown output format\n");
        doc_destroy(doc);
        return 1;
    }

    // Render
    int result = device_render_document(dev, doc);

    // Cleanup
    device_destroy(dev);
    doc_destroy(doc);

    if (result == 0 && verbose) {
        printf("Successfully created %s\n", output);
    }

    return result;
}

int main(int argc, char** argv) {
    if (argc < 2) {
        print_usage(argv[0]);
        return 1;
    }

    if (strcmp(argv[1], "convert") == 0) {
        return cmd_convert(argc - 1, argv + 1);
    } else if (strcmp(argv[1], "formats") == 0) {
        printf("Input formats: ps, pdf, dml\n");
        printf("Output formats: pdf, svg, png\n");
        return 0;
    } else {
        fprintf(stderr, "Unknown command: %s\n", argv[1]);
        print_usage(argv[0]);
        return 1;
    }
}

Phase 5: Web Interface (Week 8)

# web/app.py - Flask web interface

from flask import Flask, request, jsonify, send_file
import subprocess
import tempfile
import os

app = Flask(__name__)

@app.route('/')
def index():
    return app.send_static_file('index.html')

@app.route('/api/convert', methods=['POST'])
def convert():
    if 'file' not in request.files:
        return jsonify({'error': 'No file provided'}), 400

    file = request.files['file']
    output_format = request.form.get('format', 'pdf')
    dpi = request.form.get('dpi', '150')

    # Save uploaded file
    with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as tmp:
        file.save(tmp.name)
        input_path = tmp.name

    # Create output file
    output_ext = {'pdf': '.pdf', 'svg': '.svg', 'png': '.png'}[output_format]
    output_path = tempfile.mktemp(suffix=output_ext)

    # Run converter
    cmd = ['./docpipe', 'convert', input_path, output_path, '--to', output_format]
    if output_format == 'png':
        cmd.extend(['--dpi', dpi])

    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode != 0:
        os.unlink(input_path)
        return jsonify({'error': result.stderr}), 500

    # Clean up input
    os.unlink(input_path)

    # Return output file
    return send_file(
        output_path,
        as_attachment=True,
        download_name=f'converted{output_ext}'
    )

@app.route('/api/formats')
def formats():
    return jsonify({
        'input': ['ps', 'pdf', 'dml'],
        'output': ['pdf', 'svg', 'png']
    })

if __name__ == '__main__':
    app.run(debug=True, port=8080)

Phase 6: Testing and Polish (Weeks 9-10)

Create comprehensive tests:

// test_graphics.c - Graphics model tests

void test_document_creation() {
    Document* doc = doc_create();
    assert(doc != NULL);
    assert(doc->page_count == 0);
    doc_destroy(doc);
    printf("PASS: test_document_creation\n");
}

void test_page_operations() {
    Document* doc = doc_create();
    Page* page = doc_add_page(doc, 612, 792);

    page_move_to(page, 100, 100);
    page_line_to(page, 200, 200);
    page_stroke(page);

    assert(page->op_count == 3);
    assert(page->operations[0].type == OP_MOVE_TO);
    assert(page->operations[1].type == OP_LINE_TO);
    assert(page->operations[2].type == OP_STROKE);

    doc_destroy(doc);
    printf("PASS: test_page_operations\n");
}

void test_pdf_output() {
    Document* doc = doc_create();
    Page* page = doc_add_page(doc, 612, 792);

    page_move_to(page, 100, 100);
    page_line_to(page, 200, 200);
    page_stroke(page);

    Device* dev = pdf_device_create("test_output.pdf");
    int result = device_render_document(dev, doc);
    device_destroy(dev);

    assert(result == 0);

    // Validate output with external tool
    int check = system("qpdf --check test_output.pdf > /dev/null 2>&1");
    assert(check == 0);

    unlink("test_output.pdf");
    doc_destroy(doc);
    printf("PASS: test_pdf_output\n");
}

void test_conversion_ps_to_pdf() {
    // Create test PostScript file
    FILE* f = fopen("test_input.ps", "w");
    fprintf(f, "%%!PS-Adobe-3.0\n");
    fprintf(f, "100 100 moveto\n");
    fprintf(f, "200 200 lineto\n");
    fprintf(f, "stroke\n");
    fprintf(f, "showpage\n");
    fclose(f);

    Document* doc = parse_postscript("test_input.ps");
    assert(doc != NULL);
    assert(doc->page_count == 1);

    Device* dev = pdf_device_create("test_output.pdf");
    device_render_document(dev, doc);
    device_destroy(dev);
    doc_destroy(doc);

    // Validate
    int check = system("qpdf --check test_output.pdf > /dev/null 2>&1");
    assert(check == 0);

    unlink("test_input.ps");
    unlink("test_output.pdf");
    printf("PASS: test_conversion_ps_to_pdf\n");
}

Testing Strategy

Unit Tests

  • Test each data structure
  • Test each operation
  • Test each device

Integration Tests

  • Test complete pipelines (input โ†’ output)
  • Test all format combinations
  • Compare with reference tools (Ghostscript, Cairo)

Visual Regression Tests

  • Render test documents
  • Compare images pixel-by-pixel
  • Flag regressions

Performance Tests

  • Measure conversion time
  • Profile memory usage
  • Test with large documents

Common Pitfalls

1. Coordinate System Mismatches

Different formats use different coordinate systems:

  • PostScript/PDF: Origin at bottom-left
  • SVG: Origin at top-left
  • PNG: Origin at top-left

Handle transformations consistently.

2. Font Handling Complexity

Fonts are the hardest part:

  • Font subsetting
  • Glyph mapping
  • Embedded vs. system fonts

Start with simple font references, add complexity later.

3. Memory Management

Documents can be large:

  • Use streaming where possible
  • Free resources promptly
  • Consider memory pools

4. Thread Safety

For web interface:

  • Donโ€™t share mutable state
  • Use separate processes for conversion
  • Handle concurrent requests

Extensions

Level 1: More Input Formats

  • Add SVG input
  • Add HTML input (with simple CSS)

Level 2: More Output Formats

  • Add TIFF output
  • Add EPS output
  • Add PCL for printers

Level 3: Advanced Features

  • PDF encryption
  • PDF/A compliance
  • Transparency groups

Level 4: Optimization

  • Parallel rendering
  • GPU acceleration
  • Caching

Self-Assessment

Before considering this capstone complete:

  • Can convert PostScript to PDF, SVG, and PNG
  • Can convert PDF to PNG
  • Custom markup language works as documented
  • CLI tool handles all options correctly
  • Web interface uploads and downloads correctly
  • All tests pass
  • Performance is acceptable for multi-page documents
  • Documentation is complete

Resources

Architecture References

  • Cairo Graphics Library: https://cairographics.org/
  • Ghostscript Architecture: Study the source
  • PDF.js (Mozilla): JavaScript PDF renderer

Books

  • Software Architecture in Practice by Bass, Clements & Kazman
  • Designing Data-Intensive Applications by Martin Kleppmann
  • Computer Graphics: Principles and Practice by Foley et al.

Libraries

  • Cairo: 2D graphics library
  • FreeType: Font rendering
  • libpng/libjpeg: Image encoding
  • zlib: Compression