Project 3: PostScript-to-PDF Converter

Project 3: PostScript-to-PDF Converter

Project Overview

Attribute Details
Difficulty Level 4: Expert
Time Estimate 1 month+
Programming Language C
Knowledge Area Code Generation / Graphics
Prerequisites Projects 1 and 2, or solid understanding of both formats

What youโ€™ll build: A mini โ€œGhostscriptโ€ that executes PostScript and outputs a valid PDF file.

Why it teaches PostScriptโ†’PDF: This is the exact transformation Ghostscript performs. Youโ€™ll execute PS code and instead of drawing to screen, youโ€™ll capture the operations and emit them as PDF content streams.


Learning Objectives

By completing this project, you will:

  1. Execute PostScript programs using a stack-based interpreter
  2. Capture graphics operations during execution (instead of rendering)
  3. Generate valid PDF structure with objects, xref, and trailer
  4. Map PostScript operators to PDF operators correctly
  5. Handle font references between PostScript and PDF
  6. Understand the code generation pattern used in compilers

The Core Question Youโ€™re Answering

โ€œHow do you transform an executable program (PostScript) into a static document (PDF)?โ€

This is the fundamental question of PSโ†’PDF conversion. PostScript is a full Turing-complete programming language with loops, conditionals, and procedures. PDF is a static page description format with no flow control. Your converter must:

  1. Execute PostScript code (run the program)
  2. Capture the side effects (what gets drawn)
  3. Serialize those operations into PDFโ€™s declarative format

The paradox: How do you freeze a running program into a static snapshot?

The answer reveals deep truths about interpreters, intermediate representations, and the distinction between computation and presentation.


Deep Theoretical Foundation

1. The Fundamental Transformation

PostScript and PDF look similar but are fundamentally different:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    POSTSCRIPT โ†’ PDF                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  POSTSCRIPT (Executable)           PDF (Static)                 โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€            โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                  โ”‚
โ”‚                                                                 โ”‚
โ”‚  % Draw 10 circles                 % Must list all 10 explicitlyโ”‚
โ”‚  0 10 90 {                         0 0 m 10 0 0 10 0 360 arc S  โ”‚
โ”‚    0 0 moveto                      0 0 m 20 0 0 20 0 360 arc S  โ”‚
โ”‚    0 360 arc                       0 0 m 30 0 0 30 0 360 arc S  โ”‚
โ”‚    stroke                          0 0 m 40 0 0 40 0 360 arc S  โ”‚
โ”‚  } for                             0 0 m 50 0 0 50 0 360 arc S  โ”‚
โ”‚                                    0 0 m 60 0 0 60 0 360 arc S  โ”‚
โ”‚  10 lines โ†’ 10 circles             0 0 m 70 0 0 70 0 360 arc S  โ”‚
โ”‚                                    0 0 m 80 0 0 80 0 360 arc S  โ”‚
โ”‚                                    0 0 m 90 0 0 90 0 360 arc S  โ”‚
โ”‚                                                                 โ”‚
โ”‚                                    90 lines โ†’ 10 circles        โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The transformation process:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                CONVERSION PIPELINE                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  PostScript Source                                              โ”‚
โ”‚       โ”‚                                                         โ”‚
โ”‚       โ–ผ                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚           POSTSCRIPT INTERPRETER                         โ”‚   โ”‚
โ”‚  โ”‚                                                          โ”‚   โ”‚
โ”‚  โ”‚  - Execute stack operations                              โ”‚   โ”‚
โ”‚  โ”‚  - Evaluate loops and conditionals                       โ”‚   โ”‚
โ”‚  โ”‚  - Call procedures                                       โ”‚   โ”‚
โ”‚  โ”‚  - Update graphics state                                 โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                           โ”‚                                     โ”‚
โ”‚                           โ–ผ                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚          OPERATION RECORDER                              โ”‚   โ”‚
โ”‚  โ”‚                                                          โ”‚   โ”‚
โ”‚  โ”‚  Instead of rendering pixels, capture:                   โ”‚   โ”‚
โ”‚  โ”‚  - Path operations (moveto, lineto, curveto)            โ”‚   โ”‚
โ”‚  โ”‚  - Paint operations (stroke, fill)                       โ”‚   โ”‚
โ”‚  โ”‚  - Color changes (setgray, setrgbcolor)                 โ”‚   โ”‚
โ”‚  โ”‚  - State changes (gsave, grestore, translate)           โ”‚   โ”‚
โ”‚  โ”‚  - Text operations (show)                                โ”‚   โ”‚
โ”‚  โ”‚  - showpage boundaries                                   โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                           โ”‚                                     โ”‚
โ”‚                           โ–ผ                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚          PDF GENERATOR                                   โ”‚   โ”‚
โ”‚  โ”‚                                                          โ”‚   โ”‚
โ”‚  โ”‚  - Build PDF object structure                            โ”‚   โ”‚
โ”‚  โ”‚  - Convert recorded operations to PDF operators          โ”‚   โ”‚
โ”‚  โ”‚  - Generate content streams                              โ”‚   โ”‚
โ”‚  โ”‚  - Create xref table                                     โ”‚   โ”‚
โ”‚  โ”‚  - Write trailer                                         โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                           โ”‚                                     โ”‚
โ”‚                           โ–ผ                                     โ”‚
โ”‚                       PDF File                                  โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2. Graphics Operation Recording

The key insight: Instead of drawing pixels, record the operations.

This is the โ€œoutput deviceโ€ abstraction that Ghostscript uses:

// Traditional rendering device
typedef struct {
    void (*moveto)(Device* dev, double x, double y);
    void (*lineto)(Device* dev, double x, double y);
    void (*stroke)(Device* dev);
    void (*fill)(Device* dev);
    // ...
} DeviceOps;

// Pixel rendering implementation
void render_stroke(Device* dev) {
    // Actually draw pixels to bitmap
    rasterize_path(dev->path, dev->color);
}

// PDF capturing implementation
void record_stroke(Device* dev) {
    // Just record "stroke" operation
    add_operation(dev->ops, OP_STROKE, NULL, 0);
}

3. PostScript to PDF Operator Mapping

Most mappings are direct:

PostScript PDF Notes
moveto m Identical semantics
lineto l Identical semantics
curveto c Identical semantics
closepath h Identical semantics
stroke S Identical semantics
fill f Identical semantics
gsave q Identical semantics
grestore Q Identical semantics
setgray g (fill) / G (stroke) PDF separates fill/stroke color
setrgbcolor rg (fill) / RG (stroke) PDF separates fill/stroke color
translate, scale, rotate cm Concatenate to CTM
show Tj inside BT/ET PDF requires text mode
showpage (page boundary) Start new page

Key differences:

  1. Color: PostScript has one color for both fill and stroke. PDF separates them.
  2. Text: PostScript uses show directly. PDF requires BTโ€ฆET block.
  3. Loops/Conditionals: Must be executed; output is flattened.

4. PDF Object Structure

Your converter must generate this structure:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                PDF OBJECT HIERARCHY                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  Object 1: CATALOG                                              โ”‚
โ”‚  << /Type /Catalog                                              โ”‚
โ”‚     /Pages 2 0 R >>                                             โ”‚
โ”‚           โ”‚                                                     โ”‚
โ”‚           โ–ผ                                                     โ”‚
โ”‚  Object 2: PAGES TREE                                           โ”‚
โ”‚  << /Type /Pages                                                โ”‚
โ”‚     /Kids [3 0 R 6 0 R ...]    โ† One entry per page            โ”‚
โ”‚     /Count N >>                                                 โ”‚
โ”‚           โ”‚                                                     โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                  โ”‚
โ”‚     โ–ผ           โ–ผ           โ–ผ                                  โ”‚
โ”‚  Object 3     Object 6    Object 9   ...                       โ”‚
โ”‚  PAGE 1       PAGE 2      PAGE 3                               โ”‚
โ”‚  << /Type /Page                                                 โ”‚
โ”‚     /Parent 2 0 R                                               โ”‚
โ”‚     /MediaBox [0 0 612 792]                                     โ”‚
โ”‚     /Contents 4 0 R                                             โ”‚
โ”‚     /Resources <<                                               โ”‚
โ”‚       /Font << /F1 10 0 R >> >> >>                             โ”‚
โ”‚           โ”‚                                                     โ”‚
โ”‚           โ–ผ                                                     โ”‚
โ”‚  Object 4: CONTENT STREAM (Page 1)                              โ”‚
โ”‚  << /Length 1234 /Filter /FlateDecode >>                        โ”‚
โ”‚  stream                                                         โ”‚
โ”‚  q                                                              โ”‚
โ”‚  100 100 m                                                      โ”‚
โ”‚  200 200 l                                                      โ”‚
โ”‚  S                                                              โ”‚
โ”‚  Q                                                              โ”‚
โ”‚  endstream                                                      โ”‚
โ”‚                                                                 โ”‚
โ”‚  Object 10: FONT                                                โ”‚
โ”‚  << /Type /Font                                                 โ”‚
โ”‚     /Subtype /Type1                                             โ”‚
โ”‚     /BaseFont /Helvetica >>                                     โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

5. xref Table Generation

You must track byte offsets as you write:

typedef struct {
    int obj_num;
    long byte_offset;
} ObjectOffset;

ObjectOffset offsets[1000];
int num_objects = 0;

void write_object(FILE* f, int obj_num, const char* content) {
    // Record offset BEFORE writing
    offsets[num_objects].obj_num = obj_num;
    offsets[num_objects].byte_offset = ftell(f);
    num_objects++;

    // Write object
    fprintf(f, "%d 0 obj\n%s\nendobj\n", obj_num, content);
}

void write_xref(FILE* f) {
    long xref_offset = ftell(f);

    fprintf(f, "xref\n");
    fprintf(f, "0 %d\n", num_objects + 1);
    fprintf(f, "0000000000 65535 f \n");  // Object 0 is free

    for (int i = 0; i < num_objects; i++) {
        fprintf(f, "%010ld 00000 n \n", offsets[i].byte_offset);
    }

    fprintf(f, "trailer\n");
    fprintf(f, "<< /Size %d /Root 1 0 R >>\n", num_objects + 1);
    fprintf(f, "startxref\n");
    fprintf(f, "%ld\n", xref_offset);
    fprintf(f, "%%%%EOF\n");
}

Project Specification

Core Features

Your PSโ†’PDF converter must:

  1. Parse PostScript
    • Tokenize numbers, names, strings, procedures
    • Handle comments (% to end of line)
  2. Execute PostScript
    • Implement stack operations (dup, exch, pop, roll, etc.)
    • Implement arithmetic (add, sub, mul, div, etc.)
    • Implement path operations (moveto, lineto, curveto, closepath)
    • Implement paint operations (stroke, fill)
    • Implement state operations (gsave, grestore, translate, scale, rotate)
    • Implement color operations (setgray, setrgbcolor)
    • Detect page boundaries (showpage)
  3. Record Operations
    • Capture all graphics operations during execution
    • Track page boundaries
    • Record font usage
  4. Generate PDF
    • Write valid PDF header
    • Create catalog, pages tree, page objects
    • Generate content streams from recorded operations
    • Create font resource dictionaries
    • Build xref table with correct byte offsets
    • Write trailer
  5. Handle Fonts
    • Map standard font names (Helvetica, Times-Roman, Courier)
    • Create font resource dictionaries

Command-Line Interface

# Basic conversion
./ps2pdf input.ps output.pdf

# With verbose logging
./ps2pdf --verbose input.ps output.pdf

# With compression
./ps2pdf --compress input.ps output.pdf

Solution Architecture

High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    PSโ†’PDF CONVERTER                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                    PS TOKENIZER                           โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  Input: PostScript text                                   โ”‚ โ”‚
โ”‚  โ”‚  Output: Token stream (NUMBER, NAME, STRING, PROC, ...)  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                            โ†“                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                  PS INTERPRETER                           โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚ โ”‚
โ”‚  โ”‚  โ”‚  Operand Stack  โ”‚  โ”‚     Graphics State           โ”‚    โ”‚ โ”‚
โ”‚  โ”‚  โ”‚  [100, 200]     โ”‚  โ”‚  CTM, color, path, font     โ”‚    โ”‚ โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  Operators: add, moveto, stroke, gsave, show, etc.       โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                            โ†“                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                OPERATION RECORDER                         โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  Record instead of render:                                โ”‚ โ”‚
โ”‚  โ”‚  - moveto(100, 200) โ†’ OP_MOVETO{100, 200}               โ”‚ โ”‚
โ”‚  โ”‚  - stroke()         โ†’ OP_STROKE{}                        โ”‚ โ”‚
โ”‚  โ”‚  - showpage()       โ†’ OP_SHOWPAGE{} (page boundary)     โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  Page 1: [op, op, op, ...]                               โ”‚ โ”‚
โ”‚  โ”‚  Page 2: [op, op, op, ...]                               โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                            โ†“                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚                  PDF GENERATOR                            โ”‚ โ”‚
โ”‚  โ”‚                                                           โ”‚ โ”‚
โ”‚  โ”‚  1. Allocate object numbers                              โ”‚ โ”‚
โ”‚  โ”‚  2. Write header                                         โ”‚ โ”‚
โ”‚  โ”‚  3. Write catalog (obj 1)                                โ”‚ โ”‚
โ”‚  โ”‚  4. Write pages tree (obj 2)                             โ”‚ โ”‚
โ”‚  โ”‚  5. For each page:                                       โ”‚ โ”‚
โ”‚  โ”‚     - Write page object                                  โ”‚ โ”‚
โ”‚  โ”‚     - Convert ops to content stream                      โ”‚ โ”‚
โ”‚  โ”‚     - Write content stream object                        โ”‚ โ”‚
โ”‚  โ”‚  6. Write font objects                                   โ”‚ โ”‚
โ”‚  โ”‚  7. Write xref table                                     โ”‚ โ”‚
โ”‚  โ”‚  8. Write trailer                                        โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                            โ†“                                    โ”‚
โ”‚                       PDF Output                                โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Data Structures

// Recorded operation
typedef enum {
    OP_MOVETO, OP_LINETO, OP_CURVETO, OP_CLOSEPATH,
    OP_STROKE, OP_FILL,
    OP_GSAVE, OP_GRESTORE,
    OP_SETGRAY, OP_SETRGBCOLOR,
    OP_CONCAT,
    OP_SETFONT, OP_SHOWTEXT,
    OP_SHOWPAGE
} OpType;

typedef struct {
    OpType type;
    union {
        struct { double x, y; } point;                    // moveto, lineto
        struct { double x1, y1, x2, y2, x3, y3; } curve; // curveto
        struct { double gray; } setgray;
        struct { double r, g, b; } setrgb;
        struct { double matrix[6]; } concat;
        struct { char* font; double size; } setfont;
        struct { char* text; } showtext;
    } data;
} RecordedOp;

// Page of recorded operations
typedef struct {
    RecordedOp* ops;
    size_t count;
    size_t capacity;
    double width, height;  // Page size
} RecordedPage;

// Document (all pages)
typedef struct {
    RecordedPage* pages;
    size_t page_count;

    // Fonts used
    char** fonts;
    size_t font_count;
} RecordedDocument;

// Interpreter state
typedef struct {
    // Operand stack
    double* stack;
    int stack_ptr;

    // Graphics state
    double ctm[6];
    double color_r, color_g, color_b;
    double line_width;
    char* current_font;
    double font_size;

    // Current path
    struct {
        double x, y;
        int type;  // 0=move, 1=line, 2=curve
    } path[10000];
    size_t path_len;

    // Graphics state stack
    struct GraphicsState* gstack;
    int gstack_ptr;

    // Recording
    RecordedDocument* doc;
    RecordedPage* current_page;
} Interpreter;

Implementation Guide

Phase 1: Minimal PDF Generator (Week 1, Days 1-2)

Start by generating a valid empty PDF:

void generate_empty_pdf(const char* filename) {
    FILE* f = fopen(filename, "wb");

    // Track offsets
    long offset_1, offset_2, offset_3, xref_offset;

    // Header
    fprintf(f, "%%PDF-1.4\n");
    fprintf(f, "%%\xe2\xe3\xcf\xd3\n");  // Binary marker

    // Object 1: Catalog
    offset_1 = ftell(f);
    fprintf(f, "1 0 obj\n");
    fprintf(f, "<< /Type /Catalog /Pages 2 0 R >>\n");
    fprintf(f, "endobj\n");

    // Object 2: Pages tree
    offset_2 = ftell(f);
    fprintf(f, "2 0 obj\n");
    fprintf(f, "<< /Type /Pages /Kids [3 0 R] /Count 1 >>\n");
    fprintf(f, "endobj\n");

    // Object 3: Page
    offset_3 = ftell(f);
    fprintf(f, "3 0 obj\n");
    fprintf(f, "<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] >>\n");
    fprintf(f, "endobj\n");

    // xref table
    xref_offset = ftell(f);
    fprintf(f, "xref\n");
    fprintf(f, "0 4\n");
    fprintf(f, "0000000000 65535 f \n");
    fprintf(f, "%010ld 00000 n \n", offset_1);
    fprintf(f, "%010ld 00000 n \n", offset_2);
    fprintf(f, "%010ld 00000 n \n", offset_3);

    // Trailer
    fprintf(f, "trailer\n");
    fprintf(f, "<< /Size 4 /Root 1 0 R >>\n");
    fprintf(f, "startxref\n");
    fprintf(f, "%ld\n", xref_offset);
    fprintf(f, "%%%%EOF\n");

    fclose(f);
}

Test: Open the PDF in a viewer - should show a blank page.

Phase 2: PDF with Content Stream (Week 1, Days 3-4)

Add a hardcoded content stream:

void generate_pdf_with_line(const char* filename) {
    FILE* f = fopen(filename, "wb");

    long offsets[10];
    int obj_num = 0;

    fprintf(f, "%%PDF-1.4\n%%\xe2\xe3\xcf\xd3\n");

    // Object 1: Catalog
    offsets[++obj_num] = ftell(f);
    fprintf(f, "1 0 obj\n<< /Type /Catalog /Pages 2 0 R >>\nendobj\n");

    // Object 2: Pages
    offsets[++obj_num] = ftell(f);
    fprintf(f, "2 0 obj\n<< /Type /Pages /Kids [3 0 R] /Count 1 >>\nendobj\n");

    // Object 3: Page with content reference
    offsets[++obj_num] = ftell(f);
    fprintf(f, "3 0 obj\n");
    fprintf(f, "<< /Type /Page /Parent 2 0 R ");
    fprintf(f, "/MediaBox [0 0 612 792] ");
    fprintf(f, "/Contents 4 0 R >>\n");
    fprintf(f, "endobj\n");

    // Object 4: Content stream
    const char* content = "100 100 m\n200 200 l\nS\n";
    size_t content_len = strlen(content);

    offsets[++obj_num] = ftell(f);
    fprintf(f, "4 0 obj\n");
    fprintf(f, "<< /Length %zu >>\n", content_len);
    fprintf(f, "stream\n");
    fprintf(f, "%s", content);
    fprintf(f, "endstream\n");
    fprintf(f, "endobj\n");

    // xref
    long xref_offset = ftell(f);
    fprintf(f, "xref\n0 %d\n", obj_num + 1);
    fprintf(f, "0000000000 65535 f \n");
    for (int i = 1; i <= obj_num; i++) {
        fprintf(f, "%010ld 00000 n \n", offsets[i]);
    }

    fprintf(f, "trailer\n<< /Size %d /Root 1 0 R >>\n", obj_num + 1);
    fprintf(f, "startxref\n%ld\n%%%%EOF\n", xref_offset);

    fclose(f);
}

Test: Should show a diagonal line from (100,100) to (200,200).

Phase 3: Operation Recording (Week 1, Day 5 - Week 2, Day 2)

Create the recording infrastructure:

RecordedPage* new_page(double width, double height) {
    RecordedPage* page = calloc(1, sizeof(RecordedPage));
    page->width = width;
    page->height = height;
    page->capacity = 1000;
    page->ops = calloc(page->capacity, sizeof(RecordedOp));
    return page;
}

void record_op(RecordedPage* page, RecordedOp op) {
    if (page->count >= page->capacity) {
        page->capacity *= 2;
        page->ops = realloc(page->ops, page->capacity * sizeof(RecordedOp));
    }
    page->ops[page->count++] = op;
}

void record_moveto(Interpreter* interp, double x, double y) {
    RecordedOp op = {
        .type = OP_MOVETO,
        .data.point = {x, y}
    };
    record_op(interp->current_page, op);
}

void record_stroke(Interpreter* interp) {
    RecordedOp op = { .type = OP_STROKE };
    record_op(interp->current_page, op);
}

void record_showpage(Interpreter* interp) {
    RecordedOp op = { .type = OP_SHOWPAGE };
    record_op(interp->current_page, op);

    // Start new page
    interp->doc->page_count++;
    interp->doc->pages = realloc(interp->doc->pages,
        interp->doc->page_count * sizeof(RecordedPage*));
    interp->current_page = new_page(612, 792);
    interp->doc->pages[interp->doc->page_count - 1] = interp->current_page;
}

Phase 4: PS Interpreter with Recording (Week 2, Days 3-5)

Integrate recording into the interpreter:

void ps_moveto(Interpreter* interp) {
    double y = pop(interp);
    double x = pop(interp);

    // Update internal state
    interp->current_x = x;
    interp->current_y = y;

    // Record operation
    record_moveto(interp, x, y);
}

void ps_lineto(Interpreter* interp) {
    double y = pop(interp);
    double x = pop(interp);

    // Update internal state
    interp->current_x = x;
    interp->current_y = y;

    // Record operation
    RecordedOp op = {
        .type = OP_LINETO,
        .data.point = {x, y}
    };
    record_op(interp->current_page, op);
}

void ps_stroke(Interpreter* interp) {
    // Record (no internal state change needed)
    record_stroke(interp);
}

void ps_setgray(Interpreter* interp) {
    double gray = pop(interp);

    // Update internal state
    interp->color_r = interp->color_g = interp->color_b = gray;

    // Record operation
    RecordedOp op = {
        .type = OP_SETGRAY,
        .data.setgray = {gray}
    };
    record_op(interp->current_page, op);
}

void ps_gsave(Interpreter* interp) {
    // Push current state
    push_gstate(interp);

    // Record operation
    RecordedOp op = { .type = OP_GSAVE };
    record_op(interp->current_page, op);
}

void ps_showpage(Interpreter* interp) {
    record_showpage(interp);
}

Phase 5: PDF Content Stream Generation (Week 3, Days 1-3)

Convert recorded operations to PDF syntax:

char* generate_content_stream(RecordedPage* page, size_t* out_len) {
    // Allocate buffer (estimate size)
    size_t capacity = page->count * 50;
    char* buffer = malloc(capacity);
    size_t len = 0;

    for (size_t i = 0; i < page->count; i++) {
        RecordedOp* op = &page->ops[i];

        switch (op->type) {
            case OP_MOVETO:
                len += sprintf(buffer + len, "%.4g %.4g m\n",
                    op->data.point.x, op->data.point.y);
                break;

            case OP_LINETO:
                len += sprintf(buffer + len, "%.4g %.4g l\n",
                    op->data.point.x, op->data.point.y);
                break;

            case OP_CURVETO:
                len += sprintf(buffer + len, "%.4g %.4g %.4g %.4g %.4g %.4g c\n",
                    op->data.curve.x1, op->data.curve.y1,
                    op->data.curve.x2, op->data.curve.y2,
                    op->data.curve.x3, op->data.curve.y3);
                break;

            case OP_CLOSEPATH:
                len += sprintf(buffer + len, "h\n");
                break;

            case OP_STROKE:
                len += sprintf(buffer + len, "S\n");
                break;

            case OP_FILL:
                len += sprintf(buffer + len, "f\n");
                break;

            case OP_GSAVE:
                len += sprintf(buffer + len, "q\n");
                break;

            case OP_GRESTORE:
                len += sprintf(buffer + len, "Q\n");
                break;

            case OP_SETGRAY:
                // PDF uses 'g' for fill gray, 'G' for stroke gray
                // For simplicity, set both
                len += sprintf(buffer + len, "%.4g g\n", op->data.setgray.gray);
                len += sprintf(buffer + len, "%.4g G\n", op->data.setgray.gray);
                break;

            case OP_SETRGBCOLOR:
                len += sprintf(buffer + len, "%.4g %.4g %.4g rg\n",
                    op->data.setrgb.r, op->data.setrgb.g, op->data.setrgb.b);
                len += sprintf(buffer + len, "%.4g %.4g %.4g RG\n",
                    op->data.setrgb.r, op->data.setrgb.g, op->data.setrgb.b);
                break;

            case OP_CONCAT:
                len += sprintf(buffer + len, "%.4g %.4g %.4g %.4g %.4g %.4g cm\n",
                    op->data.concat.matrix[0], op->data.concat.matrix[1],
                    op->data.concat.matrix[2], op->data.concat.matrix[3],
                    op->data.concat.matrix[4], op->data.concat.matrix[5]);
                break;

            case OP_SETFONT:
                // Note: need font mapping (F1, F2, etc.)
                len += sprintf(buffer + len, "BT\n/F1 %.4g Tf\n",
                    op->data.setfont.size);
                break;

            case OP_SHOWTEXT:
                len += sprintf(buffer + len, "(%s) Tj\nET\n",
                    op->data.showtext.text);
                break;

            case OP_SHOWPAGE:
                // Ignored in content stream (handled at page level)
                break;
        }

        // Expand buffer if needed
        if (len > capacity - 100) {
            capacity *= 2;
            buffer = realloc(buffer, capacity);
        }
    }

    *out_len = len;
    return buffer;
}

Phase 6: Complete PDF Generation (Week 3, Days 4-5)

void generate_pdf_from_document(RecordedDocument* doc, const char* filename) {
    FILE* f = fopen(filename, "wb");

    // Allocate object numbers
    // 1 = Catalog
    // 2 = Pages tree
    // 3, 5, 7, ... = Page objects
    // 4, 6, 8, ... = Content stream objects
    // Last objects = Fonts

    int total_objects = 2 + doc->page_count * 2 + doc->font_count;
    long* offsets = calloc(total_objects + 1, sizeof(long));

    // Write header
    fprintf(f, "%%PDF-1.4\n%%\xe2\xe3\xcf\xd3\n");

    // Object 1: Catalog
    offsets[1] = ftell(f);
    fprintf(f, "1 0 obj\n<< /Type /Catalog /Pages 2 0 R >>\nendobj\n");

    // Object 2: Pages tree
    offsets[2] = ftell(f);
    fprintf(f, "2 0 obj\n<< /Type /Pages /Kids [");
    for (size_t i = 0; i < doc->page_count; i++) {
        fprintf(f, "%zu 0 R ", 3 + i * 2);
    }
    fprintf(f, "] /Count %zu >>\nendobj\n", doc->page_count);

    // Font object numbers start here
    int font_base = 3 + doc->page_count * 2;

    // Pages and content streams
    for (size_t i = 0; i < doc->page_count; i++) {
        RecordedPage* page = doc->pages[i];
        int page_obj = 3 + i * 2;
        int content_obj = 4 + i * 2;

        // Generate content stream
        size_t content_len;
        char* content = generate_content_stream(page, &content_len);

        // Write content stream object
        offsets[content_obj] = ftell(f);
        fprintf(f, "%d 0 obj\n", content_obj);
        fprintf(f, "<< /Length %zu >>\n", content_len);
        fprintf(f, "stream\n");
        fwrite(content, 1, content_len, f);
        fprintf(f, "endstream\nendobj\n");
        free(content);

        // Write page object
        offsets[page_obj] = ftell(f);
        fprintf(f, "%d 0 obj\n", page_obj);
        fprintf(f, "<< /Type /Page\n");
        fprintf(f, "   /Parent 2 0 R\n");
        fprintf(f, "   /MediaBox [0 0 %.4g %.4g]\n", page->width, page->height);
        fprintf(f, "   /Contents %d 0 R\n", content_obj);

        // Resources with fonts
        if (doc->font_count > 0) {
            fprintf(f, "   /Resources <<\n");
            fprintf(f, "     /Font <<\n");
            for (size_t j = 0; j < doc->font_count; j++) {
                fprintf(f, "       /F%zu %d 0 R\n", j + 1, font_base + (int)j);
            }
            fprintf(f, "     >>\n");
            fprintf(f, "   >>\n");
        }

        fprintf(f, ">>\nendobj\n");
    }

    // Font objects
    for (size_t i = 0; i < doc->font_count; i++) {
        offsets[font_base + i] = ftell(f);
        fprintf(f, "%d 0 obj\n", font_base + (int)i);
        fprintf(f, "<< /Type /Font\n");
        fprintf(f, "   /Subtype /Type1\n");
        fprintf(f, "   /BaseFont /%s >>\n", doc->fonts[i]);
        fprintf(f, "endobj\n");
    }

    // xref table
    long xref_offset = ftell(f);
    fprintf(f, "xref\n0 %d\n", total_objects + 1);
    fprintf(f, "0000000000 65535 f \n");
    for (int i = 1; i <= total_objects; i++) {
        fprintf(f, "%010ld 00000 n \n", offsets[i]);
    }

    // Trailer
    fprintf(f, "trailer\n<< /Size %d /Root 1 0 R >>\n", total_objects + 1);
    fprintf(f, "startxref\n%ld\n%%%%EOF\n", xref_offset);

    free(offsets);
    fclose(f);
}

Phase 7: Main Conversion Function (Week 4)

int convert_ps_to_pdf(const char* ps_file, const char* pdf_file) {
    // Read PostScript file
    FILE* f = fopen(ps_file, "r");
    if (!f) {
        fprintf(stderr, "Cannot open %s\n", ps_file);
        return 1;
    }

    fseek(f, 0, SEEK_END);
    size_t size = ftell(f);
    fseek(f, 0, SEEK_SET);

    char* ps_content = malloc(size + 1);
    fread(ps_content, 1, size, f);
    ps_content[size] = '\0';
    fclose(f);

    // Initialize interpreter
    Interpreter* interp = interpreter_new();

    // Initialize document
    RecordedDocument* doc = calloc(1, sizeof(RecordedDocument));
    doc->pages = calloc(1, sizeof(RecordedPage*));
    doc->pages[0] = new_page(612, 792);  // US Letter
    doc->page_count = 1;
    interp->doc = doc;
    interp->current_page = doc->pages[0];

    // Execute PostScript
    execute_postscript(interp, ps_content);

    // Generate PDF
    generate_pdf_from_document(doc, pdf_file);

    // Cleanup
    free(ps_content);
    // ... free document and interpreter

    printf("Converted %s to %s\n", ps_file, pdf_file);
    return 0;
}

int main(int argc, char** argv) {
    if (argc < 3) {
        fprintf(stderr, "Usage: %s input.ps output.pdf\n", argv[0]);
        return 1;
    }

    return convert_ps_to_pdf(argv[1], argv[2]);
}

Testing Strategy

Test Cases

  1. Empty page:
    %!PS-Adobe-3.0
    showpage
    
  2. Simple line:
    %!PS-Adobe-3.0
    100 100 moveto
    200 200 lineto
    stroke
    showpage
    
  3. Filled shape:
    %!PS-Adobe-3.0
    newpath
    100 100 moveto
    300 100 lineto
    300 300 lineto
    100 300 lineto
    closepath
    0.5 setgray
    fill
    showpage
    
  4. Transformations:
    %!PS-Adobe-3.0
    gsave
    200 200 translate
    45 rotate
    0 0 moveto 100 0 lineto stroke
    grestore
    showpage
    
  5. Multiple pages:
    %!PS-Adobe-3.0
    100 100 moveto (Page 1) show
    showpage
    200 200 moveto (Page 2) show
    showpage
    

Validation

# Compare with Ghostscript
gs -sDEVICE=pdfwrite -o reference.pdf test.ps
./ps2pdf test.ps output.pdf

# Validate structure
qpdf --check output.pdf

# Compare text extraction
diff <(pdftotext reference.pdf -) <(pdftotext output.pdf -)

# Visual comparison
gs -sDEVICE=png16m -r150 -o ref%d.png reference.pdf
gs -sDEVICE=png16m -r150 -o out%d.png output.pdf
compare ref1.png out1.png diff.png

Common Pitfalls

1. xref Offset Precision

xref offsets must be exactly 10 digits, padded with zeros:

// WRONG:
fprintf(f, "%ld 00000 n \n", offset);  // Might be 7 digits

// CORRECT:
fprintf(f, "%010ld 00000 n \n", offset);  // Always 10 digits

2. Stream Length Accuracy

The /Length value must exactly match the stream content:

size_t content_len;
char* content = generate_content_stream(page, &content_len);

fprintf(f, "<< /Length %zu >>\n", content_len);  // Must match exactly
fprintf(f, "stream\n");
fwrite(content, 1, content_len, f);  // Write exactly content_len bytes
fprintf(f, "endstream\n");

3. Color Separation

PDF separates fill and stroke colors. When converting from PostScript (one color):

// PS: setrgbcolor sets both
// PDF: Need to set both rg (fill) and RG (stroke)
len += sprintf(buffer + len, "%.4g %.4g %.4g rg\n", r, g, b);
len += sprintf(buffer + len, "%.4g %.4g %.4g RG\n", r, g, b);

4. Text Mode

PDF requires BT/ET around text operations:

// WRONG:
fprintf(f, "(Hello) Tj\n");

// CORRECT:
fprintf(f, "BT\n");
fprintf(f, "/F1 12 Tf\n");
fprintf(f, "100 700 Td\n");
fprintf(f, "(Hello) Tj\n");
fprintf(f, "ET\n");

Extensions

Level 1: Stream Compression

Add Flate compression for content streams:

#include <zlib.h>

char* compress_stream(const char* data, size_t len, size_t* out_len) {
    uLongf compressed_len = compressBound(len);
    char* compressed = malloc(compressed_len);

    compress((Bytef*)compressed, &compressed_len, (const Bytef*)data, len);

    *out_len = compressed_len;
    return compressed;
}

// When writing:
fprintf(f, "<< /Length %zu /Filter /FlateDecode >>\n", compressed_len);

Level 2: Font Embedding

Embed fonts instead of referencing by name.

Level 3: Image Support

Handle PostScript image operator and convert to PDF XObjects.

Level 4: Complete PostScript Support

Add loops, conditionals, procedures, dictionaries.


Self-Assessment

Before considering this project complete:

  • Can convert a PostScript file with lines and fills to PDF
  • Generated PDF opens correctly in multiple readers
  • xref table has correct byte offsets
  • Content stream syntax matches PDF specification
  • Multiple pages are handled correctly
  • Output matches Ghostscriptโ€™s output for simple inputs
  • Colors and transformations work correctly

Resources

Essential Reading

  • Developing with PDF by Leonard Rosenthol - PDF generation
  • PostScript Language Reference Manual - PS operators
  • Engineering a Compiler by Cooper & Torczon - Code generation patterns

Tools

  • Ghostscript: Reference implementation (gs -sDEVICE=pdfwrite)
  • qpdf: PDF validation
  • pdfinfo/pdftotext: PDF analysis