Project 3: PostScript-to-PDF Converter
Project 3: PostScript-to-PDF Converter
Project Overview
| Attribute | Details |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 1 month+ |
| Programming Language | C |
| Knowledge Area | Code Generation / Graphics |
| Prerequisites | Projects 1 and 2, or solid understanding of both formats |
What youโll build: A mini โGhostscriptโ that executes PostScript and outputs a valid PDF file.
Why it teaches PostScriptโPDF: This is the exact transformation Ghostscript performs. Youโll execute PS code and instead of drawing to screen, youโll capture the operations and emit them as PDF content streams.
Learning Objectives
By completing this project, you will:
- Execute PostScript programs using a stack-based interpreter
- Capture graphics operations during execution (instead of rendering)
- Generate valid PDF structure with objects, xref, and trailer
- Map PostScript operators to PDF operators correctly
- Handle font references between PostScript and PDF
- Understand the code generation pattern used in compilers
The Core Question Youโre Answering
โHow do you transform an executable program (PostScript) into a static document (PDF)?โ
This is the fundamental question of PSโPDF conversion. PostScript is a full Turing-complete programming language with loops, conditionals, and procedures. PDF is a static page description format with no flow control. Your converter must:
- Execute PostScript code (run the program)
- Capture the side effects (what gets drawn)
- Serialize those operations into PDFโs declarative format
The paradox: How do you freeze a running program into a static snapshot?
The answer reveals deep truths about interpreters, intermediate representations, and the distinction between computation and presentation.
Deep Theoretical Foundation
1. The Fundamental Transformation
PostScript and PDF look similar but are fundamentally different:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ POSTSCRIPT โ PDF โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ POSTSCRIPT (Executable) PDF (Static) โ
โ โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ
โ % Draw 10 circles % Must list all 10 explicitlyโ
โ 0 10 90 { 0 0 m 10 0 0 10 0 360 arc S โ
โ 0 0 moveto 0 0 m 20 0 0 20 0 360 arc S โ
โ 0 360 arc 0 0 m 30 0 0 30 0 360 arc S โ
โ stroke 0 0 m 40 0 0 40 0 360 arc S โ
โ } for 0 0 m 50 0 0 50 0 360 arc S โ
โ 0 0 m 60 0 0 60 0 360 arc S โ
โ 10 lines โ 10 circles 0 0 m 70 0 0 70 0 360 arc S โ
โ 0 0 m 80 0 0 80 0 360 arc S โ
โ 0 0 m 90 0 0 90 0 360 arc S โ
โ โ
โ 90 lines โ 10 circles โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The transformation process:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CONVERSION PIPELINE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ PostScript Source โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ POSTSCRIPT INTERPRETER โ โ
โ โ โ โ
โ โ - Execute stack operations โ โ
โ โ - Evaluate loops and conditionals โ โ
โ โ - Call procedures โ โ
โ โ - Update graphics state โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ OPERATION RECORDER โ โ
โ โ โ โ
โ โ Instead of rendering pixels, capture: โ โ
โ โ - Path operations (moveto, lineto, curveto) โ โ
โ โ - Paint operations (stroke, fill) โ โ
โ โ - Color changes (setgray, setrgbcolor) โ โ
โ โ - State changes (gsave, grestore, translate) โ โ
โ โ - Text operations (show) โ โ
โ โ - showpage boundaries โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PDF GENERATOR โ โ
โ โ โ โ
โ โ - Build PDF object structure โ โ
โ โ - Convert recorded operations to PDF operators โ โ
โ โ - Generate content streams โ โ
โ โ - Create xref table โ โ
โ โ - Write trailer โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ PDF File โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2. Graphics Operation Recording
The key insight: Instead of drawing pixels, record the operations.
This is the โoutput deviceโ abstraction that Ghostscript uses:
// Traditional rendering device
typedef struct {
void (*moveto)(Device* dev, double x, double y);
void (*lineto)(Device* dev, double x, double y);
void (*stroke)(Device* dev);
void (*fill)(Device* dev);
// ...
} DeviceOps;
// Pixel rendering implementation
void render_stroke(Device* dev) {
// Actually draw pixels to bitmap
rasterize_path(dev->path, dev->color);
}
// PDF capturing implementation
void record_stroke(Device* dev) {
// Just record "stroke" operation
add_operation(dev->ops, OP_STROKE, NULL, 0);
}
3. PostScript to PDF Operator Mapping
Most mappings are direct:
| PostScript | Notes | |
|---|---|---|
moveto |
m |
Identical semantics |
lineto |
l |
Identical semantics |
curveto |
c |
Identical semantics |
closepath |
h |
Identical semantics |
stroke |
S |
Identical semantics |
fill |
f |
Identical semantics |
gsave |
q |
Identical semantics |
grestore |
Q |
Identical semantics |
setgray |
g (fill) / G (stroke) |
PDF separates fill/stroke color |
setrgbcolor |
rg (fill) / RG (stroke) |
PDF separates fill/stroke color |
translate, scale, rotate |
cm |
Concatenate to CTM |
show |
Tj inside BT/ET |
PDF requires text mode |
showpage |
(page boundary) | Start new page |
Key differences:
- Color: PostScript has one color for both fill and stroke. PDF separates them.
- Text: PostScript uses
showdirectly. PDF requiresBTโฆETblock. - Loops/Conditionals: Must be executed; output is flattened.
4. PDF Object Structure
Your converter must generate this structure:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PDF OBJECT HIERARCHY โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Object 1: CATALOG โ
โ << /Type /Catalog โ
โ /Pages 2 0 R >> โ
โ โ โ
โ โผ โ
โ Object 2: PAGES TREE โ
โ << /Type /Pages โ
โ /Kids [3 0 R 6 0 R ...] โ One entry per page โ
โ /Count N >> โ
โ โ โ
โ โโโโโโโดโโโโโโฌโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ Object 3 Object 6 Object 9 ... โ
โ PAGE 1 PAGE 2 PAGE 3 โ
โ << /Type /Page โ
โ /Parent 2 0 R โ
โ /MediaBox [0 0 612 792] โ
โ /Contents 4 0 R โ
โ /Resources << โ
โ /Font << /F1 10 0 R >> >> >> โ
โ โ โ
โ โผ โ
โ Object 4: CONTENT STREAM (Page 1) โ
โ << /Length 1234 /Filter /FlateDecode >> โ
โ stream โ
โ q โ
โ 100 100 m โ
โ 200 200 l โ
โ S โ
โ Q โ
โ endstream โ
โ โ
โ Object 10: FONT โ
โ << /Type /Font โ
โ /Subtype /Type1 โ
โ /BaseFont /Helvetica >> โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
5. xref Table Generation
You must track byte offsets as you write:
typedef struct {
int obj_num;
long byte_offset;
} ObjectOffset;
ObjectOffset offsets[1000];
int num_objects = 0;
void write_object(FILE* f, int obj_num, const char* content) {
// Record offset BEFORE writing
offsets[num_objects].obj_num = obj_num;
offsets[num_objects].byte_offset = ftell(f);
num_objects++;
// Write object
fprintf(f, "%d 0 obj\n%s\nendobj\n", obj_num, content);
}
void write_xref(FILE* f) {
long xref_offset = ftell(f);
fprintf(f, "xref\n");
fprintf(f, "0 %d\n", num_objects + 1);
fprintf(f, "0000000000 65535 f \n"); // Object 0 is free
for (int i = 0; i < num_objects; i++) {
fprintf(f, "%010ld 00000 n \n", offsets[i].byte_offset);
}
fprintf(f, "trailer\n");
fprintf(f, "<< /Size %d /Root 1 0 R >>\n", num_objects + 1);
fprintf(f, "startxref\n");
fprintf(f, "%ld\n", xref_offset);
fprintf(f, "%%%%EOF\n");
}
Project Specification
Core Features
Your PSโPDF converter must:
- Parse PostScript
- Tokenize numbers, names, strings, procedures
- Handle comments (% to end of line)
- Execute PostScript
- Implement stack operations (dup, exch, pop, roll, etc.)
- Implement arithmetic (add, sub, mul, div, etc.)
- Implement path operations (moveto, lineto, curveto, closepath)
- Implement paint operations (stroke, fill)
- Implement state operations (gsave, grestore, translate, scale, rotate)
- Implement color operations (setgray, setrgbcolor)
- Detect page boundaries (showpage)
- Record Operations
- Capture all graphics operations during execution
- Track page boundaries
- Record font usage
- Generate PDF
- Write valid PDF header
- Create catalog, pages tree, page objects
- Generate content streams from recorded operations
- Create font resource dictionaries
- Build xref table with correct byte offsets
- Write trailer
- Handle Fonts
- Map standard font names (Helvetica, Times-Roman, Courier)
- Create font resource dictionaries
Command-Line Interface
# Basic conversion
./ps2pdf input.ps output.pdf
# With verbose logging
./ps2pdf --verbose input.ps output.pdf
# With compression
./ps2pdf --compress input.ps output.pdf
Solution Architecture
High-Level Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PSโPDF CONVERTER โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PS TOKENIZER โ โ
โ โ โ โ
โ โ Input: PostScript text โ โ
โ โ Output: Token stream (NUMBER, NAME, STRING, PROC, ...) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PS INTERPRETER โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Operand Stack โ โ Graphics State โ โ โ
โ โ โ [100, 200] โ โ CTM, color, path, font โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ Operators: add, moveto, stroke, gsave, show, etc. โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ OPERATION RECORDER โ โ
โ โ โ โ
โ โ Record instead of render: โ โ
โ โ - moveto(100, 200) โ OP_MOVETO{100, 200} โ โ
โ โ - stroke() โ OP_STROKE{} โ โ
โ โ - showpage() โ OP_SHOWPAGE{} (page boundary) โ โ
โ โ โ โ
โ โ Page 1: [op, op, op, ...] โ โ
โ โ Page 2: [op, op, op, ...] โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PDF GENERATOR โ โ
โ โ โ โ
โ โ 1. Allocate object numbers โ โ
โ โ 2. Write header โ โ
โ โ 3. Write catalog (obj 1) โ โ
โ โ 4. Write pages tree (obj 2) โ โ
โ โ 5. For each page: โ โ
โ โ - Write page object โ โ
โ โ - Convert ops to content stream โ โ
โ โ - Write content stream object โ โ
โ โ 6. Write font objects โ โ
โ โ 7. Write xref table โ โ
โ โ 8. Write trailer โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ PDF Output โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Data Structures
// Recorded operation
typedef enum {
OP_MOVETO, OP_LINETO, OP_CURVETO, OP_CLOSEPATH,
OP_STROKE, OP_FILL,
OP_GSAVE, OP_GRESTORE,
OP_SETGRAY, OP_SETRGBCOLOR,
OP_CONCAT,
OP_SETFONT, OP_SHOWTEXT,
OP_SHOWPAGE
} OpType;
typedef struct {
OpType type;
union {
struct { double x, y; } point; // moveto, lineto
struct { double x1, y1, x2, y2, x3, y3; } curve; // curveto
struct { double gray; } setgray;
struct { double r, g, b; } setrgb;
struct { double matrix[6]; } concat;
struct { char* font; double size; } setfont;
struct { char* text; } showtext;
} data;
} RecordedOp;
// Page of recorded operations
typedef struct {
RecordedOp* ops;
size_t count;
size_t capacity;
double width, height; // Page size
} RecordedPage;
// Document (all pages)
typedef struct {
RecordedPage* pages;
size_t page_count;
// Fonts used
char** fonts;
size_t font_count;
} RecordedDocument;
// Interpreter state
typedef struct {
// Operand stack
double* stack;
int stack_ptr;
// Graphics state
double ctm[6];
double color_r, color_g, color_b;
double line_width;
char* current_font;
double font_size;
// Current path
struct {
double x, y;
int type; // 0=move, 1=line, 2=curve
} path[10000];
size_t path_len;
// Graphics state stack
struct GraphicsState* gstack;
int gstack_ptr;
// Recording
RecordedDocument* doc;
RecordedPage* current_page;
} Interpreter;
Implementation Guide
Phase 1: Minimal PDF Generator (Week 1, Days 1-2)
Start by generating a valid empty PDF:
void generate_empty_pdf(const char* filename) {
FILE* f = fopen(filename, "wb");
// Track offsets
long offset_1, offset_2, offset_3, xref_offset;
// Header
fprintf(f, "%%PDF-1.4\n");
fprintf(f, "%%\xe2\xe3\xcf\xd3\n"); // Binary marker
// Object 1: Catalog
offset_1 = ftell(f);
fprintf(f, "1 0 obj\n");
fprintf(f, "<< /Type /Catalog /Pages 2 0 R >>\n");
fprintf(f, "endobj\n");
// Object 2: Pages tree
offset_2 = ftell(f);
fprintf(f, "2 0 obj\n");
fprintf(f, "<< /Type /Pages /Kids [3 0 R] /Count 1 >>\n");
fprintf(f, "endobj\n");
// Object 3: Page
offset_3 = ftell(f);
fprintf(f, "3 0 obj\n");
fprintf(f, "<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] >>\n");
fprintf(f, "endobj\n");
// xref table
xref_offset = ftell(f);
fprintf(f, "xref\n");
fprintf(f, "0 4\n");
fprintf(f, "0000000000 65535 f \n");
fprintf(f, "%010ld 00000 n \n", offset_1);
fprintf(f, "%010ld 00000 n \n", offset_2);
fprintf(f, "%010ld 00000 n \n", offset_3);
// Trailer
fprintf(f, "trailer\n");
fprintf(f, "<< /Size 4 /Root 1 0 R >>\n");
fprintf(f, "startxref\n");
fprintf(f, "%ld\n", xref_offset);
fprintf(f, "%%%%EOF\n");
fclose(f);
}
Test: Open the PDF in a viewer - should show a blank page.
Phase 2: PDF with Content Stream (Week 1, Days 3-4)
Add a hardcoded content stream:
void generate_pdf_with_line(const char* filename) {
FILE* f = fopen(filename, "wb");
long offsets[10];
int obj_num = 0;
fprintf(f, "%%PDF-1.4\n%%\xe2\xe3\xcf\xd3\n");
// Object 1: Catalog
offsets[++obj_num] = ftell(f);
fprintf(f, "1 0 obj\n<< /Type /Catalog /Pages 2 0 R >>\nendobj\n");
// Object 2: Pages
offsets[++obj_num] = ftell(f);
fprintf(f, "2 0 obj\n<< /Type /Pages /Kids [3 0 R] /Count 1 >>\nendobj\n");
// Object 3: Page with content reference
offsets[++obj_num] = ftell(f);
fprintf(f, "3 0 obj\n");
fprintf(f, "<< /Type /Page /Parent 2 0 R ");
fprintf(f, "/MediaBox [0 0 612 792] ");
fprintf(f, "/Contents 4 0 R >>\n");
fprintf(f, "endobj\n");
// Object 4: Content stream
const char* content = "100 100 m\n200 200 l\nS\n";
size_t content_len = strlen(content);
offsets[++obj_num] = ftell(f);
fprintf(f, "4 0 obj\n");
fprintf(f, "<< /Length %zu >>\n", content_len);
fprintf(f, "stream\n");
fprintf(f, "%s", content);
fprintf(f, "endstream\n");
fprintf(f, "endobj\n");
// xref
long xref_offset = ftell(f);
fprintf(f, "xref\n0 %d\n", obj_num + 1);
fprintf(f, "0000000000 65535 f \n");
for (int i = 1; i <= obj_num; i++) {
fprintf(f, "%010ld 00000 n \n", offsets[i]);
}
fprintf(f, "trailer\n<< /Size %d /Root 1 0 R >>\n", obj_num + 1);
fprintf(f, "startxref\n%ld\n%%%%EOF\n", xref_offset);
fclose(f);
}
Test: Should show a diagonal line from (100,100) to (200,200).
Phase 3: Operation Recording (Week 1, Day 5 - Week 2, Day 2)
Create the recording infrastructure:
RecordedPage* new_page(double width, double height) {
RecordedPage* page = calloc(1, sizeof(RecordedPage));
page->width = width;
page->height = height;
page->capacity = 1000;
page->ops = calloc(page->capacity, sizeof(RecordedOp));
return page;
}
void record_op(RecordedPage* page, RecordedOp op) {
if (page->count >= page->capacity) {
page->capacity *= 2;
page->ops = realloc(page->ops, page->capacity * sizeof(RecordedOp));
}
page->ops[page->count++] = op;
}
void record_moveto(Interpreter* interp, double x, double y) {
RecordedOp op = {
.type = OP_MOVETO,
.data.point = {x, y}
};
record_op(interp->current_page, op);
}
void record_stroke(Interpreter* interp) {
RecordedOp op = { .type = OP_STROKE };
record_op(interp->current_page, op);
}
void record_showpage(Interpreter* interp) {
RecordedOp op = { .type = OP_SHOWPAGE };
record_op(interp->current_page, op);
// Start new page
interp->doc->page_count++;
interp->doc->pages = realloc(interp->doc->pages,
interp->doc->page_count * sizeof(RecordedPage*));
interp->current_page = new_page(612, 792);
interp->doc->pages[interp->doc->page_count - 1] = interp->current_page;
}
Phase 4: PS Interpreter with Recording (Week 2, Days 3-5)
Integrate recording into the interpreter:
void ps_moveto(Interpreter* interp) {
double y = pop(interp);
double x = pop(interp);
// Update internal state
interp->current_x = x;
interp->current_y = y;
// Record operation
record_moveto(interp, x, y);
}
void ps_lineto(Interpreter* interp) {
double y = pop(interp);
double x = pop(interp);
// Update internal state
interp->current_x = x;
interp->current_y = y;
// Record operation
RecordedOp op = {
.type = OP_LINETO,
.data.point = {x, y}
};
record_op(interp->current_page, op);
}
void ps_stroke(Interpreter* interp) {
// Record (no internal state change needed)
record_stroke(interp);
}
void ps_setgray(Interpreter* interp) {
double gray = pop(interp);
// Update internal state
interp->color_r = interp->color_g = interp->color_b = gray;
// Record operation
RecordedOp op = {
.type = OP_SETGRAY,
.data.setgray = {gray}
};
record_op(interp->current_page, op);
}
void ps_gsave(Interpreter* interp) {
// Push current state
push_gstate(interp);
// Record operation
RecordedOp op = { .type = OP_GSAVE };
record_op(interp->current_page, op);
}
void ps_showpage(Interpreter* interp) {
record_showpage(interp);
}
Phase 5: PDF Content Stream Generation (Week 3, Days 1-3)
Convert recorded operations to PDF syntax:
char* generate_content_stream(RecordedPage* page, size_t* out_len) {
// Allocate buffer (estimate size)
size_t capacity = page->count * 50;
char* buffer = malloc(capacity);
size_t len = 0;
for (size_t i = 0; i < page->count; i++) {
RecordedOp* op = &page->ops[i];
switch (op->type) {
case OP_MOVETO:
len += sprintf(buffer + len, "%.4g %.4g m\n",
op->data.point.x, op->data.point.y);
break;
case OP_LINETO:
len += sprintf(buffer + len, "%.4g %.4g l\n",
op->data.point.x, op->data.point.y);
break;
case OP_CURVETO:
len += sprintf(buffer + len, "%.4g %.4g %.4g %.4g %.4g %.4g c\n",
op->data.curve.x1, op->data.curve.y1,
op->data.curve.x2, op->data.curve.y2,
op->data.curve.x3, op->data.curve.y3);
break;
case OP_CLOSEPATH:
len += sprintf(buffer + len, "h\n");
break;
case OP_STROKE:
len += sprintf(buffer + len, "S\n");
break;
case OP_FILL:
len += sprintf(buffer + len, "f\n");
break;
case OP_GSAVE:
len += sprintf(buffer + len, "q\n");
break;
case OP_GRESTORE:
len += sprintf(buffer + len, "Q\n");
break;
case OP_SETGRAY:
// PDF uses 'g' for fill gray, 'G' for stroke gray
// For simplicity, set both
len += sprintf(buffer + len, "%.4g g\n", op->data.setgray.gray);
len += sprintf(buffer + len, "%.4g G\n", op->data.setgray.gray);
break;
case OP_SETRGBCOLOR:
len += sprintf(buffer + len, "%.4g %.4g %.4g rg\n",
op->data.setrgb.r, op->data.setrgb.g, op->data.setrgb.b);
len += sprintf(buffer + len, "%.4g %.4g %.4g RG\n",
op->data.setrgb.r, op->data.setrgb.g, op->data.setrgb.b);
break;
case OP_CONCAT:
len += sprintf(buffer + len, "%.4g %.4g %.4g %.4g %.4g %.4g cm\n",
op->data.concat.matrix[0], op->data.concat.matrix[1],
op->data.concat.matrix[2], op->data.concat.matrix[3],
op->data.concat.matrix[4], op->data.concat.matrix[5]);
break;
case OP_SETFONT:
// Note: need font mapping (F1, F2, etc.)
len += sprintf(buffer + len, "BT\n/F1 %.4g Tf\n",
op->data.setfont.size);
break;
case OP_SHOWTEXT:
len += sprintf(buffer + len, "(%s) Tj\nET\n",
op->data.showtext.text);
break;
case OP_SHOWPAGE:
// Ignored in content stream (handled at page level)
break;
}
// Expand buffer if needed
if (len > capacity - 100) {
capacity *= 2;
buffer = realloc(buffer, capacity);
}
}
*out_len = len;
return buffer;
}
Phase 6: Complete PDF Generation (Week 3, Days 4-5)
void generate_pdf_from_document(RecordedDocument* doc, const char* filename) {
FILE* f = fopen(filename, "wb");
// Allocate object numbers
// 1 = Catalog
// 2 = Pages tree
// 3, 5, 7, ... = Page objects
// 4, 6, 8, ... = Content stream objects
// Last objects = Fonts
int total_objects = 2 + doc->page_count * 2 + doc->font_count;
long* offsets = calloc(total_objects + 1, sizeof(long));
// Write header
fprintf(f, "%%PDF-1.4\n%%\xe2\xe3\xcf\xd3\n");
// Object 1: Catalog
offsets[1] = ftell(f);
fprintf(f, "1 0 obj\n<< /Type /Catalog /Pages 2 0 R >>\nendobj\n");
// Object 2: Pages tree
offsets[2] = ftell(f);
fprintf(f, "2 0 obj\n<< /Type /Pages /Kids [");
for (size_t i = 0; i < doc->page_count; i++) {
fprintf(f, "%zu 0 R ", 3 + i * 2);
}
fprintf(f, "] /Count %zu >>\nendobj\n", doc->page_count);
// Font object numbers start here
int font_base = 3 + doc->page_count * 2;
// Pages and content streams
for (size_t i = 0; i < doc->page_count; i++) {
RecordedPage* page = doc->pages[i];
int page_obj = 3 + i * 2;
int content_obj = 4 + i * 2;
// Generate content stream
size_t content_len;
char* content = generate_content_stream(page, &content_len);
// Write content stream object
offsets[content_obj] = ftell(f);
fprintf(f, "%d 0 obj\n", content_obj);
fprintf(f, "<< /Length %zu >>\n", content_len);
fprintf(f, "stream\n");
fwrite(content, 1, content_len, f);
fprintf(f, "endstream\nendobj\n");
free(content);
// Write page object
offsets[page_obj] = ftell(f);
fprintf(f, "%d 0 obj\n", page_obj);
fprintf(f, "<< /Type /Page\n");
fprintf(f, " /Parent 2 0 R\n");
fprintf(f, " /MediaBox [0 0 %.4g %.4g]\n", page->width, page->height);
fprintf(f, " /Contents %d 0 R\n", content_obj);
// Resources with fonts
if (doc->font_count > 0) {
fprintf(f, " /Resources <<\n");
fprintf(f, " /Font <<\n");
for (size_t j = 0; j < doc->font_count; j++) {
fprintf(f, " /F%zu %d 0 R\n", j + 1, font_base + (int)j);
}
fprintf(f, " >>\n");
fprintf(f, " >>\n");
}
fprintf(f, ">>\nendobj\n");
}
// Font objects
for (size_t i = 0; i < doc->font_count; i++) {
offsets[font_base + i] = ftell(f);
fprintf(f, "%d 0 obj\n", font_base + (int)i);
fprintf(f, "<< /Type /Font\n");
fprintf(f, " /Subtype /Type1\n");
fprintf(f, " /BaseFont /%s >>\n", doc->fonts[i]);
fprintf(f, "endobj\n");
}
// xref table
long xref_offset = ftell(f);
fprintf(f, "xref\n0 %d\n", total_objects + 1);
fprintf(f, "0000000000 65535 f \n");
for (int i = 1; i <= total_objects; i++) {
fprintf(f, "%010ld 00000 n \n", offsets[i]);
}
// Trailer
fprintf(f, "trailer\n<< /Size %d /Root 1 0 R >>\n", total_objects + 1);
fprintf(f, "startxref\n%ld\n%%%%EOF\n", xref_offset);
free(offsets);
fclose(f);
}
Phase 7: Main Conversion Function (Week 4)
int convert_ps_to_pdf(const char* ps_file, const char* pdf_file) {
// Read PostScript file
FILE* f = fopen(ps_file, "r");
if (!f) {
fprintf(stderr, "Cannot open %s\n", ps_file);
return 1;
}
fseek(f, 0, SEEK_END);
size_t size = ftell(f);
fseek(f, 0, SEEK_SET);
char* ps_content = malloc(size + 1);
fread(ps_content, 1, size, f);
ps_content[size] = '\0';
fclose(f);
// Initialize interpreter
Interpreter* interp = interpreter_new();
// Initialize document
RecordedDocument* doc = calloc(1, sizeof(RecordedDocument));
doc->pages = calloc(1, sizeof(RecordedPage*));
doc->pages[0] = new_page(612, 792); // US Letter
doc->page_count = 1;
interp->doc = doc;
interp->current_page = doc->pages[0];
// Execute PostScript
execute_postscript(interp, ps_content);
// Generate PDF
generate_pdf_from_document(doc, pdf_file);
// Cleanup
free(ps_content);
// ... free document and interpreter
printf("Converted %s to %s\n", ps_file, pdf_file);
return 0;
}
int main(int argc, char** argv) {
if (argc < 3) {
fprintf(stderr, "Usage: %s input.ps output.pdf\n", argv[0]);
return 1;
}
return convert_ps_to_pdf(argv[1], argv[2]);
}
Testing Strategy
Test Cases
- Empty page:
%!PS-Adobe-3.0 showpage - Simple line:
%!PS-Adobe-3.0 100 100 moveto 200 200 lineto stroke showpage - Filled shape:
%!PS-Adobe-3.0 newpath 100 100 moveto 300 100 lineto 300 300 lineto 100 300 lineto closepath 0.5 setgray fill showpage - Transformations:
%!PS-Adobe-3.0 gsave 200 200 translate 45 rotate 0 0 moveto 100 0 lineto stroke grestore showpage - Multiple pages:
%!PS-Adobe-3.0 100 100 moveto (Page 1) show showpage 200 200 moveto (Page 2) show showpage
Validation
# Compare with Ghostscript
gs -sDEVICE=pdfwrite -o reference.pdf test.ps
./ps2pdf test.ps output.pdf
# Validate structure
qpdf --check output.pdf
# Compare text extraction
diff <(pdftotext reference.pdf -) <(pdftotext output.pdf -)
# Visual comparison
gs -sDEVICE=png16m -r150 -o ref%d.png reference.pdf
gs -sDEVICE=png16m -r150 -o out%d.png output.pdf
compare ref1.png out1.png diff.png
Common Pitfalls
1. xref Offset Precision
xref offsets must be exactly 10 digits, padded with zeros:
// WRONG:
fprintf(f, "%ld 00000 n \n", offset); // Might be 7 digits
// CORRECT:
fprintf(f, "%010ld 00000 n \n", offset); // Always 10 digits
2. Stream Length Accuracy
The /Length value must exactly match the stream content:
size_t content_len;
char* content = generate_content_stream(page, &content_len);
fprintf(f, "<< /Length %zu >>\n", content_len); // Must match exactly
fprintf(f, "stream\n");
fwrite(content, 1, content_len, f); // Write exactly content_len bytes
fprintf(f, "endstream\n");
3. Color Separation
PDF separates fill and stroke colors. When converting from PostScript (one color):
// PS: setrgbcolor sets both
// PDF: Need to set both rg (fill) and RG (stroke)
len += sprintf(buffer + len, "%.4g %.4g %.4g rg\n", r, g, b);
len += sprintf(buffer + len, "%.4g %.4g %.4g RG\n", r, g, b);
4. Text Mode
PDF requires BT/ET around text operations:
// WRONG:
fprintf(f, "(Hello) Tj\n");
// CORRECT:
fprintf(f, "BT\n");
fprintf(f, "/F1 12 Tf\n");
fprintf(f, "100 700 Td\n");
fprintf(f, "(Hello) Tj\n");
fprintf(f, "ET\n");
Extensions
Level 1: Stream Compression
Add Flate compression for content streams:
#include <zlib.h>
char* compress_stream(const char* data, size_t len, size_t* out_len) {
uLongf compressed_len = compressBound(len);
char* compressed = malloc(compressed_len);
compress((Bytef*)compressed, &compressed_len, (const Bytef*)data, len);
*out_len = compressed_len;
return compressed;
}
// When writing:
fprintf(f, "<< /Length %zu /Filter /FlateDecode >>\n", compressed_len);
Level 2: Font Embedding
Embed fonts instead of referencing by name.
Level 3: Image Support
Handle PostScript image operator and convert to PDF XObjects.
Level 4: Complete PostScript Support
Add loops, conditionals, procedures, dictionaries.
Self-Assessment
Before considering this project complete:
- Can convert a PostScript file with lines and fills to PDF
- Generated PDF opens correctly in multiple readers
- xref table has correct byte offsets
- Content stream syntax matches PDF specification
- Multiple pages are handled correctly
- Output matches Ghostscriptโs output for simple inputs
- Colors and transformations work correctly
Resources
Essential Reading
- Developing with PDF by Leonard Rosenthol - PDF generation
- PostScript Language Reference Manual - PS operators
- Engineering a Compiler by Cooper & Torczon - Code generation patterns
Tools
- Ghostscript: Reference implementation (
gs -sDEVICE=pdfwrite) - qpdf: PDF validation
- pdfinfo/pdftotext: PDF analysis