Project 16: XP Module - Extended Precision Unsigned Arithmetic

Build an extended precision arithmetic library for unsigned integers of arbitrary length, enabling computations far beyond native hardware limits.

Quick Reference

Attribute	Value
Difficulty	Level 4 - Expert
Time Estimate	2-3 weeks
Language	C
Prerequisites	Mem module, Assert module, understanding of binary arithmetic, bitwise operations
Key Topics	Arbitrary precision math, carry propagation, grade-school algorithms, cryptography foundations

1. Learning Objectives

What You Will Build

A complete XP (Extended Precision) module that represents and manipulates unsigned integers of arbitrary length using arrays of machine words. Your implementation will support:

Addition with carry propagation
Subtraction with borrow propagation
Multiplication (grade-school and optionally Karatsuba)
Division with remainder
Comparison operations
Conversion to/from strings and native integers
Bit shifting operations

Why It Teaches The Topic

Modern cryptography operates on numbers with hundreds or thousands of digits. A 2048-bit RSA key is a 617-digit decimal number. Your CPU’s largest native integer (typically 64 bits) maxes out at about 20 decimal digits. The XP module teaches you how to bridge this gap by representing large numbers as arrays of smaller units, implementing arithmetic algorithms that handle carry and borrow propagation across word boundaries.

This project connects abstract algorithm knowledge to practical implementation, teaching you:

How arbitrary precision libraries like GMP work internally
The mathematical foundations of public-key cryptography
Low-level optimization techniques for arithmetic operations
How to design clean C interfaces for complex mathematical operations

2. Theoretical Foundation

2.1 Core Concepts

Why Extended Precision Matters

Hardware Limit (64-bit):    18,446,744,073,709,551,615
2048-bit RSA modulus:       A number with ~617 decimal digits
1000!:                      A number with 2,568 decimal digits

Modern cryptography, scientific computing, and financial applications routinely work with numbers that exceed hardware capabilities. Extended precision arithmetic provides the foundation for:

RSA, Diffie-Hellman, and elliptic curve cryptography
Exact computation in computer algebra systems
High-precision scientific simulations
Cryptocurrency and blockchain implementations

Representing Large Numbers

The fundamental insight is storing numbers as arrays of smaller units (typically 32-bit or 64-bit words), similar to how we write decimal numbers as sequences of digits:

Extended Precision Number Representation
────────────────────────────────────────────────────────────────────────────────

Traditional decimal: 1,234,567,890
                     ^     ^     ^
                     |     |     |
                  Digit  Digit  Digit
                  positions

Extended precision (base 2^32):

    Number: 0x123456789ABCDEF0123456789ABCDEF0

    +-----------+-----------+-----------+-----------+
    | Word[3]   | Word[2]   | Word[1]   | Word[0]   |
    | 0x12345678| 0x9ABCDEF0| 0x12345678| 0x9ABCDEF0|
    +-----------+-----------+-----------+-----------+
         ^           ^           ^           ^
         |           |           |           |
    Most Significant          Least Significant

    Value = Word[3] * (2^32)^3 + Word[2] * (2^32)^2 +
            Word[1] * (2^32)^1 + Word[0] * (2^32)^0

Addition with Carry Propagation

Adding two XP numbers requires handling carries that ripple from least significant to most significant words:

XP Addition with Carry
────────────────────────────────────────────────────────────────────────────────

    a:    [0xFFFFFFFF] [0x00000001] [0x00000000]
  + b:    [0x00000001] [0xFFFFFFFF] [0x00000001]
    -------------------------------------------

Step 1: Add Word[0]s
    0x00000000 + 0x00000001 = 0x00000001, carry = 0

Step 2: Add Word[1]s + carry
    0x00000001 + 0xFFFFFFFF + 0 = 0x100000000
    Result = 0x00000000, carry = 1  (overflow!)

Step 3: Add Word[2]s + carry
    0xFFFFFFFF + 0x00000001 + 1 = 0x100000001
    Result = 0x00000001, carry = 1

Final:  [0x00000001] [0x00000000] [0x00000001] + overflow bit

Subtraction with Borrow

Subtraction mirrors addition but propagates borrows instead of carries:

XP Subtraction (Pseudocode)
────────────────────────────────────────────────────────────────────────────────

function xp_subtract(a[], b[], result[], n):
    borrow = 0
    for i = 0 to n-1:
        diff = a[i] - b[i] - borrow
        if diff underflows (negative):
            diff = diff + BASE  // BASE = 2^32
            borrow = 1
        else:
            borrow = 0
        result[i] = diff
    return borrow  // Non-zero means a < b

Multiplication Algorithms

The simplest approach is grade-school multiplication, where each word of one operand multiplies every word of the other:

Grade-School Multiplication
────────────────────────────────────────────────────────────────────────────────

Multiplying 3-word numbers:

         a[2]     a[1]     a[0]
    x    b[2]     b[1]     b[0]
    --------------------------------
         a*b[0]   (partial product 0)
    + a*b[1]      (partial product 1, shifted)
    + a*b[2]      (partial product 2, shifted twice)
    --------------------------------

Each a*b[i] produces a 4-word result (3 words * 1 word = 4 words max)

Time Complexity: O(n^2) multiplications

For very large numbers, the Karatsuba algorithm reduces multiplications at the cost of more additions:

Karatsuba Insight
────────────────────────────────────────────────────────────────────────────────

Traditional: (a1*B + a0) * (b1*B + b0) requires 4 multiplications:
    a1*b1, a1*b0, a0*b1, a0*b0

Karatsuba trick:
    z2 = a1 * b1
    z0 = a0 * b0
    z1 = (a1 + a0) * (b1 + b0) - z2 - z0

Result = z2*B^2 + z1*B + z0

Only 3 multiplications!
Time Complexity: O(n^1.585)

Division Algorithm

Division is the most complex operation, typically implemented as long division with estimation:

XP Division (Simplified Pseudocode)
────────────────────────────────────────────────────────────────────────────────

function xp_divide(dividend[], divisor[], quotient[], remainder[]):
    // Normalize: shift divisor so high bit is set
    shift = count_leading_zeros(divisor[high])
    dividend = shift_left(dividend, shift)
    divisor = shift_left(divisor, shift)

    for i = high_word down to 0:
        // Estimate quotient digit
        q_estimate = (dividend[i+1] * BASE + dividend[i]) / divisor[high]

        // Refine estimate (may be off by 1 or 2)
        while q_estimate * divisor > remaining_dividend:
            q_estimate -= 1

        quotient[i] = q_estimate
        dividend -= q_estimate * divisor * BASE^i

    remainder = shift_right(dividend, shift)  // Unnormalize

2.2 Why This Matters

The XP module is the foundation for the AP (Arbitrary Precision signed) and MP (Multiple Precision modular) modules. Understanding how to implement basic arithmetic operations on arbitrary-length integers gives you insight into:

How high-level languages like Python implement their unlimited-precision integers
How cryptographic libraries like OpenSSL and GMP work internally
Performance considerations in numerical computing
The relationship between algorithm complexity and practical performance

2.3 Historical Context

The Need for Extended Precision

Extended precision arithmetic has a long history:

1960s: Early work on arbitrary precision by Knuth and others
1970s: First arbitrary precision libraries for computer algebra systems
1991: GMP (GNU Multiple Precision) library released
1996: Hanson’s CII book formalizes the XP/AP/MP pattern
Today: Every major cryptographic implementation relies on these techniques

Real-World Applications

Where XP Arithmetic Is Used Today
────────────────────────────────────────────────────────────────────────────────

Cryptography:
    RSA Key Generation: Generate two 1024-bit primes, multiply to get 2048-bit modulus
    Diffie-Hellman: Exponentiate in large prime fields
    Elliptic Curves: Operations on curves over large finite fields

Scientific Computing:
    Mathematica/Maple/Sage: Exact symbolic computation
    High-precision physics simulations
    Numerical stability analysis

Financial Systems:
    Cryptocurrency: Bitcoin uses 256-bit arithmetic
    High-frequency trading: Avoid floating-point errors

Verification:
    Compiler correctness proofs
    Formal verification of hardware designs

2.4 Common Misconceptions

Misconception 1: “Extended precision is just like regular arithmetic”

Reality: Carry and borrow propagation, memory management, and overflow detection add significant complexity. A single addition may touch every word in the number.

Misconception 2: “Grade-school multiplication is always good enough”

Reality: For very large numbers (thousands of digits), algorithms like Karatsuba, Toom-Cook, or FFT-based multiplication provide significant speedups.

Misconception 3: “Division is similar in complexity to multiplication”

Reality: Division is significantly harder. It requires normalization, quotient estimation, and correction steps that make it 3-5x slower than multiplication.

Misconception 4: “Just use bigger integers”

Reality: Hardware integers have fixed sizes. When you need more, you must implement it yourself (or use a library).

3. Project Specification

3.1 What You Will Build

A complete XP module providing:

Type definition: XP_T as an array of unsigned machine words
Basic arithmetic: Addition, subtraction, multiplication, division
Comparison: Less than, equal, greater than
Conversion: To/from strings, to/from native integers
Bit operations: Shift left, shift right, bit counting
Utility functions: Copy, zero, normalization

3.2 Functional Requirements

Core Arithmetic Operations:

Operation	Function Signature	Description
Add	`int XP_add(int n, XP_T z, XP_T x, XP_T y, int carry)`	z = x + y + carry, returns final carry
Subtract	`int XP_sub(int n, XP_T z, XP_T x, XP_T y, int borrow)`	z = x - y - borrow, returns final borrow
Multiply	`void XP_mul(XP_T z, int n, XP_T x, int m, XP_T y)`	z = x * y (z must have n+m words)
Divide	`int XP_div(int n, XP_T q, XP_T x, int m, XP_T y, XP_T r, XP_T tmp)`	q = x / y, r = x % y

Comparison Operations:

Operation	Function Signature	Description
Compare	`int XP_cmp(int n, XP_T x, XP_T y)`	Returns -1, 0, or 1

Conversion Operations:

Operation	Function Signature	Description
From unsigned long	`void XP_fromint(int n, XP_T z, unsigned long u)`	z = u
To unsigned long	`unsigned long XP_toint(int n, XP_T x)`	Returns low-order bits
From string	`int XP_fromstr(int n, XP_T z, const char str, int base, char *end)`	Parse string
To string	`char XP_tostr(char str, int size, int base, int n, XP_T x)`	Convert to string

Bit Operations:

Operation	Function Signature	Description
Shift left	`void XP_lshift(int n, XP_T z, int m, XP_T x, int s, int fill)`	z = x « s
Shift right	`void XP_rshift(int n, XP_T z, int m, XP_T x, int s, int fill)`	z = x » s
Count bits	`int XP_length(int n, XP_T x)`	Number of significant bits

3.3 Non-Functional Requirements

Portability: Must work on both 32-bit and 64-bit systems
Memory Safety: No buffer overflows, proper bounds checking
Performance: O(n) for addition/subtraction, O(n^2) for multiplication (minimum)
Testability: Clear preconditions that can be verified with assertions
Documentation: Clear API documentation in header file

3.4 Example Usage / Output

#include "xp.h"
#include <stdio.h>
#include <string.h>

int main(void) {
    // Calculate 2^256 - 1 (maximum value for 256-bit number)
    unsigned char a[32], b[32], c[32];
    char str[100];

    // Set a = 2^256 - 1 (all 1s)
    memset(a, 0xFF, 32);

    // Set b = 1
    memset(b, 0, 32);
    b[0] = 1;

    // c = a + b (should overflow to 0 with carry)
    int carry = XP_add(32, c, a, b, 0);

    printf("2^256 - 1 + 1:\n");
    printf("Carry: %d\n", carry);  // Should print 1
    printf("Result: %s\n", XP_tostr(str, sizeof(str), 16, 32, c));
    // Should print all zeros

    // Calculate 1000!
    unsigned char fact[400];  // Need about 400 bytes for 1000!
    memset(fact, 0, sizeof(fact));
    fact[0] = 1;

    for (int i = 2; i <= 1000; i++) {
        // Multiply fact by i
        unsigned char temp[400];
        unsigned char ival[4];
        XP_fromint(4, ival, i);
        XP_mul(temp, 400, fact, 4, ival);
        memcpy(fact, temp, 400);
    }

    printf("\n1000! has %d bits\n", XP_length(400, fact));
    // Should print about 8530 bits

    return 0;
}

Expected Output:

2^256 - 1 + 1:
Carry: 1
Result: 0

1000! has 8530 bits

4. Solution Architecture

4.1 High-Level Design

XP Module Architecture
────────────────────────────────────────────────────────────────────────────────

                    +------------------------+
                    |      Client Code       |
                    +------------------------+
                              |
                              v
    +----------------------------------------------------------+
    |                    xp.h (Interface)                       |
    |                                                          |
    |  typedef unsigned char *XP_T;  // Byte array             |
    |                                                          |
    |  Arithmetic:  XP_add, XP_sub, XP_mul, XP_div             |
    |  Comparison:  XP_cmp                                      |
    |  Conversion:  XP_fromint, XP_toint, XP_fromstr, XP_tostr |
    |  Bit ops:     XP_lshift, XP_rshift, XP_length            |
    +----------------------------------------------------------+
                              |
                              v
    +----------------------------------------------------------+
    |                    xp.c (Implementation)                  |
    |                                                          |
    |  Internal helpers:                                        |
    |  - normalize(): Remove leading zeros                      |
    |  - single_mul(): Multiply by single word                  |
    |  - single_div(): Divide by single word                    |
    |                                                          |
    |  Uses: assert.h, mem.h                                   |
    +----------------------------------------------------------+

4.2 Key Components

Header File (xp.h):

#ifndef XP_INCLUDED
#define XP_INCLUDED

/*
 * XP: Extended Precision Unsigned Arithmetic
 *
 * Provides operations on unsigned integers of arbitrary length,
 * represented as arrays of bytes in little-endian order.
 *
 * All numbers are stored least-significant byte first.
 * Length n is always in bytes.
 */

typedef unsigned char *XP_T;

/* Arithmetic operations - return carry/borrow */
extern int  XP_add(int n, XP_T z, XP_T x, XP_T y, int carry);
extern int  XP_sub(int n, XP_T z, XP_T x, XP_T y, int borrow);
extern void XP_mul(XP_T z, int n, XP_T x, int m, XP_T y);
extern int  XP_div(int n, XP_T q, XP_T x, int m, XP_T y,
                   XP_T r, XP_T tmp);

/* Comparison - returns -1, 0, or 1 */
extern int  XP_cmp(int n, XP_T x, XP_T y);

/* Conversion */
extern void XP_fromint(int n, XP_T z, unsigned long u);
extern unsigned long XP_toint(int n, XP_T x);
extern int  XP_fromstr(int n, XP_T z, const char *str,
                       int base, char **end);
extern char *XP_tostr(char *str, int size, int base, int n, XP_T x);

/* Bit operations */
extern void XP_lshift(int n, XP_T z, int m, XP_T x, int s, int fill);
extern void XP_rshift(int n, XP_T z, int m, XP_T x, int s, int fill);
extern int  XP_length(int n, XP_T x);

/* Utility */
extern int  XP_sum(int n, XP_T z, XP_T x, int y);
extern int  XP_diff(int n, XP_T z, XP_T x, int y);
extern int  XP_product(int n, XP_T z, XP_T x, int y);
extern int  XP_quotient(int n, XP_T z, XP_T x, int y);
extern int  XP_neg(int n, XP_T z, XP_T x, int carry);

#endif

Implementation File Structure (xp.c):

#include "xp.h"
#include "assert.h"
#include <string.h>

/* Internal constants */
#define BASE 256  /* 2^8 for byte-based arithmetic */

/* Addition with carry propagation */
int XP_add(int n, XP_T z, XP_T x, XP_T y, int carry) {
    int i;
    for (i = 0; i < n; i++) {
        carry += x[i] + y[i];
        z[i] = carry % BASE;
        carry /= BASE;
    }
    return carry;
}

/* ... other functions ... */

4.3 Data Structures

Number Representation:

XP Number Layout (Little-Endian Byte Array)
────────────────────────────────────────────────────────────────────────────────

Example: The number 0x123456789ABCDEF0

In memory (little-endian):
    Index:  [0]   [1]   [2]   [3]   [4]   [5]   [6]   [7]
    Value:  0xF0  0xDE  0xBC  0x9A  0x78  0x56  0x34  0x12
            ^                                         ^
            |                                         |
           LSB                                       MSB

Why little-endian?
    - Natural alignment with x86/ARM memory order
    - Addition proceeds from index 0 upward (natural loop)
    - Easy to extend by appending bytes
    - Carry propagates in increasing index order

4.4 Algorithm Overview

Addition Algorithm:

XP Addition (carry propagation)
────────────────────────────────────────────────────────────────────────────────

For each byte position i from 0 to n-1:
    1. sum = x[i] + y[i] + carry_in
    2. z[i] = sum mod 256 (low byte)
    3. carry_out = sum / 256 (0 or 1)
    4. carry_in = carry_out for next iteration

Return final carry (0 or 1)

Example: 0xFFFF + 0x0001
    Position 0: 0xFF + 0x01 + 0 = 0x100 -> z[0]=0x00, carry=1
    Position 1: 0xFF + 0x00 + 1 = 0x100 -> z[1]=0x00, carry=1
    Return: carry=1

    Result: 0x10000 (as 3 bytes: [0x00, 0x00, 0x01])

Multiplication Algorithm (Grade-School):

XP Multiplication
────────────────────────────────────────────────────────────────────────────────

z[0..n+m-1] = x[0..n-1] * y[0..m-1]

for i = 0 to n-1:
    carry = 0
    for j = 0 to m-1:
        product = x[i] * y[j] + z[i+j] + carry
        z[i+j] = product mod 256
        carry = product / 256
    z[i+m] = carry

Note: z must be zeroed before multiplication
      z must have at least n+m bytes

Division Algorithm (Knuth’s Algorithm D):

XP Division (Simplified)
────────────────────────────────────────────────────────────────────────────────

Given: dividend x, divisor y
Find:  quotient q, remainder r such that x = q*y + r

1. Handle special cases (y=0 error, x<y gives q=0,r=x)

2. Normalize: shift both x and y left until y's high bit is set
   This makes quotient digit estimation more accurate

3. For each quotient digit position (high to low):
   a. Estimate q_digit using 2-byte / 1-byte division
   b. Multiply q_digit * y
   c. If product > remaining dividend, decrease q_digit
   d. Subtract product from dividend

4. Denormalize remainder (shift right by normalization amount)

Time: O(n*m) where n = dividend length, m = divisor length

5. Implementation Guide

5.1 Development Environment Setup

Required Tools
────────────────────────────────────────────────────────────────────────────────

Compiler:
    - GCC 7+ or Clang 6+
    - C11 standard support (-std=c11)

Testing:
    - Valgrind for memory leak detection
    - AddressSanitizer (-fsanitize=address)

Build:
    - GNU Make

Recommended flags:
    CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g -O2
    CFLAGS += -fsanitize=address -fsanitize=undefined  # for testing

5.2 Project Structure

xp/
├── Makefile
├── include/
│   └── xp.h
├── src/
│   └── xp.c
├── test/
│   ├── test_xp_basic.c      # Basic operations tests
│   ├── test_xp_mul.c        # Multiplication tests
│   ├── test_xp_div.c        # Division tests
│   ├── test_xp_convert.c    # Conversion tests
│   └── test_factorial.c     # Integration test: compute n!
└── examples/
    └── factorial.c          # Example: compute large factorials

5.3 The Core Question You’re Answering

“How do you perform arithmetic on numbers larger than your hardware can natively represent?”

The fundamental insight is:

Represent large numbers as arrays of smaller units
Implement arithmetic operations by processing these units sequentially
Handle carry/borrow propagation between adjacent units
Maintain proper boundary conditions and edge cases

This is the same principle that lets us add multi-digit decimal numbers by hand, scaled to computer word sizes.

5.4 Concepts You Must Understand First

Before implementing XP, ensure you can answer these questions:

What happens when you add two 8-bit numbers that sum to more than 255?
- The result overflows; you get the low 8 bits plus a carry
- In C: (uint8_t)(200 + 100) gives 44, carry = 1
How do you detect overflow in unsigned addition?
- If a + b < a (or < b), overflow occurred
- Or: use a larger type for the intermediate result
What does little-endian mean for multi-byte integers?
- Least significant byte at lowest address
- Natural for x86/ARM; carry propagates to higher addresses
Why is multiplication harder than addition?
- Each digit of one operand interacts with each digit of the other
- O(n^2) vs O(n) complexity
How does long division work with multi-digit numbers?
- Estimate quotient digit, multiply, subtract, repeat
- May need to adjust estimate if it was too high

5.5 Questions to Guide Your Design

Representation Questions:

Should you use bytes (8-bit), 16-bit, or 32-bit words as your base unit?
- Bytes are simplest but slowest
- 32-bit words balance simplicity and speed
- Consider your target platform
Little-endian or big-endian storage?
- Little-endian: easier carry propagation, matches x86/ARM
- Big-endian: easier string conversion, matches mathematical notation
How do you handle numbers of different lengths?
- Require equal lengths (simpler)
- Allow unequal, treat missing high-order words as zero

Algorithm Questions:

How do you handle the carry from addition?
- Return it as function result (Hanson’s approach)
- Store in extra word of result
- Raise exception on overflow
For multiplication, how do you accumulate partial products?
- Accumulate in-place in result array
- Use separate temporary storage
- Consider word size for intermediate results
For division, how accurate must your quotient estimate be?
- Can be off by at most 2 with proper normalization
- Must be able to correct by iterating

5.6 Thinking Exercise

Before writing code, work through these by hand:

Exercise 1: Binary Addition

Add these 3-byte numbers (in hex):

  0x12 0x34 0x56  (little-endian: actual value 0x563412)
+ 0xFE 0xDC 0xBA  (little-endian: actual value 0xBADCFE)

Work out each byte position, tracking carries.

Exercise 2: Single-Digit Multiplication

Multiply 0x12 0x34 (2 bytes) by 0x56 (1 byte):

Position 0: 0x12 * 0x56 = ?
Position 1: 0x34 * 0x56 + carry = ?

What's the 3-byte result?

Exercise 3: Quotient Estimation

Divide 0x01 0x00 0x00 by 0x80 0x00 (both little-endian):

First estimate: what's the quotient of the top 2 bytes by the top 1 byte?
Is this estimate correct? Too high? Too low?

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual):

Begin with addition and subtraction. Use a simple array of unsigned char values. The carry/borrow is just a single bit you track across iterations. Write out the loop structure before worrying about edge cases.

Hint 2 - Next Level (More Specific):

For multiplication, the product of two n-word numbers is at most 2n words. Allocate appropriately. Zero the result array first, then accumulate partial products by adding to existing values (not replacing).

Hint 3 - Technical Details (Approach):

When multiplying two 8-bit numbers, the result can be 16 bits. Use unsigned int for intermediate results:

unsigned int product = (unsigned int)x[i] * y[j];
unsigned int sum = product + z[i+j] + carry;
z[i+j] = sum & 0xFF;
carry = sum >> 8;

Hint 4 - Verification (Testing Strategy):

Test with numbers you can verify manually:

0xFFFFFFFF + 1 should give 0x100000000 (5 bytes)
0xFFFF * 0xFFFF should give 0xFFFE0001 (4 bytes)
Use Python’s unlimited integers to generate test cases

5.8 The Interview Questions They’ll Ask

Q1: Explain how you’d add two 256-bit numbers using 64-bit registers.

Strong Answer: “I’d represent each 256-bit number as an array of four 64-bit words in little-endian order. Addition proceeds from the least significant word to the most significant. For each position, I add the two corresponding words plus any carry from the previous position. I can detect carry by checking if the result is less than either operand (unsigned overflow). The final carry indicates if the sum needs an extra word. Time complexity is O(n) where n is the number of words.”

Q2: What’s the time complexity of grade-school multiplication? Can you do better?

Strong Answer: “Grade-school multiplication is O(n^2) where n is the number of digits. Each digit of one number multiplies each digit of the other. We can do better with Karatsuba (O(n^1.585)), Toom-Cook (O(n^1.465)), or FFT-based methods (O(n log n)). However, the crossover point where Karatsuba beats grade-school is typically around 30-80 words due to the overhead of recursive calls and extra additions. For most practical sizes below a few thousand bits, optimized grade-school is often competitive.”

Q3: How would you implement comparison of two XP numbers?

Strong Answer: “Compare from the most significant word down. If the numbers have different lengths, the longer one is larger (assuming normalized representation without leading zeros). For equal lengths, find the first position where they differ and compare those words. If all words are equal, the numbers are equal. This is O(n) in the worst case but often terminates early.”

Q4: Why is division so much harder than multiplication?

Strong Answer: “Division requires estimating each quotient digit, which involves dividing two words of the dividend by one word of the divisor. This estimate can be off by up to 2, requiring correction. You must also normalize (shift) the divisor so its high bit is set, which improves estimate accuracy. After finding each quotient digit, you multiply and subtract, potentially requiring borrow propagation. Division is typically 3-5x slower than multiplication due to the estimation, correction, and normalization overhead.”

Q5: How do you convert an XP number to a decimal string?

Strong Answer: “Repeatedly divide by 10 (or a power of 10 for efficiency) and collect the remainders. Each remainder is a decimal digit (or group of digits). The digits come out in reverse order, so store them and reverse at the end, or start from the end of the output buffer. For large numbers, dividing by 10^9 gives 9 digits at once, reducing the number of expensive multi-precision divisions. Time complexity is O(n^2) because each division is O(n) and we need O(n) digits.”

5.9 Books That Will Help

Topic	Book	Chapter/Section
XP Module Design	“C Interfaces and Implementations” by Hanson	Chapter 17
Arithmetic Algorithms	“The Art of Computer Programming” Vol. 2 by Knuth	Chapter 4 (Arithmetic)
Cryptographic Context	“Applied Cryptography” by Schneier	Chapter 11 (Mathematical Background)
Modern Optimization	“Hacker’s Delight” by Warren	Chapters 8-10
GMP Algorithms	“Modern Computer Arithmetic” by Brent & Zimmermann	Chapters 1-2 (free online)

5.10 Implementation Phases

Phase 1: Basic Infrastructure (Days 1-2)

Set up project structure with header and implementation files
Implement XP_add and XP_sub with carry/borrow
Implement XP_cmp for comparison
Write comprehensive tests for basic operations
Verify with simple cases: 0+0, 0xFF+1, 0x100-1

Phase 2: Conversion (Days 3-4)

Implement XP_fromint and XP_toint
Implement XP_tostr for decimal/hex output
Implement XP_fromstr for parsing strings
Test round-trip: number -> string -> number

Phase 3: Multiplication (Days 5-7)

Implement grade-school XP_mul
Handle the product size (n+m words)
Test with known products
Verify commutativity: xy == yx

Phase 4: Division (Days 8-12)

Start with single-word divisor (simpler)
Implement normalization (left shift until high bit set)
Implement quotient estimation
Implement full multi-word division
Test: verify x == q*y + r for all test cases

Phase 5: Bit Operations & Polish (Days 13-14)

Implement XP_lshift and XP_rshift
Implement XP_length (bit count)
Add edge case handling throughout
Run under Valgrind, fix any memory issues
Document the API

5.11 Key Implementation Decisions

Decision 1: Base Unit Size

8-bit (bytes): Simplest, most portable, but slowest
32-bit words: Good balance, uses hardware multiply efficiently
64-bit words: Fastest on 64-bit systems, but need 128-bit intermediate for multiply

Recommendation: Start with bytes for simplicity, optimize later if needed.

Decision 2: Memory Ownership

Caller allocates all memory (Hanson’s approach)
Library manages memory internally

Recommendation: Caller-managed is simpler and avoids hidden allocations.

Decision 3: Error Handling

Assertions for precondition violations (division by zero, invalid inputs)
Return codes for overflow conditions (add returns carry)
No silent failures

6. Testing Strategy

Unit Tests

Test Categories:

// test_xp_add.c

// Basic functionality
void test_add_zero() {
    unsigned char a[4] = {0x12, 0x34, 0x56, 0x78};
    unsigned char b[4] = {0, 0, 0, 0};
    unsigned char c[4];
    int carry = XP_add(4, c, a, b, 0);
    assert(carry == 0);
    assert(memcmp(a, c, 4) == 0);
}

// Carry propagation
void test_add_carry_chain() {
    unsigned char a[4] = {0xFF, 0xFF, 0xFF, 0xFF};
    unsigned char b[4] = {0x01, 0, 0, 0};
    unsigned char c[4];
    int carry = XP_add(4, c, a, b, 0);
    assert(carry == 1);
    assert(c[0] == 0 && c[1] == 0 && c[2] == 0 && c[3] == 0);
}

// Commutativity
void test_add_commutative() {
    unsigned char a[4], b[4], c1[4], c2[4];
    // Fill with random values
    XP_add(4, c1, a, b, 0);
    XP_add(4, c2, b, a, 0);
    assert(memcmp(c1, c2, 4) == 0);
}

Multiplication Tests:

// test_xp_mul.c

void test_mul_by_zero() {
    unsigned char a[4] = {0x12, 0x34, 0x56, 0x78};
    unsigned char b[4] = {0, 0, 0, 0};
    unsigned char c[8] = {1, 1, 1, 1, 1, 1, 1, 1};  // Non-zero to verify clearing
    XP_mul(c, 4, a, 4, b);
    // c should be all zeros
    for (int i = 0; i < 8; i++) assert(c[i] == 0);
}

void test_mul_known_values() {
    // 0xFFFF * 0xFFFF = 0xFFFE0001
    unsigned char a[2] = {0xFF, 0xFF};
    unsigned char b[2] = {0xFF, 0xFF};
    unsigned char c[4];
    XP_mul(c, 2, a, 2, b);
    assert(c[0] == 0x01);
    assert(c[1] == 0x00);
    assert(c[2] == 0xFE);
    assert(c[3] == 0xFF);
}

Division Tests:

// test_xp_div.c

void test_div_exact() {
    // 100 / 10 = 10, remainder 0
    unsigned char dividend[2], divisor[1], quotient[2], remainder[1];
    XP_fromint(2, dividend, 100);
    XP_fromint(1, divisor, 10);
    XP_div(2, quotient, dividend, 1, divisor, remainder, NULL);
    assert(XP_toint(2, quotient) == 10);
    assert(XP_toint(1, remainder) == 0);
}

void test_div_with_remainder() {
    // 100 / 7 = 14 remainder 2
    unsigned char dividend[2], divisor[1], quotient[2], remainder[1];
    XP_fromint(2, dividend, 100);
    XP_fromint(1, divisor, 7);
    XP_div(2, quotient, dividend, 1, divisor, remainder, NULL);
    assert(XP_toint(2, quotient) == 14);
    assert(XP_toint(1, remainder) == 2);
}

void test_div_identity() {
    // For all x, y: x == (x/y)*y + (x%y)
    // Generate random values and verify
}

Integration Test: Factorial

// test_factorial.c

void test_factorial_100() {
    // 100! is known to have 158 decimal digits
    unsigned char fact[200];  // Enough space
    memset(fact, 0, sizeof(fact));
    fact[0] = 1;

    for (int i = 2; i <= 100; i++) {
        unsigned char temp[200];
        unsigned char ival[4];
        XP_fromint(4, ival, i);
        XP_mul(temp, 200, fact, 4, ival);
        memcpy(fact, temp, 200);
    }

    char str[500];
    XP_tostr(str, sizeof(str), 10, 200, fact);

    // 100! starts with "93326215443944..." and has 158 digits
    assert(strlen(str) == 158);
    assert(strncmp(str, "9332621544", 10) == 0);
}

7. Common Pitfalls & Debugging

Pitfall 1: Forgetting to Initialize the Product Array

Problem: Multiplication accumulates into the result, so it must be zeroed first.

Symptom: Random values in product, inconsistent results.

Fix:

void XP_mul(XP_T z, int n, XP_T x, int m, XP_T y) {
    memset(z, 0, n + m);  // CRITICAL: zero before accumulating
    // ... multiplication loop
}

Pitfall 2: Integer Overflow in Intermediate Calculations

Problem: Multiplying two bytes can produce up to 16 bits; adding carry can add more.

Symptom: Incorrect results, especially with large operands.

Fix:

// BAD: overflow risk
unsigned char product = x[i] * y[j] + z[i+j] + carry;

// GOOD: use larger intermediate type
unsigned int product = (unsigned int)x[i] * y[j] + z[i+j] + carry;
z[i+j] = product & 0xFF;
carry = product >> 8;

Pitfall 3: Off-by-One Errors in Division

Problem: Quotient estimation can be too high by 1 or 2.

Symptom: Division produces wrong results, especially near word boundaries.

Fix: Always verify and correct the estimate:

// After estimating q_digit
while (q_digit * divisor > remaining_dividend) {
    q_digit--;
}

Pitfall 4: Forgetting Endianness in String Conversion

Problem: Printing bytes in wrong order produces reversed number.

Symptom: “0x12345678” prints as “78563412”.

Fix: Account for little-endian storage when building output strings.

Pitfall 5: Not Normalizing the Divisor

Problem: Division without normalization makes quotient estimation very inaccurate.

Symptom: Division fails or requires many correction iterations.

Fix: Shift divisor left until its high bit is set, apply same shift to dividend.

8. Extensions & Challenges

Basic Extensions

GCD (Greatest Common Divisor): Implement Euclidean algorithm using your XP_div
Power: Implement XP_power(base, exponent) using repeated squaring
Primality Test: Implement Miller-Rabin using your XP operations

Advanced Extensions

Karatsuba Multiplication: Implement the recursive O(n^1.585) algorithm
Barrett Reduction: Fast modular reduction without division
Montgomery Multiplication: Foundation for the MP module

Challenge Projects

RSA Key Generation: Generate large primes, compute modular inverse
Big Integer Calculator: Interactive calculator for arbitrary precision math
Factorial Competition: How fast can you compute 100,000!?

9. Real-World Connections

GMP (GNU Multiple Precision Arithmetic Library)

The most widely used arbitrary precision library uses similar techniques:

Little-endian limb (word) storage
Careful handling of carry propagation
Karatsuba, Toom-Cook, and FFT for large multiplications
Sophisticated division algorithms

Your XP module implements the core ideas; GMP adds decades of optimization.

OpenSSL/BoringSSL

Cryptographic libraries implement specialized versions:

Constant-time operations to prevent timing attacks
Montgomery multiplication for modular arithmetic
Optimized assembly for critical paths

Python’s Integer Implementation

Python’s unlimited precision integers work similarly:

Array of 30-bit “digits” (32-bit words with some bits unused)
Grade-school algorithms for small numbers
Karatsuba for larger numbers
Sign-magnitude representation (like the AP module)

10. Resources

Books

Primary: “C Interfaces and Implementations” by David Hanson, Chapter 17
Algorithm Depth: “The Art of Computer Programming” Vol. 2 by Donald Knuth
Modern Techniques: “Modern Computer Arithmetic” by Brent & Zimmermann (free PDF)
Cryptographic Context: “Handbook of Applied Cryptography” by Menezes et al. (free online)

Online Resources

CII Source Code: https://github.com/drh/cii (official implementation)
GMP Documentation: https://gmplib.org/manual/
Python Integer Implementation: https://github.com/python/cpython/blob/main/Objects/longobject.c

Tools

GMP: Compare your implementation’s results against GMP
Python: Use Python’s unlimited integers to generate test cases
Valgrind: Essential for catching memory errors

11. Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

Can explain why extended precision is needed for cryptography
Can describe carry propagation in addition
Can explain the grade-school multiplication algorithm
Can describe why division requires normalization
Understand the trade-offs of different base unit sizes

Implementation

Addition handles carry across all word boundaries
Subtraction handles borrow correctly (returns nonzero if result is negative)
Multiplication produces correct results for all operand sizes
Division produces correct quotient and remainder
Comparison works correctly for numbers of different magnitudes
String conversion handles all bases (at least 2, 10, 16)

Testing

All basic operations pass unit tests
Edge cases handled: zero operands, maximum values, single-byte numbers
Algebraic properties verified: a+b=b+a, ab=ba, (a/b)*b+a%b=a
No memory leaks (verified by Valgrind)
No undefined behavior (verified by sanitizers)

Documentation

Header file documents all public functions
Preconditions and postconditions are clear
Usage examples provided

12. Submission / Completion Criteria

This project is complete when:

All Operations Functional: Addition, subtraction, multiplication, division, comparison, and conversion all work correctly
Tests Pass: Comprehensive test suite with >90% coverage passes
No Memory Issues: Valgrind reports no leaks or errors
Integration Demo: Can compute 1000! and display it in decimal
Documentation Complete: API reference in header, usage examples provided
Code Quality: Clean code following Hanson’s interface/implementation pattern

Verification Command:

# Build and test
make clean all test

# Memory check
valgrind --leak-check=full ./test_xp

# Compute 1000! as integration test
./factorial 1000

Expected Output for 1000!:

1000! has 8530 bits (2568 decimal digits)
1000! = 402387260077093773543702433923003985719374864210714632543799910429938512398629020...