Project 16: XP Module - Extended Precision Unsigned Arithmetic
Build an extended precision arithmetic library for unsigned integers of arbitrary length, enabling computations far beyond native hardware limits.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4 - Expert |
| Time Estimate | 2-3 weeks |
| Language | C |
| Prerequisites | Mem module, Assert module, understanding of binary arithmetic, bitwise operations |
| Key Topics | Arbitrary precision math, carry propagation, grade-school algorithms, cryptography foundations |
1. Learning Objectives
What You Will Build
A complete XP (Extended Precision) module that represents and manipulates unsigned integers of arbitrary length using arrays of machine words. Your implementation will support:
- Addition with carry propagation
- Subtraction with borrow propagation
- Multiplication (grade-school and optionally Karatsuba)
- Division with remainder
- Comparison operations
- Conversion to/from strings and native integers
- Bit shifting operations
Why It Teaches The Topic
Modern cryptography operates on numbers with hundreds or thousands of digits. A 2048-bit RSA key is a 617-digit decimal number. Your CPU’s largest native integer (typically 64 bits) maxes out at about 20 decimal digits. The XP module teaches you how to bridge this gap by representing large numbers as arrays of smaller units, implementing arithmetic algorithms that handle carry and borrow propagation across word boundaries.
This project connects abstract algorithm knowledge to practical implementation, teaching you:
- How arbitrary precision libraries like GMP work internally
- The mathematical foundations of public-key cryptography
- Low-level optimization techniques for arithmetic operations
- How to design clean C interfaces for complex mathematical operations
2. Theoretical Foundation
2.1 Core Concepts
Why Extended Precision Matters
Hardware Limit (64-bit): 18,446,744,073,709,551,615
2048-bit RSA modulus: A number with ~617 decimal digits
1000!: A number with 2,568 decimal digits
Modern cryptography, scientific computing, and financial applications routinely work with numbers that exceed hardware capabilities. Extended precision arithmetic provides the foundation for:
- RSA, Diffie-Hellman, and elliptic curve cryptography
- Exact computation in computer algebra systems
- High-precision scientific simulations
- Cryptocurrency and blockchain implementations
Representing Large Numbers
The fundamental insight is storing numbers as arrays of smaller units (typically 32-bit or 64-bit words), similar to how we write decimal numbers as sequences of digits:
Extended Precision Number Representation
────────────────────────────────────────────────────────────────────────────────
Traditional decimal: 1,234,567,890
^ ^ ^
| | |
Digit Digit Digit
positions
Extended precision (base 2^32):
Number: 0x123456789ABCDEF0123456789ABCDEF0
+-----------+-----------+-----------+-----------+
| Word[3] | Word[2] | Word[1] | Word[0] |
| 0x12345678| 0x9ABCDEF0| 0x12345678| 0x9ABCDEF0|
+-----------+-----------+-----------+-----------+
^ ^ ^ ^
| | | |
Most Significant Least Significant
Value = Word[3] * (2^32)^3 + Word[2] * (2^32)^2 +
Word[1] * (2^32)^1 + Word[0] * (2^32)^0
Addition with Carry Propagation
Adding two XP numbers requires handling carries that ripple from least significant to most significant words:
XP Addition with Carry
────────────────────────────────────────────────────────────────────────────────
a: [0xFFFFFFFF] [0x00000001] [0x00000000]
+ b: [0x00000001] [0xFFFFFFFF] [0x00000001]
-------------------------------------------
Step 1: Add Word[0]s
0x00000000 + 0x00000001 = 0x00000001, carry = 0
Step 2: Add Word[1]s + carry
0x00000001 + 0xFFFFFFFF + 0 = 0x100000000
Result = 0x00000000, carry = 1 (overflow!)
Step 3: Add Word[2]s + carry
0xFFFFFFFF + 0x00000001 + 1 = 0x100000001
Result = 0x00000001, carry = 1
Final: [0x00000001] [0x00000000] [0x00000001] + overflow bit
Subtraction with Borrow
Subtraction mirrors addition but propagates borrows instead of carries:
XP Subtraction (Pseudocode)
────────────────────────────────────────────────────────────────────────────────
function xp_subtract(a[], b[], result[], n):
borrow = 0
for i = 0 to n-1:
diff = a[i] - b[i] - borrow
if diff underflows (negative):
diff = diff + BASE // BASE = 2^32
borrow = 1
else:
borrow = 0
result[i] = diff
return borrow // Non-zero means a < b
Multiplication Algorithms
The simplest approach is grade-school multiplication, where each word of one operand multiplies every word of the other:
Grade-School Multiplication
────────────────────────────────────────────────────────────────────────────────
Multiplying 3-word numbers:
a[2] a[1] a[0]
x b[2] b[1] b[0]
--------------------------------
a*b[0] (partial product 0)
+ a*b[1] (partial product 1, shifted)
+ a*b[2] (partial product 2, shifted twice)
--------------------------------
Each a*b[i] produces a 4-word result (3 words * 1 word = 4 words max)
Time Complexity: O(n^2) multiplications
For very large numbers, the Karatsuba algorithm reduces multiplications at the cost of more additions:
Karatsuba Insight
────────────────────────────────────────────────────────────────────────────────
Traditional: (a1*B + a0) * (b1*B + b0) requires 4 multiplications:
a1*b1, a1*b0, a0*b1, a0*b0
Karatsuba trick:
z2 = a1 * b1
z0 = a0 * b0
z1 = (a1 + a0) * (b1 + b0) - z2 - z0
Result = z2*B^2 + z1*B + z0
Only 3 multiplications!
Time Complexity: O(n^1.585)
Division Algorithm
Division is the most complex operation, typically implemented as long division with estimation:
XP Division (Simplified Pseudocode)
────────────────────────────────────────────────────────────────────────────────
function xp_divide(dividend[], divisor[], quotient[], remainder[]):
// Normalize: shift divisor so high bit is set
shift = count_leading_zeros(divisor[high])
dividend = shift_left(dividend, shift)
divisor = shift_left(divisor, shift)
for i = high_word down to 0:
// Estimate quotient digit
q_estimate = (dividend[i+1] * BASE + dividend[i]) / divisor[high]
// Refine estimate (may be off by 1 or 2)
while q_estimate * divisor > remaining_dividend:
q_estimate -= 1
quotient[i] = q_estimate
dividend -= q_estimate * divisor * BASE^i
remainder = shift_right(dividend, shift) // Unnormalize
2.2 Why This Matters
The XP module is the foundation for the AP (Arbitrary Precision signed) and MP (Multiple Precision modular) modules. Understanding how to implement basic arithmetic operations on arbitrary-length integers gives you insight into:
- How high-level languages like Python implement their unlimited-precision integers
- How cryptographic libraries like OpenSSL and GMP work internally
- Performance considerations in numerical computing
- The relationship between algorithm complexity and practical performance
2.3 Historical Context
The Need for Extended Precision
Extended precision arithmetic has a long history:
- 1960s: Early work on arbitrary precision by Knuth and others
- 1970s: First arbitrary precision libraries for computer algebra systems
- 1991: GMP (GNU Multiple Precision) library released
- 1996: Hanson’s CII book formalizes the XP/AP/MP pattern
- Today: Every major cryptographic implementation relies on these techniques
Real-World Applications
Where XP Arithmetic Is Used Today
────────────────────────────────────────────────────────────────────────────────
Cryptography:
RSA Key Generation: Generate two 1024-bit primes, multiply to get 2048-bit modulus
Diffie-Hellman: Exponentiate in large prime fields
Elliptic Curves: Operations on curves over large finite fields
Scientific Computing:
Mathematica/Maple/Sage: Exact symbolic computation
High-precision physics simulations
Numerical stability analysis
Financial Systems:
Cryptocurrency: Bitcoin uses 256-bit arithmetic
High-frequency trading: Avoid floating-point errors
Verification:
Compiler correctness proofs
Formal verification of hardware designs
2.4 Common Misconceptions
Misconception 1: “Extended precision is just like regular arithmetic”
Reality: Carry and borrow propagation, memory management, and overflow detection add significant complexity. A single addition may touch every word in the number.
Misconception 2: “Grade-school multiplication is always good enough”
Reality: For very large numbers (thousands of digits), algorithms like Karatsuba, Toom-Cook, or FFT-based multiplication provide significant speedups.
Misconception 3: “Division is similar in complexity to multiplication”
Reality: Division is significantly harder. It requires normalization, quotient estimation, and correction steps that make it 3-5x slower than multiplication.
Misconception 4: “Just use bigger integers”
Reality: Hardware integers have fixed sizes. When you need more, you must implement it yourself (or use a library).
3. Project Specification
3.1 What You Will Build
A complete XP module providing:
- Type definition:
XP_Tas an array of unsigned machine words - Basic arithmetic: Addition, subtraction, multiplication, division
- Comparison: Less than, equal, greater than
- Conversion: To/from strings, to/from native integers
- Bit operations: Shift left, shift right, bit counting
- Utility functions: Copy, zero, normalization
3.2 Functional Requirements
Core Arithmetic Operations:
| Operation | Function Signature | Description |
|---|---|---|
| Add | int XP_add(int n, XP_T z, XP_T x, XP_T y, int carry) |
z = x + y + carry, returns final carry |
| Subtract | int XP_sub(int n, XP_T z, XP_T x, XP_T y, int borrow) |
z = x - y - borrow, returns final borrow |
| Multiply | void XP_mul(XP_T z, int n, XP_T x, int m, XP_T y) |
z = x * y (z must have n+m words) |
| Divide | int XP_div(int n, XP_T q, XP_T x, int m, XP_T y, XP_T r, XP_T tmp) |
q = x / y, r = x % y |
Comparison Operations:
| Operation | Function Signature | Description |
|---|---|---|
| Compare | int XP_cmp(int n, XP_T x, XP_T y) |
Returns -1, 0, or 1 |
Conversion Operations:
| Operation | Function Signature | Description |
|---|---|---|
| From unsigned long | void XP_fromint(int n, XP_T z, unsigned long u) |
z = u |
| To unsigned long | unsigned long XP_toint(int n, XP_T x) |
Returns low-order bits |
| From string | int XP_fromstr(int n, XP_T z, const char *str, int base, char **end) |
Parse string |
| To string | char *XP_tostr(char *str, int size, int base, int n, XP_T x) |
Convert to string |
Bit Operations:
| Operation | Function Signature | Description |
|---|---|---|
| Shift left | void XP_lshift(int n, XP_T z, int m, XP_T x, int s, int fill) |
z = x « s |
| Shift right | void XP_rshift(int n, XP_T z, int m, XP_T x, int s, int fill) |
z = x » s |
| Count bits | int XP_length(int n, XP_T x) |
Number of significant bits |
3.3 Non-Functional Requirements
- Portability: Must work on both 32-bit and 64-bit systems
- Memory Safety: No buffer overflows, proper bounds checking
- Performance: O(n) for addition/subtraction, O(n^2) for multiplication (minimum)
- Testability: Clear preconditions that can be verified with assertions
- Documentation: Clear API documentation in header file
3.4 Example Usage / Output
#include "xp.h"
#include <stdio.h>
#include <string.h>
int main(void) {
// Calculate 2^256 - 1 (maximum value for 256-bit number)
unsigned char a[32], b[32], c[32];
char str[100];
// Set a = 2^256 - 1 (all 1s)
memset(a, 0xFF, 32);
// Set b = 1
memset(b, 0, 32);
b[0] = 1;
// c = a + b (should overflow to 0 with carry)
int carry = XP_add(32, c, a, b, 0);
printf("2^256 - 1 + 1:\n");
printf("Carry: %d\n", carry); // Should print 1
printf("Result: %s\n", XP_tostr(str, sizeof(str), 16, 32, c));
// Should print all zeros
// Calculate 1000!
unsigned char fact[400]; // Need about 400 bytes for 1000!
memset(fact, 0, sizeof(fact));
fact[0] = 1;
for (int i = 2; i <= 1000; i++) {
// Multiply fact by i
unsigned char temp[400];
unsigned char ival[4];
XP_fromint(4, ival, i);
XP_mul(temp, 400, fact, 4, ival);
memcpy(fact, temp, 400);
}
printf("\n1000! has %d bits\n", XP_length(400, fact));
// Should print about 8530 bits
return 0;
}
Expected Output:
2^256 - 1 + 1:
Carry: 1
Result: 0
1000! has 8530 bits
4. Solution Architecture
4.1 High-Level Design
XP Module Architecture
────────────────────────────────────────────────────────────────────────────────
+------------------------+
| Client Code |
+------------------------+
|
v
+----------------------------------------------------------+
| xp.h (Interface) |
| |
| typedef unsigned char *XP_T; // Byte array |
| |
| Arithmetic: XP_add, XP_sub, XP_mul, XP_div |
| Comparison: XP_cmp |
| Conversion: XP_fromint, XP_toint, XP_fromstr, XP_tostr |
| Bit ops: XP_lshift, XP_rshift, XP_length |
+----------------------------------------------------------+
|
v
+----------------------------------------------------------+
| xp.c (Implementation) |
| |
| Internal helpers: |
| - normalize(): Remove leading zeros |
| - single_mul(): Multiply by single word |
| - single_div(): Divide by single word |
| |
| Uses: assert.h, mem.h |
+----------------------------------------------------------+
4.2 Key Components
Header File (xp.h):
#ifndef XP_INCLUDED
#define XP_INCLUDED
/*
* XP: Extended Precision Unsigned Arithmetic
*
* Provides operations on unsigned integers of arbitrary length,
* represented as arrays of bytes in little-endian order.
*
* All numbers are stored least-significant byte first.
* Length n is always in bytes.
*/
typedef unsigned char *XP_T;
/* Arithmetic operations - return carry/borrow */
extern int XP_add(int n, XP_T z, XP_T x, XP_T y, int carry);
extern int XP_sub(int n, XP_T z, XP_T x, XP_T y, int borrow);
extern void XP_mul(XP_T z, int n, XP_T x, int m, XP_T y);
extern int XP_div(int n, XP_T q, XP_T x, int m, XP_T y,
XP_T r, XP_T tmp);
/* Comparison - returns -1, 0, or 1 */
extern int XP_cmp(int n, XP_T x, XP_T y);
/* Conversion */
extern void XP_fromint(int n, XP_T z, unsigned long u);
extern unsigned long XP_toint(int n, XP_T x);
extern int XP_fromstr(int n, XP_T z, const char *str,
int base, char **end);
extern char *XP_tostr(char *str, int size, int base, int n, XP_T x);
/* Bit operations */
extern void XP_lshift(int n, XP_T z, int m, XP_T x, int s, int fill);
extern void XP_rshift(int n, XP_T z, int m, XP_T x, int s, int fill);
extern int XP_length(int n, XP_T x);
/* Utility */
extern int XP_sum(int n, XP_T z, XP_T x, int y);
extern int XP_diff(int n, XP_T z, XP_T x, int y);
extern int XP_product(int n, XP_T z, XP_T x, int y);
extern int XP_quotient(int n, XP_T z, XP_T x, int y);
extern int XP_neg(int n, XP_T z, XP_T x, int carry);
#endif
Implementation File Structure (xp.c):
#include "xp.h"
#include "assert.h"
#include <string.h>
/* Internal constants */
#define BASE 256 /* 2^8 for byte-based arithmetic */
/* Addition with carry propagation */
int XP_add(int n, XP_T z, XP_T x, XP_T y, int carry) {
int i;
for (i = 0; i < n; i++) {
carry += x[i] + y[i];
z[i] = carry % BASE;
carry /= BASE;
}
return carry;
}
/* ... other functions ... */
4.3 Data Structures
Number Representation:
XP Number Layout (Little-Endian Byte Array)
────────────────────────────────────────────────────────────────────────────────
Example: The number 0x123456789ABCDEF0
In memory (little-endian):
Index: [0] [1] [2] [3] [4] [5] [6] [7]
Value: 0xF0 0xDE 0xBC 0x9A 0x78 0x56 0x34 0x12
^ ^
| |
LSB MSB
Why little-endian?
- Natural alignment with x86/ARM memory order
- Addition proceeds from index 0 upward (natural loop)
- Easy to extend by appending bytes
- Carry propagates in increasing index order
4.4 Algorithm Overview
Addition Algorithm:
XP Addition (carry propagation)
────────────────────────────────────────────────────────────────────────────────
For each byte position i from 0 to n-1:
1. sum = x[i] + y[i] + carry_in
2. z[i] = sum mod 256 (low byte)
3. carry_out = sum / 256 (0 or 1)
4. carry_in = carry_out for next iteration
Return final carry (0 or 1)
Example: 0xFFFF + 0x0001
Position 0: 0xFF + 0x01 + 0 = 0x100 -> z[0]=0x00, carry=1
Position 1: 0xFF + 0x00 + 1 = 0x100 -> z[1]=0x00, carry=1
Return: carry=1
Result: 0x10000 (as 3 bytes: [0x00, 0x00, 0x01])
Multiplication Algorithm (Grade-School):
XP Multiplication
────────────────────────────────────────────────────────────────────────────────
z[0..n+m-1] = x[0..n-1] * y[0..m-1]
for i = 0 to n-1:
carry = 0
for j = 0 to m-1:
product = x[i] * y[j] + z[i+j] + carry
z[i+j] = product mod 256
carry = product / 256
z[i+m] = carry
Note: z must be zeroed before multiplication
z must have at least n+m bytes
Division Algorithm (Knuth’s Algorithm D):
XP Division (Simplified)
────────────────────────────────────────────────────────────────────────────────
Given: dividend x, divisor y
Find: quotient q, remainder r such that x = q*y + r
1. Handle special cases (y=0 error, x<y gives q=0,r=x)
2. Normalize: shift both x and y left until y's high bit is set
This makes quotient digit estimation more accurate
3. For each quotient digit position (high to low):
a. Estimate q_digit using 2-byte / 1-byte division
b. Multiply q_digit * y
c. If product > remaining dividend, decrease q_digit
d. Subtract product from dividend
4. Denormalize remainder (shift right by normalization amount)
Time: O(n*m) where n = dividend length, m = divisor length
5. Implementation Guide
5.1 Development Environment Setup
Required Tools
────────────────────────────────────────────────────────────────────────────────
Compiler:
- GCC 7+ or Clang 6+
- C11 standard support (-std=c11)
Testing:
- Valgrind for memory leak detection
- AddressSanitizer (-fsanitize=address)
Build:
- GNU Make
Recommended flags:
CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g -O2
CFLAGS += -fsanitize=address -fsanitize=undefined # for testing
5.2 Project Structure
xp/
├── Makefile
├── include/
│ └── xp.h
├── src/
│ └── xp.c
├── test/
│ ├── test_xp_basic.c # Basic operations tests
│ ├── test_xp_mul.c # Multiplication tests
│ ├── test_xp_div.c # Division tests
│ ├── test_xp_convert.c # Conversion tests
│ └── test_factorial.c # Integration test: compute n!
└── examples/
└── factorial.c # Example: compute large factorials
5.3 The Core Question You’re Answering
“How do you perform arithmetic on numbers larger than your hardware can natively represent?”
The fundamental insight is:
- Represent large numbers as arrays of smaller units
- Implement arithmetic operations by processing these units sequentially
- Handle carry/borrow propagation between adjacent units
- Maintain proper boundary conditions and edge cases
This is the same principle that lets us add multi-digit decimal numbers by hand, scaled to computer word sizes.
5.4 Concepts You Must Understand First
Before implementing XP, ensure you can answer these questions:
- What happens when you add two 8-bit numbers that sum to more than 255?
- The result overflows; you get the low 8 bits plus a carry
- In C:
(uint8_t)(200 + 100)gives 44, carry = 1
- How do you detect overflow in unsigned addition?
- If
a + b < a(or< b), overflow occurred - Or: use a larger type for the intermediate result
- If
- What does little-endian mean for multi-byte integers?
- Least significant byte at lowest address
- Natural for x86/ARM; carry propagates to higher addresses
- Why is multiplication harder than addition?
- Each digit of one operand interacts with each digit of the other
- O(n^2) vs O(n) complexity
- How does long division work with multi-digit numbers?
- Estimate quotient digit, multiply, subtract, repeat
- May need to adjust estimate if it was too high
5.5 Questions to Guide Your Design
Representation Questions:
- Should you use bytes (8-bit), 16-bit, or 32-bit words as your base unit?
- Bytes are simplest but slowest
- 32-bit words balance simplicity and speed
- Consider your target platform
- Little-endian or big-endian storage?
- Little-endian: easier carry propagation, matches x86/ARM
- Big-endian: easier string conversion, matches mathematical notation
- How do you handle numbers of different lengths?
- Require equal lengths (simpler)
- Allow unequal, treat missing high-order words as zero
Algorithm Questions:
- How do you handle the carry from addition?
- Return it as function result (Hanson’s approach)
- Store in extra word of result
- Raise exception on overflow
- For multiplication, how do you accumulate partial products?
- Accumulate in-place in result array
- Use separate temporary storage
- Consider word size for intermediate results
- For division, how accurate must your quotient estimate be?
- Can be off by at most 2 with proper normalization
- Must be able to correct by iterating
5.6 Thinking Exercise
Before writing code, work through these by hand:
Exercise 1: Binary Addition
Add these 3-byte numbers (in hex):
0x12 0x34 0x56 (little-endian: actual value 0x563412)
+ 0xFE 0xDC 0xBA (little-endian: actual value 0xBADCFE)
Work out each byte position, tracking carries.
Exercise 2: Single-Digit Multiplication
Multiply 0x12 0x34 (2 bytes) by 0x56 (1 byte):
Position 0: 0x12 * 0x56 = ?
Position 1: 0x34 * 0x56 + carry = ?
What's the 3-byte result?
Exercise 3: Quotient Estimation
Divide 0x01 0x00 0x00 by 0x80 0x00 (both little-endian):
First estimate: what's the quotient of the top 2 bytes by the top 1 byte?
Is this estimate correct? Too high? Too low?
5.7 Hints in Layers
Hint 1 - Starting Point (Conceptual):
Begin with addition and subtraction. Use a simple array of unsigned char values. The carry/borrow is just a single bit you track across iterations. Write out the loop structure before worrying about edge cases.
Hint 2 - Next Level (More Specific):
For multiplication, the product of two n-word numbers is at most 2n words. Allocate appropriately. Zero the result array first, then accumulate partial products by adding to existing values (not replacing).
Hint 3 - Technical Details (Approach):
When multiplying two 8-bit numbers, the result can be 16 bits. Use unsigned int for intermediate results:
unsigned int product = (unsigned int)x[i] * y[j];
unsigned int sum = product + z[i+j] + carry;
z[i+j] = sum & 0xFF;
carry = sum >> 8;
Hint 4 - Verification (Testing Strategy):
Test with numbers you can verify manually:
- 0xFFFFFFFF + 1 should give 0x100000000 (5 bytes)
- 0xFFFF * 0xFFFF should give 0xFFFE0001 (4 bytes)
- Use Python’s unlimited integers to generate test cases
5.8 The Interview Questions They’ll Ask
Q1: Explain how you’d add two 256-bit numbers using 64-bit registers.
Strong Answer: “I’d represent each 256-bit number as an array of four 64-bit words in little-endian order. Addition proceeds from the least significant word to the most significant. For each position, I add the two corresponding words plus any carry from the previous position. I can detect carry by checking if the result is less than either operand (unsigned overflow). The final carry indicates if the sum needs an extra word. Time complexity is O(n) where n is the number of words.”
Q2: What’s the time complexity of grade-school multiplication? Can you do better?
Strong Answer: “Grade-school multiplication is O(n^2) where n is the number of digits. Each digit of one number multiplies each digit of the other. We can do better with Karatsuba (O(n^1.585)), Toom-Cook (O(n^1.465)), or FFT-based methods (O(n log n)). However, the crossover point where Karatsuba beats grade-school is typically around 30-80 words due to the overhead of recursive calls and extra additions. For most practical sizes below a few thousand bits, optimized grade-school is often competitive.”
Q3: How would you implement comparison of two XP numbers?
Strong Answer: “Compare from the most significant word down. If the numbers have different lengths, the longer one is larger (assuming normalized representation without leading zeros). For equal lengths, find the first position where they differ and compare those words. If all words are equal, the numbers are equal. This is O(n) in the worst case but often terminates early.”
Q4: Why is division so much harder than multiplication?
Strong Answer: “Division requires estimating each quotient digit, which involves dividing two words of the dividend by one word of the divisor. This estimate can be off by up to 2, requiring correction. You must also normalize (shift) the divisor so its high bit is set, which improves estimate accuracy. After finding each quotient digit, you multiply and subtract, potentially requiring borrow propagation. Division is typically 3-5x slower than multiplication due to the estimation, correction, and normalization overhead.”
Q5: How do you convert an XP number to a decimal string?
Strong Answer: “Repeatedly divide by 10 (or a power of 10 for efficiency) and collect the remainders. Each remainder is a decimal digit (or group of digits). The digits come out in reverse order, so store them and reverse at the end, or start from the end of the output buffer. For large numbers, dividing by 10^9 gives 9 digits at once, reducing the number of expensive multi-precision divisions. Time complexity is O(n^2) because each division is O(n) and we need O(n) digits.”
5.9 Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| XP Module Design | “C Interfaces and Implementations” by Hanson | Chapter 17 |
| Arithmetic Algorithms | “The Art of Computer Programming” Vol. 2 by Knuth | Chapter 4 (Arithmetic) |
| Cryptographic Context | “Applied Cryptography” by Schneier | Chapter 11 (Mathematical Background) |
| Modern Optimization | “Hacker’s Delight” by Warren | Chapters 8-10 |
| GMP Algorithms | “Modern Computer Arithmetic” by Brent & Zimmermann | Chapters 1-2 (free online) |
5.10 Implementation Phases
Phase 1: Basic Infrastructure (Days 1-2)
- Set up project structure with header and implementation files
- Implement
XP_addandXP_subwith carry/borrow - Implement
XP_cmpfor comparison - Write comprehensive tests for basic operations
- Verify with simple cases: 0+0, 0xFF+1, 0x100-1
Phase 2: Conversion (Days 3-4)
- Implement
XP_fromintandXP_toint - Implement
XP_tostrfor decimal/hex output - Implement
XP_fromstrfor parsing strings - Test round-trip: number -> string -> number
Phase 3: Multiplication (Days 5-7)
- Implement grade-school
XP_mul - Handle the product size (n+m words)
- Test with known products
- Verify commutativity: xy == yx
Phase 4: Division (Days 8-12)
- Start with single-word divisor (simpler)
- Implement normalization (left shift until high bit set)
- Implement quotient estimation
- Implement full multi-word division
- Test: verify x == q*y + r for all test cases
Phase 5: Bit Operations & Polish (Days 13-14)
- Implement
XP_lshiftandXP_rshift - Implement
XP_length(bit count) - Add edge case handling throughout
- Run under Valgrind, fix any memory issues
- Document the API
5.11 Key Implementation Decisions
Decision 1: Base Unit Size
- 8-bit (bytes): Simplest, most portable, but slowest
- 32-bit words: Good balance, uses hardware multiply efficiently
- 64-bit words: Fastest on 64-bit systems, but need 128-bit intermediate for multiply
Recommendation: Start with bytes for simplicity, optimize later if needed.
Decision 2: Memory Ownership
- Caller allocates all memory (Hanson’s approach)
- Library manages memory internally
Recommendation: Caller-managed is simpler and avoids hidden allocations.
Decision 3: Error Handling
- Assertions for precondition violations (division by zero, invalid inputs)
- Return codes for overflow conditions (add returns carry)
- No silent failures
6. Testing Strategy
Unit Tests
Test Categories:
// test_xp_add.c
// Basic functionality
void test_add_zero() {
unsigned char a[4] = {0x12, 0x34, 0x56, 0x78};
unsigned char b[4] = {0, 0, 0, 0};
unsigned char c[4];
int carry = XP_add(4, c, a, b, 0);
assert(carry == 0);
assert(memcmp(a, c, 4) == 0);
}
// Carry propagation
void test_add_carry_chain() {
unsigned char a[4] = {0xFF, 0xFF, 0xFF, 0xFF};
unsigned char b[4] = {0x01, 0, 0, 0};
unsigned char c[4];
int carry = XP_add(4, c, a, b, 0);
assert(carry == 1);
assert(c[0] == 0 && c[1] == 0 && c[2] == 0 && c[3] == 0);
}
// Commutativity
void test_add_commutative() {
unsigned char a[4], b[4], c1[4], c2[4];
// Fill with random values
XP_add(4, c1, a, b, 0);
XP_add(4, c2, b, a, 0);
assert(memcmp(c1, c2, 4) == 0);
}
Multiplication Tests:
// test_xp_mul.c
void test_mul_by_zero() {
unsigned char a[4] = {0x12, 0x34, 0x56, 0x78};
unsigned char b[4] = {0, 0, 0, 0};
unsigned char c[8] = {1, 1, 1, 1, 1, 1, 1, 1}; // Non-zero to verify clearing
XP_mul(c, 4, a, 4, b);
// c should be all zeros
for (int i = 0; i < 8; i++) assert(c[i] == 0);
}
void test_mul_known_values() {
// 0xFFFF * 0xFFFF = 0xFFFE0001
unsigned char a[2] = {0xFF, 0xFF};
unsigned char b[2] = {0xFF, 0xFF};
unsigned char c[4];
XP_mul(c, 2, a, 2, b);
assert(c[0] == 0x01);
assert(c[1] == 0x00);
assert(c[2] == 0xFE);
assert(c[3] == 0xFF);
}
Division Tests:
// test_xp_div.c
void test_div_exact() {
// 100 / 10 = 10, remainder 0
unsigned char dividend[2], divisor[1], quotient[2], remainder[1];
XP_fromint(2, dividend, 100);
XP_fromint(1, divisor, 10);
XP_div(2, quotient, dividend, 1, divisor, remainder, NULL);
assert(XP_toint(2, quotient) == 10);
assert(XP_toint(1, remainder) == 0);
}
void test_div_with_remainder() {
// 100 / 7 = 14 remainder 2
unsigned char dividend[2], divisor[1], quotient[2], remainder[1];
XP_fromint(2, dividend, 100);
XP_fromint(1, divisor, 7);
XP_div(2, quotient, dividend, 1, divisor, remainder, NULL);
assert(XP_toint(2, quotient) == 14);
assert(XP_toint(1, remainder) == 2);
}
void test_div_identity() {
// For all x, y: x == (x/y)*y + (x%y)
// Generate random values and verify
}
Integration Test: Factorial
// test_factorial.c
void test_factorial_100() {
// 100! is known to have 158 decimal digits
unsigned char fact[200]; // Enough space
memset(fact, 0, sizeof(fact));
fact[0] = 1;
for (int i = 2; i <= 100; i++) {
unsigned char temp[200];
unsigned char ival[4];
XP_fromint(4, ival, i);
XP_mul(temp, 200, fact, 4, ival);
memcpy(fact, temp, 200);
}
char str[500];
XP_tostr(str, sizeof(str), 10, 200, fact);
// 100! starts with "93326215443944..." and has 158 digits
assert(strlen(str) == 158);
assert(strncmp(str, "9332621544", 10) == 0);
}
7. Common Pitfalls & Debugging
Pitfall 1: Forgetting to Initialize the Product Array
Problem: Multiplication accumulates into the result, so it must be zeroed first.
Symptom: Random values in product, inconsistent results.
Fix:
void XP_mul(XP_T z, int n, XP_T x, int m, XP_T y) {
memset(z, 0, n + m); // CRITICAL: zero before accumulating
// ... multiplication loop
}
Pitfall 2: Integer Overflow in Intermediate Calculations
Problem: Multiplying two bytes can produce up to 16 bits; adding carry can add more.
Symptom: Incorrect results, especially with large operands.
Fix:
// BAD: overflow risk
unsigned char product = x[i] * y[j] + z[i+j] + carry;
// GOOD: use larger intermediate type
unsigned int product = (unsigned int)x[i] * y[j] + z[i+j] + carry;
z[i+j] = product & 0xFF;
carry = product >> 8;
Pitfall 3: Off-by-One Errors in Division
Problem: Quotient estimation can be too high by 1 or 2.
Symptom: Division produces wrong results, especially near word boundaries.
Fix: Always verify and correct the estimate:
// After estimating q_digit
while (q_digit * divisor > remaining_dividend) {
q_digit--;
}
Pitfall 4: Forgetting Endianness in String Conversion
Problem: Printing bytes in wrong order produces reversed number.
Symptom: “0x12345678” prints as “78563412”.
Fix: Account for little-endian storage when building output strings.
Pitfall 5: Not Normalizing the Divisor
Problem: Division without normalization makes quotient estimation very inaccurate.
Symptom: Division fails or requires many correction iterations.
Fix: Shift divisor left until its high bit is set, apply same shift to dividend.
8. Extensions & Challenges
Basic Extensions
- GCD (Greatest Common Divisor): Implement Euclidean algorithm using your XP_div
- Power: Implement XP_power(base, exponent) using repeated squaring
- Primality Test: Implement Miller-Rabin using your XP operations
Advanced Extensions
- Karatsuba Multiplication: Implement the recursive O(n^1.585) algorithm
- Barrett Reduction: Fast modular reduction without division
- Montgomery Multiplication: Foundation for the MP module
Challenge Projects
- RSA Key Generation: Generate large primes, compute modular inverse
- Big Integer Calculator: Interactive calculator for arbitrary precision math
- Factorial Competition: How fast can you compute 100,000!?
9. Real-World Connections
GMP (GNU Multiple Precision Arithmetic Library)
The most widely used arbitrary precision library uses similar techniques:
- Little-endian limb (word) storage
- Careful handling of carry propagation
- Karatsuba, Toom-Cook, and FFT for large multiplications
- Sophisticated division algorithms
Your XP module implements the core ideas; GMP adds decades of optimization.
OpenSSL/BoringSSL
Cryptographic libraries implement specialized versions:
- Constant-time operations to prevent timing attacks
- Montgomery multiplication for modular arithmetic
- Optimized assembly for critical paths
Python’s Integer Implementation
Python’s unlimited precision integers work similarly:
- Array of 30-bit “digits” (32-bit words with some bits unused)
- Grade-school algorithms for small numbers
- Karatsuba for larger numbers
- Sign-magnitude representation (like the AP module)
10. Resources
Books
- Primary: “C Interfaces and Implementations” by David Hanson, Chapter 17
- Algorithm Depth: “The Art of Computer Programming” Vol. 2 by Donald Knuth
- Modern Techniques: “Modern Computer Arithmetic” by Brent & Zimmermann (free PDF)
- Cryptographic Context: “Handbook of Applied Cryptography” by Menezes et al. (free online)
Online Resources
- CII Source Code: https://github.com/drh/cii (official implementation)
- GMP Documentation: https://gmplib.org/manual/
- Python Integer Implementation: https://github.com/python/cpython/blob/main/Objects/longobject.c
Tools
- GMP: Compare your implementation’s results against GMP
- Python: Use Python’s unlimited integers to generate test cases
- Valgrind: Essential for catching memory errors
11. Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- Can explain why extended precision is needed for cryptography
- Can describe carry propagation in addition
- Can explain the grade-school multiplication algorithm
- Can describe why division requires normalization
- Understand the trade-offs of different base unit sizes
Implementation
- Addition handles carry across all word boundaries
- Subtraction handles borrow correctly (returns nonzero if result is negative)
- Multiplication produces correct results for all operand sizes
- Division produces correct quotient and remainder
- Comparison works correctly for numbers of different magnitudes
- String conversion handles all bases (at least 2, 10, 16)
Testing
- All basic operations pass unit tests
- Edge cases handled: zero operands, maximum values, single-byte numbers
- Algebraic properties verified: a+b=b+a, ab=ba, (a/b)*b+a%b=a
- No memory leaks (verified by Valgrind)
- No undefined behavior (verified by sanitizers)
Documentation
- Header file documents all public functions
- Preconditions and postconditions are clear
- Usage examples provided
12. Submission / Completion Criteria
This project is complete when:
-
All Operations Functional: Addition, subtraction, multiplication, division, comparison, and conversion all work correctly
-
Tests Pass: Comprehensive test suite with >90% coverage passes
-
No Memory Issues: Valgrind reports no leaks or errors
-
Integration Demo: Can compute 1000! and display it in decimal
-
Documentation Complete: API reference in header, usage examples provided
-
Code Quality: Clean code following Hanson’s interface/implementation pattern
Verification Command:
# Build and test
make clean all test
# Memory check
valgrind --leak-check=full ./test_xp
# Compute 1000! as integration test
./factorial 1000
Expected Output for 1000!:
1000! has 8530 bits (2568 decimal digits)
1000! = 402387260077093773543702433923003985719374864210714632543799910429938512398629020...