Project 2: Payment Tokenization Vault
Project 2: Payment Tokenization Vault
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 1-2 weeks |
| Programming Language | C |
| Knowledge Area | Payment Security / Cryptography |
| Key Technologies | AES-256, Tokenization, Format-Preserving Encryption |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 4. The โOpen Coreโ Infrastructure |
Learning Objectives
By completing this project, you will:
- Understand tokenization architecture - Learn why tokenization is THE core technique in modern payment security
- Implement AES-256 encryption - Master symmetric encryption for data at rest
- Design format-preserving tokens - Create tokens that pass Luhn validation for legacy system compatibility
- Build secure key management - Understand key lifecycle, rotation, and storage
- Create vault architecture - Design bidirectional token-to-PAN mapping with O(1) lookup
- Understand PCI scope reduction - Learn how tokenization removes systems from compliance requirements
The Core Question Youโre Answering
โHow can merchants process payments without ever seeing the actual card number?โ
This is the central problem tokenization solves. Consider this scenario:
- Customer enters card
4532015112830366on your website - Your server receivesโฆ
tok_9f8g7h6j5k4l3m2n(a token) - You store, process, and reference the token everywhere
- Only the tokenization vault knows the mapping back to the real card
- Your entire system never touches the real PAN
Why this matters:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PCI SCOPE: WITH vs. WITHOUT TOKENIZATION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ WITHOUT TOKENIZATION WITH TOKENIZATION โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ Web Server โ โ PCI Scope โ Web Server โ โ
โ โ (handles PAN) โ โ (handles token) โ โ OUT of PCI โ
โ โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโผโโโโโโโโโ โ
โ โ App Server โ โ PCI Scope โ App Server โ โ OUT of PCI โ
โ โ (handles PAN) โ โ (handles token) โ โ
โ โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโผโโโโโโโโโ โ
โ โ Database โ โ PCI Scope โ Token Vault โ โ PCI Scope โ
โ โ (stores PAN) โ โ (stores PAN) โ (ONLY this!)โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ
โ PCI Scope: 3 systems PCI Scope: 1 system โ
โ Audit cost: $$$$$ Audit cost: $$ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Tokenization reduces PCI scope from your entire infrastructure to a single, hardened vault.
Deep Theoretical Foundation
1. What is Tokenization?
Tokenization replaces sensitive data (PAN) with a non-sensitive substitute (token) that has no exploitable value outside the tokenization system.
Key properties of tokens:
- Non-reversible externally: You cannot derive the PAN from the token without access to the vault
- Unique mapping: Each PAN maps to one (or more) tokens
- Useless if stolen: Unlike encrypted data, a stolen token database reveals nothing
- Format-compatible: Can optionally match PAN format for legacy systems
Tokenization vs. Encryption:
| Aspect | Encryption | Tokenization |
|---|---|---|
| Reversibility | Key holder can decrypt | Only vault can map back |
| If data stolen | Attacker can try to crack | Token is useless outside vault |
| Format preservation | Ciphertext is different format | Token can match original format |
| Key management | Keys must be protected everywhere | Keys only in vault |
| PCI scope | All systems with keys are in scope | Only vault is in scope |
2. Token Types
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TOKEN TYPES โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ RANDOM TOKEN โ
โ โโโโโโโโโโโโ โ
โ PAN: 4532015112830366 โ
โ Token: tok_9f8g7h6j5k4l3m2n โ
โ โ
โ โข No format relationship to PAN โ
โ โข Highest security โ
โ โข Requires system changes to accept new format โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ FORMAT-PRESERVING TOKEN โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ PAN: 4532015112830366 โ
โ Token: 4532019876543210 โ Same length, passes Luhn! โ
โ โ
โ โข Looks like a card number โ
โ โข Works with legacy systems expecting 16 digits โ
โ โข BIN may or may not be preserved โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ PARTIAL TOKEN (BIN-preserving) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ PAN: 4532015112830366 โ
โ Token: 4532019876543210 โ
โ ^^^^^^ โ
โ BIN preserved (for routing) โ
โ โ
โ โข Allows network routing without detokenization โ
โ โข Common in payment processing โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
3. AES-256 Encryption
AES (Advanced Encryption Standard) with 256-bit keys is the standard for protecting PANs at rest.
AES Block Cipher Basics:
- Block size: 128 bits (16 bytes)
- Key sizes: 128, 192, or 256 bits
- Mode of operation: CBC, GCM, CTR (each with different properties)
Why AES-256 for payment data:
- Symmetric: Same key encrypts and decrypts (fast)
- 256-bit: Quantum-resistant (AES-128 less so)
- Well-analyzed: 20+ years of cryptanalysis, no practical attacks
- Hardware acceleration: Modern CPUs have AES-NI instructions
Mode Selection for Token Vault:
| Mode | Pros | Cons | Use Case |
|---|---|---|---|
| ECB | Simple | Patterns visible, NEVER use | Never |
| CBC | Hides patterns | Needs IV, padding oracle attacks | Legacy |
| GCM | Auth + encryption | Nonce reuse catastrophic | Recommended |
| CTR | Streaming, parallelizable | No authentication | With HMAC |
Recommended: AES-256-GCM (authenticated encryption)
4. Format-Preserving Encryption (FPE)
FPE encrypts data while preserving its format (length, character set).
NIST Standards:
- FF1: Based on Feistel cipher, NIST SP 800-38G
- FF3: Alternate algorithm, also NIST approved
How FPE works (conceptual):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FORMAT-PRESERVING ENCRYPTION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Regular AES: โ
โ Input: "4532015112830366" (16 chars) โ
โ Output: "a7f3b2c1d4e5f6g7h8i9j0k1l2m3n4o5p6" (32+ hex chars) โ
โ โ
โ Format-Preserving (FF1): โ
โ Input: "4532015112830366" (16 digits) โ
โ Output: "9876543210123456" (16 digits) โ
โ โ
โ Algorithm (Feistel-based): โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ 1. Split input into two halves: L, R โ โ
โ โ 2. For i = 0 to rounds: โ โ
โ โ L' = R โ โ
โ โ R' = L XOR F(R, key, tweak, i) โ โ
โ โ 3. Return L' || R' โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ F() = AES-based round function that outputs โ
โ values in the same alphabet as input โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
For this project: Start with random tokens. FPE is complexโimplement it as an extension.
5. Key Management
Keys are the critical element. A stolen vault with encrypted data is useless without keys.
Key Hierarchy:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ KEY HIERARCHY โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Master Key (MK) โ โ Stored in HSM or โ
โ โ โ key management service โ
โ โโโโโโโโโโโฌโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ Key Encryption โ โ Data Encryption โ โ Token Generationโ โ
โ โ Key (KEK) โ โ Key (DEK) โ โ Key (TGK) โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ Encrypts other Encrypts PANs Generates tokens โ
โ keys at rest in the vault (HMAC-based) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Rotation:
- Generate new DEK
- Re-encrypt all PANs with new DEK
- Encrypt new DEK with KEK
- Destroy old DEK
For this project: Simulate key management. Production would use HSM.
6. Vault Architecture
Bidirectional Mapping:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TOKEN VAULT DATA MODEL โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ TOKENIZE: PAN โ Token โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 1. Receive PAN: 4532015112830366 โ
โ 2. Check if PAN already tokenized (hash lookup) โ
โ 3. If exists: return existing token โ
โ 4. If new: โ
โ a. Generate random token โ
โ b. Encrypt PAN with DEK โ
โ c. Store mapping: token โ encrypted_pan โ
โ d. Store hash index: hash(PAN) โ token โ
โ e. Return token โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ DETOKENIZE: Token โ PAN โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 1. Receive token: tok_9f8g7h6j5k4l3m2n โ
โ 2. Lookup: token โ encrypted_pan โ
โ 3. Decrypt encrypted_pan with DEK โ
โ 4. Return PAN: 4532015112830366 โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ DATABASE SCHEMA: โ
โ โโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ tokens โ โ
โ โ โโโ token_id (PK): varchar(32) โ โ
โ โ โโโ encrypted_pan: varbinary(64) โ โ
โ โ โโโ pan_hash: char(64) โ For dedup, salted SHA-256 โ โ
โ โ โโโ created_at: timestamp โ โ
โ โ โโโ last_used: timestamp โ โ
โ โ โโโ metadata: json โ BIN, card type, etc. โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ INDEX on pan_hash for O(1) duplicate detection โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Project Specification
What Youโll Build
A secure tokenization service that:
- Accepts card numbers via API
- Stores them encrypted in a vault
- Returns format-preserving (or random) tokens
- Provides authenticated detokenization
- Supports key rotation
Expected Output
# Tokenize a card
$ curl -X POST http://localhost:8080/tokenize \
-H "Content-Type: application/json" \
-d '{"pan": "4532015112830366"}'
{
"success": true,
"token": "tok_a1b2c3d4e5f6g7h8",
"metadata": {
"network": "Visa",
"last_four": "0366",
"bin": "453201"
}
}
# Detokenize (requires authentication)
$ curl -X POST http://localhost:8080/detokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <api_key>" \
-d '{"token": "tok_a1b2c3d4e5f6g7h8"}'
{
"success": true,
"pan": "4532015112830366"
}
# Vault statistics (admin only)
$ curl http://localhost:8080/admin/stats \
-H "Authorization: Bearer <admin_key>"
{
"total_tokens": 15234,
"unique_pans": 12891,
"tokens_by_network": {
"Visa": 8234,
"Mastercard": 4521,
"Amex": 1479
},
"created_today": 523
}
Project Structure
tokenization_vault/
โโโ src/
โ โโโ main.c # HTTP server entry point
โ โโโ vault.c # Core vault logic
โ โโโ vault.h
โ โโโ crypto.c # AES encryption wrapper
โ โโโ crypto.h
โ โโโ token_gen.c # Token generation
โ โโโ token_gen.h
โ โโโ key_manager.c # Key lifecycle
โ โโโ key_manager.h
โ โโโ storage.c # SQLite persistence
โ โโโ storage.h
โ โโโ http_server.c # Minimal HTTP handling
โ โโโ http_server.h
โโโ include/
โ โโโ config.h # Build configuration
โโโ tests/
โ โโโ test_vault.c
โ โโโ test_crypto.c
โ โโโ test_token_gen.c
โ โโโ test_storage.c
โโโ scripts/
โ โโโ key_ceremony.sh # Key generation script
โ โโโ rotate_keys.sh # Key rotation script
โโโ data/
โ โโโ vault.db # SQLite database
โโโ Makefile
โโโ README.md
Core API Design
// vault.h
#ifndef VAULT_H
#define VAULT_H
#include <stdbool.h>
#include <stdint.h>
typedef struct {
char token[33]; // 32 chars + null
char network[32]; // Card network
char last_four[5]; // Last 4 digits (safe to store)
char bin[9]; // First 6-8 digits
bool new_token; // True if newly created
} TokenizeResult;
typedef struct {
char pan[20]; // Decrypted PAN
bool success;
char error[128]; // Error message if failed
} DetokenizeResult;
// Primary operations
TokenizeResult vault_tokenize(const char* pan);
DetokenizeResult vault_detokenize(const char* token, const char* api_key);
// Vault management
bool vault_init(const char* db_path, const char* master_key_path);
void vault_shutdown(void);
// Key management
bool vault_rotate_keys(void);
int vault_get_key_version(void);
// Statistics (for admin dashboard)
typedef struct {
int64_t total_tokens;
int64_t unique_pans;
int64_t tokens_today;
} VaultStats;
VaultStats vault_get_stats(void);
#endif
Solution Architecture
System Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TOKENIZATION VAULT ARCHITECTURE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ API Gateway / โ โ
โ โ HTTP Server โ โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ /tokenize โ โ /detokenize โ โ /admin/stats โ โ
โ โ โ โ โ โ โ โ
โ โ โข Validate โ โ โข Auth checkโ โ โข Auth checkโ โ
โ โ โข Luhn checkโ โ โข Rate limitโ โ โข Read only โ โ
โ โ โข Dedupe โ โ โข Audit log โ โ โ โ
โ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโโโฌโโโโโโโโโ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ VAULT CORE โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Token Generatorโ โ Crypto Engine โ โ โ
โ โ โ โ โ โ โ โ
โ โ โ โข CSPRNG โ โ โข AES-256-GCM โ โ โ
โ โ โ โข Format-pres โ โ โข Key derivation โ โ โ
โ โ โ โข Uniqueness โ โ โข Key wrapping โ โ โ
โ โ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ KEY MANAGER โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ โ
โ โ โ Master Key โ โ DEK Cache โ โ โ
โ โ โ (env/file) โ โ (in memory)โ โ โ
โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ STORAGE LAYER โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ SQLite Database โ โ โ
โ โ โ โ โ โ
โ โ โ tokens: token_id, encrypted_pan, โ โ โ
โ โ โ pan_hash, metadata โ โ โ
โ โ โ โ โ โ
โ โ โ keys: key_version, encrypted_dek, โ โ โ
โ โ โ created_at, active โ โ โ
โ โ โ โ โ โ
โ โ โ audit_log: timestamp, action, โ โ โ
โ โ โ token_id, actor โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Tokenization Flow
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TOKENIZATION FLOW โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ INPUT: PAN "4532015112830366" โ
โ โ
โ Step 1: VALIDATE โ
โ โโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โข Check Luhn validity โ โ
โ โ โข Verify length (13-19 digits) โ โ
โ โ โข Identify network from BIN โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Step 2: DEDUPLICATE โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ pan_hash = HMAC-SHA256(PAN, hash_key) โ โ
โ โ existing = SELECT token WHERE hash=? โ โ
โ โ IF existing: return existing token โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Step 3: GENERATE TOKEN โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ token = "tok_" + random_base62(24) โ โ
โ โ Verify uniqueness in database โ โ
โ โ (or use UUID for guaranteed uniqueness) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Step 4: ENCRYPT PAN โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ nonce = random(12 bytes) โ โ
โ โ encrypted = AES-256-GCM(PAN, DEK, nonce) โ โ
โ โ ciphertext = nonce || encrypted || tag โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Step 5: STORE โ
โ โโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ INSERT INTO tokens ( โ โ
โ โ token_id, encrypted_pan, pan_hash, โ โ
โ โ metadata, key_version, created_at โ โ
โ โ ) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ OUTPUT: token "tok_a1b2c3d4e5f6g7h8" โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Security Design Decisions
- Never log PANs: Even for debugging, only log tokens and metadata
- Immediate memory clearing: Zero out PAN buffers after use
- Authenticated encryption: AES-GCM prevents tampering
- Salted PAN hashes: Hash key is separate from encryption key
- Key versioning: Support old keys during rotation
- Audit logging: Every detokenization is logged with actor ID
Implementation Guide
Phase 1: Crypto Foundation
Goal: Implement AES-256-GCM encryption wrapper.
Using OpenSSL:
#include <openssl/evp.h>
#include <openssl/rand.h>
typedef struct {
unsigned char nonce[12];
unsigned char tag[16];
unsigned char* ciphertext;
size_t ciphertext_len;
} EncryptedData;
EncryptedData* encrypt_pan(const char* pan, const unsigned char* key);
char* decrypt_pan(const EncryptedData* data, const unsigned char* key);
Key tasks:
- Generate cryptographically secure random nonces
- Implement AES-256-GCM encryption
- Implement decryption with tag verification
- Zero memory after use
Phase 2: Token Generation
Goal: Generate unique, random tokens.
Random token approach:
// Generate 24 random bytes, base62 encode
// Result: "tok_" + 32 base62 characters
char* generate_token(void) {
unsigned char random_bytes[24];
RAND_bytes(random_bytes, 24);
char* token = malloc(37); // "tok_" + 32 + null
strcpy(token, "tok_");
base62_encode(random_bytes, 24, token + 4);
return token;
}
Format-preserving token (extension):
// Generate token that looks like a card number
// Preserves BIN, generates random middle, calculates valid Luhn check digit
char* generate_fp_token(const char* original_pan) {
char* token = malloc(20);
// Copy BIN (first 6 digits)
strncpy(token, original_pan, 6);
// Generate random middle
for (int i = 6; i < 15; i++) {
token[i] = '0' + (rand() % 10);
}
// Calculate valid Luhn check digit
token[15] = calculate_luhn_check(token);
token[16] = '\0';
return token;
}
Phase 3: Storage Layer
Goal: Persist tokens and encrypted PANs in SQLite.
Schema:
CREATE TABLE tokens (
token_id TEXT PRIMARY KEY,
encrypted_pan BLOB NOT NULL,
pan_hash TEXT NOT NULL UNIQUE,
bin TEXT NOT NULL,
last_four TEXT NOT NULL,
network TEXT NOT NULL,
key_version INTEGER NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_accessed TIMESTAMP
);
CREATE INDEX idx_pan_hash ON tokens(pan_hash);
CREATE TABLE keys (
key_version INTEGER PRIMARY KEY,
encrypted_dek BLOB NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
active BOOLEAN DEFAULT 1
);
CREATE TABLE audit_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
action TEXT NOT NULL, -- 'tokenize', 'detokenize'
token_id TEXT,
actor_id TEXT,
ip_address TEXT
);
Phase 4: Vault Core
Goal: Implement tokenize/detokenize operations.
Key functions:
TokenizeResult vault_tokenize(const char* pan) {
TokenizeResult result = {0};
// 1. Validate
if (!luhn_check(pan)) {
// Return error
}
// 2. Check for existing token
char* hash = hmac_sha256(pan, get_hash_key());
char* existing = storage_find_by_hash(hash);
if (existing) {
strcpy(result.token, existing);
result.new_token = false;
return result;
}
// 3. Generate new token
char* token = generate_token();
// 4. Encrypt PAN
EncryptedData* encrypted = encrypt_pan(pan, get_current_dek());
// 5. Store
storage_insert_token(token, encrypted, hash, extract_metadata(pan));
// 6. Build result
strcpy(result.token, token);
result.new_token = true;
// ... copy metadata
// 7. Clean up sensitive data
secure_zero(pan_copy, strlen(pan_copy));
return result;
}
Phase 5: HTTP Server
Goal: Expose vault via REST API.
Minimal HTTP server options:
- libmicrohttpd: Small, embeddable HTTP server
- mongoose: Single-file HTTP library
- Custom: Parse HTTP manually (educational but tedious)
Recommended: Use mongoose for simplicity.
Phase 6: Key Management
Goal: Implement key loading and rotation.
Key storage options:
- Environment variable:
VAULT_MASTER_KEY(development) - File: Key file with strict permissions (simple production)
- HSM: Hardware Security Module (real production)
Key rotation process:
bool vault_rotate_keys(void) {
// 1. Generate new DEK
unsigned char new_dek[32];
RAND_bytes(new_dek, 32);
// 2. Encrypt new DEK with master key
EncryptedData* wrapped_dek = encrypt_dek(new_dek, get_master_key());
// 3. Store new key with incremented version
int new_version = get_key_version() + 1;
storage_insert_key(new_version, wrapped_dek);
// 4. Re-encrypt all PANs (background job)
storage_reencrypt_all(get_dek(get_key_version() - 1), new_dek, new_version);
// 5. Mark old key inactive (but keep for disaster recovery)
storage_deactivate_key(new_version - 1);
return true;
}
Testing Strategy
Unit Tests
// test_crypto.c
void test_encrypt_decrypt_roundtrip() {
const char* pan = "4111111111111111";
unsigned char key[32];
RAND_bytes(key, 32);
EncryptedData* encrypted = encrypt_pan(pan, key);
char* decrypted = decrypt_pan(encrypted, key);
assert(strcmp(pan, decrypted) == 0);
free_encrypted(encrypted);
free(decrypted);
}
void test_wrong_key_fails() {
const char* pan = "4111111111111111";
unsigned char key1[32], key2[32];
RAND_bytes(key1, 32);
RAND_bytes(key2, 32);
EncryptedData* encrypted = encrypt_pan(pan, key1);
char* result = decrypt_pan(encrypted, key2); // Wrong key
assert(result == NULL); // Decryption should fail
}
// test_vault.c
void test_tokenize_same_pan_returns_same_token() {
const char* pan = "4111111111111111";
TokenizeResult r1 = vault_tokenize(pan);
TokenizeResult r2 = vault_tokenize(pan);
assert(strcmp(r1.token, r2.token) == 0);
assert(r1.new_token == true);
assert(r2.new_token == false);
}
Integration Tests
#!/bin/bash
# test_api.sh
# Start server
./vault_server &
PID=$!
sleep 1
# Test tokenize
RESPONSE=$(curl -s -X POST http://localhost:8080/tokenize \
-H "Content-Type: application/json" \
-d '{"pan": "4111111111111111"}')
TOKEN=$(echo $RESPONSE | jq -r '.token')
assert_starts_with "tok_" "$TOKEN"
# Test detokenize
RESPONSE=$(curl -s -X POST http://localhost:8080/detokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer test_api_key" \
-d "{\"token\": \"$TOKEN\"}")
PAN=$(echo $RESPONSE | jq -r '.pan')
assert_equals "4111111111111111" "$PAN"
# Cleanup
kill $PID
Security Tests
// test_security.c
void test_pan_not_in_logs() {
// Redirect stderr to capture logs
// Tokenize a PAN
// Verify PAN doesn't appear in log output
}
void test_memory_cleared_after_use() {
// This is hard to test directly
// Use Valgrind or AddressSanitizer to check for leaks
}
void test_detokenize_requires_auth() {
// Call detokenize without auth header
// Verify 401 response
}
Common Pitfalls & Debugging
Pitfall 1: Nonce Reuse
Symptom: Security completely broken (same plaintext + key + nonce = same ciphertext).
Cause: Using static or predictable nonce.
Fix: Always use RAND_bytes() for nonce generation.
Pitfall 2: Not Verifying GCM Tag
Symptom: Accepting tampered ciphertext.
Cause: Ignoring authentication tag verification.
Fix: Always check EVP_DecryptFinal_ex() return value.
Pitfall 3: Timing Attacks on Hash Comparison
Symptom: Attackers can determine token validity by timing responses.
Cause: Using strcmp() for hash comparison (early exit on mismatch).
Fix: Use constant-time comparison:
int secure_compare(const char* a, const char* b, size_t len) {
unsigned char result = 0;
for (size_t i = 0; i < len; i++) {
result |= a[i] ^ b[i];
}
return result == 0;
}
Pitfall 4: Key in Memory After Use
Symptom: Key extractable from memory dump.
Cause: Not zeroing key buffers.
Fix: Use explicit_bzero() or volatile-write loop:
void secure_zero(void* ptr, size_t len) {
volatile unsigned char* p = ptr;
while (len--) *p++ = 0;
}
Extensions & Challenges
Extension 1: Format-Preserving Encryption
Implement FF1 algorithm from NIST SP 800-38G for true format-preserving tokens.
Extension 2: Multi-Tenant Vault
Add tenant isolationโeach merchant has their own token space.
Extension 3: HSM Integration
Integrate with SoftHSM2 for realistic key management simulation.
Extension 4: Token Scoping
Implement scoped tokens: same PAN gets different tokens for different merchants/channels.
Extension 5: Expiring Tokens
Add token expiration for temporary/session tokens.
Interview Questions This Prepares You For
- โWhy use tokenization instead of encryption?โ
- Scope reduction, no key management outside vault, tokens useless if stolen
- โHow do you ensure unique tokens?โ
- CSPRNG + collision check, or UUID-based
- โWhatโs the difference between FF1 and random tokenization?โ
- FF1 preserves format (length, character set), random is simpler but changes format
- โHow do you handle key rotation?โ
- Re-encrypt data, version keys, keep old keys temporarily
- โWhatโs in PCI scope with a tokenization vault?โ
- Only the vault itself; systems handling only tokens are out of scope
Resources
Books
| Topic | Book | Chapter |
|---|---|---|
| AES & Block Ciphers | Serious Cryptography (Aumasson) | Ch. 4-5 |
| Key Management | Security in Computing (Pfleeger) | Key management chapter |
| Format-Preserving Encryption | NIST SP 800-38G | Full document |
Specifications
- PCI Token Guidelines
- NIST SP 800-38G (FPE)
- NIST SP 800-108 (Key Derivation)
Libraries
- OpenSSL (libcrypto): AES, RAND, HMAC
- libsodium: Modern crypto primitives
- SQLite: Storage
Self-Assessment Checklist
- AES-256-GCM encryption works correctly
- Tokens are unique and non-predictable
- Same PAN always returns same token
- Detokenization requires authentication
- PANs never appear in logs
- Memory is cleared after PAN handling
- Key rotation re-encrypts all data
- Audit log captures all detokenization
- Can explain why tokenization reduces PCI scope
Whatโs Next?
After this project, you understand how to protect card data at rest. Move to Project 3: P2PE Simulator to learn how data is protected in transit from the card reader to the processor.