Project 6: The Matrix Encoder (Linear Algebra for Reversible Transforms)

Build a deterministic matrix-based message encoder/decoder that demonstrates how linear transformations scramble and recover structured data.

Quick Reference

Attribute Value
Difficulty Level 2 (Intermediate)
Time Estimate 10-18 hours
Main Language Python
Alternatives Julia, MATLAB/Octave, TypeScript
Key Topics Matrix multiplication, invertibility, block encoding, modular arithmetic, numerical stability
Deliverable CLI tool to encode plaintext into numeric blocks and decode back
Observable Outcome Exact round-trip recovery of messages under valid key matrix

Learning Objectives

By the end of this project, you should be able to:

  1. Map text symbols to numeric vectors with explicit encoding rules.
  2. Apply matrix multiplication as a reversible transformation pipeline.
  3. Validate key invertibility before encoding begins.
  4. Handle block sizing, padding, and message normalization deterministically.
  5. Decode using matrix inverse (real-valued or modular mode, depending on design).
  6. Identify failure modes caused by non-invertible keys or numeric precision.
  7. Build deterministic CLI transcripts for encode/decode regression tests.
  8. Explain why this project is educational cryptography, not production-grade security.

Theory Needed

1) Linear Transformations as Data Scramblers

A matrix can transform vectors in consistent, rule-based ways. If a plaintext block is represented as vector v, encoding can be:

c = K * v

Where K is your key matrix and c is the cipher vector.

If K is invertible, decoding is:

v = K^{-1} * c

This project turns abstract linear algebra into a concrete reversible pipeline.

ASCII intuition:

plaintext block v  --(multiply by K)-->  encoded block c
      ^                                      |
      |                                      v
      +-----------(multiply by K^-1)---------+

2) Invertibility, Determinant, and Rank

A square matrix is invertible iff determinant is non-zero (equivalently full rank). If determinant is zero, information is lost: multiple inputs map to the same output and exact decode is impossible.

Practical engineering checks:

  • matrix is square (n x n)
  • determinant is not near zero (real-valued mode)
  • for modular mode, determinant is coprime with modulus

Failure example:

  • Rows are multiples of each other -> rank deficiency -> not invertible.

3) Block Encoding Design

Text usually has variable length, while matrix multiplication expects fixed vector dimension. So message must be chunked into blocks of size n.

Process:

  1. Normalize text (uppercase policy, allowed symbols).
  2. Map each symbol to number.
  3. Chunk into length n.
  4. Pad final block (e.g., with _ or X).

ASCII block pipeline:

"HELLO WORLD"
   |
   v
[H,E,L,L,O,_,W,O,R,L,D]
   |
   v
chunk n=3 -> [H,E,L] [L,O,_] [W,O,R] [L,D,_]
   |
   v
vectorize -> multiply by K block-by-block

4) Numeric Mode vs Modular Mode

You have two common educational approaches:

  1. Real-valued inverse mode:
    • straightforward linear algebra
    • but can suffer floating-point rounding
  2. Modular arithmetic mode (like Hill-cipher style):
    • integer math under modulus m
    • avoids floating drift
    • requires modular inverse of determinant

For high-school math learning, start with real-valued mode and optionally add modular mode extension.

5) Determinism and Input Contracts

Encoding systems fail when rules are ambiguous. Deterministic contracts are required:

  • explicit alphabet order
  • explicit symbol-to-number table
  • explicit padding symbol
  • explicit key and matrix dimension

Pseudo-logic:

normalize(text)
nums <- map_symbols(text)
blocks <- chunk_and_pad(nums, n)
encoded_blocks <- [K * b for b in blocks]
write encoded blocks in stable format

Project Spec

What You Will Build

A CLI tool with two commands:

  1. encode: plaintext -> encoded numeric blocks.
  2. decode: encoded numeric blocks -> plaintext.

Core features:

  • key matrix validation
  • deterministic text normalization and padding
  • block-by-block transform
  • round-trip verification mode

Out of scope:

  • secure key exchange
  • modern cryptographic security claims
  • binary file encryption pipeline

Functional Requirements

  1. Accept key matrix from file or inline argument.
  2. Validate shape and invertibility before processing.
  3. Normalize plaintext using documented alphabet policy.
  4. Encode/decode deterministically with stable output formatting.
  5. Return clear errors for invalid symbols, malformed blocks, and bad key matrices.

Non-Functional Requirements

  • Correctness: encode/decode round-trip must match normalized input exactly.
  • Reproducibility: same inputs produce same outputs byte-for-byte.
  • Explainability: output should include intermediate metadata (block count, padding count).

Data Shapes (Conceptual)

AlphabetMap:
  symbols: ordered list[str]
  to_int: map[str -> int]
  to_symbol: map[int -> str]

MatrixKey:
  size: int
  values: n x n numbers
  inverse: n x n numbers (or modular inverse representation)

EncodedPayload:
  key_id: string
  mode: "real" | "mod"
  block_size: int
  pad_symbol: string
  blocks: list[list[number]]

Real World Outcome

You should be able to run a deterministic encode/decode pair and compare your output with this transcript structure.

Deterministic CLI transcript:

$ python matrix_encoder.py \
  encode \
  --message "MEET AT NOON" \
  --alphabet "A-Z_" \
  --block-size 3 \
  --key-file keys/key3x3.json \
  --mode real

[INFO] mode=real block_size=3 key_id=key3x3-v1
[INFO] normalized_message=MEET_AT_NOON_
[INFO] symbol_count=12 pad_added=1
[INFO] blocks_plain=4
[ENCODED] [132,  89, 170]
[ENCODED] [141, 118, 162]
[ENCODED] [115,  76, 154]
[ENCODED] [ 88,  57, 103]
[INFO] wrote outputs/encoded_key3x3_v1.txt

$ python matrix_encoder.py decode --input outputs/encoded_key3x3_v1.txt --key-file keys/key3x3.json --mode real
[INFO] mode=real block_size=3 key_id=key3x3-v1
[INFO] decoded_blocks=4
[DECODED] MEET_AT_NOON_
[CHECK] round_trip=PASS

Failure transcript (invalid key):

$ python matrix_encoder.py encode --message "HELLO" --key-file keys/singular.json
[ERROR] key matrix is not invertible (determinant=0)

Success signal: normalized plaintext is recovered exactly in decode path.

Solution Architecture

System architecture:

+---------------------+
| CLI Layer           |
+----------+----------+
           |
           v
+---------------------+      +----------------------+
| Input Normalizer    |----->| Alphabet Mapper      |
| text policy/padding |      | symbols <-> integers |
+----------+----------+      +----------+-----------+
           |                            |
           v                            v
+---------------------+      +----------------------+
| Block Processor     |----->| Matrix Engine        |
| chunk/un-chunk      |      | multiply / inverse   |
+----------+----------+      +----------+-----------+
           |                            |
           +-------------+--------------+
                         v
                  +-------------+
                  | Serializer   |
                  +-------------+

Key Components

Component Responsibility Design Decision
KeyValidator Shape + invertibility checks Fail-fast before any transform
AlphabetCodec Symbol/integer mapping Explicit ordered alphabet prevents ambiguity
BlockCodec Chunk and padding policy Separate layer keeps matrix engine generic
MatrixEngine Encode/decode block transforms Isolated for unit-testing arithmetic correctness
PayloadIO Stable on-disk format Deterministic serialization for regression tests

Algorithm Overview

Encode path pseudocode:

function encode(message, key, alphabet, block_size):
  ensure key is invertible
  normalized <- normalize_message(message, alphabet)
  ints <- map_symbols_to_ints(normalized)
  blocks <- chunk_with_padding(ints, block_size)
  encoded <- []
  for block in blocks:
    encoded.append(matrix_multiply(key, block))
  return serialize(encoded, metadata)

Decode path pseudocode:

function decode(payload, key, alphabet):
  ensure payload.block_size == key.size
  inv <- inverse(key)
  plain_blocks <- []
  for c in payload.blocks:
    plain_blocks.append(matrix_multiply(inv, c))
  ints <- flatten(plain_blocks)
  symbols <- map_ints_to_symbols(round_or_mod(ints))
  return join(symbols)

Complexity:

  • For B blocks and n x n key: O(B * n^2) time (naive matrix-vector multiply), O(B * n) storage for blocks.

Tradeoff:

  • Real mode is easier to understand but can require rounding safeguards.
  • Modular mode is arithmetic-cleaner but conceptually heavier.

Implementation Guide

Phase 1: Contracts First

  1. Define alphabet and normalization policy.
  2. Define key matrix format and validation rules.
  3. Define payload format for encoded output.

Checkpoint: schema-level tests pass before arithmetic logic.

Phase 2: Arithmetic Core

  1. Implement matrix-vector multiply.
  2. Implement invertibility check.
  3. Implement inverse path (or modular inverse path if chosen).

Checkpoint: K^-1 * (K * v) == v for multiple test vectors.

Phase 3: Text and Block Pipeline

  1. Normalize plaintext.
  2. Convert symbols to numbers.
  3. Chunk/pad blocks and transform each block.

Checkpoint: known message round-trip matches normalized message exactly.

Phase 4: CLI and Reporting

  1. Add encode and decode subcommands.
  2. Print deterministic metadata lines.
  3. Add clear failure messages for key and symbol errors.

Checkpoint: transcript-compatible output verified by snapshot tests.

Testing

Unit Tests

  1. Alphabet map reversibility (symbol -> int -> symbol).
  2. Invertibility detection for valid and singular matrices.
  3. Block split/merge with padding behavior.
  4. Matrix round-trip on sample vectors.

Integration Tests

  1. Full encode/decode CLI round-trip using fixed key.
  2. Payload file read/write stability test.
  3. Invalid input tests (bad symbol, malformed block, wrong key size).

Deterministic test transcript example:

$ pytest tests/test_matrix_encoder.py -q
.......
7 passed in 0.22s

$ python matrix_encoder.py encode --message "ALGEBRA" --block-size 3 --key-file keys/key3x3.json --mode real
[INFO] normalized_message=ALGEBRA__
[CHECKSUM] encoded_payload=1d4fa6c2

Numerical Stability Checks

  1. In real mode, assert decoded values are within tolerance before rounding.
  2. Add boundary tests where rounding could flip nearest symbol.

Pitfalls

Symptom Likely Cause Fix
Decode output has wrong letters Rounding drift in real mode Use tolerance checks and deterministic rounding policy
Cannot decode at all Singular key matrix Reject key early with determinant/rank diagnostics
Random extra characters Padding not removed/marked correctly Include explicit pad_count metadata
Different outputs across runs Non-deterministic serialization order Sort keys and use stable formatting
Garbled symbols Alphabet mismatch between encode/decode Store alphabet ID in payload and validate

Extensions

  1. Add modular arithmetic mode (mod 27 or mod 29) for integer-safe decode.
  2. Support multi-key rotation by message block index.
  3. Add frequency-analysis experiment to show weakness of linear ciphers.
  4. Add “explain mode” that prints each matrix-vector step for learning.
  5. Add key-generation helper that enforces invertibility constraints.

Real-World Connections

  1. Data transforms: many compression and signal pipelines rely on reversible linear transforms.
  2. Error-correcting codes: linear algebra over finite fields underpins coding theory.
  3. Computer graphics: matrix transformations are core to rendering pipelines.
  4. Cryptography history: Hill-cipher style ideas show why modern crypto needs stronger primitives.

Resources

  • Gilbert Strang, Linear Algebra and Its Applications.
  • Sheldon Axler, Linear Algebra Done Right.
  • David C. Lay et al., Linear Algebra and Its Applications.
  • Douglas Stinson, Cryptography: Theory and Practice (historical cipher context).

Self-Assessment

  1. Why is non-zero determinant necessary for reversible decoding?
  2. What concrete problem does block padding solve?
  3. How does real-mode rounding error appear in decoded text?
  4. Why is explicit alphabet ordering part of correctness, not just formatting?
  5. What metadata must be stored so decoding is unambiguous?
  6. Why is this educational and not secure modern encryption?
  7. If two different keys decode the same payload plausibly, what does that imply?
  8. How would modular arithmetic change your validation rules?

Completion Criteria

  • Encode/decode CLI supports deterministic round-trip for fixed key and input.
  • Key validation rejects singular or malformed matrices before processing.
  • Payload format is stable and includes enough metadata for safe decode.
  • Tests cover mapping, block handling, arithmetic correctness, and CLI integration.
  • Deterministic transcript run matches expected structure and outcomes.
  • You can explain invertibility, block encoding, and failure modes clearly.