Project 5: Random Number Generator Suite

Implement multiple RNGs and validate them with statistical tests.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2-3 weeks
Language	C
Prerequisites	Math toolkit, basic stats
Key Topics	LCGs, tests, distributions

1. Learning Objectives

Implement several RNGs (LCG, xorshift, MWC).
Generate uniform and non-uniform distributions.
Apply statistical tests (chi-square, serial tests).
Compare empirical results to theory.

2. Theoretical Foundation

2.1 Core Concepts

Period and recurrence quality.
Uniformity and independence tests.
Distribution transforms (e.g., inverse CDF).

2.2 Why This Matters

TAOCP emphasizes rigorous evaluation of randomness. Weak RNGs corrupt simulations and algorithms.

2.3 Historical Context / Background

Knuth formalized statistical testing methods widely used today.

2.4 Common Misconceptions

“Looks random” is not evidence.
One test is not sufficient.

3. Project Specification

3.1 What You Will Build

A CLI tool that can generate sequences with different RNGs and run statistical tests.

3.2 Functional Requirements

Implement at least three RNG algorithms.
Support fixed seeds and reproducibility.
Run chi-square and serial correlation tests.
Output pass/fail thresholds.

3.3 Non-Functional Requirements

Correctness: Exact formula implementations.
Reproducibility: Seeded runs.
Usability: Clear output.

3.4 Example Usage / Output

$ ./rngtest --lcg --n 100000
chi-square: 0.98 (pass)
serial: 0.12 (pass)

3.5 Real World Outcome

You can evaluate RNGs and see which ones are suitable for simulations.

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ RNG engine   │────▶│ test suite   │────▶│ reports      │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component	Responsibility	Key Decisions
RNG core	Generate values	pluggable interface
Tests	Validate stats	chi-square + serial
Output	Report results	numeric summaries

4.3 Data Structures

typedef struct {
    uint64_t state;
} LcgRng;

4.4 Algorithm Overview

Key Algorithm: LCG

state = (a*state + c) mod m.
return state.

Complexity Analysis:

Time: O(1) per sample.
Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── src/rng.c
├── src/tests.c
├── include/
└── Makefile

5.3 The Core Question You’re Answering

“Is this sequence statistically random by rigorous tests?”

5.4 Concepts You Must Understand First

Probability distributions
Chi-square test basics
Correlation and independence

5.5 Questions to Guide Your Design

What sample size is sufficient for tests?
How will you compare different RNGs fairly?
How will you avoid modulo bias?

5.6 Thinking Exercise

Prove why a full-period LCG requires specific parameter choices.

5.7 The Interview Questions They’ll Ask

What is the period of an LCG?
Why are statistical tests needed?
How do you generate non-uniform distributions?

5.8 Hints in Layers

Hint 1: Implement LCG first. Hint 2: Add chi-square test. Hint 3: Add serial correlation test.

5.9 Books That Will Help

Topic	Book	Chapter
RNG theory	TAOCP Vol 2	Ch. 3
Statistics	Knuth references	RNG sections

5.10 Implementation Phases

Phase 1: Foundation (1 week)

Implement RNGs and seeding.

Phase 2: Core Functionality (1 week)

Implement statistical tests.

Phase 3: Polish & Edge Cases (3-4 days)

Add reports and comparisons.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
RNG set	LCG only vs multiple	multiple	comparison value

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
RNG correctness	Reproducibility	fixed seed outputs
Test accuracy	Known distributions	uniform buckets
Regression	Consistency	saved baselines

6.2 Critical Test Cases

Fixed seed yields same sequence.
Chi-square within expected range.
Serial correlation near zero.

6.3 Test Data

N=1e5 samples, 100 buckets

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Bad parameters	Short period	Use known good constants
Small samples	False pass/fail	Increase N
Modulo bias	Skewed buckets	Use rejection sampling

7.2 Debugging Strategies

Plot histograms to sanity check.
Compare with reference RNG output.

7.3 Performance Traps

Large tests can be slow; provide fast mode.

8. Extensions & Challenges

8.1 Beginner Extensions

Add xorshift RNG.

8.2 Intermediate Extensions

Add Gaussian generation (Box-Muller).

8.3 Advanced Extensions

Implement test batteries (Dieharder-lite).

9. Real-World Connections

9.1 Industry Applications

Simulations, randomized algorithms, cryptography.

rngtest, dieharder.

9.3 Interview Relevance

Demonstrates statistical reasoning in algorithms.

10. Resources

10.1 Essential Reading

TAOCP Vol 2 Chapter 3.

10.2 Video Resources

Search: “LCG and chi-square test”.

10.3 Tools & Documentation

None beyond standard C tooling.

Arbitrary Precision Arithmetic

11. Self-Assessment Checklist

11.1 Understanding

I can explain RNG period and tests.
I can justify parameter choices.

11.2 Implementation

RNGs and tests run correctly.
Results are reproducible.

11.3 Growth

I can evaluate RNGs for applications.

12. Submission / Completion Criteria

Minimum: LCG + one statistical test. Full: Multiple RNGs + tests. Excellence: Distribution transforms and test battery.

This guide was generated from LEARN_TAOCP_DEEP_DIVE.md. For the complete learning path, see the parent directory README.