Project 5: Random Number Generator Suite

Implement multiple RNGs and validate them with statistical tests.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-3 weeks
Language C
Prerequisites Math toolkit, basic stats
Key Topics LCGs, tests, distributions

1. Learning Objectives

  1. Implement several RNGs (LCG, xorshift, MWC).
  2. Generate uniform and non-uniform distributions.
  3. Apply statistical tests (chi-square, serial tests).
  4. Compare empirical results to theory.

2. Theoretical Foundation

2.1 Core Concepts

  • Period and recurrence quality.
  • Uniformity and independence tests.
  • Distribution transforms (e.g., inverse CDF).

2.2 Why This Matters

TAOCP emphasizes rigorous evaluation of randomness. Weak RNGs corrupt simulations and algorithms.

2.3 Historical Context / Background

Knuth formalized statistical testing methods widely used today.

2.4 Common Misconceptions

  • “Looks random” is not evidence.
  • One test is not sufficient.

3. Project Specification

3.1 What You Will Build

A CLI tool that can generate sequences with different RNGs and run statistical tests.

3.2 Functional Requirements

  1. Implement at least three RNG algorithms.
  2. Support fixed seeds and reproducibility.
  3. Run chi-square and serial correlation tests.
  4. Output pass/fail thresholds.

3.3 Non-Functional Requirements

  • Correctness: Exact formula implementations.
  • Reproducibility: Seeded runs.
  • Usability: Clear output.

3.4 Example Usage / Output

$ ./rngtest --lcg --n 100000
chi-square: 0.98 (pass)
serial: 0.12 (pass)

3.5 Real World Outcome

You can evaluate RNGs and see which ones are suitable for simulations.


4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ RNG engine   │────▶│ test suite   │────▶│ reports      │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
RNG core Generate values pluggable interface
Tests Validate stats chi-square + serial
Output Report results numeric summaries

4.3 Data Structures

typedef struct {
    uint64_t state;
} LcgRng;

4.4 Algorithm Overview

Key Algorithm: LCG

  1. state = (a*state + c) mod m.
  2. return state.

Complexity Analysis:

  • Time: O(1) per sample.
  • Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── src/rng.c
├── src/tests.c
├── include/
└── Makefile

5.3 The Core Question You’re Answering

“Is this sequence statistically random by rigorous tests?”

5.4 Concepts You Must Understand First

  • Probability distributions
  • Chi-square test basics
  • Correlation and independence

5.5 Questions to Guide Your Design

  1. What sample size is sufficient for tests?
  2. How will you compare different RNGs fairly?
  3. How will you avoid modulo bias?

5.6 Thinking Exercise

Prove why a full-period LCG requires specific parameter choices.

5.7 The Interview Questions They’ll Ask

  1. What is the period of an LCG?
  2. Why are statistical tests needed?
  3. How do you generate non-uniform distributions?

5.8 Hints in Layers

Hint 1: Implement LCG first. Hint 2: Add chi-square test. Hint 3: Add serial correlation test.

5.9 Books That Will Help

Topic Book Chapter
RNG theory TAOCP Vol 2 Ch. 3
Statistics Knuth references RNG sections

5.10 Implementation Phases

Phase 1: Foundation (1 week)

  • Implement RNGs and seeding.

Phase 2: Core Functionality (1 week)

  • Implement statistical tests.

Phase 3: Polish & Edge Cases (3-4 days)

  • Add reports and comparisons.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
RNG set LCG only vs multiple multiple comparison value

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
RNG correctness Reproducibility fixed seed outputs
Test accuracy Known distributions uniform buckets
Regression Consistency saved baselines

6.2 Critical Test Cases

  1. Fixed seed yields same sequence.
  2. Chi-square within expected range.
  3. Serial correlation near zero.

6.3 Test Data

N=1e5 samples, 100 buckets

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Bad parameters Short period Use known good constants
Small samples False pass/fail Increase N
Modulo bias Skewed buckets Use rejection sampling

7.2 Debugging Strategies

  • Plot histograms to sanity check.
  • Compare with reference RNG output.

7.3 Performance Traps

Large tests can be slow; provide fast mode.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add xorshift RNG.

8.2 Intermediate Extensions

  • Add Gaussian generation (Box-Muller).

8.3 Advanced Extensions

  • Implement test batteries (Dieharder-lite).

9. Real-World Connections

9.1 Industry Applications

  • Simulations, randomized algorithms, cryptography.
  • rngtest, dieharder.

9.3 Interview Relevance

  • Demonstrates statistical reasoning in algorithms.

10. Resources

10.1 Essential Reading

  • TAOCP Vol 2 Chapter 3.

10.2 Video Resources

  • Search: “LCG and chi-square test”.

10.3 Tools & Documentation

  • None beyond standard C tooling.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain RNG period and tests.
  • I can justify parameter choices.

11.2 Implementation

  • RNGs and tests run correctly.
  • Results are reproducible.

11.3 Growth

  • I can evaluate RNGs for applications.

12. Submission / Completion Criteria

Minimum: LCG + one statistical test. Full: Multiple RNGs + tests. Excellence: Distribution transforms and test battery.


This guide was generated from LEARN_TAOCP_DEEP_DIVE.md. For the complete learning path, see the parent directory README.