Project 5: Random Number Generator Suite
Implement multiple RNGs and validate them with statistical tests.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Language | C |
| Prerequisites | Math toolkit, basic stats |
| Key Topics | LCGs, tests, distributions |
1. Learning Objectives
- Implement several RNGs (LCG, xorshift, MWC).
- Generate uniform and non-uniform distributions.
- Apply statistical tests (chi-square, serial tests).
- Compare empirical results to theory.
2. Theoretical Foundation
2.1 Core Concepts
- Period and recurrence quality.
- Uniformity and independence tests.
- Distribution transforms (e.g., inverse CDF).
2.2 Why This Matters
TAOCP emphasizes rigorous evaluation of randomness. Weak RNGs corrupt simulations and algorithms.
2.3 Historical Context / Background
Knuth formalized statistical testing methods widely used today.
2.4 Common Misconceptions
- “Looks random” is not evidence.
- One test is not sufficient.
3. Project Specification
3.1 What You Will Build
A CLI tool that can generate sequences with different RNGs and run statistical tests.
3.2 Functional Requirements
- Implement at least three RNG algorithms.
- Support fixed seeds and reproducibility.
- Run chi-square and serial correlation tests.
- Output pass/fail thresholds.
3.3 Non-Functional Requirements
- Correctness: Exact formula implementations.
- Reproducibility: Seeded runs.
- Usability: Clear output.
3.4 Example Usage / Output
$ ./rngtest --lcg --n 100000
chi-square: 0.98 (pass)
serial: 0.12 (pass)
3.5 Real World Outcome
You can evaluate RNGs and see which ones are suitable for simulations.
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ RNG engine │────▶│ test suite │────▶│ reports │
└──────────────┘ └──────────────┘ └──────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| RNG core | Generate values | pluggable interface |
| Tests | Validate stats | chi-square + serial |
| Output | Report results | numeric summaries |
4.3 Data Structures
typedef struct {
uint64_t state;
} LcgRng;
4.4 Algorithm Overview
Key Algorithm: LCG
- state = (a*state + c) mod m.
- return state.
Complexity Analysis:
- Time: O(1) per sample.
- Space: O(1).
5. Implementation Guide
5.1 Development Environment Setup
gcc --version
5.2 Project Structure
project-root/
├── src/rng.c
├── src/tests.c
├── include/
└── Makefile
5.3 The Core Question You’re Answering
“Is this sequence statistically random by rigorous tests?”
5.4 Concepts You Must Understand First
- Probability distributions
- Chi-square test basics
- Correlation and independence
5.5 Questions to Guide Your Design
- What sample size is sufficient for tests?
- How will you compare different RNGs fairly?
- How will you avoid modulo bias?
5.6 Thinking Exercise
Prove why a full-period LCG requires specific parameter choices.
5.7 The Interview Questions They’ll Ask
- What is the period of an LCG?
- Why are statistical tests needed?
- How do you generate non-uniform distributions?
5.8 Hints in Layers
Hint 1: Implement LCG first. Hint 2: Add chi-square test. Hint 3: Add serial correlation test.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| RNG theory | TAOCP Vol 2 | Ch. 3 |
| Statistics | Knuth references | RNG sections |
5.10 Implementation Phases
Phase 1: Foundation (1 week)
- Implement RNGs and seeding.
Phase 2: Core Functionality (1 week)
- Implement statistical tests.
Phase 3: Polish & Edge Cases (3-4 days)
- Add reports and comparisons.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| RNG set | LCG only vs multiple | multiple | comparison value |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| RNG correctness | Reproducibility | fixed seed outputs |
| Test accuracy | Known distributions | uniform buckets |
| Regression | Consistency | saved baselines |
6.2 Critical Test Cases
- Fixed seed yields same sequence.
- Chi-square within expected range.
- Serial correlation near zero.
6.3 Test Data
N=1e5 samples, 100 buckets
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Bad parameters | Short period | Use known good constants |
| Small samples | False pass/fail | Increase N |
| Modulo bias | Skewed buckets | Use rejection sampling |
7.2 Debugging Strategies
- Plot histograms to sanity check.
- Compare with reference RNG output.
7.3 Performance Traps
Large tests can be slow; provide fast mode.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add xorshift RNG.
8.2 Intermediate Extensions
- Add Gaussian generation (Box-Muller).
8.3 Advanced Extensions
- Implement test batteries (Dieharder-lite).
9. Real-World Connections
9.1 Industry Applications
- Simulations, randomized algorithms, cryptography.
9.2 Related Open Source Projects
- rngtest, dieharder.
9.3 Interview Relevance
- Demonstrates statistical reasoning in algorithms.
10. Resources
10.1 Essential Reading
- TAOCP Vol 2 Chapter 3.
10.2 Video Resources
- Search: “LCG and chi-square test”.
10.3 Tools & Documentation
- None beyond standard C tooling.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain RNG period and tests.
- I can justify parameter choices.
11.2 Implementation
- RNGs and tests run correctly.
- Results are reproducible.
11.3 Growth
- I can evaluate RNGs for applications.
12. Submission / Completion Criteria
Minimum: LCG + one statistical test. Full: Multiple RNGs + tests. Excellence: Distribution transforms and test battery.
This guide was generated from LEARN_TAOCP_DEEP_DIVE.md. For the complete learning path, see the parent directory README.