Project 8: ETS Cache and Session Service
An ETS-backed cache with TTL eviction and stats.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2 |
| Time Estimate | 10-15 hours |
| Main Programming Language | Erlang or Elixir (Alternatives: Gleam) |
| Alternative Programming Languages | Gleam |
| Coolness Level | Level 3 |
| Business Potential | Level 2 |
| Prerequisites | ETS + State |
| Key Topics | ETS, caching, TTL eviction |
1. Learning Objectives
By completing this project, you will:
- Design a BEAM process model that isolates failures.
- Apply OTP supervision strategies to real services.
- Validate correctness with deterministic test scenarios.
2. All Theory Needed (Per-Concept Breakdown)
ETS + State
Fundamentals ETS provides in-memory tables for fast access to shared data. Tables are created and owned by a process and are destroyed when the owner exits. ETS supports different table types (set, ordered_set, bag, duplicate_bag) and is optimized for key-based access. It is not a full database; it is a high-performance in-memory store that must be used with care. Mnesia builds on these concepts to provide transactions and replication, but it adds complexity.
Deep Dive into the concept ETS is central to many BEAM systems because it provides fast shared access without requiring explicit locks. The owner process defines lifecycle, while other processes can read or write depending on access settings. This enables cache-like behavior and registry-like lookups.
The key design challenge is lifecycle management. If the owner process crashes, the table disappears. For ephemeral caches, this might be acceptable. For critical state, you must add persistence or replication. ETS itself does not provide durability.
Querying ETS requires discipline. While match and select operations exist, they can be expensive for large tables because they may require scanning. This means you should design keys and indexes for constant-time lookups whenever possible.
Mnesia adds distributed transactions and replication but comes with operational complexity. It is powerful but not always necessary. For learning projects, ETS plus a journaled log is often sufficient and teaches you the same trade-offs.
The best practice is to wrap ETS behind a GenServer or dedicated API so you control access and prevent uncontrolled writes. This preserves the benefits of isolation while still allowing fast lookups.
How this fit on projects This concept is essential for the project and appears repeatedly in BEAM systems.
Definitions & key terms
- Core vocabulary for this concept, defined in project context.
Mental model diagram
[Input] -> [Process] -> [State] -> [Output]
How it works (step-by-step, with invariants and failure modes)
- Identify inputs and their constraints.
- Apply the core rules of the concept.
- Validate outputs and error states.
- Invariant: the system preserves isolation and deterministic behavior.
- Failure modes: overload, incorrect state transitions, missing supervision.
Minimal concrete example
Small example flow using messages and state updates.
Common misconceptions
- Confusing representation with runtime behavior.
- Assuming failures are rare instead of expected.
Check-your-understanding questions
- Explain the concept in your own words.
- Predict the outcome of a simple failure scenario.
- Why is this concept crucial for reliability?
Check-your-understanding answers
- It defines how the system behaves under concurrency.
- The supervisor should recover failed components.
- Without it, failure handling becomes ad hoc and unreliable.
Real-world applications
- High-concurrency services
- Fault-tolerant backends
- Real-time pipelines
Where you’ll apply it
- In this project’s core runtime loop and error handling.
References
- Official OTP and BEAM documentation for this concept.
Key insights This concept is the lever that makes BEAM systems resilient and scalable.
Summary Mastering this concept makes the project predictable and robust.
Homework/Exercises to practice the concept
- Draw a failure path and its recovery.
- Design a message flow for a small subsystem.
Solutions to the homework/exercises
- The failure should trigger a supervisor restart.
- The flow should isolate state and avoid shared mutation.
3. Project Specification
3.1 What You Will Build
Build a focused service with clear inputs, outputs, and failure behavior. It should be observable and testable, and should demonstrate the core BEAM concept for the project.
3.2 Functional Requirements
- Validated Input: Reject malformed or out-of-range values.
- Deterministic Output: Same input always yields the same output.
- Fault Behavior: Defined recovery path when a worker crashes.
3.3 Non-Functional Requirements
- Performance: Must handle expected input rate without unbounded queues.
- Reliability: Must recover from injected failures.
- Usability: Outputs are explicit and reproducible.
3.4 Example Usage / Output
$ run-project --demo
[expected output]
3.5 Data Formats / Schemas / Protocols
- Inputs: CLI arguments or small config file
- Outputs: structured logs + CLI status
3.6 Edge Cases
- Empty input
- Over-limit bursts
- Process crashes mid-operation
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
- Build:
mix compileorrebar3 compile - Run:
./bin/project --demo
3.7.2 Golden Path Demo (Deterministic)
A known input produces a known, testable output.
3.7.3 If CLI: exact terminal transcript
$ ./project --demo
[result line 1]
[result line 2]
4. Solution Architecture
4.1 High-Level Design
[Client] -> [Router] -> [Worker] -> [State]
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Router | Dispatch requests | Choose routing strategy |
| Worker | Handle operations | Isolation and failure handling |
| Storage | Maintain state | ETS vs process state |
4.4 Data Structures (No Full Code)
- Process state maps
- ETS tables for shared data
- Message structs with tagged fields
4.4 Algorithm Overview
Key Algorithm: Core Service Loop
- Parse input into a message.
- Dispatch to target process.
- Update state and emit output.
Complexity Analysis:
- Time: O(n) in number of messages
- Space: O(k) in number of active keys/processes
5. Implementation Guide
5.1 Development Environment Setup
# Use standard OTP tooling (mix or rebar3)
5.2 Project Structure
project-root/
├── lib/
│ ├── router.ex
│ ├── worker.ex
│ └── storage.ex
├── test/
│ └── project_test.exs
└── README.md
5.3 The Core Question You’re Answering
“How do I make this service recover automatically without global failure?”
5.4 Concepts You Must Understand First
- Review the concepts above and ensure you can explain them clearly.
5.5 Questions to Guide Your Design
- How will you partition work across processes?
- Where is state stored and how is it protected?
- What happens when a worker crashes?
5.6 Thinking Exercise
Sketch the message flow and failure paths before coding.
5.7 The Interview Questions They’ll Ask
- “Why does this design scale better than shared-memory locks?”
- “How do you detect and recover from failures?”
- “How do you prevent mailbox buildup?”
5.8 Hints in Layers
Hint 1: Start with one worker process and a single message type.
Hint 2: Add supervision and confirm restart behavior.
Hint 3: Add routing and concurrency only after correctness.
Hint 4: Validate output with scripted test vectors.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Core BEAM | “Programming Erlang” | Processes chapter |
| OTP | “Designing for Scalability with Erlang/OTP” | Supervision chapter |
5.10 Implementation Phases
Phase 1: Foundation (2-4 hours)
- Build a single worker process
- Define message shapes
Phase 2: Core Functionality (4-8 hours)
- Add routing and state storage
- Validate correctness on test cases
Phase 3: Polish & Edge Cases (2-4 hours)
- Add failure injection tests
- Document recovery behavior
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| State location | process vs ETS | process | simpler, safe |
| Supervision | one_for_one vs one_for_all | one_for_one | isolate failures |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate core logic | message parsing |
| Integration Tests | End-to-end flow | demo scenario |
| Failure Tests | Crash recovery | kill worker |
6.2 Critical Test Cases
- Normal path: request -> response
- Crash path: worker exit -> restart
- Overload: burst input -> bounded queue
6.3 Test Data
inputs: demo messages
expected: stable outputs
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Missing supervisor | app dies on crash | add supervisor |
| Oversized mailbox | latency spikes | split processes |
| Unbounded state | memory growth | add eviction |
7.2 Debugging Strategies
- Use
observerto inspect process counts and queues - Add structured logs on message receive
7.3 Performance Traps
- Avoid heavy work in a single process
8. Extensions & Challenges
8.1 Beginner Extensions
- Add structured logging
- Add basic metrics
8.2 Intermediate Extensions
- Add distribution across two nodes
- Add persistence layer
8.3 Advanced Extensions
- Add rolling upgrades
- Add multi-region replication
9. Real-World Connections
9.1 Industry Applications
- Chat, presence, and realtime monitoring systems
9.2 Related Open Source Projects
- Phoenix Channels
- GenStage-based pipelines
9.3 Interview Relevance
- OTP supervision and process model questions
10. Resources
10.1 Essential Reading
- “Programming Erlang” by Joe Armstrong
- “Designing for Scalability with Erlang/OTP” by Cesarini/Thompson
10.2 Video Resources
- Conference talks on OTP supervision and BEAM concurrency