Learn Phoenix Framework: From BEAM Fundamentals to Real-Time Mastery
Goal: Deeply understand Phoenix—not just how to use it, but how it works internally, why it’s so fast, what makes LiveView magical, and how it compares to other web frameworks. You’ll build everything from a raw TCP server to a full real-time application, understanding each layer.
Why Phoenix Matters
Phoenix isn’t just another web framework. It’s built on Elixir, which runs on the BEAM (Erlang Virtual Machine)—the same technology that powers WhatsApp (handling 2 million connections per server), Discord (handling 5 million concurrent users), and telecom systems with 99.9999999% uptime.
After completing these projects, you will:
- Understand the BEAM’s actor model and why it enables massive concurrency
- Know how Phoenix handles millions of WebSocket connections
- Understand how LiveView creates reactive UIs without JavaScript
- See how Plug middleware composes the request pipeline
- Master Ecto’s functional approach to database interactions
- Compare Phoenix to Rails, Django, Express, and Go frameworks
- Build real-time applications that scale horizontally
Core Concept Analysis
The Technology Stack
┌─────────────────────────────────────────────────────────────────────────┐
│ YOUR APPLICATION │
├─────────────────────────────────────────────────────────────────────────┤
│ PHOENIX FRAMEWORK │
│ (Router, Controllers, Views, Channels, LiveView) │
├─────────────────────────────────────────────────────────────────────────┤
│ PLUG │
│ (Composable middleware, Conn struct) │
├─────────────────────────────────────────────────────────────────────────┤
│ ECTO │
│ (Database toolkit: Repos, Schemas, Changesets, Query) │
├─────────────────────────────────────────────────────────────────────────┤
│ COWBOY HTTP SERVER │
│ (Erlang HTTP/1.1, HTTP/2, WebSocket server) │
├─────────────────────────────────────────────────────────────────────────┤
│ ELIXIR LANGUAGE │
│ (Functional, immutable, pattern matching, metaprogramming) │
├─────────────────────────────────────────────────────────────────────────┤
│ OTP (Open Telecom Platform) │
│ (GenServer, Supervisor, Application - behaviors for concurrency) │
├─────────────────────────────────────────────────────────────────────────┤
│ BEAM Virtual Machine │
│ (Lightweight processes, preemptive scheduling, fault tolerance) │
└─────────────────────────────────────────────────────────────────────────┘
The BEAM’s Secret Sauce: Lightweight Processes
Traditional Web Server (Node.js, Python, Ruby)
──────────────────────────────────────────────
┌───────────────────────────────────────────────────────────┐
│ Single OS Thread/Process │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Request 1 ───► Request 2 ───► Request 3 ───► ... │ │
│ │ (blocking or callback-based async) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ Problem: One slow request can affect others │
│ Problem: Crash in one request can crash everything │
└───────────────────────────────────────────────────────────┘
BEAM Virtual Machine (Elixir/Phoenix)
─────────────────────────────────────
┌─────────────────────────────────────────────────────────────────────┐
│ BEAM VM (per CPU core) │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Scheduler 1 Scheduler 2 Scheduler 3 ... │ │
│ │ │ │ │ │ │
│ │ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐ │ │
│ │ │ P P │ │ P P │ │ P P │ │ │
│ │ │ P P │ │ P P │ │ P P │ │ │
│ │ │ P P │ │ P P │ │ P P │ │ │
│ │ └───────┘ └───────┘ └───────┘ │ │
│ │ (Each P is a lightweight process, ~2KB memory) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ✓ Each request gets its own isolated process │
│ ✓ Processes are preemptively scheduled (no blocking) │
│ ✓ Crash in one process doesn't affect others │
│ ✓ Millions of processes can run concurrently │
└─────────────────────────────────────────────────────────────────────┘
Phoenix Request Flow
HTTP Request
│
▼
┌─────────────┐
│ Endpoint │ ← First stop: static files, logging, session, CSRF
│ (Plug) │
└─────┬───────┘
│
▼
┌─────────────┐
│ Router │ ← Matches URL pattern, selects pipeline and controller
│ (Plug) │
└─────┬───────┘
│
▼
┌─────────────┐
│ Pipeline │ ← :browser adds session, flash, etc.
│ (Plugs) │ :api adds JSON parsing
└─────┬───────┘
│
▼
┌─────────────┐
│ Controller │ ← Handles business logic, calls context functions
│ (Plug) │
└─────┬───────┘
│
▼
┌─────────────┐
│ View │ ← Prepares data for rendering
│ │
└─────┬───────┘
│
▼
┌─────────────┐
│ Template │ ← Compiled EEx, generates HTML
│ (HEEx) │
└─────┬───────┘
│
▼
HTTP Response
LiveView: How It Works
Initial Page Load (HTTP)
────────────────────────
Browser ──HTTP GET──► Phoenix ──renders──► Full HTML Page
│
▼
Browser displays page
│
▼
JavaScript loads
│
▼
WebSocket connects
│
▼
┌────────────────┴────────────────┐
│ │
Phoenix spawns Browser ready
LiveView process for updates
(stateful, long-lived)
User Interaction (WebSocket)
────────────────────────────
Browser LiveView Process
│ │
│──── phx-click="increment" ────────────────────────►│
│ │
│ handle_event("increment")
│ │
│ Updates socket.assigns
│ │
│ Re-renders template
│ │
│ Computes DIFF (only changes)
│ │
│◄───────── {diff: [{0: "6"}]} ─────────────────────│
│ │
│ JavaScript patches DOM │
│ with minimal changes │
▼ │
Key Insight: Only the changed parts are sent!
- Template has static and dynamic parts
- Phoenix tracks which assigns changed
- Only sends new values for changed dynamics
How Phoenix Compares to Other Frameworks
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONCURRENCY MODEL COMPARISON │
├─────────────────┬───────────────────────────────────────────────────────────┤
│ Framework │ How it handles 10,000 concurrent connections │
├─────────────────┼───────────────────────────────────────────────────────────┤
│ │ │
│ Ruby on Rails │ Thread pool (typically 5-25 threads) │
│ │ Each request blocks a thread │
│ │ Need load balancer + many processes │
│ │ │
│ Django │ Similar to Rails (WSGI is synchronous) │
│ │ ASGI + async views help but add complexity │
│ │ │
│ Node.js/Express │ Single-threaded event loop │
│ │ Non-blocking I/O, callbacks/promises │
│ │ CPU-bound work blocks everything │
│ │ Need worker threads or cluster mode │
│ │ │
│ Go (net/http) │ Goroutines (lightweight, like BEAM processes) │
│ │ Excellent concurrency, but manual error handling │
│ │ No supervision trees │
│ │ │
│ Phoenix/Elixir │ 10,000 BEAM processes (one per connection) │
│ │ Preemptive scheduling, no blocking │
│ │ Supervision trees auto-restart failed processes │
│ │ Built for "let it crash" philosophy │
│ │ │
└─────────────────┴───────────────────────────────────────────────────────────┘
Feature Comparison
| Feature | Phoenix | Rails | Django | Express | Go (std lib) |
|---|---|---|---|---|---|
| Real-time (WebSockets) | Built-in Channels + LiveView | ActionCable (bolt-on) | Channels (bolt-on) | Socket.io (3rd party) | gorilla/websocket |
| Concurrency Model | Actor model (millions of processes) | Thread pool | Thread pool (async optional) | Event loop | Goroutines |
| Fault Tolerance | Supervisor trees (auto-restart) | External (systemd) | External | External | Manual |
| Hot Code Reload | Yes (BEAM feature) | No | No | No | No |
| Database | Ecto (functional) | ActiveRecord (ORM) | Django ORM | Choose your own | database/sql |
| Learning Curve | Steep (new paradigm) | Low | Low | Low | Medium |
| Community Size | Small but growing | Very large | Very large | Massive | Large |
| Performance | Excellent | Good | Good | Excellent | Excellent |
| Productivity | High (once learned) | Very High | Very High | Medium | Medium |
When to Choose Phoenix
Phoenix excels at:
- Real-time features (chat, live updates, notifications)
- High concurrency (many simultaneous connections)
- Fault-tolerant systems (financial, telecom)
- Long-running connections (IoT, streaming)
- Applications that need to scale horizontally
Consider alternatives when:
- Team has no functional programming experience and can’t invest in learning
- Hiring is a concern (smaller developer pool)
- You need extensive third-party library ecosystem
- Simple CRUD apps where Rails/Django productivity wins
Project List
Projects are ordered from foundational understanding to advanced implementations.
Project 1: Build a TCP Echo Server (Understanding BEAM Processes)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Concurrency / Networking
- Software or Tool: Elixir, :gen_tcp
- Main Book: “Elixir in Action” by Saša Jurić
What you’ll build: A TCP server that accepts multiple simultaneous connections, echoing back whatever clients send. Each connection is handled by its own lightweight BEAM process.
Why it teaches Phoenix: Before Phoenix, there’s Elixir. Before Elixir, there’s the BEAM. This project shows you the foundation—how lightweight processes work, how they communicate via messages, and why this model enables Phoenix’s performance.
Core challenges you’ll face:
- Spawning processes for each connection → maps to the actor model
- Message passing between processes → maps to how Phoenix Channels work
- Handling process crashes → maps to fault tolerance
- Using :gen_tcp → maps to understanding Cowboy’s foundation
Key Concepts:
- BEAM Processes: “Elixir in Action” Chapter 5 - Saša Jurić
- Message Passing: “Elixir in Action” Chapter 5
- :gen_tcp module: Erlang/Elixir documentation
- Process Linking: “Elixir in Action” Chapter 8
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic Elixir syntax (pattern matching, functions, modules). Install Elixir.
Real world outcome:
# Terminal 1: Start your server
$ iex -S mix
iex> EchoServer.start(4000)
Listening on port 4000...
# Terminal 2: Connect with telnet
$ telnet localhost 4000
Connected to localhost.
Hello, BEAM!
Hello, BEAM! # Server echoes back
# Terminal 3: Another simultaneous connection
$ telnet localhost 4000
Connected to localhost.
Second connection!
Second connection!
# Both connections work independently, each in its own process!
Implementation Hints:
The core pattern for accepting connections:
1. Open a listening socket with :gen_tcp.listen/2
2. Accept a connection with :gen_tcp.accept/1
3. Spawn a new process to handle this connection
4. Go back to step 2 (accept loop)
For each client process:
1. Receive data with :gen_tcp.recv/2
2. Send it back with :gen_tcp.send/2
3. Loop until connection closes
Key questions to answer:
- What happens when you
spawna function? - How does
receiveblock a process without blocking others? - What happens if a client process crashes? Does the server crash?
- How many connections can you handle simultaneously?
Resources for key challenges:
Learning milestones:
- Single connection works → You understand :gen_tcp basics
- Multiple connections work simultaneously → You understand process spawning
- Server survives client crashes → You understand process isolation
- You can track active connections → You understand process communication
Project 2: Add Supervision Trees (Fault Tolerance)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: OTP / Fault Tolerance
- Software or Tool: Elixir, OTP Supervisor
- Main Book: “Elixir in Action” by Saša Jurić
What you’ll build: Wrap your TCP server in a supervision tree that automatically restarts failed components. Crash the acceptor? It restarts. Crash a connection handler? Only that connection dies.
Why it teaches Phoenix: Phoenix applications are OTP applications with supervision trees. Understanding Supervisors is essential for understanding how Phoenix stays resilient under load.
Core challenges you’ll face:
- Designing a supervision tree → maps to Phoenix’s application structure
- Choosing restart strategies → maps to one_for_one vs one_for_all
- GenServer behavior → maps to how Channels and PubSub work
- Application behavior → maps to how Phoenix apps start
Key Concepts:
- Supervisors: “Elixir in Action” Chapter 8 - Saša Jurić
- GenServer: “Elixir in Action” Chapter 6
- Application Behavior: “Elixir in Action” Chapter 9
- Restart Strategies: OTP documentation
Difficulty: Intermediate Time estimate: Weekend to 1 week Prerequisites: Project 1 (TCP server). Understanding of basic OTP concepts.
Real world outcome:
$ iex -S mix
iex> EchoServer.Application.start(:normal, [])
{:ok, #PID<0.150.0>}
# View the supervision tree
iex> :observer.start()
# Opens GUI showing:
# EchoServer.Application
# └── EchoServer.Supervisor
# ├── EchoServer.Acceptor (GenServer)
# └── EchoServer.ConnectionSupervisor (DynamicSupervisor)
# ├── Connection #PID<0.200.0>
# ├── Connection #PID<0.201.0>
# └── Connection #PID<0.202.0>
# Kill the acceptor - it restarts automatically!
iex> Process.exit(pid, :kill)
# Logs: Acceptor crashed, restarting...
# Server continues working!
Implementation Hints:
Supervision tree structure:
Application
│
└── Supervisor (one_for_one)
│
├── Acceptor (GenServer)
│ └── Accepts connections, spawns handlers
│
└── ConnectionSupervisor (DynamicSupervisor)
└── Dynamically supervises connection handlers
GenServer callback skeleton:
defmodule EchoServer.Acceptor do
use GenServer
def start_link(opts) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
end
@impl true
def init(port) do
# Open listening socket
{:ok, listen_socket} = :gen_tcp.listen(port, [...])
# Start accepting (use send_after to avoid blocking init)
send(self(), :accept)
{:ok, %{socket: listen_socket}}
end
@impl true
def handle_info(:accept, state) do
# Accept connection, spawn handler, loop
{:noreply, state}
end
end
Learning milestones:
- Supervisor starts child processes → You understand supervision basics
- Crashed process restarts automatically → You understand restart strategies
- Dynamic supervisor manages connections → You understand DynamicSupervisor
- :observer shows your tree → You can visualize OTP applications
Project 3: Build a Minimal Plug Application
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Web / HTTP
- Software or Tool: Plug, Cowboy
- Main Book: “Programming Phoenix 1.4” by Chris McCord
What you’ll build: A web application using only Plug and Cowboy—no Phoenix. You’ll see exactly what Phoenix does for you by doing it yourself: routing, parsing, rendering, and middleware.
Why it teaches Phoenix: Phoenix is built on Plug. Every Phoenix endpoint, router, and controller is a Plug. Understanding Plug means understanding Phoenix’s core abstraction.
Core challenges you’ll face:
- The Plug specification → maps to function and module plugs
- The Conn struct → maps to request/response data structure
- Plug pipelines → maps to Phoenix pipelines
- Plug.Router → maps to Phoenix.Router
Key Concepts:
- Plug Specification: Plug Documentation
- Conn Struct: Understanding request/response state
- Cowboy: Erlang HTTP server
- Pipelines: Composing transformations
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic Elixir, understanding of HTTP. Projects 1-2 helpful but not required.
Real world outcome:
$ iex -S mix
iex> MiniWeb.Application.start(:normal, [])
Server running at http://localhost:4000
$ curl http://localhost:4000/
Welcome to MiniWeb!
$ curl http://localhost:4000/hello/world
Hello, world!
$ curl http://localhost:4000/users -d '{"name": "Alice"}'
Created user: Alice
$ curl http://localhost:4000/unknown
404 Not Found
Implementation Hints:
Minimal Plug module:
defmodule MiniWeb.Router do
use Plug.Router
plug :match
plug :dispatch
get "/" do
send_resp(conn, 200, "Welcome to MiniWeb!")
end
get "/hello/:name" do
send_resp(conn, 200, "Hello, #{name}!")
end
match _ do
send_resp(conn, 404, "Not Found")
end
end
The Conn struct (simplified):
%Plug.Conn{
host: "localhost",
port: 4000,
method: "GET",
path_info: ["hello", "world"],
params: %{"name" => "world"},
req_headers: [...],
resp_headers: [...],
status: nil, # Set by send_resp
resp_body: nil, # Set by send_resp
assigns: %{}, # Your custom data
...
}
Key insight: A Plug is a function that takes a conn, transforms it, and returns a conn. That’s it!
Resources for key challenges:
Learning milestones:
- Cowboy serves your Plug → You understand the HTTP server layer
- Routes match correctly → You understand Plug.Router
- Custom plugs transform requests → You understand middleware
- You parse JSON bodies → You understand plug pipelines
Project 4: Your First Phoenix Application
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Web Framework
- Software or Tool: Phoenix Framework
- Main Book: “Programming Phoenix 1.4” by Chris McCord
What you’ll build: A standard Phoenix CRUD application—but with deep understanding of every generated file and concept. You’ll trace a request through the entire stack.
Why it teaches Phoenix: Now that you understand BEAM processes, OTP, and Plug, you can appreciate what Phoenix provides. This project connects the dots between the foundations and the framework.
Core challenges you’ll face:
- Phoenix project structure → maps to where things live and why
- Contexts (Phoenix 1.3+) → maps to domain-driven design
- Ecto basics → maps to database interactions
- Templates (HEEx) → maps to HTML generation
Key Concepts:
- Phoenix Architecture: Phoenix Overview
- Contexts: “Programming Phoenix 1.4” Chapter 2
- Router & Pipelines: “Programming Phoenix 1.4” Chapter 2
- Controllers & Views: “Programming Phoenix 1.4” Chapter 3
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Projects 1-3 (foundations), basic SQL knowledge.
Real world outcome:
$ mix phx.new blog
$ cd blog
$ mix ecto.create
$ mix phx.gen.html Content Post posts title:string body:text
$ mix ecto.migrate
$ mix phx.server
# Browser: http://localhost:4000/posts
# Full CRUD interface for blog posts!
# You understand:
# - Why files are organized this way
# - How the router dispatches to controllers
# - How Ecto persists data
# - How templates render HTML
# - How the whole request flows through the stack
Implementation Hints:
Phoenix project structure:
blog/
├── lib/
│ ├── blog/ # Business logic (contexts)
│ │ ├── content.ex # Content context
│ │ └── content/
│ │ └── post.ex # Post schema
│ ├── blog_web/ # Web interface
│ │ ├── controllers/
│ │ ├── components/ # (Phoenix 1.7+) or templates/
│ │ ├── router.ex
│ │ └── endpoint.ex
│ ├── blog.ex # Application module
│ └── blog_web.ex # Web module macros
├── config/ # Configuration
├── priv/
│ └── repo/
│ └── migrations/ # Database migrations
└── test/ # Tests
The request flow for GET /posts:
1. Endpoint receives HTTP request
2. Router matches "/posts" to PostController.index
3. Pipeline `:browser` applies session, flash, CSRF plugs
4. PostController.index calls Content.list_posts()
5. Context queries database via Ecto
6. Controller renders "index.html" with posts
7. Template generates HTML
8. Response sent to browser
Learning milestones:
- Generated app runs → You understand project structure
- You can trace a request through the stack → You understand the flow
- You modify a context function → You understand the boundary
- You add a new route and controller → You understand the patterns
Project 5: Deep Dive into Ecto
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Database / Functional
- Software or Tool: Ecto, PostgreSQL
- Main Book: “Programming Ecto” by Darin Wilson
What you’ll build: A data-intensive application that uses Ecto’s advanced features: complex queries, associations, transactions, custom types, and understanding why Ecto is NOT an ORM.
Why it teaches Phoenix: Ecto is Elixir’s database toolkit. Unlike ActiveRecord or Django ORM, it’s explicitly functional—no hidden state, explicit changesets, composable queries. Understanding Ecto’s philosophy is essential for Phoenix development.
Core challenges you’ll face:
- Changesets → maps to validating and casting data
- Composable queries → maps to building queries piece by piece
- Associations → maps to has_many, belongs_to, many_to_many
- Transactions → maps to multi-step database operations
Key Concepts:
- Repos: “Programming Ecto” Chapter 2 - All DB operations go through Repo
- Schemas: “Programming Ecto” Chapter 3 - Mapping DB to Elixir
- Changesets: “Programming Ecto” Chapter 4 - Validating changes
- Query: “Programming Ecto” Chapter 5 - Composable queries
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 4 (basic Phoenix), SQL knowledge.
Real world outcome:
# Complex composable query
query = from p in Post,
join: u in assoc(p, :user),
where: p.published == true,
where: p.inserted_at > ^one_week_ago,
order_by: [desc: p.inserted_at],
preload: [:user, :comments],
select: %{title: p.title, author: u.name, comment_count: count(p.comments)}
posts = Repo.all(query)
# Changeset with validations
def changeset(user, attrs) do
user
|> cast(attrs, [:email, :password, :name])
|> validate_required([:email, :password])
|> validate_format(:email, ~r/@/)
|> validate_length(:password, min: 8)
|> unique_constraint(:email)
|> put_password_hash()
end
# Transaction for complex operations
Repo.transaction(fn ->
with {:ok, user} <- Accounts.create_user(attrs),
{:ok, profile} <- Profiles.create_profile(user, profile_attrs),
:ok <- Mailer.send_welcome_email(user) do
{:ok, user}
else
{:error, reason} -> Repo.rollback(reason)
end
end)
Implementation Hints:
Ecto is NOT an ORM - key differences:
ActiveRecord (ORM) Ecto (Functional Toolkit)
────────────────── ──────────────────────────
user.save Repo.insert(changeset)
↓ ↓
Object tracks its own Changeset is passed
dirty state explicitly to Repo
user.posts.build(...) build_assoc(user, :posts, ...)
↓ ↓
Implicit association Explicit function call
magic
User.find(1) Repo.get(User, 1)
↓ ↓
Model has class methods Repo handles all queries
user.posts Repo.preload(user, :posts)
↓ ↓
Lazy loading (N+1 trap) Explicit preloading
Resources for key challenges:
Learning milestones:
- You write composable queries → You understand Ecto.Query
- Changeset validates and transforms → You understand the pattern
- Associations load explicitly → You understand preloading
- Transaction handles failures → You understand Multi and rollbacks
Project 6: Phoenix Channels (Real-Time Communication)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir + JavaScript
- Alternative Programming Languages: N/A
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Real-Time / WebSockets
- Software or Tool: Phoenix Channels, JavaScript client
- Main Book: “Programming Phoenix 1.4” by Chris McCord
What you’ll build: A real-time chat application where messages appear instantly for all users without page refresh. You’ll understand Phoenix’s pub-sub system and how millions of connections are handled.
Why it teaches Phoenix: Channels showcase Phoenix’s killer feature: real-time at scale. Each WebSocket connection is a lightweight BEAM process. Phoenix PubSub distributes messages. This is why Discord uses Elixir.
Core challenges you’ll face:
- Socket lifecycle → maps to connect, join, handle_in, terminate
- Topics and rooms → maps to pub-sub patterns
- Presence → maps to tracking who’s online
- JavaScript client → maps to browser-side integration
Key Concepts:
- Channels: “Programming Phoenix 1.4” Chapter 11
- PubSub: Phoenix.PubSub documentation
- Presence: Phoenix.Presence documentation
- JavaScript Client: phoenix.js documentation
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 4 (Phoenix basics), JavaScript knowledge.
Real world outcome:
┌─────────────────────────────────────────────────────────────────┐
│ Phoenix Chat - Room: #general │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Alice: Hey everyone! 10:30 AM │
│ Bob: Hi Alice! How's it going? 10:31 AM │
│ Charlie: Great to see you both! 10:31 AM │
│ │
│ ───────────────────────────────────────────────────────────── │
│ Online: Alice, Bob, Charlie (3 users) │
│ ───────────────────────────────────────────────────────────── │
│ │
│ [Type a message...] [Send] │
└─────────────────────────────────────────────────────────────────┘
# Messages appear INSTANTLY for all connected users
# User list updates in real-time as people join/leave
# Each user is a separate BEAM process on the server
Implementation Hints:
Channel lifecycle:
Browser Phoenix
│ │
│── WebSocket connect ─────────────────────────►│
│ UserSocket.connect/3
│◄───────────────────── :ok ────────────────────│
│ │
│── channel.join("room:lobby") ────────────────►│
│ RoomChannel.join/3
│◄───────────────────── :ok ────────────────────│
│ │
│── channel.push("new_msg", {body: "Hi"}) ─────►│
│ RoomChannel.handle_in/3
│ broadcast!(socket, "new_msg", ...)
│◄─────────── broadcast to all in room ─────────│
│ │
Channel module structure:
defmodule MyAppWeb.RoomChannel do
use MyAppWeb, :channel
def join("room:" <> room_id, _params, socket) do
# Called when client joins this topic
{:ok, assign(socket, :room_id, room_id)}
end
def handle_in("new_msg", %{"body" => body}, socket) do
# Handle incoming message from this client
broadcast!(socket, "new_msg", %{body: body, user: socket.assigns.user})
{:noreply, socket}
end
end
Resources for key challenges:
Learning milestones:
- WebSocket connects → You understand Socket
- Messages broadcast to all → You understand pub-sub
- Presence tracks online users → You understand Presence
- Multiple rooms work → You understand topics
Project 7: Phoenix LiveView (Interactive UI without JavaScript)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir (with minimal JavaScript)
- Alternative Programming Languages: N/A
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Real-Time / UI
- Software or Tool: Phoenix LiveView
- Main Book: “Programming Phoenix LiveView” by Bruce Tate
What you’ll build: A fully interactive single-page application with search-as-you-type, form validation, sorting, pagination, and real-time updates—all without writing JavaScript. LiveView sends minimal DOM diffs over WebSocket.
Why it teaches Phoenix: LiveView is Phoenix’s most innovative feature. It combines the productivity of server-rendered apps with the interactivity of SPAs. Understanding how it tracks state and computes diffs teaches you the framework’s core philosophy.
Core challenges you’ll face:
- LiveView lifecycle → maps to mount, handle_event, render
- Socket assigns → maps to state management
- DOM patching → maps to how minimal updates work
- Live navigation → maps to SPA-like routing
Key Concepts:
- LiveView Lifecycle: Phoenix LiveView Introduction
- Assigns & Diff Tracking: How Phoenix LiveView Works
- Events & Bindings: LiveView documentation
- Live Components: Reusable LiveView pieces
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 4-6 (Phoenix and Channels). Understanding of HTML/CSS.
Real world outcome:
┌─────────────────────────────────────────────────────────────────┐
│ Product Catalog [🔍 Search...] │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Sort by: [Name ▼] [Price ▼] Filter: [All Categories] │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ 📱 │ │ 💻 │ │ 🎧 │ │ ⌚ │ │
│ │ iPhone │ │ MacBook │ │ AirPods │ │ Watch │ │
│ │ $999 │ │ $1,299 │ │ $249 │ │ $399 │ │
│ │ [Add 🛒] │ │ [Add 🛒] │ │ [Add 🛒] │ │ [Add 🛒] │ │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ │
│ Showing 1-4 of 24 products [◀ Prev] [1] [2] [3] [Next ▶] │
│ │
│ Cart: 3 items ($1,647) [Checkout] │
└─────────────────────────────────────────────────────────────────┘
# As you type in search: results filter instantly
# Click column header: sorts without page reload
# Add to cart: cart updates, counter changes
# ALL without writing JavaScript—just Elixir!
Implementation Hints:
LiveView module structure:
defmodule MyAppWeb.ProductLive.Index do
use MyAppWeb, :live_view
def mount(_params, _session, socket) do
# Called on initial load AND WebSocket connect
products = Products.list_products()
{:ok, assign(socket, products: products, cart: [])}
end
def handle_event("search", %{"query" => query}, socket) do
# Called when user types in search box
products = Products.search(query)
{:noreply, assign(socket, products: products)}
end
def handle_event("add_to_cart", %{"id" => id}, socket) do
product = Products.get_product!(id)
cart = [product | socket.assigns.cart]
{:noreply, assign(socket, cart: cart)}
end
def render(assigns) do
~H"""
<input type="text" phx-keyup="search" placeholder="Search..." />
<div class="products">
<%= for product <- @products do %>
<div class="product">
<h3><%= product.name %></h3>
<button phx-click="add_to_cart" phx-value-id={product.id}>
Add to Cart
</button>
</div>
<% end %>
</div>
<div class="cart">Items: <%= length(@cart) %></div>
"""
end
end
How diff tracking works:
Template divides into static and dynamic parts:
~H"""
<div class="product"> ← static (sent once)
<h3><%= @product.name %></h3> ← dynamic (tracked)
<p>$<%= @product.price %></p> ← dynamic (tracked)
</div> ← static (sent once)
"""
When @product.price changes from 99 to 89:
- Server ONLY sends: {position_2: "89"}
- Client patches just that text node
- No full HTML re-render!
Resources for key challenges:
Learning milestones:
- Mount and render work → You understand the lifecycle
- Events update state → You understand handle_event
- You see minimal diff in DevTools → You understand optimization
- Live navigation works → You understand SPA-like behavior
Project 8: Authentication from Scratch
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Security / Web
- Software or Tool: Phoenix, Argon2, JWT (optional)
- Main Book: “Programming Phoenix 1.4” by Chris McCord
What you’ll build: A complete authentication system with registration, login, logout, password hashing, session management, and protected routes. You’ll understand both how phx.gen.auth works and build the key pieces manually.
Why it teaches Phoenix: Authentication touches many Phoenix concepts: plugs (for auth checks), contexts (for user management), Ecto (for password hashing), sessions, and LiveView integration. It’s a great integration project.
Core challenges you’ll face:
- Password hashing → maps to bcrypt/argon2, never store plaintext
- Session management → maps to cookies, tokens, security
- Auth plugs → maps to protecting routes
- LiveView auth → maps to socket assigns, on_mount
Key Concepts:
- Password Hashing: Comeonin/Argon2 libraries
- Plug.Session: Cookie-based sessions
- CSRF Protection: Phoenix built-in
- phx.gen.auth: Phoenix auth generator
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Projects 4-5 (Phoenix, Ecto), understanding of web security basics.
Real world outcome:
# Registration
POST /users/register
{email: "alice@example.com", password: "secret123"}
→ Creates user with hashed password
→ Sets session token
→ Redirects to dashboard
# Login
POST /users/login
{email: "alice@example.com", password: "secret123"}
→ Verifies password hash
→ Sets session token
→ Redirects to dashboard
# Protected route
GET /dashboard
→ Auth plug checks session
→ If valid: shows dashboard with current_user
→ If invalid: redirects to login
# LiveView protected page
→ on_mount hook assigns current_user to socket
→ If no user: redirect to login
Implementation Hints:
Auth plug for protecting routes:
defmodule MyAppWeb.Plugs.RequireAuth do
import Plug.Conn
import Phoenix.Controller
def init(opts), do: opts
def call(conn, _opts) do
if conn.assigns[:current_user] do
conn
else
conn
|> put_flash(:error, "You must log in to access this page")
|> redirect(to: "/login")
|> halt()
end
end
end
# In router:
pipeline :protected do
plug :fetch_current_user
plug MyAppWeb.Plugs.RequireAuth
end
Password hashing with Argon2:
# In changeset
def registration_changeset(user, attrs) do
user
|> cast(attrs, [:email, :password])
|> validate_required([:email, :password])
|> validate_length(:password, min: 8)
|> hash_password()
end
defp hash_password(changeset) do
case get_change(changeset, :password) do
nil -> changeset
password ->
put_change(changeset, :password_hash, Argon2.hash_pwd_salt(password))
end
end
Learning milestones:
- Passwords are hashed correctly → You understand security basics
- Sessions persist across requests → You understand session management
- Protected routes redirect → You understand auth plugs
- LiveView pages are protected → You understand on_mount hooks
Project 9: Build a REST API with JSON:API or GraphQL
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: API Design
- Software or Tool: Phoenix, Absinthe (for GraphQL), or JSON:API
- Main Book: “Craft GraphQL APIs in Elixir with Absinthe” by Bruce Williams
What you’ll build: A well-designed API (REST or GraphQL) with authentication, pagination, filtering, proper error handling, and documentation. You’ll understand Phoenix’s API capabilities.
Why it teaches Phoenix: Phoenix isn’t just for HTML. Its lightweight nature and Elixir’s pattern matching make it excellent for APIs. Understanding the :api pipeline versus :browser teaches important architectural decisions.
Core challenges you’ll face:
- API pipeline → maps to JSON rendering, no session
- Error handling → maps to fallback controllers
- GraphQL types → maps to Absinthe schema
- Pagination → maps to cursor vs offset pagination
Key Concepts:
- Phoenix JSON API: Phoenix documentation
- Absinthe GraphQL: “Craft GraphQL APIs in Elixir with Absinthe”
- API Authentication: Token-based auth
- Error Handling: FallbackController pattern
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Projects 4-5 (Phoenix, Ecto), understanding of REST/GraphQL concepts.
Real world outcome:
# REST API
$ curl -H "Authorization: Bearer <token>" \
http://localhost:4000/api/posts
{
"data": [
{"id": 1, "title": "Hello Phoenix", "author": "Alice"},
{"id": 2, "title": "Elixir Rocks", "author": "Bob"}
],
"meta": {"total": 42, "page": 1, "per_page": 10}
}
# GraphQL API
$ curl -X POST http://localhost:4000/graphql \
-H "Content-Type: application/json" \
-d '{"query": "{ posts { title author { name } } }"}'
{
"data": {
"posts": [
{"title": "Hello Phoenix", "author": {"name": "Alice"}},
{"title": "Elixir Rocks", "author": {"name": "Bob"}}
]
}
}
Implementation Hints:
API pipeline (no session, HTML rendering):
pipeline :api do
plug :accepts, ["json"]
plug MyAppWeb.Plugs.APIAuth
end
scope "/api", MyAppWeb.API do
pipe_through :api
resources "/posts", PostController, except: [:new, :edit]
end
FallbackController for error handling:
defmodule MyAppWeb.FallbackController do
use MyAppWeb, :controller
def call(conn, {:error, :not_found}) do
conn
|> put_status(:not_found)
|> json(%{error: "Not found"})
end
def call(conn, {:error, %Ecto.Changeset{} = changeset}) do
conn
|> put_status(:unprocessable_entity)
|> json(%{errors: format_errors(changeset)})
end
end
Learning milestones:
- API returns JSON → You understand the api pipeline
- Auth protects endpoints → You understand token auth
- Errors return proper status codes → You understand FallbackController
- GraphQL queries work → You understand Absinthe
Project 10: PubSub and Distributed Phoenix
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Distributed Systems
- Software or Tool: Phoenix.PubSub, libcluster
- Main Book: “Elixir in Action” by Saša Jurić
What you’ll build: A Phoenix application that runs on multiple nodes, with PubSub messages reaching all connected users across all nodes. You’ll see how Phoenix scales horizontally.
Why it teaches Phoenix: Phoenix’s ability to scale across multiple nodes—with Channels, LiveView, and PubSub working seamlessly—is why companies choose it for high-traffic applications. Understanding distribution is understanding Phoenix’s power.
Core challenges you’ll face:
- Connecting BEAM nodes → maps to libcluster, Erlang distribution
- PubSub across nodes → maps to :pg2, Phoenix.PubSub
- Session sharing → maps to Redis, distributed cache
- Deployment → maps to releases, clustering in production
Key Concepts:
- Erlang Distribution: Node connections, cookies
- PubSub Adapters: PG2 (built-in), Redis
- libcluster: Auto-discovery of nodes
- Releases:
mix release
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 6-7 (Channels, LiveView), understanding of distributed systems.
Real world outcome:
# Start node 1
$ PORT=4000 iex --name node1@127.0.0.1 -S mix phx.server
# Start node 2
$ PORT=4001 iex --name node2@127.0.0.1 -S mix phx.server
# Nodes discover each other (libcluster)
[libcluster] Connected to node2@127.0.0.1
# User A connects to node1, User B connects to node2
# User A sends message in chat
# User B receives it INSTANTLY (PubSub across nodes)
# In iex on node1:
iex(node1@127.0.0.1)> Node.list()
[:node2@127.0.0.1]
iex(node1@127.0.0.1)> Phoenix.PubSub.broadcast(MyApp.PubSub, "room:lobby", {:msg, "Hi!"})
# Message received on BOTH nodes!
Implementation Hints:
libcluster configuration:
# config/runtime.exs
config :libcluster,
topologies: [
local: [
strategy: Cluster.Strategy.Gossip
]
]
PubSub across nodes:
Node 1 Node 2
┌─────────────────────┐ ┌─────────────────────┐
│ Phoenix.PubSub │ │ Phoenix.PubSub │
│ (local subscribers) │◄────────────►│ (local subscribers) │
│ │ pg2/Redis │ │
│ User A (WebSocket) │ │ User B (WebSocket) │
└─────────────────────┘ └─────────────────────┘
When User A sends message:
1. Channel broadcasts to local PubSub
2. PubSub adapter forwards to other nodes
3. Other nodes broadcast to their local subscribers
4. User B receives message
Learning milestones:
- Two nodes connect → You understand Erlang distribution
- PubSub reaches both nodes → You understand distributed PubSub
- LiveView works across nodes → You understand horizontal scaling
- You can add/remove nodes dynamically → You understand clustering
Project 11: Background Jobs with Oban
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Background Processing
- Software or Tool: Oban
- Main Book: Oban documentation
What you’ll build: A robust background job system for sending emails, processing images, syncing data, and other tasks that shouldn’t block web requests. You’ll understand job queues, retries, and scheduling.
Why it teaches Phoenix: Real applications need background processing. Oban is the standard for Elixir, leveraging PostgreSQL for reliability. Understanding async patterns complements your synchronous Phoenix knowledge.
Core challenges you’ll face:
- Job definition → maps to workers and args
- Queues and concurrency → maps to controlling parallelism
- Retries and backoff → maps to handling failures
- Scheduling → maps to cron-like recurring jobs
Key Concepts:
- Oban Workers: Defining job behavior
- Queues: Controlling concurrency
- Pruning: Cleaning old jobs
- Telemetry: Monitoring job execution
Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 4-5 (Phoenix, Ecto), PostgreSQL.
Real world outcome:
# Enqueue a job
%{user_id: 123, email: "welcome"}
|> MyApp.Workers.SendEmail.new()
|> Oban.insert()
# Job executes in background
# defmodule MyApp.Workers.SendEmail do
# use Oban.Worker, queue::emails, max_attempts: 3
#
# def perform(%Job{args: %{"user_id" => user_id, "email" => email}}) do
# user = Accounts.get_user!(user_id)
# Mailer.send(user, email)
# end
# end
# Web UI shows:
# ┌────────────────────────────────────────────────────────────┐
# │ Oban Dashboard │
# ├────────────────────────────────────────────────────────────┤
# │ Queues: │
# │ emails: 5 available, 2 executing, 0 failed │
# │ images: 10 available, 5 executing, 1 retrying │
# │ sync: 0 available, 0 executing, 0 failed │
# │ │
# │ Recent Jobs: │
# │ SendEmail (user:123) - completed 5s ago │
# │ ProcessImage (file:abc) - executing... │
# │ SendEmail (user:124) - completed 10s ago │
# └────────────────────────────────────────────────────────────┘
Implementation Hints:
Oban worker with retry:
defmodule MyApp.Workers.ProcessImage do
use Oban.Worker,
queue: :images,
max_attempts: 5,
priority: 1
@impl Oban.Worker
def perform(%Job{args: %{"image_id" => id}}) do
image = Media.get_image!(id)
case ImageProcessor.resize(image) do
{:ok, resized} ->
Media.update_image(image, %{processed: true, url: resized.url})
:ok
{:error, reason} ->
# Returning error triggers retry with backoff
{:error, reason}
end
end
end
Learning milestones:
- Jobs execute in background → You understand async processing
- Failed jobs retry → You understand error handling
- Queues control concurrency → You understand resource management
- Scheduled jobs run on time → You understand cron-like behavior
Project 12: Telemetry and Observability
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Monitoring / Observability
- Software or Tool: Telemetry, PromEx, LiveDashboard
- Main Book: “Elixir in Action” by Saša Jurić
What you’ll build: Full observability for your Phoenix app: request timing, database query metrics, custom business metrics, Prometheus integration, and live dashboards.
Why it teaches Phoenix: Phoenix is built on Telemetry—it emits events for everything. Understanding how to capture and act on these events is essential for running Phoenix in production.
Core challenges you’ll face:
- Telemetry events → maps to what Phoenix emits
- Handlers and metrics → maps to collecting data
- LiveDashboard → maps to built-in monitoring
- Prometheus/Grafana → maps to production monitoring
Key Concepts:
- :telemetry library: Event emission and handling
- Phoenix.Telemetry: Built-in Phoenix events
- PromEx: Prometheus metrics for Phoenix
- LiveDashboard: Built-in Phoenix dashboard
Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 4 (Phoenix basics), understanding of metrics concepts.
Real world outcome:
LiveDashboard (http://localhost:4000/dashboard)
┌─────────────────────────────────────────────────────────────────┐
│ Phoenix LiveDashboard │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Home │ Metrics │ Request Logger │ Applications │ Processes │
│ │
│ System: │
│ BEAM Memory: 234 MB │ Atoms: 12,453 │ Processes: 1,234 │
│ CPU: 15% │ Run Queue: 0 │
│ │
│ Phoenix: │
│ Requests/sec: 1,234 │ Avg Latency: 12ms │ 99p: 45ms │
│ WebSocket connections: 567 │
│ │
│ Ecto: │
│ Queries/sec: 2,345 │ Avg Time: 2ms │ Pool Size: 10 │
│ │
│ Custom Metrics: │
│ Signups today: 123 │ Orders: 45 │ Revenue: $12,345 │
└─────────────────────────────────────────────────────────────────┘
Implementation Hints:
Attaching to Phoenix telemetry events:
# In application.ex
def start(_type, _args) do
:telemetry.attach_many(
"my-app-handler",
[
[:phoenix, :endpoint, :stop],
[:my_app, :repo, :query],
[:my_app, :user, :signup]
],
&MyApp.Telemetry.handle_event/4,
nil
)
# ... start supervision tree
end
# Handler
defmodule MyApp.Telemetry do
def handle_event([:phoenix, :endpoint, :stop], measurements, metadata, _config) do
Logger.info("Request to #{metadata.route} took #{measurements.duration / 1_000_000}ms")
end
end
Learning milestones:
- LiveDashboard shows metrics → You understand built-in observability
- Custom events emit → You understand Telemetry API
- Prometheus scrapes metrics → You understand external monitoring
- Grafana dashboard works → You have production observability
Project 13: Testing Phoenix Applications
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Testing / Quality
- Software or Tool: ExUnit, Mox, Wallaby
- Main Book: “Testing Elixir” by Andrea Leopardi
What you’ll build: A comprehensive test suite covering unit tests, integration tests, controller tests, LiveView tests, and end-to-end browser tests. You’ll understand Phoenix’s testing patterns.
Why it teaches Phoenix: Phoenix has excellent testing support built-in. Understanding how to test each layer—contexts, controllers, channels, LiveView—makes you a more effective Phoenix developer.
Core challenges you’ll face:
- ConnTest → maps to testing controllers
- DataCase → maps to testing with database
- ChannelCase → maps to testing channels
- LiveViewTest → maps to testing LiveView
Key Concepts:
- ExUnit: Elixir’s testing framework
- ConnTest: Phoenix controller testing
- Mox: Behavior-based mocking
- Sandbox: Database isolation
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Projects 4-7 (Phoenix, Ecto, Channels, LiveView).
Real world outcome:
$ mix test
...............................................................
Finished in 2.3 seconds
42 tests, 0 failures
Randomized with seed 12345
# Test breakdown:
# - 15 context tests (pure business logic)
# - 10 controller tests (HTTP layer)
# - 8 LiveView tests (interactive UI)
# - 5 channel tests (real-time)
# - 4 integration tests (full stack)
Implementation Hints:
Testing a controller:
defmodule MyAppWeb.PostControllerTest do
use MyAppWeb.ConnCase
describe "index" do
test "lists all posts", %{conn: conn} do
post = insert(:post, title: "Hello")
conn = get(conn, ~p"/posts")
assert html_response(conn, 200) =~ "Hello"
end
end
end
Testing LiveView:
defmodule MyAppWeb.CounterLiveTest do
use MyAppWeb.ConnCase
import Phoenix.LiveViewTest
test "increments counter", %{conn: conn} do
{:ok, view, _html} = live(conn, "/counter")
assert view |> element("span.count") |> render() =~ "0"
view |> element("button", "Increment") |> render_click()
assert view |> element("span.count") |> render() =~ "1"
end
end
Learning milestones:
- Context tests pass → You test business logic
- Controller tests pass → You test HTTP layer
- LiveView tests pass → You test interactive UIs
- Full test suite is fast → You understand async testing
Project 14: Deployment and Production Phoenix
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Docker, Bash
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: DevOps / Deployment
- Software or Tool: Mix releases, Docker, Fly.io or self-hosted
- Main Book: “Real-World Elixir Deployment” (various resources)
What you’ll build: A production-ready Phoenix deployment with releases, environment configuration, migrations, health checks, and clustering across multiple instances.
Why it teaches Phoenix: Development is only half the story. Understanding how to package Phoenix as a release, handle secrets, run migrations, and scale horizontally completes your knowledge.
Core challenges you’ll face:
- Releases → maps to packaging for production
- Runtime configuration → maps to config/runtime.exs
- Migrations in production → maps to Ecto.Migrator
- Clustering → maps to connecting nodes
Key Concepts:
- Mix Releases:
mix release - Runtime Config: config/runtime.exs
- Docker: Containerized deployment
- Health Checks: Ready/alive probes
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: All previous projects, Docker knowledge helpful.
Real world outcome:
# Dockerfile
FROM hexpm/elixir:1.15.0-erlang-26.0-alpine-3.18.0 AS build
# ... build release
FROM alpine:3.18.0
COPY --from=build /app/_build/prod/rel/my_app ./
CMD ["bin/my_app", "start"]
# Build and deploy
$ docker build -t my_app .
$ docker push my_registry/my_app
# On Fly.io
$ fly deploy
# Check clustering
$ fly ssh console
> Node.list()
[:my_app@fdaa:0:1234::3] # Connected to peer!
# Run migrations
$ fly ssh console
> MyApp.Release.migrate()
Implementation Hints:
Release configuration:
# config/runtime.exs
import Config
if config_env() == :prod do
config :my_app, MyApp.Repo,
url: System.get_env("DATABASE_URL"),
pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10")
config :my_app, MyAppWeb.Endpoint,
url: [host: System.get_env("PHX_HOST")],
secret_key_base: System.get_env("SECRET_KEY_BASE")
end
Release module for migrations:
defmodule MyApp.Release do
def migrate do
for repo <- repos() do
{:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :up, all: true))
end
end
defp repos, do: Application.fetch_env!(:my_app, :ecto_repos)
end
Learning milestones:
- Release builds successfully → You understand mix release
- Docker container runs → You understand containerization
- App starts in production → You understand runtime config
- Multiple instances cluster → You understand horizontal scaling
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. TCP Echo Server | Intermediate | Weekend | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| 2. Supervision Trees | Intermediate | Weekend-1wk | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| 3. Minimal Plug App | Intermediate | Weekend | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 4. First Phoenix App | Intermediate | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 5. Ecto Deep Dive | Advanced | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| 6. Phoenix Channels | Advanced | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 7. Phoenix LiveView | Advanced | 2-3 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 8. Authentication | Advanced | 1-2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 9. REST/GraphQL API | Advanced | 1-2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 10. Distributed Phoenix | Expert | 2-3 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 11. Background Jobs | Advanced | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 12. Telemetry | Advanced | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 13. Testing | Intermediate | 1 week | ⭐⭐⭐⭐ | ⭐⭐ |
| 14. Deployment | Advanced | 1-2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
Recommended Learning Path
If you’re new to Elixir and Phoenix:
- Learn Elixir basics (pattern matching, modules, recursion)
- Project 1: TCP Echo Server → Understand BEAM processes
- Project 2: Supervision Trees → Understand OTP
- Project 3: Minimal Plug → Understand the web layer
- Project 4: First Phoenix App → Connect the dots
- Projects 5-7 → Deep dive into Ecto, Channels, LiveView
- Continue with remaining projects
If you know another web framework (Rails, Django, Express):
- Quick Elixir syntax review
- Project 1-2 (Quick: understand the concurrency model)
- Project 4: First Phoenix App → See Phoenix conventions
- Project 7: LiveView → See what’s unique
- Project 10: Distributed Phoenix → See the scalability story
- Fill in gaps as needed
If you want to understand Phoenix internals:
- Projects 1-3 (essential: understand the layers)
- Read Phoenix source code (it’s very readable!)
- Project 6-7 (Channels, LiveView internals)
- Project 10 (Distribution)
- Consider “Build Your Own Web Framework in Elixir” book
Final Capstone Project: Real-Time Collaborative Application
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: N/A
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Full-Stack Phoenix
- Software or Tool: All Phoenix features
- Main Book: All referenced books
What you’ll build: A real-time collaborative document editor (like a mini Google Docs) where multiple users can edit the same document simultaneously, see each other’s cursors, and changes merge correctly.
This project integrates:
- BEAM Processes: Each document is a GenServer
- OTP: Supervision for document processes
- Plug: Request pipeline
- Phoenix: Web framework structure
- Ecto: Document persistence
- Channels: Real-time sync
- LiveView: Interactive UI
- PubSub: Multi-node support
- Presence: Who’s editing what
- Telemetry: Monitoring
- Deployment: Production clustering
Why this is the ultimate Phoenix project: It demonstrates everything Phoenix excels at—real-time, stateful, distributed, fault-tolerant. Companies like Figma, Notion, and Google Docs face these challenges. Building even a simple version proves mastery.
Real world outcome:
┌─────────────────────────────────────────────────────────────────┐
│ Collaborative Editor - Document: "Team Notes" │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Online: 🟢 Alice (editing) 🟢 Bob (viewing) 🟢 Charlie │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ # Meeting Notes │ │
│ │ │ │
│ │ ## Decisions │ │
│ │ - Use Phoenix for the backend ← Alice's cursor │ │
│ │ - Deploy on Fly.io| │ │
│ │ ↑ Bob is typing here │ │
│ │ ## Action Items │ │
│ │ - [ ] Set up CI/CD │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ 💾 Auto-saved 2 seconds ago │
└─────────────────────────────────────────────────────────────────┘
Features:
- Real-time sync across all connected users
- Cursor positions shown for other users
- Conflict resolution for concurrent edits
- Works across multiple server nodes
- Survives server restarts (document state recovered)
- Presence shows who's online
Summary
| # | Project | Main Language |
|---|---|---|
| 1 | TCP Echo Server (BEAM Processes) | Elixir |
| 2 | Supervision Trees (Fault Tolerance) | Elixir |
| 3 | Minimal Plug Application | Elixir |
| 4 | First Phoenix Application | Elixir |
| 5 | Deep Dive into Ecto | Elixir |
| 6 | Phoenix Channels (Real-Time) | Elixir + JavaScript |
| 7 | Phoenix LiveView (Interactive UI) | Elixir |
| 8 | Authentication from Scratch | Elixir |
| 9 | REST API with JSON:API or GraphQL | Elixir |
| 10 | PubSub and Distributed Phoenix | Elixir |
| 11 | Background Jobs with Oban | Elixir |
| 12 | Telemetry and Observability | Elixir |
| 13 | Testing Phoenix Applications | Elixir |
| 14 | Deployment and Production | Elixir + Docker |
| Final | Real-Time Collaborative Editor (Capstone) | Elixir |
Key Resources Referenced
Books
- “Elixir in Action” by Saša Jurić (2nd Edition)
- “Programming Phoenix 1.4” by Chris McCord, Bruce Tate, José Valim
- “Programming Phoenix LiveView” by Bruce Tate and Sophie DeBenedetto
- “Programming Ecto” by Darin Wilson and Eric Meadows-Jönsson
- “Craft GraphQL APIs in Elixir with Absinthe” by Bruce Williams
- “Build Your Own Web Framework in Elixir” by Aditya Iyengar
Online Resources
- Phoenix Framework
- Phoenix Overview
- How Phoenix LiveView Works
- Elixir and The BEAM Concurrency
- Phoenix LiveView Introduction
- Ecto Introduction
- Building a Web Framework from Scratch
- Elixir School
- Comparing Phoenix and Modern Web Frameworks
Elite BEAM Mastery Extension (Advanced to Elite)
Goal: Move from “advanced Phoenix developer” to “BEAM systems engineer” by mastering runtime internals, distributed failure handling, production observability, release mechanics, and architecture patterns that survive chaos. This extension adds deep concept chapters and elite projects that intentionally force hard tradeoffs: mailbox pressure, netsplits, GC pressure, coordination under partition, and runtime upgrades under load. You will build systems where each subsystem has measurable behavior, explicit failure modes, and clear invariants. By the end you will be able to design, benchmark, and operate multi-node real-time Phoenix systems with confidence.
Introduction
Phoenix mastery is not just about controllers, contexts, and LiveView components. The real advantage of Phoenix appears when you can reason at BEAM runtime level: scheduler fairness, memory ownership, mailbox growth, supervision strategy, process discovery, distributed consistency, and upgrade safety.
This extension focuses on that exact gap.
- In scope: BEAM internals, advanced OTP architecture, distributed resilience, performance engineering, release/runtime engineering, native interop, architecture patterns, real-time internals, reliability patterns, advanced testing, SaaS operations, and elite low-level build projects.
- Out of scope: beginner Phoenix setup, basic CRUD, and introductory Elixir syntax.
What you will build across new projects:
- Scheduler and mailbox diagnostics lab
- Stateful workflow engines with
:gen_statem - Partition-tolerant cluster services
- Load-tested and profiled LiveView systems
- Hot-upgrade simulation pipelines
- NIF/Port boundary prototypes
- CQRS + event-sourced, multi-tenant services
- Chaos-tested SaaS operational platform
- A full insane capstone with CRDT shared state and 50k+ concurrent connection test plan
Big-picture map:
BEAM Internals OTP Design Distribution Operations
┌────────────────────┐ ┌────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ Scheduling/GC/MQ │→│ Supervision/SM │→│ Clustering/CRDT │→│ Tracing/Load/Rel │
└────────────────────┘ └────────────────┘ └──────────────────┘ └───────────────────┘
│ │ │ │
└──────────────────────┴────────────────────┴─────────────────────┘
Phoenix Product Architecture
How to Use This Guide
- Read the existing projects first through Project 10 if you have not already; this extension assumes those basics.
- Read the Theory Primer chapters below before attempting Projects 15-27.
- For each project, define observability first: what metric, trace, log, and failure signal prove correctness.
- Keep a lab notebook for every project:
- target invariant,
- what you deliberately broke,
- how the system signaled failure,
- what fixed it.
- Do not skip post-project verification; elite systems work because behavior is measured, not assumed.
Prerequisites & Background Knowledge
Essential Prerequisites (Must Have)
- Elixir/OTP fluency: GenServer, Supervisor, DynamicSupervisor, Registry, Task, Ecto basics.
- Phoenix fluency: Endpoint, Channels, LiveView lifecycle, Telemetry hooks.
- Distributed systems basics: CAP tradeoffs, retry/backoff, idempotency semantics.
- Linux operations basics: process inspection, networking, container runtime, deploy pipelines.
Helpful But Not Required
- Erlang shell debugging commands (
:observer,:sys,:erlang.trace) - Rust basics (for Rustler experiments)
- k6 scripting for load generation
Self-Assessment Questions
- Can you explain why mailbox growth can crash latency before CPU saturation?
- Can you justify one supervision strategy over another for a production subsystem?
- Can you describe what exactly fails during a cluster netsplit and what must stay available?
- Can you reason about when to choose ETS vs process state vs database?
- Can you define a safe path for runtime config and hot upgrade rollback?
Development Environment Setup
Required Tools:
- Elixir 1.16+
- Erlang/OTP 27+
- Phoenix 1.7+
- PostgreSQL 15+
- Docker and docker compose
- k6
Recommended Tools:
:observer(GUI runtime view):reconutilities- Grafana + Prometheus + OpenTelemetry collector
Testing Your Setup:
$ elixir --version
$ mix phx.new --version
$ k6 version
$ iex -e ':erlang.system_info(:schedulers_online) |> IO.inspect()'
Time Investment
- Projects 15-20: 2-3 weeks each
- Projects 21-26: 2-4 weeks each
- Project 27 capstone: 4-8 weeks
- Total extension: ~6-10 months part-time
Important Reality Check These projects are intentionally uncomfortable. You will trigger real failure modes: mailbox blowups, split-brain behavior, failed upgrade windows, and instrumentation overload. That is the point.
Big Picture / Mental Model
Incoming Load
│
▼
┌───────────────┐ ┌─────────────────┐ ┌────────────────────┐
│ Runtime Model │ -> │ OTP Architecture│ -> │ Distributed Behavior│
│ (sched/GC/MQ) │ │ (trees/states) │ │ (partition/recovery)│
└───────────────┘ └─────────────────┘ └────────────────────┘
│ │ │
▼ ▼ ▼
┌────────────────────────────────────────────────────────────────────┐
│ Observability + Release Control + Testing + Chaos + SLO Feedback │
└────────────────────────────────────────────────────────────────────┘
│
▼
Reliable, measurable Phoenix system under real failure and load
Theory Primer
Concept Cluster 1: Deep BEAM Internals (Schedulers, Reductions, Memory, Mailboxes)
Fundamentals: The BEAM does not schedule OS threads per request. It schedules BEAM processes across scheduler threads using reductions as a unit of work, preempting long-running work to keep fairness. Each BEAM process has isolated heap and GC, so pauses are local, not global. Message passing copies data into recipient process heaps except reference-counted large binaries, which changes memory behavior under high-throughput systems. Mailboxes are per-process queues; if consumers cannot keep up, latency and memory grow even while CPU appears “fine.” Dirty schedulers are dedicated execution pools for operations that would otherwise block normal schedulers. Understanding these fundamentals is the difference between guessing performance and proving it.
Deep Dive: At elite scale, performance debugging begins with scheduler pressure and mailbox pressure, not just endpoint timing. Reductions approximate executed function calls and operations; the runtime gives each process a reduction budget before preemption. This creates fairness under mixed workloads, but fairness can degrade when you introduce long NIF calls, blocking ports, or single-process hotspots. Dirty CPU schedulers and dirty I/O schedulers exist to isolate heavy operations from normal schedulers, but misuse can still starve systems if dirty pools saturate and callers pile up.
Per-process heaps are one of BEAM’s strongest design decisions because they localize GC pauses. A process with bursty allocations can GC repeatedly without freezing unrelated sockets. However, this advantage creates a different class of bug: thousands of mostly-idle processes each hold modest state, creating significant aggregate memory. Another subtle issue is large binary handling. Binaries above internal thresholds are reference counted and shared; if a tiny sub-binary keeps a reference to a huge original binary, memory appears “leaked” even though references still exist. Streaming systems often encounter this when parsing partial payloads and storing slices in process state.
Mailboxes are operational truth. A healthy actor model depends on bounded queue growth or explicit backpressure. Selective receive can worsen queue scanning overhead because older unmatched messages remain and force repeated scans. This creates nonlinear latency. For real systems, you need explicit flow control: producer credits, mailbox size thresholds, admission control, or bounded demand protocols.
How this fits in projects:
- Project 15 establishes baseline scheduler/memory/mailbox observability.
- Projects 18 and 22 apply this to load and LiveView process behavior.
- Project 27 uses all of it under multi-node stress.
Definitions and key terms:
- Reduction: runtime accounting unit for scheduler preemption.
- Scheduler run queue: queue of runnable processes per scheduler.
- Dirty scheduler: scheduler pool for blocking/long native work.
- Mailbox: per-process message queue.
- Sub-binary retention: long-lived reference to a large underlying binary.
Mental model diagram:
Producer Processes ---> [Mailbox of Target PID] ---> handle_info/receive
│ │ │
│ ├─ grows unbounded? ├─ fast enough?
│ └─ selective receive scan └─ sends downstream
▼
Scheduler run queues balance runnable processes by reductions
How it works (step-by-step):
- Incoming work creates runnable processes.
- Scheduler assigns a process and executes until reduction budget is spent.
- Process yields/preempts; scheduler picks next runnable process.
- Incoming messages append to destination mailbox.
- Receiver pattern-matches mailbox entries; unmatched messages remain queued.
- GC runs per-process based on heap pressure.
Invariants and failure modes:
- Invariant: no single process should have unbounded mailbox growth.
- Invariant: blocking native work should not run on normal schedulers.
- Failure mode: selective receive over huge mailbox causes latency spikes.
- Failure mode: retained sub-binaries hold large memory unexpectedly.
Minimal concrete example:
Telemetry sample:
process_mailbox_len{pid="chat_room_42"} 18234
scheduler_run_queue{scheduler="3"} 211
vm_memory_binary_bytes 1.8GB
Interpretation: actor backlog + binary retention, not just HTTP latency.
Common misconceptions:
- “BEAM preemption means no process can hurt latency.” False; mailbox and hot actors still hurt.
- “GC is automatic so memory tuning is unnecessary.” False; ownership and data shape still matter.
Check-your-understanding questions:
- Why can latency spike while CPU remains moderate?
- Why can a 50-byte slice keep a 5 MB binary alive?
- Why is selective receive risky in hot paths?
Check-your-understanding answers:
- Queueing delay and mailbox scans dominate before CPU maxes.
- Sub-binary references original binary allocation.
- Unmatched messages force repeated mailbox scans.
Real-world applications:
- Chat systems, collaborative editors, telemetry fanout, event ingestion.
Where you will apply it:
- Projects 15, 18, 22, 27.
References:
- https://www.erlang.org/doc/system/eff_guide_processes.html
- https://www.erlang.org/doc/system/eff_guide_binaryhandling.html
- https://www.erlang.org/doc/apps/erts/erlang.html#system_info-1
Key insight: Runtime fairness is necessary but not sufficient; queue shape and memory ownership decide real latency.
Summary: Measure scheduler pressure, mailbox pressure, and binary retention together or you will miss root cause.
Homework/exercises:
- Build a mailbox stress harness and chart mailbox_len over time.
- Reproduce sub-binary memory retention and then fix it with explicit copy boundary.
Solutions:
- Use controlled producer rates and telemetry snapshots every second.
- Force binary copy at ownership boundary, then compare binary memory metrics.
Concept Cluster 2: Advanced OTP Architecture (State Machines, Supervision, Discovery, ETS, Mnesia)
Fundamentals:
OTP behaviors encode architecture decisions in reusable contracts. GenServer is one tool, not the only tool. :gen_statem is often better when workflow has explicit states, transition rules, and timers. Supervision trees express failure boundaries and restart policies. Registries and discovery decide how processes are located in single-node and multi-node environments. ETS provides in-memory tables with concurrency tuning and match specs. Mnesia provides distributed transactional storage with tradeoffs that must be explicit.
Deep Dive:
Systems become fragile when every service is a generic request-reply GenServer. Stateful workflows with temporal rules are cleaner with :gen_statem because state transitions become first-class and testable. You can model retries, timeouts, and compensation paths with explicit transition functions instead of scattered if/else branches. This also improves observability because every transition can emit telemetry events.
Supervision requires matching restart semantics to business semantics. :temporary children are right for one-off jobs that should never restart; :transient children restart only on abnormal exits; :permanent children always restart. Restart intensity (max_restarts and max_seconds) is safety against crash loops, but values must reflect subsystem criticality. A common anti-pattern is one huge supervisor where unrelated failures share restart budget. Partition trees by bounded context and failure domain.
Discovery options evolve with scale. Local Registry is simple and fast for node-local routing. For cluster-wide process ownership, libraries like Horde use Delta CRDT state convergence, tolerating partitions with eventual reconciliation. This changes conflict behavior: two sides can temporarily believe they own a process; reconciliation must be deterministic.
ETS is often your first high-performance store. Table type (set, ordered_set, bag) plus read/write concurrency options determine behavior under load. Ownership is critical: when owner process dies, table is destroyed unless ownership transfer strategy exists. Match specs let you query/filter inside VM without copying full table contents. Rate limiting can be built with ETS counters and time buckets, but global limits need cross-node strategy.
Mnesia can be excellent for BEAM-native metadata and replicated state, but it is not a drop-in replacement for external transactional databases. Network partitions and table replica strategy determine consistency availability. Use it where BEAM-local integration and operational simplicity matter more than broad ecosystem tooling.
How this fits in projects:
- Project 16 builds explicit state and supervision control plane.
- Project 17 extends discovery and distributed ownership.
- Projects 21 and 25 use ETS/Mnesia patterns for product systems.
Definitions and key terms:
:gen_statem: OTP behavior for event-driven finite state machines.- Restart intensity: supervisor crash-loop guardrails.
- Registry: name-to-process mapping.
- ETS ownership: process that owns lifecycle of table.
- Replica type: how Mnesia stores/replicates table copies.
Mental model diagram:
Supervision Tree
│
┌─────────────┴─────────────┐
│ │
Workflow FSM Data Plane
(:gen_statem) (Registry + ETS + Mnesia)
│ │
explicit states fast lookup / replicated metadata
How it works:
- Supervisor starts FSM + support services.
- FSM handles events and transitions state.
- Registry maps business IDs to process IDs.
- ETS stores hot-path counters/cache.
- Optional Mnesia stores replicated control metadata.
Failure modes:
- Wrong restart type causes dangerous auto-retry loops.
- ETS table disappears on owner crash without handoff strategy.
- Split brain in discovery causes duplicate workers.
Minimal concrete example:
State transition log:
order_fsm order=123 pending -> authorized (event=payment_ok)
order_fsm order=123 authorized -> shipped (event=warehouse_ack)
Common misconceptions:
- “GenServer is always enough.” No, explicit state models reduce hidden complexity.
- “ETS is global and safe by default.” No, ownership and table options matter.
Check-your-understanding questions:
- When should you choose
:transientchild restart? - Why might
ordered_sethurt write throughput? - What failure happens if ETS owner dies?
Check-your-understanding answers:
- When normal exits should not restart but crashes should.
- It maintains ordering overhead on updates/inserts.
- Table is removed unless transferred/managed differently.
Real-world applications:
- Order workflow engines, session routing, local caches, distributed service registries.
Where you will apply it:
- Projects 16, 17, 21, 25.
References:
- https://www.erlang.org/doc/system/statem
- https://www.erlang.org/doc/system/sup_princ.html
- https://hexdocs.pm/elixir/Registry.html
- https://www.erlang.org/doc/apps/stdlib/ets.html
- https://www.erlang.org/doc/apps/mnesia/mnesia_chap1.html
Key insight: OTP architecture quality comes from explicit failure boundaries and state transitions, not just process count.
Summary: Use the right OTP primitive for the problem; architecture is a runtime contract.
Homework/exercises:
- Refactor a GenServer workflow to
:gen_statem. - Simulate ETS owner crash and implement recovery.
Solutions:
- Define states/events/timeouts first, then transition table.
- Move critical tables under controlled owner and test handoff.
Concept Cluster 3: Distributed BEAM Engineering (Netsplits, CRDT, Global Limits)
Fundamentals: A BEAM cluster is a distributed system, not a magic LAN. Network partitions, delayed messages, and node churn are normal states. You need explicit policy for availability vs consistency, process ownership, and global quotas. CRDT-based coordination (for example Delta CRDT approaches) can keep systems available with eventual convergence.
Deep Dive: Netsplits create two truths. During partition, each side can continue accepting writes unless you intentionally gate behavior. If your domain cannot tolerate divergent writes for a specific operation, you must enforce leader-based control or reject writes during uncertain membership. For lower-risk state (presence, ephemeral counters, soft coordination), eventual consistency with CRDTs can preserve availability and user experience.
Distributed process patterns include sharding (deterministic key-to-shard mapping), ownership registries, and partition-aware routers. Global registries simplify addressing but introduce conflict logic during partition and heal events. Always define what happens when ownership conflicts are detected after heal.
Global rate limiting is a good example of consistency tradeoff. Strict global limit requires strong coordination and can reduce availability/latency. Approximate distributed limit with periodic gossip can maintain higher availability but allows short burst inaccuracy. Choose intentionally and document user-facing semantics.
How this fits in projects:
- Project 17 simulates split brain and recovery.
- Project 22 exercises presence and multiplayer consistency.
- Project 27 combines CRDT shared state with global limit behavior.
Definitions and key terms:
- Netsplit: partition that breaks cluster connectivity.
- Split brain: simultaneous conflicting authority during partition.
- CRDT: convergent replicated data type with merge guarantees.
- Shard: partition of keyspace handled by specific process/node.
Mental model diagram:
Node A Cluster View X partition X Node B Cluster View
[A1 A2 A3] thinks leader=A1 [B1 B2] thinks leader=B1
│ │
└---- divergent writes possible until heal -----------┘
How it works:
- Nodes join and exchange membership.
- Router places keys by shard strategy.
- Partition occurs; each side makes local decisions.
- Heal happens; conflicting state reconciles (CRDT or policy).
Failure modes:
- Duplicate process ownership after heal.
- Double counting in global limit windows.
- Lost user intent if reconciliation drops updates.
Minimal concrete example:
limit policy:
strict_global=false
local_limit=100 req/s per node
sync_interval_ms=500
Common misconceptions:
- “Clustering means linear scalability automatically.” No, cross-node coordination costs dominate.
- “CRDT means no bugs.” No, semantics can still violate business expectations.
Check-your-understanding questions:
- When is eventual consistency acceptable for product behavior?
- Why can strict global limit hurt availability?
- What must be deterministic during conflict resolution?
Check-your-understanding answers:
- Presence/ephemeral signals where brief divergence is acceptable.
- It depends on coordination path that may be partitioned.
- Ownership and merge policy to avoid oscillation.
Real-world applications:
- Presence systems, collaborative tools, geo-distributed edge services.
Where you will apply it:
- Projects 17, 22, 27.
References:
- https://hexdocs.pm/horde/readme.html
- https://hexdocs.pm/delta_crdt_ex/readme.html
- https://hexdocs.pm/phoenix_pubsub/Phoenix.PubSub.html
- https://hexdocs.pm/phoenix_pubsub/Phoenix.Tracker.html
Key insight: Distributed success comes from explicit conflict policy, not from clustering alone.
Summary: Design for partition first, then optimize happy path.
Homework/exercises:
- Simulate a 30-second partition and record divergent state.
- Implement two rate-limit policies: strict and approximate.
Solutions:
- Use network controls to isolate nodes and compare state snapshots.
- Compare error bounds vs availability under packet loss.
Concept Cluster 4: Performance Engineering (Tracing, Telemetry, Profiling, Load)
Fundamentals:
Performance work starts with measurement hierarchy: user-facing SLOs, subsystem metrics, traces, then profiler output. BEAM gives powerful built-ins (:observer, tracing, profilers) and Phoenix emits telemetry events across request and LiveView paths.
Deep Dive:
Tracing is for causality; metrics are for trends. During incidents, capture narrow traces with bounded duration and clear hypotheses. Broad trace enablement can itself cause pressure. :observer gives runtime snapshots (processes, reductions, memory, ports), while targeted tracepoints answer “who is sending what to whom, and how often?” Profilers (:fprof, :eprof) explain where CPU time goes, but results are only valid under representative workload.
Load testing must model realistic traffic mixes: websocket fanout, burst joins/leaves, API writes, background jobs, and cache misses. For LiveView, process-per-socket means connection count directly maps to runtime process count and memory footprint. Tune heartbeat intervals, diff payload shape, and assign lifecycles.
Telemetry design requires cardinality discipline. High-cardinality labels (raw user_id, request path with IDs) destroy metric store utility. Prefer normalized route labels and sampled detail paths.
How this fits in projects:
- Project 18 is dedicated performance lab.
- Project 22 validates real-time scaling.
- Project 27 establishes production SLO dashboards.
Definitions and key terms:
- SLO: measurable reliability/performance target.
- Cardinality: number of distinct label combinations for metric series.
- Profiling: attributing runtime resource usage to functions/processes.
Mental model diagram:
SLO breach -> metric anomaly -> targeted trace -> profile hotspot -> fix -> repeat load test
How it works:
- Define SLO and budget (latency/error/resource).
- Instrument request, socket, queue, and job paths.
- Load test controlled scenarios.
- Trace/profile suspected hotspots.
- Apply change and re-run same scenario.
Failure modes:
- Instrumentation overload increases latency.
- Non-representative load gives false confidence.
- Optimizing p50 while p99 worsens.
Minimal concrete example:
slo_http_p99_ms <= 250
slo_ws_broadcast_p99_ms <= 120
slo_error_rate <= 0.5%
Common misconceptions:
- “If average latency is good, system is good.” Tail latency drives outages.
- “More metrics always help.” No, low-signal high-cardinality metrics hurt.
Check-your-understanding questions:
- Why is p99 often more important than average?
- What makes a trace unsafe in production?
- Why should load tests include warmup and steady-state phases?
Check-your-understanding answers:
- User pain clusters in tail latency.
- Excessive trace scope can amplify load.
- Caches/JIT/system state differ before steady state.
Real-world applications:
- Capacity planning, incident response, regression prevention.
Where you will apply it:
- Projects 18, 22, 25, 27.
References:
- https://hexdocs.pm/telemetry/
- https://www.erlang.org/doc/apps/runtime_tools/observer_ug.html
- https://www.erlang.org/doc/apps/tools/eprof.html
- https://www.erlang.org/doc/apps/tools/fprof.html
- https://k6.io/docs/
Key insight: Performance wins come from disciplined measurement loops, not from isolated micro-optimizations.
Summary: Define SLOs, observe behavior, run reproducible load, then optimize.
Homework/exercises:
- Create a load scenario that isolates websocket fanout.
- Add one intentionally bad metric label and observe cardinality impact.
Solutions:
- Keep API writes constant; vary connection count and fanout size.
- Compare time-series count before/after adding unique user labels.
Concept Cluster 5: Release Engineering and Runtime Power (Hot Upgrade, Runtime Config, Mix vs Distillery)
Fundamentals: BEAM releases package runtime, app code, and upgrade metadata. Runtime configuration decides deployment portability and secret hygiene. Hot code upgrade is powerful but requires strict state transition strategy and rollback planning.
Deep Dive: Mix releases are the modern default for Elixir deployments. Distillery was historically dominant and shaped many operational practices, but modern teams should understand it mostly for legacy migration and release internals history. Runtime configuration should be done with environment-aware providers and explicit boot validation; compile-time secrets and environment assumptions are common production failures.
Hot upgrade path requires migration strategy for long-lived process state. If process state struct changes between versions, you need explicit conversion logic and compatibility window. Upgrades in cluster require orchestration: canary node, health validation, then rolling wave. Rollback must be rehearsed, not theoretical.
How this fits in projects:
- Project 19 focuses on release internals and upgrade rehearsal.
- Project 25 applies blue/green and graceful shutdown patterns.
- Project 27 includes hot-upgrade simulation under active traffic.
Definitions and key terms:
- appup/relup: upgrade instructions for application/release transitions.
- Config provider: runtime mechanism to load config on boot.
- Rolling upgrade: incrementally replacing node versions.
Mental model diagram:
Build artifact -> validate config -> deploy canary -> rolling update -> observe -> finalize
└-> rollback path validated first
How it works:
- Build reproducible release artifact.
- Validate runtime config and secrets before boot.
- Upgrade one node with live traffic.
- Confirm health metrics and state migration.
- Continue rollout or rollback.
Failure modes:
- Incompatible state shape causes process crashes post-upgrade.
- Runtime secret missing causes partial node boot and hidden drift.
Minimal concrete example:
release_gate:
- config_schema_check: pass
- health_endpoint: pass
- migration_dry_run: pass
Common misconceptions:
- “Hot upgrade means zero operational risk.” Risk shifts to state-compatibility correctness.
- “Runtime config is just env vars.” It also includes validation, defaults, and boot policy.
Check-your-understanding questions:
- Why is rollback path part of release design, not an afterthought?
- What breaks when state structs change without migration?
- Why should config validation happen before app start?
Check-your-understanding answers:
- Fast rollback is reliability budget protection.
- Long-lived processes crash or misbehave on pattern mismatch.
- Fail-fast prevents serving broken traffic.
Real-world applications:
- Continuous delivery with low downtime and controlled risk.
Where you will apply it:
- Projects 19, 25, 27.
References:
- https://hexdocs.pm/mix/Mix.Tasks.Release.html
- https://hex.pm/packages/distillery
- https://www.erlang.org/docs/17/design_principles/release_handling
Key insight: Release engineering is system design under time pressure; rehearsed rollback is the true feature.
Summary: Treat upgrades as testable workflows with explicit compatibility and safety gates.
Homework/exercises:
- Write a preflight release checklist with fail-fast rules.
- Simulate failed migration and rollback in staging.
Solutions:
- Include schema, config, health, and drain checks.
- Rehearse rollback timing and data consistency validation.
Concept Cluster 6: Native Interop and Escape Hatches (Ports, NIFs, Rustler)
Fundamentals: When BEAM code is not enough for specific performance or integration constraints, you can cross runtime boundaries via Ports or NIFs. Ports isolate crashes in external OS processes. NIFs run native code in VM address space and therefore have stronger risk profile.
Deep Dive: Ports are safer by default: if external process crashes, BEAM survives and can restart component via supervision. Cost is serialization/IPC overhead and operational complexity managing sidecar binaries. NIFs have low call overhead and direct native execution but can block schedulers or crash VM if implemented unsafely. Dirty NIFs mitigate scheduler blocking for long native work, but memory and thread safety remain critical.
Rustler improves ergonomics and safety for many NIF scenarios, but it does not eliminate all risk. You still need strict time budgeting per call, clear ownership boundaries, and load tests that include native failure simulation.
How this fits in projects:
- Project 20 compares Port vs NIF under same workload.
- Project 26 includes custom libraries and runtime hooks.
Definitions and key terms:
- Port: external OS process communicating via stdio/socket.
- NIF: native function loaded into BEAM VM.
- Dirty NIF: NIF scheduled on dirty scheduler pools.
Mental model diagram:
BEAM Process -> [Port Protocol] -> External Binary (isolated crash)
BEAM Process -> [NIF Call] -----> Native code in VM (shared fate)
How it works:
- Define boundary contract and data schema.
- Implement both Port and NIF prototype.
- Benchmark latency/throughput and failure behavior.
- Choose safety/performance tradeoff with explicit policy.
Failure modes:
- NIF long execution stalls scheduler lanes.
- Native memory corruption crashes whole VM.
- Port protocol mismatch causes silent decode failures.
Minimal concrete example:
policy:
if native_call_ms > 1ms then use dirty_nif or port
if crash_isolation_required then prefer port
Common misconceptions:
- “NIF is always better performance.” End-to-end system performance can worsen via scheduler pressure.
- “Rust means no runtime risk.” Unsafe boundaries and protocol errors still exist.
Check-your-understanding questions:
- Why are ports often preferred for untrusted integrations?
- What type of work is a good dirty NIF candidate?
- How do you prove boundary correctness?
Check-your-understanding answers:
- Crash isolation and supervised restart.
- CPU-heavy bounded work that cannot be split on BEAM side.
- Contract tests + fuzz input + kill/restart drills.
Real-world applications:
- ML inference integration, codecs, crypto, hardware interfaces.
Where you will apply it:
- Projects 20, 26.
References:
- https://www.erlang.org/docs/25/tutorial/nif
- https://www.erlang.org/doc/apps/erts/erl_nif.html
- https://www.erlang.org/docs/22/tutorial/c_port
- https://github.com/rusterlium/rustler
Key insight: Escape hatches are architecture decisions, not just optimization tricks.
Summary: Prioritize isolation first, then optimize with bounded and measurable native paths.
Homework/exercises:
- Implement identical hash service as Port and NIF.
- Kill external port process during load and document recovery.
Solutions:
- Compare p99 and failure blast radius.
- Supervisor restart should restore service with bounded loss.
Concept Cluster 7: Architectural Patterns (Event Sourcing, CQRS, Multi-Tenant, Actor Domains)
Fundamentals: Architectural patterns align domain behavior with runtime primitives. Event sourcing stores facts (events) rather than mutable row snapshots. CQRS separates write consistency path from read optimization path. Multi-tenancy isolates data and performance across customer boundaries. Actor-based domain modeling maps bounded contexts to supervision trees and process ownership boundaries.
Deep Dive: Event sourcing gives auditability and temporal replay, but introduces projection complexity and eventual consistency between write and read models. Commands represent intent with invariants; events represent accepted facts. Projections convert event streams into query-optimized views, often asynchronously via worker pipelines (for example Oban-based projection jobs). This decouples write throughput from read shapes but requires versioned events and replay-safe handlers.
CQRS is not mandatory for every feature. Use it where read/write asymmetry is significant or where write validation and read latency goals diverge strongly. Keep idempotency keys on write side to avoid duplicate command effects.
Multi-tenant architecture decisions include shared schema with tenant_id, isolated schema prefixes, or isolated databases. Isolation level must match regulatory and noisy-neighbor risk, not just convenience. Runtime tenant routing should be explicit in boundary layers; accidental cross-tenant query leakage is a top SaaS failure mode.
How this fits in projects:
- Project 21 focuses on event-sourced CQRS and tenant routing.
- Project 25 operationalizes SaaS governance patterns.
- Project 27 inherits projection and tenant correctness requirements.
Definitions and key terms:
- Command: request to change state.
- Event: immutable fact of accepted change.
- Projection: read model derived from event stream.
- Bounded context: domain boundary with explicit language and ownership.
Mental model diagram:
Command API -> Validate -> Append Event -> Async Projection -> Read Model API
│
└-> Audit stream / replay / compensation
How it works:
- Receive command with idempotency key.
- Validate invariants.
- Append event atomically.
- Project event to read models.
- Serve queries from optimized read side.
Failure modes:
- Projection lag hides recent writes.
- Event schema changes break old replay.
- Tenant boundary leaks via missing filters.
Minimal concrete example:
Command: CreateInvoice(tenant_id, invoice_id, amount)
Event: InvoiceCreated(v1, tenant_id, invoice_id, amount, timestamp)
Common misconceptions:
- “Event sourcing automatically solves all consistency issues.” It shifts where consistency is enforced.
- “CQRS always increases performance.” It can add complexity without payoff for simple domains.
Check-your-understanding questions:
- Why keep commands and events separate?
- When is projection lag acceptable?
- Which tenant isolation model fits regulated data?
Check-your-understanding answers:
- Intent and fact have different semantics and lifecycle.
- Analytics/read-heavy paths where slight staleness is tolerable.
- Dedicated schemas or databases depending compliance and risk.
Real-world applications:
- Billing, compliance-sensitive workflows, audit-heavy SaaS domains.
Where you will apply it:
- Projects 21, 25, 27.
References:
- https://martinfowler.com/bliki/CQRS.html
- https://martinfowler.com/eaaDev/EventSourcing.html
- https://hexdocs.pm/oban/Oban.html
Key insight: Pattern power comes from explicit tradeoffs: auditability and scalability in exchange for coordination complexity.
Summary: Use event and read/write separation where domain behavior justifies the cost.
Homework/exercises:
- Design event versioning strategy for one aggregate.
- Document tenant isolation threat model.
Solutions:
- Add schema version and migration replay tests.
- Enumerate query boundaries and enforce tenant-scoped access layer.
Concept Cluster 8: Real-Time System Engineering (LiveView Internals, Presence, Multiplayer Ticks)
Fundamentals: Phoenix real-time architecture combines process isolation, PubSub fanout, LiveView server-rendered diffs, and presence tracking. Each connected client maps to process lifecycle decisions that affect memory and latency.
Deep Dive: LiveView initial render is HTTP, then stateful websocket session. Server holds assigns; diff tracking sends only changed dynamic segments, reducing wire size but still requiring careful state shape control. Large assigns copied frequently can inflate memory and GC churn. Use streaming patterns and granular assigns for high-cardinality UIs.
Presence is built on tracking and distributed metadata propagation. It is not a strict source of truth for billing/security decisions because eventual propagation and transient disconnects can create short inconsistencies. For multiplayer/state-synchronized apps, use authoritative server model with deterministic tick loops and explicit conflict resolution. Client prediction can improve UX but server authority must resolve final state.
How this fits in projects:
- Project 22 is dedicated real-time internals build.
- Projects 17 and 27 consume Presence/CRDT patterns in distributed contexts.
Definitions and key terms:
- LiveView diff: minimal patch payload from server state delta.
- Presence metadata: distributed user/session descriptors.
- Tick loop: fixed cadence state update cycle.
Mental model diagram:
HTTP render -> websocket connect -> LiveView PID per socket -> event handlers -> diff patches
-> PubSub/Presence fanout across nodes
How it works:
- Client loads initial page over HTTP.
- JS client establishes websocket.
- LiveView process handles events and updates assigns.
- Framework computes diff and patches DOM.
- Presence broadcasts joins/leaves metadata.
Failure modes:
- Large per-socket state causes memory pressure.
- Fanout storms create mailbox spikes.
- Tick drift causes multiplayer desync.
Minimal concrete example:
tick_rate_hz=20
authoritative_state_update_every=50ms
presence_staleness_budget_ms=2000
Common misconceptions:
- “LiveView means no frontend complexity.” Real-time state modeling is still complex.
- “Presence equals strict online truth.” It is eventually convergent tracking.
Check-your-understanding questions:
- Why does per-socket process design simplify isolation but increase memory sensitivity?
- Why is authoritative server model preferred for multiplayer fairness?
- What is the risk of huge assigns in LiveView?
Check-your-understanding answers:
- Failures are isolated, but each socket carries state overhead.
- Prevents conflicting client authority and cheating/desync.
- Diff and GC pressure increase latency and memory usage.
Real-world applications:
- Collaborative editing, operations dashboards, multiplayer games.
Where you will apply it:
- Projects 22, 27.
References:
- https://hexdocs.pm/phoenix_live_view/welcome.html
- https://hexdocs.pm/phoenix/Phoenix.Presence.html
- https://hexdocs.pm/phoenix_pubsub/Phoenix.Tracker.html
Key insight: Real-time scale depends on state shape and fanout discipline more than websocket count alone.
Summary: Treat every connection as a process with explicit memory and message budget.
Homework/exercises:
- Measure memory growth per 1k LiveView sockets.
- Implement deterministic tick reconciliation for one shared state.
Solutions:
- Collect memory snapshots during staged connection ramp.
- Use server sequence numbers and conflict resolution rules.
Concept Cluster 9: Reliability and Fault Tolerance (Chaos, Circuit Breakers, Idempotency)
Fundamentals: Fault tolerance is engineered by design and verified by failure injection. Circuit breakers, bulkheads, backoff strategies, and idempotency keys control blast radius and duplicate effects.
Deep Dive: Chaos engineering on BEAM should target realistic faults: random process kill, selective node isolation, DNS failures, storage latency spikes, and dropped PubSub propagation. The objective is to verify resilience hypotheses and recovery time objectives, not to randomly break production for entertainment.
Circuit breakers prevent cascading failure by cutting traffic to degraded dependencies after failure thresholds. Bulkheads isolate resource pools so one bad dependency does not starve unrelated features. Backoff with jitter avoids synchronized retries that create retry storms.
Exactly-once delivery in distributed systems is usually an illusion; practical systems achieve at-least-once with idempotent handlers and deduplication keys. You design command handling so repeated message processing is safe and produces same final state.
How this fits in projects:
- Project 23 focuses on chaos and resilience controls.
- Projects 21 and 25 require idempotent boundaries.
- Project 27 validates full-system fault injections.
Definitions and key terms:
- Circuit breaker: failure-state gate around dependency.
- Bulkhead: resource partition for isolation.
- Idempotency key: unique operation key to deduplicate retries.
Mental model diagram:
Client -> Service -> Dependency
│
breaker + bulkhead + retry(backoff+jitter)
│
idempotent command handler
How it works:
- Detect dependency failures over rolling window.
- Open circuit when threshold exceeded.
- Route fallbacks or fast-fail responses.
- Retry with jitter when safe.
- Deduplicate command effects with idempotency key store.
Failure modes:
- Shared pool starvation without bulkheads.
- Duplicate side effects without idempotency.
- Retry storm from synchronized clients.
Minimal concrete example:
breaker_open_if: >50% failures in 20 requests
retry_policy: exp_backoff(base=100ms,max=5s,jitter=true)
idempotency_ttl: 24h
Common misconceptions:
- “Let it crash means no resilience patterns needed.” Supervisors are necessary but not sufficient.
- “At-least-once means duplicates are rare.” Under failure, duplicates are normal.
Check-your-understanding questions:
- Why pair breaker with bulkhead?
- What breaks if idempotency store has shorter TTL than retry horizon?
- Why add jitter to backoff?
Check-your-understanding answers:
- Breakers control traffic; bulkheads control resource isolation.
- Late retries can re-apply side effects.
- Prevent synchronized retry spikes.
Real-world applications:
- Payments, messaging, webhooks, workflow engines.
Where you will apply it:
- Projects 23, 25, 27.
References:
- https://principlesofchaos.org/
- https://martinfowler.com/bliki/CircuitBreaker.html
- https://stripe.com/docs/idempotency
Key insight: Reliability comes from repeatable failure handling policies, not optimism.
Summary: Design duplicate-safe operations and isolate degradation paths.
Homework/exercises:
- Inject 500 ms latency into one dependency and observe breaker behavior.
- Replay duplicate commands with same idempotency key.
Solutions:
- Verify fallback latency and error budget impact.
- Confirm exactly one durable side effect.
Concept Cluster 10: Advanced Testing (Property, Concurrency, Distributed)
Fundamentals: Unit tests validate examples; property-based tests validate invariants across broad input spaces. Concurrency tests validate race-sensitive behavior. Distributed tests validate convergence and fault recovery over node boundaries.
Deep Dive: Property-based testing with StreamData forces you to define system invariants clearly. Instead of asserting one value, you assert properties that must always hold: no negative balances, eventual convergence, monotonic sequence, idempotent replay, or bounded queue growth under given constraints. Stateful property tests model command sequences and find hidden transition bugs.
Concurrency testing requires deliberate race orchestration. Use barriers, randomized scheduling, and fault injection to expose message ordering assumptions. Distributed tests require multi-node harnesses with controlled partition and heal operations, then assertions about convergence and acceptable divergence windows.
How this fits in projects:
- Project 24 is dedicated advanced testing framework.
- Projects 17, 21, 23, 27 depend on these test techniques.
Definitions and key terms:
- Property: invariant expected to hold for generated inputs.
- Stateful model test: generated operation sequences against model + system.
- Fault injection: deliberate disturbance to test resilience.
Mental model diagram:
Model invariants -> generator -> random sequences -> system under faults -> invariant checks
How it works:
- Define invariant before implementation.
- Build generators for valid and edge inputs.
- Run many trials with shrinking.
- Add fault injections during trials.
- Persist counterexamples as regression tests.
Failure modes:
- Weak generators miss hard edge cases.
- Non-deterministic assertions create flaky tests.
- No seed capture prevents reproducibility.
Minimal concrete example:
property: applying same command with same idempotency key twice yields one side effect
Common misconceptions:
- “Property tests replace example tests.” They complement each other.
- “Flaky distributed tests are unavoidable.” Deterministic harness design reduces flake.
Check-your-understanding questions:
- What makes a good property definition?
- Why store failing seeds?
- How do you test netsplit reconciliation deterministically?
Check-your-understanding answers:
- Domain invariant independent of specific sample values.
- Reproduce and debug minimized counterexample.
- Script partition/heal timing and compare state snapshots.
Real-world applications:
- Billing invariants, replicated state correctness, queue reliability.
Where you will apply it:
- Projects 24, 27.
References:
- https://hexdocs.pm/stream_data/StreamData.html
- https://hexdocs.pm/ex_unit/ExUnit.html
Key insight: Advanced tests are executable system contracts under uncertainty.
Summary: State the invariant, generate adversarial inputs, and keep failures reproducible.
Homework/exercises:
- Write one property test for global rate limiter monotonicity.
- Add deterministic partition test with convergence assertion.
Solutions:
- Generate request bursts; assert non-decreasing accepted count bound by policy.
- Compare CRDT merge results after heal across nodes.
Concept Cluster 11: Production SaaS Patterns (Auditability, Background Work, Deploy Safety)
Fundamentals: Production SaaS reliability depends on operational patterns: soft deletion, audit logging, job orchestration, distributed scheduling, release strategies, graceful shutdown, and observability-first defaults.
Deep Dive: Soft deletion preserves recoverability and compliance workflows. Audit logging captures who changed what and when with immutable event records. Background orchestration handles long-running workflows with retries, dead-letter handling, and visibility. Distributed cron avoids duplicate runs across nodes through leadership or lock coordination.
Blue/green and canary deployments reduce risk by shifting traffic gradually and enabling rollback with minimal downtime. Graceful shutdown ensures inflight requests/jobs are drained before termination. Observability-first architecture means every critical flow emits structured logs, metrics, and traces by default, not as incident patchwork.
How this fits in projects:
- Project 25 is complete SaaS operations platform project.
- Project 27 capstone enforces these patterns end-to-end.
Definitions and key terms:
- Soft delete: logical deletion via metadata flag/timestamp.
- Audit log: immutable change record.
- Graceful drain: controlled stop that preserves inflight work integrity.
Mental model diagram:
User action -> write path -> audit event -> async jobs -> observability pipeline -> operations dashboard
How it works:
- Every state mutation emits audit event.
- Deletion marks records, does not immediately destroy.
- Jobs execute with retry/backoff and dead-letter routes.
- Deployment shifts traffic gradually with health gates.
- Shutdown drains requests and workers before exit.
Failure modes:
- Duplicate cron jobs across nodes.
- Hard delete breaking restoration/compliance needs.
- Deploy without drain causing dropped requests.
Minimal concrete example:
deploy_policy:
- max_unavailable: 1 node
- drain_timeout: 30s
- rollback_on_error_budget_breach: true
Common misconceptions:
- “We can add audit later.” Retroactive reconstruction is costly and incomplete.
- “Blue/green is overkill for mid-size SaaS.” Outage cost usually proves otherwise.
Check-your-understanding questions:
- Why separate business events and audit events?
- How do you prevent duplicate distributed cron execution?
- What metrics gate rollout progression?
Check-your-understanding answers:
- Different retention, schema, and compliance goals.
- Use leader election or distributed lock with timeout.
- Error rate, latency, saturation, and job backlog.
Real-world applications:
- B2B SaaS billing, admin systems, compliance-sensitive workflows.
Where you will apply it:
- Projects 25, 27.
References:
- https://hexdocs.pm/oban/Oban.html
- https://fly.io/docs/
- https://12factor.net/
Key insight: Operational patterns are product features because they decide availability and trust.
Summary: Design operations paths as first-class architecture.
Homework/exercises:
- Define audit schema for account lifecycle.
- Build deployment gate checklist with rollback criteria.
Solutions:
- Include actor, action, target, before/after, timestamp, request context.
- Gate on SLO deltas and queue drain completion.
Concept Cluster 12: Elite Mode Topics (Custom OTP Behaviours and Runtime Primitives)
Fundamentals: Elite BEAM mastery means you can create your own runtime abstractions: custom OTP behaviours, boot hooks, PubSub adapters, lightweight distributed stores, custom rate limiter libraries, job queues, CRDT implementations, and protocol servers.
Deep Dive: Building your own behaviour teaches callback contracts, lifecycle semantics, and extension points. Boot hooks teach startup order and dependency readiness management. Custom PubSub adapters teach transport abstraction and fanout policy. A tiny distributed DB or CRDT implementation teaches data model convergence and conflict semantics better than any slide deck.
A custom rate limiter library forces API design plus correctness/performance tradeoffs. Building an Oban-lite queue forces understanding of scheduling, retries, visibility timeout, and poison message handling. Implementing a small TCP protocol server tests backpressure, framing, parsing, and connection lifecycle control directly on BEAM.
How this fits in projects:
- Project 26 is the dedicated elite primitives build.
- Project 27 integrates selected primitives into one production-like system.
Definitions and key terms:
- Behaviour: callback contract enforced by runtime conventions.
- Boot hook: startup phase extension before full service ready.
- Adapter: pluggable implementation behind stable interface.
Mental model diagram:
Public API -> Behaviour Contract -> Adapter/Implementation -> Runtime Metrics -> Failure Policy
How it works:
- Define boundary interface and callback contracts.
- Build reference adapter with clear invariants.
- Add observability and failure policy.
- Validate under fault and load tests.
Failure modes:
- Leaky abstractions hide transport/storage semantics.
- Boot-order races create intermittent startup failures.
- Missing backpressure leads to queue explosions.
Minimal concrete example:
@callback acquire(key, tokens, now_ms) :: {:ok, state} | {:deny, retry_ms, state}
Common misconceptions:
- “Custom primitives are reinvention.” They are focused learning accelerators when scope is controlled.
- “Framework internals are too complex to learn by building mini versions.” Small replicas are the best learning path.
Check-your-understanding questions:
- What makes a callback contract robust?
- Why should adapters expose metrics uniformly?
- Which invariants must a queue guarantee?
Check-your-understanding answers:
- Explicit return types, failure semantics, and timeout expectations.
- Swap implementations without losing operational visibility.
- Delivery semantics, retry policy, ordering guarantees.
Real-world applications:
- Internal platform tooling, runtime extension libraries, protocol gateways.
Where you will apply it:
- Projects 26, 27.
References:
- https://www.erlang.org/doc/system/design_principles.html
- https://hexdocs.pm/phoenix_pubsub/Phoenix.PubSub.html
- https://www.erlang.org/docs/25/tutorial/c_port
Key insight: Building small runtime primitives gives first-principles confidence in every higher-level abstraction.
Summary: You become elite when you can define and verify your own runtime contracts.
Homework/exercises:
- Design a minimal behaviour for distributed lock acquisition.
- Implement wire protocol framing rules for a TCP command server.
Solutions:
- Include acquire/release/renew callbacks with timeout semantics.
- Use length-prefixed frames with checksum and schema version.
Glossary
- Reduction: BEAM runtime work unit used for scheduling fairness.
- Dirty Scheduler: dedicated scheduler pool for blocking or long native work.
- Mailbox Explosion: unbounded message queue growth causing latency/memory issues.
- Selective Receive: mailbox pattern matching that can cause expensive queue scans.
- Split Brain: partitioned cluster where multiple nodes believe they are authoritative.
- CRDT: data type with deterministic merge properties for eventual convergence.
- Hot Upgrade: changing running code/state without full node restart.
- Idempotency: repeated same operation produces same final durable effect.
- Bulkhead: isolation boundary preventing one subsystem from exhausting shared resources.
- High Cardinality Metric: metric label strategy creating too many series to manage safely.
Why Phoenix Matters (2026 Update)
Modern motivation:
- Phoenix gives unusual leverage for real-time product workloads because BEAM process isolation and OTP recovery are runtime-native, not bolt-on.
- This matters most when systems must stay responsive under concurrency, partial failure, and frequent deploy cycles.
Recent signals and statistics:
- Stack Overflow 2025 survey reports Elixir with a strong admired score (66.68%) among developers using the language, signaling unusually high satisfaction once adopted. Source: https://survey.stackoverflow.co/2025/technology#admired-and-desired-programming-scripting-and-markup-languages
- Hex package ecosystem shows large sustained package pull volume for Phoenix and related libraries, indicating ongoing production usage and ecosystem maturity. Source: https://hex.pm/packages/phoenix
Old vs new architecture perspective:
Traditional Web Tier BEAM/Phoenix Tier
┌──────────────────────────┐ ┌──────────────────────────┐
│ Thread/process pools │ │ Process per connection │
│ External worker retries │ │ OTP supervised retries │
│ Runtime restarts as fix │ │ Fault isolation by design│
└──────────────────────────┘ └──────────────────────────┘
Context and evolution:
- Distillery historically shaped release workflows in early Elixir production stacks.
- Mix releases became default modern path and reduced operational friction for most teams.
- Today, the strategic advantage is less about framework syntax and more about operable distributed runtime behavior.
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| 1. Deep BEAM Internals | Scheduler reductions, mailbox growth, per-process GC, binary memory ownership, and flow-control impact. |
| 2. Advanced OTP Architecture | Correct behavior choice (:gen_statem, supervisors, registry, ETS, Mnesia) based on explicit failure and state semantics. |
| 3. Distributed BEAM Engineering | Netsplit-aware design, CRDT convergence, shard ownership, and global limit tradeoffs under partition. |
| 4. Performance Engineering | SLO-first measurement loops using telemetry, tracing, profiling, and reproducible load tests. |
| 5. Release Engineering & Runtime | Runtime config correctness, upgrade compatibility, canary/rolling orchestration, and rollback discipline. |
| 6. Native Interop | Port vs NIF safety/performance boundaries and dirty scheduler implications. |
| 7. Architectural Patterns | Event sourcing, CQRS, multi-tenant isolation, and actor-mapped bounded contexts. |
| 8. Real-Time Systems | LiveView process lifecycle, diff behavior, presence semantics, and deterministic multiplayer sync. |
| 9. Reliability & Fault Tolerance | Chaos experiments, breaker/bulkhead design, and idempotency-first write paths. |
| 10. Advanced Testing | Property/stateful/concurrency/distributed testing for invariants under fault conditions. |
| 11. Production SaaS Patterns | Auditability, soft deletion, job orchestration, distributed cron, safe deploy and graceful shutdown. |
| 12. Elite Runtime Primitives | Building custom behaviours, adapters, queues, CRDTs, and protocol servers with explicit contracts. |
Project-to-Concept Map
| Project | Concepts Applied |
|---|---|
| Project 15 | 1, 4 |
| Project 16 | 2 |
| Project 17 | 2, 3 |
| Project 18 | 1, 4, 8 |
| Project 19 | 5 |
| Project 20 | 1, 6 |
| Project 21 | 2, 7, 9 |
| Project 22 | 3, 8 |
| Project 23 | 3, 9, 10 |
| Project 24 | 10, 9 |
| Project 25 | 5, 7, 11 |
| Project 26 | 6, 12 |
| Project 27 (Insane Capstone) | 1-12 |
Deep Dive Reading by Concept
| Concept | Book and Chapter | Why This Matters |
|---|---|---|
| BEAM Internals | “Elixir in Action” (Concurrency + OTP chapters) | Maps runtime behavior to practical instrumentation decisions. |
| OTP Architecture | “Elixir in Action” (Behaviours/Supervision) | Teaches behavior-level architectural constraints. |
| Distributed Systems | “Designing Data-Intensive Applications” Ch. 5-9 | Formalizes partition and consistency tradeoffs. |
| Performance Engineering | “Systems Performance” (Gregg), tracing chapters | Establishes rigorous performance workflow. |
| Release Engineering | Erlang Design Principles - Release handling docs | Required for safe live upgrade and rollback design. |
| Native Interop | Erlang NIF/Port docs + Rustler docs | Defines crash isolation and scheduler safety boundaries. |
| Event/CQRS Patterns | Fowler Event Sourcing + CQRS articles | Grounded model for audit-heavy SaaS workflows. |
| LiveView and Presence | “Programming Phoenix LiveView” + official docs | Explains server-driven real-time UI internals. |
| Reliability | “Release It!” by Michael Nygard | Concrete patterns for breaker/bulkhead/backoff systems. |
| Advanced Testing | StreamData + ExUnit docs | Invariant-driven correctness under concurrency/distribution. |
Quick Start: Your First 48 Hours
Day 1:
- Read Concept Clusters 1, 2, and 4 from this extension.
- Start Project 15 and capture your first scheduler/mailbox dashboard.
Day 2:
- Implement one backpressure rule in Project 15 and confirm mailbox stabilization.
- Draft transition table for Project 16
:gen_statemworkflow before coding.
Recommended Learning Paths
Path 1: Runtime Internals First
- Project 15 -> Project 18 -> Project 20 -> Project 19 -> Project 27
Path 2: Distributed Systems First
- Project 17 -> Project 22 -> Project 23 -> Project 24 -> Project 27
Path 3: SaaS Architecture First
- Project 21 -> Project 25 -> Project 24 -> Project 27
Success Metrics
- You can explain and prove where latency comes from (scheduler, mailbox, GC, I/O, or dependency).
- You can simulate partition and describe expected vs observed behavior.
- You can run a release canary + rollback with explicit safety gates.
- You can enforce idempotency and show duplicate-safe outcomes.
- You can load-test to 50k+ concurrent sessions with clear bottleneck diagnosis.
Project Overview Table
| Project | Topic Layer | Difficulty | Time Estimate | Observable Outcome |
|---|---|---|---|---|
| 15 | Deep BEAM Internals | Expert | 2 weeks | Scheduler/mailbox/memory dashboards with backpressure controls |
| 16 | Advanced OTP Architecture | Expert | 2-3 weeks | Workflow engine with :gen_statem and tuned supervision |
| 17 | Distributed BEAM Engineering | Expert | 3 weeks | Netsplit-tolerant clustered service with global discovery |
| 18 | Performance Engineering | Expert | 2 weeks | Reproducible load-test + trace/profile optimization loop |
| 19 | Release Engineering | Expert | 2 weeks | Hot-upgrade rehearsal pipeline and rollback drills |
| 20 | Native Interop | Expert | 2 weeks | Port vs NIF benchmark with crash-isolation report |
| 21 | Architectural Patterns | Expert | 3 weeks | Event-sourced, CQRS, multi-tenant bounded-context service |
| 22 | Real-Time Systems | Expert | 3 weeks | LiveView + Presence multiplayer consistency prototype |
| 23 | Reliability/Fault Tolerance | Expert | 2-3 weeks | Chaos-tested system with circuit breaker and bulkheads |
| 24 | Advanced Testing | Expert | 2 weeks | Property + concurrency + distributed invariant harness |
| 25 | Production SaaS Patterns | Expert | 3 weeks | Audit-ready, soft-delete, cron-safe, blue/green-capable SaaS core |
| 26 | Elite Runtime Primitives | Master | 4 weeks | Custom behaviour, queue, CRDT, PubSub adapter, protocol server |
| 27 | Insane Capstone | Master | 4-8 weeks | Multi-node collaborative platform validated at 50k+ concurrent sessions |
Project List
The following elite projects extend the original sprint from framework fluency to runtime and distributed systems mastery.
Project 15: Scheduler, Reductions, and Mailbox Pressure Lab
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: BEAM Internals
- Software or Tool:
:observer, telemetry, custom probes - Main Book: “Elixir in Action” by Sasa Juric
What you will build: A runtime diagnostics service that tracks scheduler run queues, process reductions, mailbox length, and binary memory pressure, then applies adaptive backpressure policy when limits are exceeded.
Why it teaches Phoenix: Phoenix behavior under stress is runtime behavior. This project makes internal pressure visible before it becomes user-visible incidents.
Core challenges you will face:
- Run queue saturation interpretation -> maps to scheduler fairness and preemption
- Mailbox explosion handling -> maps to flow control and bounded work
- Binary retention detection -> maps to memory model and GC behavior
Real World Outcome
$ mix run priv/labs/scheduler_lab.exs
[lab] warmup complete
[metric] scheduler.run_queue.max=24
[metric] process.mailbox.max=18420 pid=#PID<0.412.0>
[metric] vm.memory.binary=1910MB
[action] applied_backpressure=chat_broadcast channel=lobby reason=mailbox_limit
[result] p99_ws_latency_ms before=412 after=138
You will see a dashboard panel where queue and mailbox lines flatten after backpressure policy activation.
The Core Question You Are Answering
“How do I distinguish CPU pressure, queue pressure, and memory pressure before users report latency?”
This question matters because blind tuning often optimizes the wrong subsystem.
Concepts You Must Understand First
- Reduction budgeting
- What does preemption mean in BEAM?
- Book Reference: “Elixir in Action” - Concurrency chapters
- Mailbox behavior
- Why does selective receive degrade with queue size?
- Book Reference: “Elixir in Action” - Process communication
- Binary memory ownership
- Why can tiny slices retain huge binaries?
- Reference: Erlang Efficiency Guide
Questions to Guide Your Design
- Which three pressure signals trigger write throttling?
- How will you avoid noisy alerts from short spikes?
- Which operations can be safely deferred under pressure?
Thinking Exercise
Create two timelines:
- Timeline A: no backpressure, write-heavy broadcast.
- Timeline B: adaptive backpressure at mailbox threshold.
Predict how p99 latency and memory differ before running tests.
The Interview Questions They Will Ask
- “What is a reduction, and how does it affect fairness?”
- “Why can mailbox growth hurt latency without high CPU?”
- “How do dirty schedulers change tuning decisions?”
- “How do you prove a memory issue is binary retention?”
- “How would you mitigate a selective receive hotspot?”
Hints in Layers
Hint 1: Starting Point Instrument first, optimize second.
Hint 2: Next Level Track queue length at both scheduler and mailbox levels.
Hint 3: Technical Details Use periodic telemetry samples and threshold-based throttling state machine.
Hint 4: Tools/Debugging
Correlate :observer snapshots with custom metrics every 5 seconds.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| BEAM process model | “Elixir in Action” | Concurrency + OTP chapters |
| Runtime memory behavior | Erlang Efficiency Guide | Processes + Binary Handling |
Common Pitfalls and Debugging
Problem 1: “Backpressure never triggers”
- Why: threshold values too high for observed load.
- Fix: calibrate thresholds from baseline p95/p99 metrics.
- Quick test: replay same load profile and confirm action events.
Problem 2: “Latency improves but memory still climbs”
- Why: retained large binaries in long-lived process state.
- Fix: copy or re-encode binary at ownership boundary.
- Quick test: compare
vm.memory.binarybefore/after patch.
Definition of Done
- Scheduler, mailbox, and binary metrics are visible in one dashboard.
- Backpressure policy activates automatically under stress.
- Latency improvement is measurable in repeatable load scenario.
- Root-cause report explains one real bottleneck with evidence.
Project 16: OTP Control Plane with gen_statem, Dynamic Supervision, Registry, ETS, and Mnesia
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: OTP Architecture
- Software or Tool:
:gen_statem, DynamicSupervisor, Registry, ETS, Mnesia - Main Book: “Elixir in Action” by Sasa Juric
What you will build: A workflow control plane where each workflow instance is a :gen_statem process, discovered via registry, accelerated with ETS lookup, and persisted in Mnesia for replicated metadata.
Why it teaches Phoenix: It replaces “just use GenServer” instincts with explicit architecture primitives aligned to runtime semantics.
Core challenges you will face:
- State transition correctness -> maps to lifecycle modeling
- Restart policy tuning -> maps to failure containment
- Ownership and table lifecycle -> maps to ETS/Mnesia correctness
Real World Outcome
$ mix run priv/labs/control_plane.exs
[boot] supervisor tree started
[fsm] workflow=onboarding/42 state=pending -> state=verifying
[registry] lookup workflow:onboarding/42 -> #PID<0.551.0>
[ets] read latency p95=0.28ms
[mnesia] replicated write ok table=workflow_meta nodes=3
[recovery] killed worker #PID<0.551.0>, restarted and restored state=verifying
The Core Question You Are Answering
“Which OTP primitive should own each part of my workflow so failures recover predictably instead of randomly?”
Concepts You Must Understand First
gen_statemtransition modeling- Why model timers and retries as explicit states?
- Book Reference: OTP Design Principles
- Supervisor child restart types
- Temporary vs transient vs permanent behavior.
- Book Reference: OTP supervision docs
- ETS ownership and concurrency options
- What happens when table owner dies?
- Reference: ETS docs
Questions to Guide Your Design
- Which states require timeout transitions?
- Which child processes should never auto-restart?
- Which data belongs in ETS vs Mnesia?
Thinking Exercise
Draw failure domains for each tree branch and mark the expected restart scope for each crash type.
The Interview Questions They Will Ask
- “When would you choose
gen_statemover GenServer?” - “How do restart intensity settings protect production?”
- “What are the risks of a single global ETS owner?”
- “Why might Mnesia be useful here, and where would you avoid it?”
- “How do you validate state machine transitions?”
Hints in Layers
Hint 1: Starting Point Write state transition table before any implementation.
Hint 2: Next Level Separate control plane metadata from high-frequency counters.
Hint 3: Technical Details Use one supervisor branch for FSM workers, one for storage/discovery support.
Hint 4: Tools/Debugging Trace state transitions with telemetry event per transition.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| OTP behaviors | “Elixir in Action” | Behaviors + supervision chapters |
| State machines | Erlang/OTP docs | :gen_statem documentation |
Common Pitfalls and Debugging
Problem 1: “State machine stuck forever”
- Why: missing timeout transition path.
- Fix: define timeout event and fallback transition.
- Quick test: run with intentionally missing external ack.
Problem 2: “ETS table disappears unexpectedly”
- Why: owner process crash removed table.
- Fix: move ownership under supervised stable process.
- Quick test: kill owner and verify handoff/recreation path.
Definition of Done
- Workflow FSM transitions are explicit and logged.
- Supervisor restarts only intended scope.
- Registry lookup and ETS reads hit performance target.
- Mnesia metadata replication survives single-node failure.
Project 17: Partition-Tolerant Cluster with Horde, CRDT Membership, and Global Rate Limiting
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Distributed BEAM Engineering
- Software or Tool: libcluster, Horde, Delta CRDT, Phoenix.PubSub
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you will build: A multi-node service that survives netsplits, reconciles CRDT-backed ownership state, and enforces a configurable global rate-limit policy (strict mode vs approximate mode).
Why it teaches Phoenix: Real-time Phoenix at scale is distributed systems engineering. This project makes partition semantics explicit.
Core challenges you will face:
- Split-brain ownership conflicts -> maps to distributed process patterns
- Convergence semantics -> maps to Delta CRDT behavior
- Global quota strategy -> maps to CAP tradeoffs
Real World Outcome
$ mix run priv/labs/netsplit_lab.exs
[cluster] nodes=3 connected
[test] partition node_a <-> node_b,node_c
[mode] approximate_global_limit enabled
[metric] accepted_rate node_a=99/s node_b=101/s
[heal] partition resolved in 32s
[reconcile] ownership_conflicts=3 resolved=3
[result] no double-owned workers after convergence window
The Core Question You Are Answering
“What does my system promise during a partition, and how do I prove it after healing?”
Concepts You Must Understand First
- CAP and partition tolerance semantics
- Which operations can remain available under partition?
- CRDT convergence basics
- Why deterministic merge logic matters.
- Sharding and ownership mapping
- How keys map to workers and nodes.
Questions to Guide Your Design
- Which requests should fail-fast in strict mode?
- What is acceptable divergence window in approximate mode?
- How will you detect post-heal duplicate ownership?
Thinking Exercise
Describe two truth tables:
- strict limit under partition,
- approximate limit under partition.
Include user-visible behavior for each path.
The Interview Questions They Will Ask
- “How do you avoid split-brain double writes?”
- “When would you pick eventual consistency intentionally?”
- “How do you validate CRDT convergence in tests?”
- “Why can strict global limits hurt availability?”
- “How do you make partition behavior product-visible?”
Hints in Layers
Hint 1: Starting Point Document semantics first, implementation second.
Hint 2: Next Level Build deterministic ownership conflict resolver.
Hint 3: Technical Details Maintain conflict journal with before/after owner snapshots.
Hint 4: Tools/Debugging Run scripted partition/heal tests with fixed timeline.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Partition behavior | “Designing Data-Intensive Applications” | Replication + partition chapters |
| Cluster process discovery | Horde docs | Readme and usage sections |
Common Pitfalls and Debugging
Problem 1: “Cluster heals but stale owners remain”
- Why: non-deterministic conflict resolution policy.
- Fix: enforce stable owner tie-break rule.
- Quick test: repeat same partition scenario three times and compare final owners.
Problem 2: “Global limit oscillates wildly”
- Why: sync interval too short or too long for traffic pattern.
- Fix: tune window and sync cadence, or switch mode by workload.
- Quick test: replay burst profile and chart accept/deny variance.
Definition of Done
- Netsplit simulation is automated and repeatable.
- Conflict resolution produces deterministic final ownership.
- Global rate limit mode behavior is documented and tested.
- Cluster converges within defined recovery budget.
Project 18: Phoenix Performance War Room (Tracing, Telemetry, Profiling, and 100k-Connection Modeling)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: JavaScript (k6)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 5: Master
- Knowledge Area: Performance Engineering
- Software or Tool: k6,
:observer, telemetry,:fprof,:eprof - Main Book: “Systems Performance” by Brendan Gregg
What you will build: A full performance pipeline for a LiveView + API Phoenix app that includes load profiles, SLO dashboards, targeted tracing, and profiler-driven optimization reports.
Why it teaches Phoenix: It turns performance from guesswork into reproducible engineering.
Core challenges you will face:
- Load model realism -> maps to production confidence
- High-cardinality metric control -> maps to observability economics
- Hotspot attribution -> maps to profiling discipline
Real World Outcome
$ k6 run priv/perf/liveview_mix_profile.js
running (10m), 000/50000 VUs, 125.6k req/s
http_req_duration{route=/api/feed}: p(99)=182ms
ws_broadcast_latency_ms: p(99)=118
error_rate=0.31%
$ mix run priv/perf/profile_report.exs
[profile] top hotspot=MyApp.Live.Room.handle_event/3 (31.2%)
[fix] reduced assign payload size by 67%
[result] ws_broadcast_latency_ms p99=74
The Core Question You Are Answering
“Can I explain every major latency jump with evidence, and can I prove improvement under identical load?”
Concepts You Must Understand First
- SLO and error budget design
- Telemetry event schema and cardinality
- Profiler output interpretation (
:fprof/:eprof)
Questions to Guide Your Design
- Which user journeys must be modeled in load scenarios?
- Which labels are safe for metrics and which are too cardinal?
- What constitutes statistically meaningful improvement?
Thinking Exercise
Define one “false improvement” scenario (p50 improves, p99 worsens). Explain why this is a regression.
The Interview Questions They Will Ask
- “How do you design a realistic Phoenix load test?”
- “Why can telemetry hurt performance if misused?”
- “When do you use tracing vs profiling?”
- “How do you tune LiveView memory per connection?”
- “How do you make perf tests reproducible in CI?”
Hints in Layers
Hint 1: Starting Point Pick 3 SLOs before writing any script.
Hint 2: Next Level Separate websocket-heavy and API-heavy traffic profiles.
Hint 3: Technical Details Use fixed warmup, steady-state, and cooldown phases.
Hint 4: Tools/Debugging Capture profile snapshots only during steady-state window.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Performance workflow | “Systems Performance” | Methodology + CPU analysis |
| Phoenix observability | telemetry docs | Event handling and measurements |
Common Pitfalls and Debugging
Problem 1: “Great benchmark, bad production”
- Why: unrealistic traffic shape and no background jobs in test profile.
- Fix: include mixed workload and burst patterns.
- Quick test: compare synthetic and production metric distributions.
Problem 2: “Metrics backend overloaded”
- Why: high-cardinality labels (user_id, full URL path).
- Fix: normalize labels to route and cohort.
- Quick test: track series count before/after label cleanup.
Definition of Done
- Load scripts cover at least three realistic traffic patterns.
- SLO dashboard tracks p50/p95/p99 and error rate.
- One profiling-informed optimization improves p99 measurably.
- Results are reproducible from versioned scripts and configs.
Project 19: Release Engineering Lab (Hot Upgrades, Runtime Config, Mix vs Distillery Internals)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Shell
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Release Engineering
- Software or Tool: Mix releases, release handling scripts
- Main Book: Erlang Design Principles docs
What you will build: A release pipeline that validates runtime config on boot, performs canary upgrade, simulates hot code/state migration, and rehearses rollback under active traffic.
Why it teaches Phoenix: BEAM operational power is realized only when release workflows are safe under pressure.
Core challenges you will face:
- State migration safety -> maps to hot upgrade correctness
- Config provider reliability -> maps to runtime boot stability
- Rollback speed -> maps to incident recovery budget
Real World Outcome
$ ./ops/release_drill.sh
[build] release artifact generated
[preflight] runtime config schema check: PASS
[deploy] canary node upgraded to v2
[migrate] state converter executed: 124 live processes migrated
[verify] p99 latency delta +6ms (within budget)
[rollback-test] success in 41s
The Core Question You Are Answering
“Can I upgrade and rollback a live Phoenix cluster without losing correctness or blowing SLO budgets?”
Concepts You Must Understand First
- Mix release lifecycle and runtime config providers
- Hot code upgrade constraints for long-lived process state
- Rolling deploy orchestration and rollback gates
Questions to Guide Your Design
- What preflight checks are mandatory before canary?
- Which states require explicit conversion logic?
- What metrics trigger automatic rollback?
Thinking Exercise
Design a failure drill where upgrade succeeds technically but must rollback due to latency regression.
The Interview Questions They Will Ask
- “How do appup/relup concepts influence upgrade planning?”
- “What breaks first during a bad runtime config rollout?”
- “How do you test hot upgrade compatibility safely?”
- “Why compare Distillery history vs modern Mix releases?”
- “What is your rollback SLO?”
Hints in Layers
Hint 1: Starting Point Build deterministic preflight checklist first.
Hint 2: Next Level Include state schema version in process state.
Hint 3: Technical Details Implement conversion function for each version bump.
Hint 4: Tools/Debugging Run canary under steady traffic, not idle environment.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Release handling | Erlang Design Principles | Release handling section |
| Deploy safety | “Release It!” | Deployment and stability chapters |
Common Pitfalls and Debugging
Problem 1: “Node boots but features fail”
- Why: config defaults masked missing secret values.
- Fix: fail-fast config schema validation before app start.
- Quick test: run boot with intentionally missing secret.
Problem 2: “Upgrade succeeds but memory grows”
- Why: migrated state retains legacy fields and large payloads.
- Fix: trim migrated state and enforce shape contracts.
- Quick test: compare memory snapshots pre/post upgrade.
Definition of Done
- Preflight checks are automated and version-controlled.
- Canary upgrade and rollback are both rehearsed.
- State migration logic is tested against prior versions.
- Runtime config fails fast on invalid or missing inputs.
Project 20: Native Boundary Lab (Ports vs NIFs with Rustler and Dirty Scheduler Safety)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Rust, C
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 5: Master
- Knowledge Area: Native Interop
- Software or Tool: Port protocol, Rustler NIF, dirty schedulers
- Main Book: Erlang NIF/Port docs
What you will build: The same CPU-heavy and I/O-heavy service exposed through both Port and NIF integration paths, with safety and latency benchmarks.
Why it teaches Phoenix: You will learn exactly where to draw runtime boundaries for performance without sacrificing crash isolation.
Core challenges you will face:
- Boundary contract design -> maps to data integrity and compatibility
- Scheduler blocking risk -> maps to dirty NIF discipline
- Crash blast radius -> maps to operational risk management
Real World Outcome
$ mix run priv/labs/native_boundary.exs
[benchmark] port_path p99=3.8ms throughput=29k/s
[benchmark] nif_path p99=1.6ms throughput=41k/s
[fault] forced native crash in port binary: VM survived, worker restarted
[fault] forced panic in NIF: process crash contained, no scheduler starvation observed
[decision] selected port for untrusted workload, nif for bounded compute path
The Core Question You Are Answering
“Where should native code run so I get performance gains without turning one bug into a full-platform outage?”
Concepts You Must Understand First
- NIF execution model and scheduler impact
- Port communication and supervision recovery
- Memory ownership across VM/native boundaries
Questions to Guide Your Design
- What latency budget justifies NIF complexity?
- Which calls must be bounded or offloaded to dirty schedulers?
- What failure behavior is acceptable for each integration path?
Thinking Exercise
Create a risk matrix: performance gain vs crash isolation for three workloads (trusted CPU, untrusted parser, external model inference).
The Interview Questions They Will Ask
- “Why are ports usually safer than NIFs?”
- “When is dirty NIF usage mandatory?”
- “How do you benchmark boundary overhead honestly?”
- “How does Rustler help, and what risk remains?”
- “How do you design kill-switch fallback if native path fails?”
Hints in Layers
Hint 1: Starting Point Define one canonical request/response schema for both paths.
Hint 2: Next Level Use identical workload generator for fair comparison.
Hint 3: Technical Details Add timeout wrappers and circuit-breaker around native calls.
Hint 4: Tools/Debugging Track scheduler utilization during NIF stress runs.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| NIF behavior | Erlang docs | NIF tutorial and API docs |
| Port architecture | Erlang docs | C nodes and ports tutorial |
Common Pitfalls and Debugging
Problem 1: “NIF benchmark great, production unstable”
- Why: benchmark lacked failure and long-tail scenarios.
- Fix: include fault injection and sustained saturation runs.
- Quick test: run 30-minute soak with failure injection every minute.
Problem 2: “Port throughput too low”
- Why: chatty protocol and small message framing overhead.
- Fix: batch messages and optimize binary framing.
- Quick test: compare messages/sec before and after batching.
Definition of Done
- Port and NIF implementations pass identical contract tests.
- Benchmarks include latency, throughput, and failure drills.
- Scheduler safety is validated under stress.
- Decision memo states which path is used for which workload and why.
Project 21: Event-Sourced, CQRS, Multi-Tenant Phoenix Domain
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: SQL
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Architecture Patterns
- Software or Tool: Phoenix, Ecto, Oban, tenant routing
- Main Book: “Domain-Driven Design” by Eric Evans
What you will build: A tenant-aware command API that appends domain events, projects async read models, and exposes optimized read endpoints with strict tenant isolation.
Why it teaches Phoenix: This is how Phoenix becomes enterprise architecture rather than framework code.
Core challenges you will face:
- Command/event schema evolution -> maps to long-lived system maintainability
- Projection lag handling -> maps to user experience tradeoffs
- Tenant isolation correctness -> maps to security and compliance
Real World Outcome
$ mix run priv/labs/cqrs_tenant_demo.exs
[tenant] acme command CreateInvoice accepted id=inv_1042
[event] InvoiceCreated appended version=1
[projection] read model updated lag_ms=128
[query] GET /api/acme/invoices/inv_1042 -> status=200
[audit] actor=user_91 action=create_invoice recorded
[isolation-test] cross-tenant query blocked status=403
The Core Question You Are Answering
“How do I preserve strict write correctness and fast reads while keeping tenant boundaries unbreakable?”
Concepts You Must Understand First
- Event sourcing command vs event separation
- CQRS read/write model responsibilities
- Multi-tenant routing and authorization boundaries
Questions to Guide Your Design
- Which invariants are enforced on command path only?
- What read staleness is acceptable for each endpoint?
- Where is tenant context attached and validated?
Thinking Exercise
Map one invoice lifecycle from command to projection and identify where duplicates or replays can occur.
The Interview Questions They Will Ask
- “Why separate commands and events?”
- “How do you backfill read models after schema changes?”
- “How do you enforce tenant isolation in every query path?”
- “How do you make projections idempotent?”
- “When would CQRS be unnecessary complexity?”
Hints in Layers
Hint 1: Starting Point Define aggregate invariants before endpoint design.
Hint 2: Next Level Use versioned events and projection checkpointing.
Hint 3: Technical Details Use Oban jobs for projection with idempotent upserts.
Hint 4: Tools/Debugging Track projection lag and per-tenant error counters.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Domain boundaries | “Domain-Driven Design” | Strategic + tactical design |
| Event/CQRS patterns | Fowler articles | Event Sourcing + CQRS |
Common Pitfalls and Debugging
Problem 1: “Read model inconsistent after retries”
- Why: projection handler not idempotent.
- Fix: use deterministic event_id checkpoints and upserts.
- Quick test: replay same event batch twice and compare rows.
Problem 2: “Tenant leak in admin endpoint”
- Why: missing tenant scope in one query path.
- Fix: enforce tenant scoping at repository boundary.
- Quick test: run cross-tenant fuzz query suite.
Definition of Done
- Commands enforce domain invariants with idempotency keys.
- Event stream and projections are versioned and replay-safe.
- Tenant isolation tests pass for API and job paths.
- Projection lag is observable and within target budgets.
Project 22: LiveView and Presence Internals for Multiplayer Real-Time Systems
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: JavaScript (optional client prediction)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Real-Time System Engineering
- Software or Tool: Phoenix LiveView, Presence, PubSub
- Main Book: “Programming Phoenix LiveView” by Bruce Tate and Sophie DeBenedetto
What you will build: A multiplayer room service with authoritative server ticks, Presence state propagation, and LiveView UI updates optimized for connection scale.
Why it teaches Phoenix: This project exposes the process lifecycle and diff behavior behind “magic” real-time UX.
Core challenges you will face:
- Socket memory budgeting -> maps to LiveView scaling internals
- Presence convergence behavior -> maps to distributed metadata reliability
- Deterministic tick processing -> maps to multiplayer correctness
Real World Outcome
$ mix run priv/labs/multiplayer_room.exs
[room] started room=lobby tick_rate=20Hz
[presence] users_online=1240
[latency] server_tick_apply_p99=34ms
[liveview] diff_payload_avg_bytes=412
[consistency] divergent_state_events=0 over 10min deterministic run
UI outcome:
- User list updates in near real-time.
- Shared canvas/state updates every tick.
- Reconnect path restores state and presence metadata.
The Core Question You Are Answering
“How do I keep real-time shared state fair and consistent when thousands of clients join, leave, and burst actions?”
Concepts You Must Understand First
- LiveView socket lifecycle and diff model
- Presence/Tracker propagation semantics
- Authoritative server tick loop design
Questions to Guide Your Design
- Which state is authoritative on server vs client?
- What is the max acceptable tick drift?
- How do you recover quickly from reconnect storms?
Thinking Exercise
Design conflict resolution for two simultaneous moves on the same game object from different clients.
The Interview Questions They Will Ask
- “How does LiveView send only partial updates?”
- “Why is Presence not a strict source of truth for billing?”
- “How do you design deterministic tick processing?”
- “What causes memory blowup per connection?”
- “How do you make reconnect experience smooth?”
Hints in Layers
Hint 1: Starting Point Keep per-socket assigns minimal and structured.
Hint 2: Next Level Batch state changes per tick instead of per event.
Hint 3: Technical Details Emit sequence numbers for authoritative updates.
Hint 4: Tools/Debugging Compare server state hashes across nodes every interval.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| LiveView internals | “Programming Phoenix LiveView” | State and lifecycle chapters |
| Real-time architecture | Phoenix docs | Presence + PubSub docs |
Common Pitfalls and Debugging
Problem 1: “UI smooth, server overloaded”
- Why: too many per-event broadcasts and large assigns.
- Fix: coalesce updates by tick and slim assigns.
- Quick test: compare diff bytes and mailbox length before/after.
Problem 2: “Presence ghost users”
- Why: stale metadata window not handled in UI.
- Fix: apply grace timeout and metadata freshness checks.
- Quick test: force disconnect and confirm stale entries expire.
Definition of Done
- Room tick loop is deterministic under replay.
- LiveView memory per connection is measured and documented.
- Presence convergence behavior is validated across nodes.
- Reconnect behavior restores user state correctly.
Project 23: Chaos and Resilience Engineering on BEAM
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Shell
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 5: Master
- Knowledge Area: Reliability and Fault Tolerance
- Software or Tool: chaos scripts, circuit breakers, retry policies
- Main Book: “Release It!” by Michael Nygard
What you will build: A resilience test harness that kills random processes, injects network delay and netsplits, validates breaker and bulkhead behavior, and verifies idempotent command handling.
Why it teaches Phoenix: It proves fault tolerance properties instead of assuming them.
Core challenges you will face:
- Experiment safety design -> maps to controlled chaos practice
- Breaker/bulkhead tuning -> maps to blast-radius control
- Duplicate handling -> maps to exactly-once illusion management
Real World Outcome
$ ./chaos/run_suite.sh
[experiment] random_worker_kill every=5s duration=2m
[experiment] dependency_latency +600ms on payments service
[breaker] payments opened at t+38s
[bulkhead] feed service unaffected p99=112ms
[idempotency] duplicate_command_replays=120 side_effect_duplicates=0
[result] all resilience hypotheses passed
The Core Question You Are Answering
“When systems fail in ugly ways, what still works, what degrades, and what data remains correct?”
Concepts You Must Understand First
- Chaos engineering hypothesis-driven experiments
- Circuit breaker and bulkhead isolation strategy
- Idempotency key design and retry semantics
Questions to Guide Your Design
- Which user-facing SLOs must remain within budget during failures?
- Which dependencies get dedicated bulkheads?
- How long must idempotency keys remain valid?
Thinking Exercise
Write a failure budget table for three dependencies: payments, notifications, analytics. Define expected behavior when each fails.
The Interview Questions They Will Ask
- “How do you run chaos experiments safely in production-like environments?”
- “Why pair circuit breakers with bulkheads?”
- “What does exactly-once illusion mean operationally?”
- “How do you test idempotency under retry storms?”
- “How do you communicate degraded modes to product stakeholders?”
Hints in Layers
Hint 1: Starting Point Define hypothesis and stop conditions for every experiment.
Hint 2: Next Level Tag every command with idempotency key at ingress.
Hint 3: Technical Details Store dedupe outcomes with TTL exceeding max retry horizon.
Hint 4: Tools/Debugging Correlate chaos events with SLO graph annotations.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Resilience patterns | “Release It!” | Stability patterns |
| Chaos methodology | Principles of Chaos Engineering | Entire principles page |
Common Pitfalls and Debugging
Problem 1: “Chaos tests pass but incidents still bad”
- Why: experiments did not reflect real failure modes.
- Fix: derive scenarios from incident history and dependency graphs.
- Quick test: compare experiment catalog against incident postmortems.
Problem 2: “Idempotency store grew uncontrollably”
- Why: TTL too long without cleanup strategy.
- Fix: partition and expire with bounded retention policy.
- Quick test: run 24h simulation and verify bounded table growth.
Definition of Done
- Chaos suite runs automatically with clear stop conditions.
- Breaker and bulkhead behavior is observable and validated.
- Duplicate command replay never creates duplicate side effects.
- Degraded mode behavior is documented for product teams.
Project 24: Advanced Verification Lab (Property-Based, Concurrency, and Distributed Tests)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 4: Expert
- Knowledge Area: Advanced Testing
- Software or Tool: ExUnit, StreamData, multi-node test harness
- Main Book: StreamData docs + ExUnit docs
What you will build: A test harness that validates critical invariants with generated data, race simulations, and controlled partition/heal scenarios.
Why it teaches Phoenix: It upgrades testing from examples to system guarantees under failure and concurrency.
Core challenges you will face:
- Invariant definition quality -> maps to domain correctness
- Deterministic race reproduction -> maps to concurrency confidence
- Multi-node test control -> maps to distributed correctness
Real World Outcome
$ MIX_ENV=test mix test test/advanced --seed 424242
[property] invoice_projection_idempotent: PASS (1000 cases)
[property] global_limit_monotonicity: PASS (500 cases)
[concurrency] race_suite: PASS (32 orchestrated interleavings)
[distributed] partition_reconcile_suite: PASS (10 scripted partitions)
The Core Question You Are Answering
“Can I prove core invariants still hold under random input, race conditions, and network partitions?”
Concepts You Must Understand First
- Property-based testing and shrinking
- Stateful model testing
- Deterministic distributed fault harness design
Questions to Guide Your Design
- Which invariants are safety-critical and must never break?
- Which random generators represent realistic bad input?
- How will failing seeds be persisted and replayed?
Thinking Exercise
Choose one invariant and describe how it could fail under replay, race, and partition separately.
The Interview Questions They Will Ask
- “What makes a property test meaningful?”
- “How do you avoid flaky concurrency tests?”
- “Why store random seeds from failing runs?”
- “How do you test eventual convergence properties?”
- “What should remain example-based even with property tests?”
Hints in Layers
Hint 1: Starting Point Start with one invariant that matters to money or data integrity.
Hint 2: Next Level Build generators for both valid and malformed inputs.
Hint 3: Technical Details Record failing seeds and minimize counterexamples automatically.
Hint 4: Tools/Debugging Use structured logs tagged with test run id and seed.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Property testing | StreamData docs | Generators and properties |
| Test harness design | ExUnit docs | Async, tags, setup/teardown |
Common Pitfalls and Debugging
Problem 1: “Property tests pass but bug still escaped”
- Why: generators did not represent critical edge cases.
- Fix: enrich generators with boundary and adversarial distributions.
- Quick test: mutation run that intentionally breaks invariant path.
Problem 2: “Distributed tests flaky”
- Why: timing-dependent assertions without deterministic control.
- Fix: scripted barriers and explicit convergence windows.
- Quick test: run suite 20x with fixed seed and compare outcomes.
Definition of Done
- At least three high-value invariants are property-tested.
- Concurrency race suite is deterministic and repeatable.
- Distributed partition tests validate reconciliation behavior.
- Failing seeds are captured and replayable in CI.
Project 25: Production SaaS Reliability Platform for Phoenix
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: SQL, Shell
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Production-Grade SaaS Patterns
- Software or Tool: Oban, scheduler/cron coordination, deployment gates
- Main Book: “Release It!” by Michael Nygard
What you will build: A SaaS operations layer implementing soft deletion, audit logging, background job orchestration, distributed cron safety, blue/green deploy controls, graceful shutdown, and observability-first defaults.
Why it teaches Phoenix: This is the layer that converts a functional app into a durable business system.
Core challenges you will face:
- Compliance-ready audit design -> maps to enterprise trust
- Distributed scheduler correctness -> maps to operational integrity
- Deploy/shutdown safety -> maps to production reliability
Real World Outcome
$ mix run priv/ops/saas_readiness_demo.exs
[audit] account.updated actor=user_18 recorded immutable=true
[soft_delete] customer_91 marked deleted_at=2026-02-12T09:12:00Z
[jobs] retry queue healthy failed_jobs=0
[cron] leader lock acquired node=app@node2
[deploy] blue->green switch complete error_rate=0.2%
[shutdown] drained_http=132 drained_jobs=17 lost_work=0
The Core Question You Are Answering
“Can this Phoenix system survive real SaaS operational demands without data loss, compliance gaps, or fragile deploys?”
Concepts You Must Understand First
- Audit event schema design
- Background orchestration and retry semantics
- Blue/green and graceful drain mechanics
Questions to Guide Your Design
- Which actions must always produce audit events?
- How do you guarantee one active distributed cron leader?
- What shutdown timeout balances safety and availability?
Thinking Exercise
Draft an outage scenario during deploy and define how blue/green plus graceful drain should prevent user-visible errors.
The Interview Questions They Will Ask
- “How do you implement soft delete without creating query bugs?”
- “What belongs in an audit event payload?”
- “How do you avoid duplicate cron jobs across nodes?”
- “What is your graceful shutdown sequence?”
- “How do observability defaults reduce MTTR?”
Hints in Layers
Hint 1: Starting Point Define write-path hooks for audit logging first.
Hint 2: Next Level Add tenant-aware job idempotency keys.
Hint 3: Technical Details Implement deploy gate pipeline with rollback trigger conditions.
Hint 4: Tools/Debugging Tag logs with deployment_id and tenant_id for incident slicing.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Production stability | “Release It!” | Stability and release chapters |
| Operational design | 12factor.net | Config, logs, process principles |
Common Pitfalls and Debugging
Problem 1: “Soft deleted records still appear”
- Why: one query path bypassed default scope.
- Fix: centralize scoped query builders.
- Quick test: run full endpoint list against soft-deleted fixtures.
Problem 2: “Cron ran twice after failover”
- Why: lock expiration and handover race.
- Fix: tighten lock semantics and handover jitter.
- Quick test: force leader restart and inspect execution count.
Definition of Done
- All critical mutations emit immutable audit events.
- Soft-delete restore and retention policies are tested.
- Distributed cron executes exactly once per schedule window.
- Blue/green rollout and graceful shutdown drills pass.
Project 26: Elite Runtime Primitives Workshop (Custom Behaviour, PubSub Adapter, Queue, CRDT, and TCP Protocol Server)
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: Erlang, Rust
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Elite Mode Topics
- Software or Tool: custom OTP behaviour, custom adapter implementations
- Main Book: OTP Design Principles
What you will build: A package of mini runtime primitives: custom OTP behaviour, boot hooks, custom PubSub adapter, lightweight distributed key-value layer, rate limiter library, Oban-lite queue, minimal CRDT, and custom TCP command protocol server.
Why it teaches Phoenix: Building mini runtime components reveals where framework abstractions come from and where they can be extended safely.
Core challenges you will face:
- Callback contract rigor -> maps to reusable runtime APIs
- Adapter compatibility -> maps to pluggable architecture
- Protocol correctness under pressure -> maps to system integrity
Real World Outcome
$ mix run priv/labs/elite_primitives_demo.exs
[behaviour] custom limiter adapter loaded
[boot_hook] warm cache seeded entries=5000
[pubsub_adapter] local fanout + cross-node relay active
[queue] retry/deadletter flow validated
[crdt] merge conflict resolved deterministic=true
[tcp] protocol server handled 20k commands/min without frame errors
The Core Question You Are Answering
“Can I design runtime primitives with explicit contracts, measurable behavior, and safe failure semantics?”
Concepts You Must Understand First
- OTP behaviour contract design
- Adapter and abstraction boundaries
- Protocol framing and backpressure handling
Questions to Guide Your Design
- What invariants must every adapter preserve?
- How will boot hook ordering be validated?
- Which protocol errors must force connection close?
Thinking Exercise
Write compatibility checklist for swapping one PubSub adapter implementation with another without code changes at call sites.
The Interview Questions They Will Ask
- “What makes a custom OTP behaviour useful rather than overengineering?”
- “How do you design stable callback contracts?”
- “How would you build a minimal distributed queue safely?”
- “Why is protocol framing design critical for TCP servers?”
- “How do you test adapter interchangeability?”
Hints in Layers
Hint 1: Starting Point Define interfaces and invariants in docs before implementation.
Hint 2: Next Level Build one reference adapter and one alternative adapter.
Hint 3: Technical Details Add compliance test suite each adapter must pass.
Hint 4: Tools/Debugging Use protocol fuzzing for frame parser hardening.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Behaviour design | OTP design principles | Behaviours and applications |
| Protocol engineering | TCP/IP references | framing and reliability sections |
Common Pitfalls and Debugging
Problem 1: “Adapters technically work but metrics mismatch”
- Why: inconsistent telemetry schema across implementations.
- Fix: define shared metric contract in behaviour spec.
- Quick test: swap adapters and compare dashboard continuity.
Problem 2: “TCP parser fails under burst input”
- Why: frame boundary assumptions invalid under partial reads.
- Fix: implement buffered incremental parser.
- Quick test: replay fragmented and concatenated frame streams.
Definition of Done
- Custom behaviour has clear callback and invariant docs.
- Multiple adapters pass same compliance test suite.
- Queue/CRDT/protocol primitives handle fault scenarios.
- All components emit unified telemetry schema.
Project 27: Insane Capstone - Multi-Node Eventually Consistent Real-Time Collaborative Platform
- File: LEARN_PHOENIX_FRAMEWORK_DEEP_DIVE.md
- Main Programming Language: Elixir
- Alternative Programming Languages: JavaScript (k6), Rust (optional native paths)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 5. The “Industry Disruptor”
- Difficulty: Level 5: Master
- Knowledge Area: Full-Stack Distributed Phoenix + BEAM Internals
- Software or Tool: LiveView, CRDT, ETS, distributed rate limiting, chaos tools, telemetry stack
- Main Book: Composite reading across all prior references
What you will build: A multi-node, eventually consistent, real-time collaborative system with LiveView, CRDT-based shared state, ETS caching, distributed rate limiting, hot upgrade simulation, chaos testing, telemetry dashboards, and load testing toward 50k+ concurrent users.
Why it teaches Phoenix: It is the full demonstration of BEAM runtime mastery plus production operations discipline.
Core challenges you will face:
- Convergence under partition -> maps to CRDT and distributed design
- Tail latency under high concurrency -> maps to runtime and performance engineering
- Upgrade/chaos resilience -> maps to release and reliability mastery
Real World Outcome
$ ./ops/capstone_runbook.sh
[stage-1] cluster boot nodes=6 regions=2
[stage-2] collaborative rooms active=1200 users=50874
[stage-3] induced netsplit between region_a and region_b (45s)
[stage-4] conflict merges applied=18492 unresolved=0
[stage-5] rolling hot-upgrade v1.9 -> v2.0 completed
[stage-6] chaos kill random room supervisors every 10s (5m)
[stage-7] load test complete ws_p99=143ms http_p99=201ms error_rate=0.44%
[success] SLO targets met, convergence verified, rollback rehearsal passed
User-visible product behavior:
- Multiple users edit the same shared canvas/document with near real-time sync.
- Presence shows participants accurately with bounded staleness.
- During partition, writes continue with eventual convergence once healed.
- During rolling upgrade, active sessions remain usable.
The Core Question You Are Answering
“Can I design and operate a real-time distributed Phoenix platform that stays correct, observable, and resilient when everything goes wrong at once?”
Concepts You Must Understand First
- BEAM scheduler/memory/mailbox diagnostics
- OTP architecture and distributed ownership
- CRDT convergence and partition policy
- Telemetry/tracing/profiling under scale
- Release safety and chaos resilience patterns
Questions to Guide Your Design
- What are hard invariants vs eventually consistent invariants?
- Which degraded modes are acceptable and for how long?
- What metrics decide automated rollback during upgrade?
Thinking Exercise
Design a “worst 15 minutes” incident timeline combining partition + dependency latency + rolling upgrade, and specify expected system behavior minute by minute.
The Interview Questions They Will Ask
- “How does your system behave during split brain and how do you prove convergence?”
- “Where did you apply backpressure and why there?”
- “What did your chaos tests reveal that normal testing missed?”
- “How did you design distributed rate limiting semantics?”
- “How do you decide whether to rollback or continue rollout?”
Hints in Layers
Hint 1: Starting Point Define explicit SLOs and invariants before implementation.
Hint 2: Next Level Integrate one subsystem at a time with observability gates.
Hint 3: Technical Details Use deterministic replay scenarios for partition and conflict tests.
Hint 4: Tools/Debugging Store incident timeline artifacts for every chaos + load run.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Distributed correctness | “Designing Data-Intensive Applications” | Replication and consistency |
| Resilience under load | “Release It!” + “Systems Performance” | Stability and measurement chapters |
Common Pitfalls and Debugging
Problem 1: “Convergence eventually, but users saw confusing state jumps”
- Why: conflict resolution policy not communicated or visualized in UI.
- Fix: expose merge/conflict indicators and operation timeline.
- Quick test: run scripted conflicting edits and user-journey review.
Problem 2: “Upgrades succeed in staging but fail under real load”
- Why: state migration logic untested with high mailbox pressure.
- Fix: run upgrade drill during active load/chaos scenario.
- Quick test: compare migration error rate in idle vs stressed conditions.
Definition of Done
- 50k+ concurrent session scenario is executed with reproducible scripts.
- Partition and heal scenarios converge within target recovery window.
- Rolling upgrade and rollback drills pass under active traffic.
- Chaos suite validates breaker, bulkhead, and idempotency invariants.
- SLO dashboard proves latency/error targets are met at target scale.