Project 22: LiveView and Presence Internals for Multiplayer Real-Time Systems

Build an authoritative tick-based multiplayer room with LiveView diff optimization and Presence state propagation.

Quick Reference

Attribute Value
Difficulty Master
Time Estimate 3 weeks
Main Programming Language Elixir
Alternative Programming Languages Erlang, Shell/JS/Rust (as needed)
Coolness Level Level 4-5
Business Potential Resume Gold to Open Core
Prerequisites Projects 17-18 and Phoenix real-time basics
Key Topics LiveView internals, Presence, deterministic ticks

1. Learning Objectives

By completing this project, you will:

  1. Build a production-like subsystem with explicit failure semantics.
  2. Define measurable success criteria using metrics, traces, and logs.
  3. Validate behavior under stress and fault scenarios.
  4. Document architecture and operational tradeoffs clearly.

2. All Theory Needed (Per-Concept Breakdown)

Core Concepts

  • Socket lifecycle: per-connection memory and process budget
  • Presence convergence: metadata staleness handling
  • Deterministic state sync: tick ordering and conflict policy

Mental Model Diagram

Requirements -> Architecture -> Implementation -> Instrumentation -> Fault Injection -> Validation

How It Works (Step-by-Step)

  1. Define invariant and SLO targets.
  2. Build smallest working path for core behavior.
  3. Add instrumentation for correctness and latency.
  4. Introduce realistic load and failure scenarios.
  5. Compare observed behavior against stated invariants.

Minimal Concrete Example

project_status: phase: verify slo_latency_p99_ms: within_budget invariant_violations: 0

Common Misconceptions

  • Happy-path success means production readiness.
  • Average latency is enough to judge reliability.

Check-Your-Understanding Questions

  1. Which invariant is most important for this project?
  2. What failure mode can violate it?
  3. What metric proves the fix worked?

Check-Your-Understanding Answers

  1. The invariant tied to safety/data integrity.
  2. The failure mode that bypasses backpressure/retry/consistency policy.
  3. A stable metric delta across repeatable scenarios.

Real-World Applications

  • Multi-tenant SaaS backends
  • Real-time collaboration systems
  • Distributed workflow services

Key Insight

Model the failure path first, then optimize the happy path.


3. Project Specification

3.1 What You Will Build

Build an authoritative tick-based multiplayer room with LiveView diff optimization and Presence state propagation.

3.2 Functional Requirements

  1. Implement the primary runtime behavior for this project.
  2. Expose observability signals required to validate correctness.
  3. Include one controlled fault scenario and recovery flow.

3.3 Non-Functional Requirements

  • Performance: maintain target p99 latency under defined load.
  • Reliability: recover from injected fault within recovery budget.
  • Operability: produce clear dashboards/log traces for diagnosis.

3.4 Example Usage / Output

deterministic_outcome:

  • deterministic replay yields same state hash
  • memory per connection measured and bounded
  • reconnect and presence churn handled cleanly

3.5 Edge Cases

  • Burst traffic and queue growth
  • Partial dependency failure
  • Restart during active workload

4. Solution Architecture

4.1 High-Level Design

Ingress -> Domain Process -> Storage/Cache -> Async Pipeline -> Observability -> Operator Actions

4.2 Key Components

Component Responsibility Key Decision
Ingress Handler Admits work and validates contracts Fast-fail invalid input
Domain Workers Execute core state transitions Supervised and isolated
Observability Layer Emits metrics/traces/logs Low-cardinality schema

4.3 Data Structures (No Full Code)

  • Command envelope: idempotency key, tenant, correlation id
  • State snapshot: versioned domain state
  • Metric tags: route, operation, outcome class

4.4 Algorithm Overview

  1. Validate command and current state.
  2. Apply transition or reject with explicit reason.
  3. Persist and publish state changes.
  4. Emit telemetry and evaluate SLO impact.

5. Implementation Guide

5.1 Development Environment Setup

  • Run deps install.
  • Run database setup if used.
  • Run test baseline.

5.2 Project Structure

  • lib/my_app//runtime.ex
  • lib/my_app//supervisor.ex
  • lib/my_app//telemetry.ex
  • priv/labs/runbook.exs

5.3 The Core Question You Are Answering

How do I keep shared real-time state fair, consistent, and scalable?

5.4 Concepts You Must Understand First

  • BEAM process model and failure isolation
  • OTP supervision semantics
  • Runtime observability discipline

5.5 Questions to Guide Your Design

  1. What is your safety invariant?
  2. What is your bounded recovery target?
  3. What is your fallback behavior under stress?

5.6 Milestones

  1. Baseline behavior validated.
  2. Instrumentation and dashboards available.
  3. Fault injection and recovery validated.
  4. Final report with tradeoffs and next steps.

6. Validation and Testing

6.1 Test Strategy

  • Unit tests for core transitions
  • Integration tests for runtime flows
  • Fault scenario tests for recovery

6.2 Verification Steps

  • Run full test suite.
  • Run deterministic lab script.

6.3 Definition of Done

  • Core functionality works on reference scenarios.
  • Failure path behavior matches design policy.
  • SLO metrics are captured and reproducible.
  • Findings are documented with evidence.

7. Production Hardening Checklist

  • Alert thresholds and runbook entries exist.
  • Rollback/degrade strategy documented.
  • Capacity assumptions verified by test.
  • Security and tenant boundaries reviewed.

8. Interview Deep Dive

  1. What tradeoff did you choose and why?
  2. How do you prove correctness under failure?
  3. Which metric is your leading indicator of incident risk?

9. Extensions

  • Add stricter invariants and adversarial load profiles.
  • Add multi-region network-latency simulation.
  • Add automated CI gates for regression thresholds.

10. Resources

  • https://hexdocs.pm/phoenix/overview.html
  • https://hexdocs.pm/telemetry/
  • https://www.erlang.org/doc/