Sprint: Jido Elixir AI Agents Mastery - Real World Projects
Goal: Build first-principles mastery of agent engineering in Elixir using the Jido ecosystem (
jido,jido_ai,req_llm) and core BEAM capabilities. You will learn to design deterministic agent cores, model side effects as directives, and run autonomous workflows on fault-tolerant supervision trees. You will also learn how to combine reasoning strategies (ReAct, Chain-of-Thought, Tree-of-Thoughts, Graph-of-Thoughts, Adaptive) with robust tool contracts, streaming, observability, and cost controls. By the end, you will be able to design, test, and operate production-style AI agent systems that recover from failure, survive node instability, and remain explainable under pressure.
Introduction
- What is this topic? It is the intersection of LLM systems engineering and BEAM-native reliability engineering, centered on Jido’s v2 architecture (Actions with
run/2+ StateOps + Directives + AgentServer runtime execution). - What problem does it solve today? It solves the gap between “demo agents” and production systems by combining deterministic core logic (Actions returning results, StateOps for in-strategy state mutations, and Directives for external effects) with asynchronous side-effect execution, supervision, and telemetry.
- What will you build? 20 projects that progress from single-agent tool use to distributed multi-agent autonomous systems with safety, persistence, and release discipline.
- In scope: Jido Action system and Plan DAGs, Jido.AI strategy patterns, ReqLLM multi-provider abstraction, Plugin/Skill composition, Signal Bus/Router/Dispatch/Journal, BEAM supervision/distribution, production operations.
- Out of scope: training foundation models from scratch, full MLOps platform design, non-BEAM runtime internals.
+------------------------------+
| Human / API |
| Goals, Constraints, |
| Approval Decisions |
+---------------+--------------+
|
v
+--------------------+ +-----------+---------------+ +------------------------+
| Signal Bus/Router +------>+ Jido Agent Core +------>+ Directive Queue |
| HTTP/PubSub/Cron | | Action.run(params, ctx) | | LLMStream/ToolExec/ |
| Dispatch adapters | | -> {:ok, result} | | LLMGenerate/LLMEmbed/ |
+--------------------+ | -> {:ok, result, dirs} | | EmitToolError/Emit |
| StateOps: Set/Replace/ | +-----------+------------+
| Delete/SetPath/DelPath | |
+-----------+---------------+ v
| +---------+-----------+
| | Runtime Executor |
| | AgentServer GenSrv |
| +---------+-----------+
| |
v v
+-----------+-------------+ +-----------+-----------+
| Agent State Snapshot | | External Effects |
| status, memory, metrics | | LLM/API/DB/Tools |
+-------------------------+ +-----------+-----------+
|
v
+------------+-----------+
| Observability + Cost |
| Telemetry + Usage + SLA |
+-------------------------+
How to Use This Guide
- Read
## Theory Primerbefore building projects. The projects assume the mental models from that section. - Choose one learning path in
## Recommended Learning Pathsbased on your goal. - Build each project with a strict Definition of Done and collect evidence (logs, traces, deterministic transcripts).
- Treat every project as a production rehearsal: include failure tests, timeout paths, and rollback behavior.
Prerequisites & Background Knowledge
Essential Prerequisites (Must Have)
- Elixir basics: modules, pattern matching, structs, processes, OTP applications.
- HTTP and JSON fundamentals, API auth keys, and schema validation basics.
- Basic LLM API familiarity (prompts, context windows, temperature, token budgets).
- Recommended Reading: “Designing Elixir Systems with OTP” by James Edward Gray II and Bruce A. Tate.
Helpful But Not Required
- Phoenix + LiveView basics (you will use this in projects 4 and 15).
- Distributed Erlang fundamentals (you will learn this deeply in projects 13 and 20).
- Telemetry/OpenTelemetry familiarity (you will apply this in projects 6, 17, and 20).
Self-Assessment Questions
- Can you explain why “process per request” can work on BEAM but fails on many thread-based runtimes?
- Can you design a JSON-schema-like contract for a tool call and reason about invalid arguments?
- Can you describe how a supervisor should react to repeated child crashes?
Development Environment Setup Required Tools:
- Elixir
~> 1.17 - Erlang/OTP
~> 27or newer - GitHub CLI (
gh) for repository inspection - Docker (recommended for reproducible service dependencies)
Recommended Tools:
- Phoenix + LiveView stack for interactive projects
- OpenTelemetry collector for local observability experiments
- A managed or self-hosted Redis/Postgres pair for selected projects
Testing Your Setup:
$ elixir --version
Erlang/OTP 27 or 28
Elixir 1.17.x or newer
$ gh --version
gh version 2.x
$ mix --version
Mix 1.17.x (compiled with Erlang/OTP 27+)
Time Investment
- Simple projects: 4-8 hours each
- Moderate projects: 10-20 hours each
- Complex projects: 20-40 hours each
- Total sprint: 4-8 months part-time
Important Reality Check Agent systems fail in ways CRUD systems do not: loops, tool misuse, stale context, token overrun, and cascading retries. You will learn fastest by intentionally injecting failures and proving recovery behavior. If you only test happy-path prompts, you will not build production intuition.
Big Picture / Mental Model
A Jido-based system should be understood as two coupled but separated machines:
- Machine A (deterministic): Actions implement
run(params, context)returning{:ok, result},{:ok, result, directives}, or{:error, error}. StateOps (SetState,ReplaceState,DeleteKeys,SetPath,DeletePath) handle in-strategy state mutations. This layer is testable without processes. - Machine B (effectful): The
AgentServerGenServer processes signals, routes to strategies, and executes directives (LLMStream,ToolExec,LLMGenerate,LLMEmbed,EmitToolError,EmitRequestError) under supervision. Outcomes re-enter as signals.
Machine A (Deterministic) Machine B (Effectful)
+------------------------------------------------+ +----------------------------------------+
| Inputs: signal/action params + context | | Inputs: directives + runtime context |
| | | |
| 1) Action.run(params, context) | | 1) call LLM/provider/tool |
| 2) Schema validation pipeline | | 2) spawn/stop child processes |
| (before_validate -> schema -> | | 3) schedule/cancel delayed work |
| after_validate -> run -> | | 4) emit outcome signals via Bus |
| validate_output -> output_schema) | | |
| 3) StateOps for in-strategy state changes | | AgentServer processes signals and |
| 4) Directives for external effects (data only) | | routes to strategies via Router |
| | | |
| Output: {:ok, result} | {:ok, result, dirs} | | Output: external effects + feedback |
+-------------------+----------------------------+ +------------------+---------------------+
| |
+-------------------------feedback signals-----------+
This split is the key to scaling complexity: deterministic logic stays readable and testable, while runtime behavior remains observable and controllable by OTP semantics. The Jido.Exec module provides the execution engine with timeout, retries, and backoff for running Action-based Plans (DAGs of Instructions with dependency resolution).
Theory Primer
Concept 1: Deterministic Agent Core (Actions, StateOps, Directives) and AgentServer Runtime
Fundamentals
Jido formalizes an important separation that many agent frameworks blur: state transition logic is deterministic and explicit, while side effects are deferred and described as directives. In v2, the core contract is built around the Jido.Action behaviour (defined in the jido_action package): actions implement run(params, context) and return {:ok, result}, {:ok, result, directives}, or {:error, error}. State mutations within a strategy are handled by StateOps (Jido.Agent.StateOp.SetState, ReplaceState, DeleteKeys, SetPath, DeletePath – with helper constructors StateOp.set_state/1, StateOp.replace_state/1, StateOp.delete_keys/1, StateOp.set_path/2, StateOp.delete_path/1). The Jido.Agent.StateOps module provides apply_result/2 for deep-merging action results and apply_state_ops/2 which separates StateOps from external directives: it reduces a list of structs, applies state operations to the agent, and collects non-StateOp structs as external directives to return.
External side effects are expressed as two directive families. Core directives (from Jido.Agent.Directive) handle BEAM-level operations: Emit (dispatch a signal via Jido.Signal.Dispatch), Error (wrap a Jido.Error.t()), Spawn (fire-and-forget BEAM child), SpawnAgent (child agent with parent-child hierarchy tracking, monitors, and children map), StopChild (graceful child stop by tag), Schedule (delayed message via delay_ms), Stop (stop self), Cron (recurring schedule via cron expression), and CronCancel (stop a recurring job by job_id). AI directives (from Jido.AI.Directive) handle LLM operations: LLMStream (streaming generation with id, model, context, tools, tool_choice, max_tokens, temperature, timeout, metadata), ToolExec (tool execution with id, tool_name, action_module, arguments, context), LLMGenerate (non-streaming generation), LLMEmbed (embedding generation with model, texts, dimensions), EmitToolError (immediate error for unknown tools – prevents Machine deadlock), and EmitRequestError (immediate error when agent is busy). All directives implement the Jido.AgentServer.DirectiveExec protocol for polymorphic execution.
The AgentServer GenServer (at Jido.AgentServer) processes signals, routes them to strategies via Jido.Signal.Router, and executes directives through its drain loop. Its public API: start/1, start_link/1, call/3 (sync signal), cast/2 (async signal), state/1, status/1, await_completion/2 (event-driven wait for terminal status), stream_status/2, attach/2/detach/2/touch/1 (lifecycle attachment for LiveView sockets), set_debug/2, recent_events/2. This makes the agent itself testable as pure behavior. You can reason about invariants like terminal status, retry counters, safety flags, and tool budget without needing processes, network, or provider mocks.
Deep Dive The deterministic core pattern matters because AI systems are stochastic at the model edge but do not need to be stochastic everywhere. If your entire architecture is probabilistic, incidents become impossible to debug. Jido’s Action-based pattern gives you a deterministic center: state transitions are ordinary data transformations. Even when a model output is uncertain, your handling of that output does not need to be. For example, a model can request a tool call with malformed arguments; your deterministic layer can reject, normalize, retry, or escalate based on explicit policies. This is exactly where reliability is won.
The Action system (defined with use Jido.Action) provides a rich validation pipeline with six overridable lifecycle hooks: on_before_validate_params/1 -> schema validation (via Zoi or NimbleOptions schemas) -> on_after_validate_params/1 -> run/2 -> on_after_run/1 -> on_before_validate_output/1 -> output schema validation -> on_after_validate_output/1. Actions also support compensation (for rollback via on_error/4), with configurable compensation: %{enabled: true, max_retries: N, timeout: N}. The Jido.Action.Tool.to_tool/0 callback converts any Action into a JSON-schema tool definition compatible with OpenAI function calling and similar LLM tool formats, bridging deterministic Elixir code with LLM tool calling. The Action config is validated at compile time using Zoi schemas – invalid configs raise CompileError before the module ever loads.
A common anti-pattern in agent implementations is putting remote calls directly inside runtime callbacks, then mutating in-memory state opportunistically. That pattern couples latency, error handling, and business rules into one untestable thread. Jido’s directive approach decouples these concerns. The core emits typed intention structs. For example, the AI directive LLMStream carries fields id (call correlation), model (e.g. "anthropic:claude-haiku-4-5"), model_alias (e.g. :fast, resolved via Jido.AI.resolve_model/1), system_prompt, context (conversation messages), tools (list of ReqLLM.Tool.t()), tool_choice (:auto | :none | {:required, name}), max_tokens, temperature, and timeout. Similarly, ToolExec carries id, tool_name, action_module (direct module execution bypassing Registry), arguments, and context. The AgentServer runtime executes directives via the DirectiveExec protocol dispatch – each directive type implements exec/3 which receives (directive, input_signal, state). Execution is async: LLMStream and ToolExec spawn tasks under a per-agent Task.Supervisor, and results re-enter as signals (react.llm.response, react.tool.result) through AgentServer.cast/2.
The invariants are strong and practical:
- Action
run/2results are complete at that point in logical time; StateOps are applied atomically. - Directives do not mutate already-returned state; they describe future effects.
- Runtime outcomes must re-enter through signals if they should affect future state.
This model resembles event-sourced control loops without forcing full event sourcing everywhere. You can still persist snapshots via Jido.Signal.Journal (with InMemory, ETS, or Mnesia backends), but your deterministic contract remains simple enough for property-based thinking. For complex multi-agent systems, this contract is essential. Without it, child lifecycle events, tool completion events, and retry loops create hidden state transitions.
Directive semantics also improve security posture. When effects are explicit data structs validated at construction time (each Directive uses Zoi.struct/3 schemas with new!/1 constructors that raise on invalid data), you can inspect, filter, and gate them. Core directives provide helper constructors: Directive.emit/2, Directive.spawn_agent/3, Directive.stop_child/2, Directive.schedule/2, Directive.cron/3, Directive.cron_cancel/1, and Directive.emit_to_parent/3 (for child-to-parent communication). A policy layer can reject high-risk ToolExec directives unless an approval bit is set. The SpawnAgent directive includes a meta field for passing context to child agents, and children are tracked by tag in the parent’s children map with process monitors for exit detection. This is harder when effects are direct function calls hidden in procedural code.
At operations time, this split helps with replay and incident analysis. The AgentServer supports debug mode (set_debug/2) with an in-memory ring buffer (max 50 events) recording :signal_received and :directive_started events with monotonic timestamps. You can also replay a sequence of signals against historical state snapshots (using Jido.Signal.Journal persistence) and compare whether the same directives were emitted. Divergence indicates nondeterminism introduced accidentally. The AgentServer emits structured telemetry at [:jido, :agent_server, :signal, :start | :stop | :exception] and [:jido, :agent_server, :directive, :start | :stop | :exception], plus [:jido, :agent_server, :queue, :overflow] for queue saturation. This replay and telemetry discipline is the foundation for trustworthy autonomous systems, where you must explain why an agent acted.
Failure modes to design around:
- Directive queue saturation: caused by large bursts or slow downstream providers.
- Stale feedback loops: when delayed tool results return after state has moved on.
- Implicit state mutation leaks: when helper functions mutate shared mutable containers outside the Action contract.
- Schema validation failures: when Action params do not match the declared schema, caught by the validation pipeline.
Design countermeasures include strict state versioning, idempotency keys for tool responses, bounded queue sizes, timeout-based demotion paths, and Action compensation for rollback. In interviews and real systems, the engineer who understands this deterministic/effectful split can usually move from prototype to production faster than teams that keep adding retries to opaque loops.
How this fit on projects You will apply this concept in projects 1, 2, 3, 8, 16, 17, and 20.
Definitions & key terms
- Action (
Jido.Action): Behaviour in thejido_actionpackage. Modulesuse Jido.Actionwith schema, implementrun(params, context)returning{:ok, result},{:ok, result, directives}, or{:error, error}. Six lifecycle hooks:on_before_validate_params/1,on_after_validate_params/1,on_after_run/1,on_before_validate_output/1,on_after_validate_output/1,on_error/4. - StateOp (
Jido.Agent.StateOp): In-strategy state mutation operations:SetState(deep merge),ReplaceState(wholesale),DeleteKeys(top-level),SetPath(nested set),DeletePath(nested delete). Applied byJido.Agent.StateOps.apply_state_ops/2. - Core Directive (
Jido.Agent.Directive): BEAM-level effect struct:Emit,Error,Spawn,SpawnAgent,StopChild,Schedule,Stop,Cron,CronCancel. - AI Directive (
Jido.AI.Directive): LLM/tool effect struct:LLMStream,ToolExec,LLMGenerate,LLMEmbed,EmitToolError,EmitRequestError. - DirectiveExec: Protocol (
Jido.AgentServer.DirectiveExec) withexec/3callback; each directive type implements this for polymorphic execution. - AgentServer (
Jido.AgentServer): GenServer that processes signals, routes to strategies, and executes directives. Public API:start/1,start_link/1,call/3,cast/2,state/1,status/1,await_completion/2,attach/2/detach/2,set_debug/2,recent_events/2. - Zoi: Schema validation library used for compile-time config validation and runtime struct construction (
Zoi.struct/3,new!/1). - Deterministic core: Pure state transition logic where same inputs lead to same outputs.
- Invariant: Condition that must stay true across state transitions.
Mental model diagram
Input Signal (via Jido.Signal.Bus)
|
v
[Jido.Signal.Router (trie matching)]
|
v
[AgentServer routes to Strategy]
|
v
[Action.run(params, ctx)] (validated via Zoi schema pipeline)
|
+-- {:ok, result}
| |
| v
| [Jido.Agent.StateOps.apply_result/2] (deep-merge into agent state)
|
+-- {:ok, result, mixed_structs}
| |
| v
| [Jido.Agent.StateOps.apply_state_ops/2]
| |
| +-- StateOps (SetState, ReplaceState, DeleteKeys, SetPath, DeletePath)
| | -> applied atomically to agent state
| |
| +-- External Directives (non-StateOp structs)
| -> enqueued for AgentServer drain loop
|
+-- {:error, error}
-> error handling / compensation
[AgentServer Drain Loop]
|
v
[DirectiveExec.exec/3 protocol dispatch]
|
+-- Core (Jido.Agent.Directive):
| Emit, Error, Spawn, SpawnAgent, StopChild,
| Schedule, Stop, Cron, CronCancel
|
+-- AI (Jido.AI.Directive):
LLMStream, ToolExec, LLMGenerate, LLMEmbed,
EmitToolError, EmitRequestError
(spawn Task under per-agent Task.Supervisor)
|
v
[Feedback Signals re-enter via AgentServer.cast/2]
react.llm.response, react.tool.result, react.llm.delta
|
v
[Next Signal Cycle]
How it works (step-by-step, with invariants and failure modes)
- Signal arrives via
Jido.Signal.Busand is matched byJido.Signal.Router(trie-based pattern matching). AgentServerroutes signal to the appropriate strategy.- Strategy selects and runs
Action.run(params, context)through the validation pipeline. - StateOps (if any) are applied atomically to strategy state.
- Directives (if any) are enqueued for runtime execution by
AgentServer. - Runtime outcomes re-enter as new signals through the Bus, closing the loop.
- Handle failures with explicit retry/abort state transitions and Action compensation.
Minimal concrete example
PSEUDOCODE (v2 Action pattern)
defmodule PriceLookup do
use Jido.Action,
name: "price_lookup",
schema: [sku: [type: :string, required: true]]
def run(%{sku: sku}, _context) do
result = %{price: lookup(sku)}
{:ok, result}
end
end
# In strategy, Action returns result + directives:
Action.run(params, context)
=> {:ok, %{tool_calls: [%{name: "price_lookup", args: %{sku: "A1"}}]},
[%ToolExec{id: "tool_001", tool_name: "price_lookup",
action_module: PriceLookup, arguments: %{sku: "A1"}}]}
# StateOps applied in strategy:
SetState.run(%{status: "awaiting_tool", pending_tool_calls: [%{id: "tool_001", name: "price_lookup", arguments: %{sku: "A1"}, result: nil}]}, ctx)
Common misconceptions
- “Directives are just async function calls.” No: they are data contracts that can be audited and governed.
- “If LLM output is random, deterministic state is pointless.” Wrong: deterministic handling is where reliability is created.
Check-your-understanding questions
- Why should directive execution (via
DirectiveExec.exec/3) not mutate already-returned state? - What happens when
Jido.Agent.StateOps.apply_state_ops/2encounters a struct that is not aStateOptype (e.g., anLLMStreamdirective)? - Why does
AgentServer.set_debug/2use a ring buffer (max 50 events) rather than unbounded logging? - What is the difference between
Directive.spawn/2(fire-and-forget BEAM child) andDirective.spawn_agent/3(child agent with parent-child hierarchy)? - How does the Action lifecycle hook
on_before_validate_params/1differ fromon_after_validate_params/1in terms of when each is useful?
Check-your-understanding answers
- It preserves logical time and deterministic reasoning about transitions. If
exec/3mutated state directly, you would lose the guarantee that state changes are traceable to specific Action results. - Non-StateOp structs are collected and returned as external directives.
apply_state_ops/2pattern-matches on StateOp types, applies them to state, and accumulates everything else as directives for the AgentServer drain loop. - A ring buffer prevents memory growth in long-running agents. In production, unbounded debug logs would eventually cause OOM. The bounded buffer captures the most recent 50 events, which is sufficient for immediate incident analysis.
Spawncreates a bare BEAM child process with no lifecycle tracking.SpawnAgentcreates a supervised child agent with parent-child hierarchy: the parent tracks children bytagin itschildrenmap, monitors processes for exit detection, and supportsemit_to_parent/3for child-to-parent communication.on_before_validate_params/1runs before schema validation, useful for normalizing or enriching raw input (e.g., converting string dates to DateTime).on_after_validate_params/1runs after schema validation, useful for cross-field validation or derived value injection where you know the params are already well-typed.
Real-world applications
- Regulated workflow agents where actions must be auditable.
- Tool-heavy copilots with strict permissioning.
- Autonomous service remediation loops with guardrails.
Where you’ll apply it
- Projects 1, 2, 3, 8, 16, 17, and 20.
References
- Jido README
- Jido Core Loop Guide
- Jido Directives Guide
- Jido Action source -
Jido.Actionbehaviour definition (in jido_action package) - Jido AgentServer source - GenServer runtime
- Jido.AI Directive source - LLMStream, ToolExec, etc.
Key insights Deterministic Actions plus typed StateOps and explicit Directive structs are the shortest path from demo agent to production-grade agent.
Summary The deterministic core/directive runtime split gives you testability, replayability, and governance without sacrificing asynchronous power.
Homework/Exercises to practice the concept
- Write state invariants for a ReAct agent that uses
AgentServer.cast/2for tool calls andDirective.schedule/2for retries. Include invariants forpending_tool_calls,iteration, andmax_queue_size. - Draw a failure timeline showing: (a)
ToolExecdirective emitted, (b) Task.Supervisor spawns async execution, (c) tool times out, (d) latereact.tool.resultsignal arrives after state has moved to"error". Show howcurrent_llm_call_idcheck prevents stale application. - Write a directive policy gate that inspects
DirectiveExecdispatch and rejectsToolExecdirectives wheretool_nameis not in an approved allowlist. Show howZoischema validation at construction time (vianew!/1) complements this runtime check. - Trace the full path from
Action.run/2returning{:ok, result, [%SpawnAgent{tag: "worker-1", ...}]}throughStateOps.apply_state_ops/2separating the SpawnAgent directive, then throughDirectiveExec.exec/3starting the child agent with process monitoring.
Solutions to the homework/exercises
- Invariants:
length(pending_tool_calls) <= max_concurrent_tools,iteration <= max_iterations,AgentServer queue_size <= max_queue_size (default 10000),status in ["idle", "awaiting_llm", "awaiting_tool", "completed", "error"],status == "completed" implies length(pending_tool_calls) == 0. - Timeline: t0: emit
%ToolExec{id: "call_99"}-> t1: Task.Supervisor.async spawns -> t2: 30s timeout fires, state transitions to"error",current_llm_call_idcleared -> t3: latereact.tool.resultarrives with id"call_99"-> ReAct.Machine checkscurrent_llm_call_id != "call_99", rejects with{:request_error, call_id, :stale, msg}. - In AgentServer drain loop, before
DirectiveExec.exec/3: pattern match on%ToolExec{tool_name: name}, checkname in approved_tools, reject with%Error{reason: :tool_not_allowed}if not. Construction-time validation viaToolExec.new!(%{tool_name: "unknown"})catches structurally invalid directives; runtime policy catches semantically unauthorized ones. Action.run/2returns{:ok, %{worker_started: true}, [%SpawnAgent{tag: "worker-1", module: WorkerAgent, meta: %{task: "analyze"}}]}.apply_state_ops/2deep-merges%{worker_started: true}into state, then collects%SpawnAgent{...}as an external directive. AgentServer drain loop callsDirectiveExec.exec/3on the SpawnAgent, which starts the child agent via the agent’s DynamicSupervisor, adds{"worker-1", pid}to the parent’schildrenmap, and sets up a process monitor for exit detection.
Concept 2: Signal Contracts, Bus/Router/Dispatch/Journal Architecture, and Multi-Agent Routing on BEAM
Fundamentals
Jido and Jido.AI rely on signal-driven communication so components remain decoupled and routable. The jido_signal library provides the full signal infrastructure: Jido.Signal.Bus (GenServer pub/sub), Jido.Signal.Router (trie-based pattern matching with wildcards * and **), Jido.Signal.Dispatch (multi-adapter dispatch to pid, pubsub, http, webhook, logger, console, noop, named targets), and Jido.Signal.Journal (event persistence with InMemory, ETS, and Mnesia backends). Signals carry typed envelopes (event type, source, payload) aligned with CloudEvents-style event modeling. Real signal types from jido_ai include react.input, react.llm.response, react.tool.result, react.llm.delta, react.register_tool, react.unregister_tool, react.set_tool_context, and react.usage (noop, observability only). Routing precedence is deterministic: strategy routes, then agent routes, then plugin routes.
Deep Dive Signal contracts are the social contract of your agent system. If your signal types and payloads are ad hoc, multi-agent coordination collapses under complexity. If your signal contracts are explicit and versioned, teams can independently evolve agents, tools, and orchestration rules. Jido’s routing precedence (strategy routes, then agent routes, then plugin routes) gives a deterministic dispatch order. Signal routes are merged from all three layers with deterministic precedence. This is subtle but critical in large systems where multiple capabilities may match the same incoming event.
The Jido.Signal.Router uses a trie data structure for pattern matching, supporting single-level wildcards (*) and multi-level wildcards (**). This means you can subscribe to react.* to catch all top-level react signals, or react.** to catch all signals in the react namespace at any depth. The Router maps signal types to handler tuples like {:strategy_cmd, :react_start} for react.input, {:strategy_cmd, :react_llm_result} for react.llm.response, and {:strategy_cmd, :react_tool_result} for react.tool.result. Partial streaming results arrive via react.llm.delta mapped to {:strategy_cmd, :react_llm_partial}.
The Jido.Signal.Dispatch module provides multi-adapter dispatch in three modes: synchronous dispatch/2, asynchronous dispatch_async/2, and batched dispatch_batch/3 (with configurable max_concurrency for parallel processing). A single signal can be routed to multiple adapter targets simultaneously: :pid (direct process message), :bus (Jido.Signal.Bus), :named (registered process name), :pubsub (Phoenix.PubSub broadcast), :logger (structured log output), :console (human-readable output), :noop (no-op for testing), :http (HTTP POST), and :webhook (webhook POST with retries). This is powerful for observability: the same signal that drives agent behavior can also be logged, persisted, and forwarded to monitoring systems.
The Jido.Signal.Journal provides event persistence with pluggable backends: InMemory for tests, ETS for single-node high-speed persistence, and Mnesia for distributed durable persistence. This enables signal replay for debugging, audit trails, and crash recovery.
CloudEvents matters here because it standardizes event metadata and encourages compatibility with broader event ecosystems. You do not need to enforce every field from day one, but adopting stable signal naming and source semantics (domain.subdomain.event, stable source namespaces) pays off quickly. It makes observability cleaner, policy enforcement easier, and replay safer.
On BEAM, routing is not just a software architecture concern; it is a concurrency control mechanism. You can map high-volume signal classes to dedicated worker pools, isolate slow providers, and prevent one class of work from starving another. This is where BEAM’s scheduling model and process isolation shine. Instead of monolithic queues, you build many small bounded queues with supervision boundaries.
Parent-child agent hierarchies introduce additional routing opportunities and pitfalls. Parent agents can spawn child workers for parallel subtasks, receive completion or failure signals, and aggregate results. If done well, you get deterministic orchestration with graceful failure handling. If done poorly, you get orphaned children, duplicate aggregations, and unbounded pending maps.
Failure modes in signal systems are often semantic, not mechanical:
- Schema drift: producers and consumers disagree on field shapes.
- Ordering assumptions: consumers assume strict ordering across independent channels.
- Ambiguous ownership: multiple agents believe they are authoritative responders.
- Retry storms: repeated emission of the same signal without dedupe.
Countermeasures include schema version fields, correlation IDs, causation IDs, dedupe stores, and explicit ownership tags in payload metadata. Jido’s signal_routes and plugin patterns allow precise control, but you must design the protocol intentionally.
Distribution multiplies these concerns. Once signals cross nodes, latency and partition behavior appear. The winning architecture is explicit about eventual consistency: use reconciliation signals, timeout fences, and state snapshots rather than assuming immediate global truth. Design for net-splits as a normal state. This is the right mindset for autonomous systems that must remain safe under partial failure.
A practical design pattern is the “controller + workers” topology:
- Controller handles intent, budgets, and policy.
- Workers execute isolated tasks and report back.
- Controller finalizes state and emits external result.
This pattern maps well to Jido’s SpawnAgent and StopChild directives and aligns with OTP supervision. It also mirrors how robust distributed systems are built in other domains: control plane + data plane, with explicit contracts between them.
How this fit on projects You will apply this concept in projects 2, 7, 11, 12, 13, 15, and 20.
Definitions & key terms
- Signal contract: Stable definition of event type + payload shape + metadata.
- Jido.Signal.Bus: GenServer-based pub/sub for signal distribution.
- Jido.Signal.Router: Trie-based pattern matcher with
*(single-level) and**(multi-level) wildcards. - Jido.Signal.Dispatch: Multi-adapter dispatcher (pid, pubsub, http, webhook, logger, console, noop, named).
- Jido.Signal.Journal: Event persistence with InMemory, ETS, and Mnesia backends.
- Correlation ID: Shared identifier linking events for one logical transaction.
- Causation ID: Event ID that directly triggered the current event.
- Routing precedence: Ordered matching: strategy routes -> agent routes -> plugin routes (deterministic).
Mental model diagram
[External Event / Internal Signal]
|
v
[Jido.Signal.Bus (GenServer pub/sub)]
|
v
[Jido.Signal.Router (Trie pattern matching)]
| | |
v v v
[react.*] [github.*] [maintenance.*]
| | |
v v v
Precedence: Strategy > Agent > Plugin
|
v
[Jido.Signal.Dispatch]
| | | |
v v v v
[pid] [pubsub] [webhook] [logger]
|
v
[Action.run(params, ctx)] -> [StateOps] + [Directives]
|
v
[AgentServer Runtime Executor]
|
v
[Jido.Signal.Journal (InMemory|ETS|Mnesia)]
|
v
[New Signals + Metrics -> back to Bus]
How it works (step-by-step, with invariants and failure modes)
- Ingest external event and wrap as typed signal.
- Route using deterministic precedence rules.
- Execute action and emit directives.
- Runtime executes effects and emits outcome signals.
- Correlate all events via IDs and versioned schema.
- Reconcile stale or duplicate outcomes safely.
Minimal concrete example
PSEUDOCODE SIGNALS (v2 jido_ai signal types and router mappings)
react.input -> {:strategy_cmd, :react_start} # User query arrives
react.llm.response -> {:strategy_cmd, :react_llm_result} # LLM returns tool_calls or final
react.tool.result -> {:strategy_cmd, :react_tool_result} # Tool execution completes
react.llm.delta -> {:strategy_cmd, :react_llm_partial} # Streaming partial token
react.register_tool -> {:strategy_cmd, :react_register_tool} # Dynamic tool registration
react.usage -> Noop (observability only) # Token/cost metrics
Router pattern matching:
"react.*" matches react.input, react.usage (single level)
"react.**" matches react.input, react.llm.response, react.tool.result (all depths)
Dispatch example:
signal "react.tool.result" dispatched to:
- pid: AgentServer process (drives state machine)
- logger: structured log for audit
- journal: Mnesia backend for replay
All messages carry: correlation_id=REQ-901, schema_version=2
Common misconceptions
- “Signal type naming is cosmetic.” No: naming is an operational contract.
- “Message order is globally guaranteed.” No: only per-mailbox order is guaranteed locally.
Check-your-understanding questions
- Why is routing precedence important in plugin-heavy agents?
- What does correlation ID solve that process ID does not?
- How do you handle duplicate result signals from retries?
Check-your-understanding answers
- It prevents ambiguous handlers and unpredictable behavior.
- Correlation spans processes and nodes, process ID does not.
- Idempotency keys + dedupe map + terminal state checks.
Real-world applications
- Incident response automation.
- Multi-agent research pipelines.
- Human-in-the-loop approval workflows.
Where you’ll apply it
- Projects 2, 7, 11, 12, 13, 15, and 20.
References
- CloudEvents Specification v1.0.2
- CloudEvents CNCF Graduation Announcement
- Jido Signals Guide
- Jido Orchestration Guide
- jido_signal Repository - Bus, Router, Dispatch, Journal
- Jido.Signal.Router source - Trie-based pattern matching
- Jido.Signal.Dispatch source - Multi-adapter dispatch
- Jido.Signal.Journal source - Event persistence backends
Key insights A distributed agent system is only as reliable as its signal contracts, Bus/Router/Dispatch wiring, and Journal persistence discipline.
Summary Signals are the protocol layer of autonomous systems; treat them as durable contracts, not ad hoc payloads.
Homework/Exercises to practice the concept
- Design a v1 and v2 schema for
react.tool.resultand define compatibility rules. - Draw a parent-child spawn/aggregate protocol including timeout and cancellation.
- Define dedupe rules for a retried web-search tool signal.
Solutions to the homework/exercises
- Add optional fields in v2, preserve required v1 keys, include
schema_version. - Spawn -> child.started -> work.sent -> result.received -> child.stopped.
- Use
tool_call_id+correlation_idas dedupe composite key.
Concept 3: Reasoning Strategies as Explicit State Machines (ReAct, CoT, ToT, GoT, Adaptive, TRM)
Fundamentals
Jido.AI treats reasoning strategies as explicit state machines rather than hidden prompt loops. This is a major engineering advantage. Each strategy is a named module under Jido.AI.Strategies.* with a companion state machine (using Fsmx for FSM transitions). The real strategy modules from the jido_ai codebase are: Jido.AI.Strategies.ReAct (with Jido.AI.ReAct.Machine), Jido.AI.Strategies.ChainOfThought, Jido.AI.Strategies.TreeOfThoughts, Jido.AI.Strategies.GraphOfThoughts, Jido.AI.Strategies.Adaptive (auto-selects based on task analysis), and Jido.AI.Strategies.TRM (Tiny Recursive Model for recursive reasoning). Side effects are represented as directives (LLMStream, ToolExec, LLMGenerate, LLMEmbed) and outcomes return via typed signals.
The Jido.AI.ReAct.Machine has these states: "idle" -> "awaiting_llm" -> "awaiting_tool" -> "completed" |
"error". Its fields include: status, iteration, thread, pending_tool_calls, result, current_llm_call_id, termination_reason, streaming_text, streaming_thinking, thinking_trace, usage, and started_at. This level of explicit state makes it straightforward to enforce limits (max_iterations, timeout budgets, tool call caps) and avoid unbounded loops. |
Deep Dive
Strategy-as-state-machine design is the bridge between research patterns and production control. Research papers such as ReAct, Chain-of-Thought, and Tree-of-Thoughts demonstrate reasoning improvements, but production systems need more than benchmark gains. They need bounded execution, observability, and graceful failure behavior. State machines (built on Fsmx) provide these guarantees with explicit transition maps and guard clauses.
ReAct (Jido.AI.Strategies.ReAct) is a loop pattern: reason, act (tool), observe, repeat. In Jido.AI, loop progress is explicit in the ReAct.Machine struct: iteration counters, pending_tool_calls list, current_llm_call_id, streaming_text accumulator, streaming_thinking accumulator, thinking_trace list, usage metrics, and termination_reason struct. The Fsmx transition map is declared explicitly:
"idle"=>["awaiting_llm"]"awaiting_llm"=>["awaiting_tool", "completed", "error"]"awaiting_tool"=>["awaiting_llm", "completed", "error"]"completed"=>["awaiting_llm"](allows re-entry for new queries)"error"=>["awaiting_llm"](allows recovery)
The Machine processes messages: {:start, query, call_id}, {:llm_result, call_id, result}, {:llm_partial, call_id, delta, chunk_type}, {:tool_result, call_id, result}. A busy rejection mechanism returns {:request_error, call_id, :busy, msg} when the machine is in a non-idle state. The run_tool_context field (ephemeral, cleared on terminal states) allows passing additional context into tool execution, while base_tool_context (persistent across requests) provides stable context like database connections.
Chain-of-Thought (Jido.AI.Strategies.ChainOfThought) is simpler, often single-pass with structured reasoning text. It is useful for stepwise logic but can be fragile if treated as free-form prose. Its state machine wrapper still enforces lifecycle discipline: start, think, finalize, or error.
Tree-of-Thoughts (Jido.AI.Strategies.TreeOfThoughts) introduces branching with configurable branching_factor, max_depth, and traversal_strategy (:bfs for breadth-first, :dfs for depth-first, or :best_first for score-guided). These are powerful for planning and exploration but expensive in tokens and latency.
Graph-of-Thoughts (Jido.AI.Strategies.GraphOfThoughts) extends branching to arbitrary graph structures with configurable max_nodes, max_depth, and aggregation_strategy (:voting, :weighted, or :synthesis). This enables convergent reasoning where multiple thought paths can be combined.
Adaptive (Jido.AI.Strategies.Adaptive) auto-selects the best strategy based on task analysis: it examines the query, available tools, and budget constraints to choose the most appropriate strategy. This is the practical answer to “when is ToT cost justified?”
TRM (Jido.AI.Strategies.TRM, Tiny Recursive Model) provides recursive reasoning for problems that benefit from iterative refinement of solutions.
Tool calling adds another dimension: argument validation, execution timing, retry policy, and result normalization. Jido.AI’s tool system (registry, adapter, executor with Jido.Action.Tool.to_tool/0 for AI conversion) creates a consistent path from model intent to BEAM action execution. This consistency is critical.
Streaming behavior is where state machines are especially valuable. Partial deltas arrive as react.llm.delta signals mapped to {:strategy_cmd, :react_llm_partial}. The machine accumulates streaming_text and streaming_thinking separately, preserving call correlation via current_llm_call_id, and decides when to transition from streaming to terminal states.
Failure modes to account for:
- Call-ID mismatch: stale LLM/tool result applied to current state (guard with
current_llm_call_idcheck). - Branch explosion: ToT/GoT traversal exceeds
max_depthormax_nodesbudget. - Tool schema mismatch: model emits malformed args repeatedly (handle via
on_error/4callback in Action). - Silent deadlock: strategy waits forever for signal not emitted (use timeout transitions in Fsmx).
Countermeasures include strict call-ID correlation checks, budget-aware traversal pruning (configurable branching_factor and max_depth), schema repair loops with escalation, and watchdog transitions (await_timeout -> error/recover).
From a hiring/interview perspective, this is high-value knowledge because it demonstrates that you can translate AI reasoning ideas into runtime-safe systems. Teams need engineers who can reason about token economics, failure surfaces, and lifecycle constraints, not just prompt templates.
In advanced systems, multiple strategies can coexist in one deployment: ReAct for tool tasks, CoT for quick logic tasks, ToT for planning tasks, and Adaptive as a meta-controller. This architecture is easier to maintain when each strategy conforms to common interfaces and emits common telemetry.
How this fit on projects You will apply this concept in projects 1, 4, 8, 14, 17, and 20.
Definitions & key terms
- Reasoning strategy: Named module under
Jido.AI.Strategies.*implementing a control-flow pattern. - Fsmx: FSM library used by Jido.AI for explicit state machine definitions with transition guards.
- ReAct.Machine: Struct with fields:
status,iteration,thread,pending_tool_calls,result,current_llm_call_id,termination_reason,streaming_text,streaming_thinking,thinking_trace,usage,started_at. - Traversal policy: Branch exploration approach for ToT:
:bfs,:dfs,:best_first. - Aggregation strategy: Convergence approach for GoT:
:voting,:weighted,:synthesis. - Termination reason: Structured explanation for loop exit (max_iterations, timeout, error, success).
Mental model diagram
[query via react.input signal]
|
v
[state:"idle"] --react_start--> [state:"awaiting_llm"]
|
+-----------+-----------+
| |
tool_calls final_answer
(react.llm.response) (react.llm.response)
| |
v v
[state:"awaiting_tool"] [state:"completed"]
|
tool_result
(react.tool.result)
|
v
[state:"awaiting_llm"] (next iteration)
|
+--> [state:"error"] on invalid transition/timeout/max_iterations
Machine fields tracked at each state:
iteration=N, pending_tool_calls={...}, current_llm_call_id="call_XXX",
streaming_text="...", streaming_thinking="...", thinking_trace=[...],
usage=%{input_tokens: N, output_tokens: N, total_cost: $X.XX}
How it works (step-by-step, with invariants and failure modes)
- Initialize strategy state from query and config.
- Emit LLM directive or tool directive based on current state.
- Consume signals and validate call identity.
- Transition state only via allowed transition map.
- Stop on completion, budget breach, or hard error.
Minimal concrete example
PSEUDOCODE (v2 ReAct.Machine with actual directive types)
machine = %ReAct.Machine{status: "idle", iteration: 0, current_llm_call_id: nil}
on react.input(query):
machine.status = "awaiting_llm"
machine.iteration = 1
machine.current_llm_call_id = "call_42"
emit %LLMStream{id: "call_42", model: :fast, system_prompt: "...", tools: [...]}
on react.llm.response(tool_calls=[{name: "calc", args: {...}}]):
machine.status = "awaiting_tool"
machine.pending_tool_calls = [%{id: "tool_001", name: "calc", arguments: %{...}, result: nil}]
emit %ToolExec{id: "tool_001", tool_name: "calc", action_module: Calc, arguments: {...}}
on react.tool.result(id="tool_001", result={value: 345}):
machine.status = "awaiting_llm"
machine.iteration = 2
machine.current_llm_call_id = "call_43"
emit %LLMStream{id: "call_43", model: :fast, context: [tool_result...]}
on react.llm.response(final_answer="445"):
machine.status = "completed"
machine.result = "445"
machine.termination_reason = :success
# Streaming partial tokens arrive as:
on react.llm.delta(content="4"):
machine.streaming_text = machine.streaming_text <> "4"
Common misconceptions
- “Strategies are just prompt styles.” No: in production they are control-flow programs.
- “Adaptive always means better results.” Not if routing heuristics are poor or unobservable.
Check-your-understanding questions
- Why should a tool result include the original call ID?
- When should Adaptive choose ReAct over ToT?
- What is the difference between
completedanderrorterminal states operationally?
Check-your-understanding answers
- To prevent stale result application and ensure causal linkage.
- When tool usage is required and branching cost is unjustified.
completedemits user-facing result;erroremits remediation/alert path.
Real-world applications
- Customer support copilots with tool verification.
- Multi-step compliance workflows requiring explicit reasoning traces.
- Planning agents for incident response and change automation.
Where you’ll apply it
- Projects 1, 4, 8, 14, 17, and 20.
References
- Jido.AI Strategies Guide
- Jido.AI State Machines Guide
- Jido.AI.Strategies.ReAct source
- Jido.AI.ReAct.Machine source
- Jido.AI.Strategies.TreeOfThoughts source
- Jido.AI.Strategies.GraphOfThoughts source
- Jido.AI.Strategies.Adaptive source
- Jido.AI.Strategies.TRM source
- Fsmx library - FSM transitions used by Jido.AI
- ReAct Paper
- Chain-of-Thought Prompting
- Tree of Thoughts
- Graph of Thoughts
Key insights Reasoning quality without explicit control flow is a demo; reasoning quality with Fsmx-backed state machines and typed Machine structs is an operable system.
Summary Treat strategy selection and reasoning loops as engineered state machines with bounded policies and observable transitions.
Homework/Exercises to practice the concept
- Define a 6-state ReAct machine including timeout and cancellation transitions.
- Compare token/latency budget between CoT and ToT for one planning task.
- Write criteria for Adaptive routing misclassification detection.
Solutions to the homework/exercises
- Include
idle,awaiting_llm,awaiting_tool,completed,timeout,error. - ToT usually spends more tokens; use only when branch quality gains justify cost.
- Track per-route success, retries, and human override frequency.
Concept 4: Unified LLM Provider Layer with ReqLLM plus Production Controls (Usage, Cost, Persistence, Scheduling, Safety)
Fundamentals
ReqLLM provides a canonical interface for multi-provider LLM operations across 45+ providers and 665+ models (via LLMDB model metadata). Your application logic is not tightly bound to one provider’s wire format. The high-level API consists of four core functions: ReqLLM.generate_text/3 (synchronous text generation), ReqLLM.stream_text/3 (streaming text generation), ReqLLM.generate_object/4 (structured output with schema validation), and ReqLLM.embed/3 (embedding generation). The low-level API exposes a provider plugin system with prepare_request, attach, encode_body, and decode_response callbacks. Combined with Jido runtime features (scheduling, persistence, worker pools, telemetry, error policies), this allows teams to run AI workflows with explicit SLOs, budgets, and recovery paths.
Deep Dive
Provider abstraction is not optional at scale. Teams start with one provider, then quickly need fallback, specialty models, or cost controls. Without abstraction, migration is expensive because model payloads, tool formats, streaming semantics, and usage metrics all differ. ReqLLM addresses this by offering canonical data structures and two usage layers: the high-level helpers (generate_text, stream_text, generate_object, embed) and the low-level Req plugin control for custom provider integration.
The streaming architecture is built on a StreamServer GenServer with backpressure via a high_watermark queue. StreamChunk types include :content (text tokens), :thinking (reasoning tokens for models that support extended thinking), :tool_call (incremental tool call data), and :meta (usage and metadata). The MetadataHandle module enables concurrent async usage collection, critical for accurate billing when multiple streams are active. This architecture means you can process partial tokens for live UIs while still accumulating complete usage data.
Canonicalization matters for both correctness and economics. Correctness: tool call structures, content parts, and responses become typed and inspectable. Economics: the ReqLLM.Billing module provides ReqLLM.Billing.calculate(usage, model) which returns line_items with a detailed cost breakdown (input cost, output cost, cache hits, image tokens). That means you can implement strategy-independent cost guardrails such as “abort if estimated spend > X” or “route to cheaper model if confidence threshold allows.”
Model metadata and provider capabilities are managed through LLMDB, which maintains a database of 45+ providers and 665+ models with per-model metadata: context_window, capabilities (tools, json, streaming, vision, thinking), input_cost, output_cost. ReqLLM’s model sync workflow (from models.dev + local patches) highlights a mature operational pattern: treat model catalogs as versioned infrastructure, not hardcoded constants. This supports repeatable testing and controlled rollout of new models.
Now combine this with BEAM operations:
- Worker pools manage expensive agent initialization and smooth latency.
- Persistence and checkpointing preserve state and thread histories across lifecycle events.
- Scheduling and cron directives enable autonomous recurring workflows, with explicit tradeoffs around at-most-once timer semantics.
- Telemetry events provide latency, error, queue, and directive execution visibility.
- Error policies provide predictable escalation behavior instead of ad hoc exception cascades.
Safety controls should be integrated into this layer, not added later. Tool permission gates, schema validation, policy filters, and human approval checkpoints are easier when data is typed and routing is explicit. This is where many teams fail: they have rich model features but weak control planes.
Failure modes to plan around:
- Provider outage or degraded latency: require fallback routing and timeout partitioning.
- Hidden cost spikes: unbounded branch strategies or tool-heavy prompts.
- Streaming metadata gaps: final usage not arriving reliably without robust collector logic.
- Timer persistence assumptions: in-memory schedules lost on crash/restart.
Mitigations:
- Multi-provider router with health and budget rules.
- Per-request budgets and cumulative daily guardrails.
- Separate telemetry for token cost, tool cost, image cost.
- Persist critical schedules externally when exactly-once semantics are required.
This concept is the difference between a clever agent and a production platform. A platform must answer: what did it do, why, at what cost, under which policy, and how fast can it recover?
How this fit on projects You will apply this concept in projects 3, 5, 6, 9, 10, 12, 18, 19, and 20.
Definitions & key terms
- Canonical model: Provider-agnostic structure for messages/tools/responses.
- StreamServer: GenServer with backpressure (
high_watermarkqueue) for streaming tokens. - StreamChunk: Typed chunk:
:content,:thinking,:tool_call,:meta. - MetadataHandle: Concurrent async usage collection for billing accuracy.
- ReqLLM.Billing:
calculate(usage, model)returningline_itemswith cost breakdown. - LLMDB: Model metadata database (45+ providers, 665+ models) with
context_window,capabilities, costs. - Usage accounting: Normalized token and tool cost tracking via Billing module.
- At-most-once scheduling: timer model where missed runs are possible on failure.
- Policy gate: Rule layer approving or rejecting risky actions.
Mental model diagram
[Agent Strategy / Directive]
|
v
[ReqLLM High-Level API]
generate_text/3 | stream_text/3 | generate_object/4 | embed/3
|
v
[LLMDB Model Metadata] --> [Provider Selection]
45+ providers, 665+ models |
context_window, capabilities |
input_cost, output_cost |
| |
v v
[ReqLLM Provider Plugin API]
prepare_request -> attach -> encode_body -> decode_response
| | |
v v v
[Provider A] [Provider B] [Provider C]
|
v
[StreamServer GenServer (backpressure via high_watermark)]
|
v
[StreamChunk: :content | :thinking | :tool_call | :meta]
|
v
[MetadataHandle (concurrent async usage)]
|
v
[ReqLLM.Billing.calculate(usage, model)] -> [line_items + cost breakdown]
|
v
[Budget Policy] -> [Telemetry] -> [Persistence/Recovery]
How it works (step-by-step, with invariants and failure modes)
- Select model/provider via policy and aliases.
- Build canonical context and tools.
- Execute generation/streaming with timeout and retries.
- Normalize response and usage into one accounting pipeline.
- Apply budget/safety decisions before next action.
- Persist key lifecycle state and emit telemetry.
Minimal concrete example
PSEUDOCODE (v2 ReqLLM API with actual function names)
# High-level synchronous generation
response = ReqLLM.generate_text("openai:gpt-4o-mini", context, tools: tools)
# response.content, response.tool_calls, response.usage
# High-level streaming
stream = ReqLLM.stream_text("anthropic:claude-haiku-4-5", context, tools: tools)
# Yields StreamChunk structs: %StreamChunk{type: :content, data: "..."}
# %StreamChunk{type: :thinking, data: "..."}
# %StreamChunk{type: :tool_call, data: %{...}}
# %StreamChunk{type: :meta, data: %{usage: ...}}
# Structured output with schema
object = ReqLLM.generate_object("openai:gpt-4o-mini", context, schema, mode: :strict)
# Returns validated object matching schema
# Embedding
embedding = ReqLLM.embed("openai:text-embedding-3-small", texts)
# Cost calculation
line_items = ReqLLM.Billing.calculate(response.usage, "openai:gpt-4o-mini")
# %{input_cost: 0.0015, output_cost: 0.0019, total: 0.0034}
# LLMDB model lookup
model_info = LLMDB.get("openai:gpt-4o-mini")
# %{context_window: 128000, capabilities: [:tools, :json, :streaming], ...}
if line_items.total > 0.02 then set state.status="degraded" and switch model_alias=:fast
Common misconceptions
- “Provider abstraction hides all differences.” No: you still need provider-specific options and capability checks.
- “Cron means guaranteed execution.” No: in-memory timers imply at-most-once behavior unless persisted externally.
Check-your-understanding questions
- Why normalize usage/cost at the provider boundary?
- When should you use worker pools instead of spawn-per-request agents?
- How do you protect against tool permission escalation?
Check-your-understanding answers
- To run unified budget policies independent of provider format.
- When initialization is expensive and predictable throughput is required.
- Directive policy gates + schema validation + approval signals.
Real-world applications
- Cost-aware enterprise copilots.
- Multi-provider failover APIs.
- Autonomous periodic maintenance/reporting agents.
Where you’ll apply it
- Projects 3, 5, 6, 9, 10, 12, 18, 19, and 20.
References
- ReqLLM README
- ReqLLM Core Concepts
- ReqLLM Usage and Billing Guide
- ReqLLM StreamServer source - Backpressure streaming
- ReqLLM Billing source - Cost calculation
- LLMDB Repository - Model metadata (45+ providers, 665+ models)
- Jido Worker Pools Guide
- Jido Persistence and Storage Guide
- Jido Observability Guide
Key insights
Provider abstraction without operational controls is portability theater; generate_text/3 + Billing.calculate/2 + LLMDB metadata is the production trifecta.
Summary ReqLLM + Jido runtime operations give you the control plane needed for reliable and economical AI systems.
Homework/Exercises to practice the concept
- Define a budget policy that demotes strategy/model based on spend thresholds.
- Design fallback rules for provider outage with latency/cost constraints.
- List which scheduled tasks must move from in-memory timers to durable schedulers.
Solutions to the homework/exercises
- Example: >$0.03 request cost triggers downgrade to
:fastmodel alias and disables ToT. - Primary provider timeout at 6s, fallback provider at 8s, hard fail at 12s.
- Billing-critical and compliance reports require durable external scheduler.
Concept 5: Action System, Instructions, and Plan DAGs
Fundamentals
The Jido.Action behaviour (defined in the jido_action package, not the core jido package) is the fundamental unit of work in Jido. Every Action is a module declared with use Jido.Action and a config including: name, description, category, tags, vsn, schema (param validation via Zoi/NimbleOptions), output_schema, and compensation (%{enabled: true, max_retries: N, timeout: N}). The config is validated at compile time using Zoi schemas – invalid configs raise CompileError. Actions implement run(params, context) returning {:ok, result}, {:ok, result, directives}, or {:error, error}. The full validation pipeline has six overridable lifecycle hooks: on_before_validate_params/1 -> schema validation -> on_after_validate_params/1 -> run/2 -> on_after_run/1 -> on_before_validate_output/1 -> output schema validation -> on_after_validate_output/1. The on_error/4 callback handles errors and compensation logic. The Jido.Action.Tool.to_tool/0 callback converts any Action into a JSON-schema tool definition compatible with LLM tool calling. Execution is handled by Jido.Exec with sub-modules: Validator, Telemetry, Retry, Compensation, Async, Chain, and Closure (default timeout 30000ms). Instructions (Jido.Instruction struct with id, action, params, context, opts) wrap Actions for execution, and Plans (Jido.Plan) organize Instructions into DAGs with dependency resolution.
Deep Dive Understanding the Action system is essential because it is where your business logic lives. Unlike frameworks that mix model interaction with business rules, Jido forces you to write Actions as pure, schema-validated modules. This has several engineering consequences.
First, the schema validation pipeline is comprehensive. When Jido.Exec.run/4 processes an Action, it runs through six overridable lifecycle hooks: on_before_validate_params/1 -> schema validation (using Zoi/NimbleOptions schema declarations) -> on_after_validate_params/1 -> run/2 -> on_after_run/1 -> on_before_validate_output/1 -> output schema validation -> on_after_validate_output/1. The Jido.Exec module delegates to specialized sub-modules: Exec.Validator (schema checks), Exec.Telemetry (event emission at [:jido, :exec, :start | :stop | :exception]), Exec.Retry (configurable backoff), Exec.Compensation (rollback on downstream failure), Exec.Async (Task-based async execution), Exec.Chain (sequential pipeline), and Exec.Closure (anonymous function wrapping). This means invalid inputs are caught before execution and invalid outputs are caught before consumption. In production, this prevents entire classes of bugs where malformed tool results silently corrupt agent state.
Second, the Jido.Instruction struct wraps an Action with its execution context. An Instruction has fields: id (unique identifier), action (the Action module), params (validated parameters), context (execution context passed to run/2), and opts (execution options like timeout and retries). Instructions are the building blocks of Plans.
Third, Jido.Plan provides DAG-based execution planning. Plan.build/2 constructs a Plan from a list of Instructions with declared dependencies. Plan.add/3 adds Instructions to an existing Plan. The Plan resolves dependencies to determine execution order, enabling parallel execution of independent Instructions. This is powerful for multi-tool workflows: if tool A and tool B are independent, they execute concurrently; if tool C depends on both, it waits. The execution engine (Jido.Exec.run/4) handles timeout, retries, and exponential backoff with a default timeout of 30000ms. Jido.Exec.run_async/3 provides Task-based async execution, and Jido.Exec.await/1 collects the result. The Exec.Chain sub-module enables sequential pipelines where the output of one Action feeds as input to the next.
Fourth, Action compensation enables rollback. When an Action declares compensation: true, the system can call its compensation logic if downstream Actions fail. Combined with max_retries and timeout configuration, this creates a robust execution model for multi-step workflows. For example, if a tool creates a resource (step 1) and the next step fails (step 2), the compensation for step 1 can clean up the created resource.
Fifth, the Jido.Action.Tool.to_tool/0 callback is the bridge between deterministic Elixir code and LLM tool calling. Any Action module that implements this callback can be automatically registered as an AI tool. The tool definition includes the name, description, and parameter schema derived from the Action’s schema. This means your tools are always schema-validated, documented, and testable independent of any LLM.
The execution engine respects BEAM process semantics: each Action runs in the context of the AgentServer or within a Task for async execution. Timeouts are enforced at the execution level, and retries use configurable backoff strategies. This means your Action execution is bounded and observable, not hidden in retry loops.
How this fit on projects You will apply this concept in projects 1, 2, 5, 7, 8, 9, 11, 16, and 20.
Definitions & key terms
-
Jido.Action: Behaviour in jido_actionpackage. Config:name,description,category,tags,vsn,schema,output_schema,compensation. Returns{:ok, result}{:ok, result, directives}{:error, error}. - Schema validation pipeline:
on_before_validate_params/1-> Zoi/NimbleOptions schema ->on_after_validate_params/1->run/2->on_after_run/1->on_before_validate_output/1-> output_schema ->on_after_validate_output/1(six hooks total). - Jido.Instruction: Struct with
id,action,params,context,optswrapping an Action for execution. - Jido.Plan: DAG-based execution plan built from Instructions with dependency resolution via
Plan.build/2,Plan.add/3. - Jido.Exec: Execution engine with
run/4,run_async/3,await/1. Sub-modules:Validator,Telemetry,Retry,Compensation,Async,Chain,Closure. Default timeout 30000ms. - Action compensation: Rollback logic triggered by
on_error/4callback when downstream steps fail. Configurable:compensation: %{enabled: true, max_retries: N, timeout: N}. - to_tool/0: Callback converting an Action into a JSON-schema tool definition for LLM function calling.
Mental model diagram
[Action Module Definition]
use Jido.Action, name: "...", schema: [...]
def run(params, context), do: {:ok, result}
def to_tool(), do: %{name: "...", description: "...", parameters: schema}
|
v
[Jido.Instruction]
%Instruction{id: "instr_1", action: MyAction, params: %{...}, context: ctx, opts: [...]}
|
v
[Jido.Plan (DAG)]
Plan.build([instr_1, instr_2, instr_3], deps: %{instr_3 => [instr_1, instr_2]})
|
v
[Jido.Exec.run/4]
+-- Validation Pipeline: before_validate -> schema -> run -> after_validate
+-- Timeout enforcement
+-- Retry with backoff
+-- Compensation on failure
|
+-----------+-----------+
| | |
v v v
[instr_1] [instr_2] (parallel, no deps)
| |
+-----+-----+
|
v
[instr_3] (depends on 1 and 2)
|
v
{:ok, final_result}
How it works (step-by-step, with invariants and failure modes)
- Define Action module with
use Jido.Actionand declareschemafor params. - Implement
run(params, context)with business logic. - Optionally implement
on_before_validate_params/1,on_after_run/1,on_error/4,to_tool/0. - Wrap Action in
Jido.Instructionstruct with params and context. - Build
Jido.Planfrom Instructions with dependency declarations. - Execute with
Jido.Exec.run/4which resolves DAG, runs validation pipeline, enforces timeouts. - On failure: retry with backoff, then compensate if configured, then propagate error.
Invariants:
- Schema validation must pass before
run/2is called. - Plan dependencies must form a DAG (no cycles).
- Timeout applies to each Instruction individually.
- Compensation is called only if the Action declared
compensation: trueand a downstream step fails.
Failure modes:
- Schema validation failure: Invalid params rejected before execution.
- Timeout exceeded: Action killed and error returned.
- Retry exhaustion: All retries fail, compensation triggered if enabled.
- DAG cycle: Plan construction fails at build time.
Minimal concrete example
PSEUDOCODE (v2 Action + Plan + Exec)
defmodule FetchWeather do
use Jido.Action,
name: "fetch_weather",
schema: [city: [type: :string, required: true]]
def run(%{city: city}, _ctx) do
{:ok, %{temp: 72, conditions: "sunny"}}
end
def to_tool do
%{name: "fetch_weather", description: "Get weather for a city",
parameters: %{city: %{type: "string", required: true}}}
end
end
# Build instruction
instr = %Instruction{id: "w1", action: FetchWeather, params: %{city: "NYC"}}
# Execute directly
{:ok, result} = Jido.Exec.run(FetchWeather, %{city: "NYC"}, ctx, timeout: 5000)
# Or build a Plan with dependencies
plan = Plan.build([instr_fetch, instr_analyze], deps: %{instr_analyze => [instr_fetch]})
{:ok, results} = Jido.Exec.run(plan, ctx, timeout: 10000, max_retries: 2)
Common misconceptions
- “Actions are just functions.” No: they are schema-validated, lifecycle-managed units with compensation, timeout, and retry semantics.
- “Plans are sequential pipelines.” No: Plans are DAGs; independent Instructions execute in parallel.
- “to_tool is only for LLMs.” No: tool definitions can also be used for documentation, testing, and API generation.
Check-your-understanding questions
- What happens if an Action’s output schema validation fails after
run/2succeeds? - Why does
Jido.Planrequire a DAG rather than allowing arbitrary graphs? - How does Action compensation differ from retry?
Check-your-understanding answers
- The result is rejected and an error is returned, even though
run/2succeeded. This prevents invalid data from entering agent state. - DAGs guarantee a topological execution order without infinite loops. Cycles would create deadlocks.
- Retry re-executes the same Action hoping for success. Compensation undoes the effect of a previously successful Action when a downstream step fails.
Real-world applications
- Multi-tool agent workflows where tools have dependencies (fetch data, then analyze, then summarize).
- Automated deployment pipelines with rollback on failure.
- Data processing pipelines with schema validation at every stage.
Where you’ll apply it
- Projects 1, 2, 5, 7, 8, 9, 11, 16, and 20.
References
- Jido.Action source (jido_action package) - The
Jido.Actionbehaviour definition - Jido.Exec source (jido_action package) - Execution engine with Validator, Retry, Compensation, Async, Chain, Closure sub-modules
- Jido.Instruction source
- Jido.Plan source
- Jido.Action.Tool source
Key insights Actions are the atomic unit of reliability in Jido; Plans are the composition mechanism; Exec is the bounded executor. Together they make multi-step workflows deterministic and recoverable.
Summary The Action/Instruction/Plan/Exec stack provides schema-validated, DAG-scheduled, timeout-bounded, compensation-capable execution for all agent workflows.
Homework/Exercises to practice the concept
- Write an Action with both input schema and output schema validation. Inject an invalid output and verify the pipeline catches it.
- Build a 3-Instruction Plan where two Instructions are independent and one depends on both. Verify parallel execution.
- Implement Action compensation for a “create resource” Action and trigger it by failing the next step.
Solutions to the homework/exercises
- Declare
output_schema: [result: [type: :map, required: true]]in the Action options. Return a non-map value and assert{:error, _}. - Use
Plan.build/2with deps map. Instrument each Action with timestamps. Verify the two independent ones overlap. - Add
compensation: trueto the Action options. Implementcompensate/2callback. Force downstream failure and verify compensation is called.
Concept 6: Plugin and Skill Composition Architecture
Fundamentals
Jido’s Plugin and Skill systems provide modular composition for agent capabilities. A Plugin (Jido.Plugin) extends agent behavior through runtime hooks: mount (initialization), handle_signal (signal processing), and transform_result (output transformation). Each Plugin declares a Jido.Plugin.Spec struct (defined in Jido.Plugin.Spec) containing: module, name, state_key, description, category, vsn, schema, config_schema, config, signal_patterns, tags, and actions. The state_key field provides state isolation so multiple plugins can coexist without namespace collisions. The config_schema validates plugin configuration at mount time, while signal_patterns declares which signal types the plugin subscribes to. Signal routes from plugins are merged with strategy and agent routes using deterministic precedence: strategy > agent > plugin. Skills (Jido.AI.Skill) provide a higher-level abstraction for prompt-driven capabilities with use Jido.AI.Skill macro. Skills declare name, description, license, allowed_tools, actions, plugins, and body/body_file (system prompt content). The Jido.AI.Skill.Loader loads skill definitions from SKILL.md markdown files at runtime, and Jido.AI.Skill.Registry manages available skills for lookup. Skills implement callbacks: manifest/0, body/0, allowed_tools/0, actions/0, and plugins/0.
Deep Dive The Plugin system solves a fundamental agent architecture problem: how do you add capabilities without creating a monolith? In many frameworks, adding a new tool or behavior means modifying the core agent code. In Jido, plugins are self-contained modules with explicit boundaries.
The Jido.Plugin.Spec struct declares everything the plugin needs and provides: module (the plugin module), name (human-readable identifier), state_key (isolated state namespace in the agent’s state map), description, category, vsn (version), schema (state shape validation), config_schema (configuration validation at mount time), config (runtime configuration values), signal_patterns (patterns the plugin subscribes to), tags (metadata labels), and actions (Action modules the plugin provides). This Spec is introspectable at boot time, enabling composition validation before the agent starts processing. Note that signal routing for plugins is handled through the signal_patterns field rather than explicit route maps – the AgentServer merges these patterns into the overall route table during initialization.
The mount callback initializes plugin state under its state_key in the agent’s state map. This isolation is critical: if plugin A uses state_key: :chat and plugin B uses state_key: :memory, they cannot accidentally corrupt each other’s state. The handle_signal callback processes signals relevant to the plugin, and transform_result allows plugins to modify results before they reach the caller.
Signal routes from plugins are merged into the agent’s route table with deterministic precedence: strategy routes take priority, then agent routes, then plugin routes. This means if a strategy and a plugin both claim to handle the same signal type, the strategy wins. This precedence is essential for predictable behavior and should be logged at boot time for debugging.
The Skill system (Jido.AI.Skill) operates at a higher level. While plugins provide runtime behavior (actions, signal handling, state management), Skills provide prompt-level capabilities. A Skill module is defined with use Jido.AI.Skill and declares options: name, description, license, allowed_tools (list of tool names the skill may invoke), actions (Action modules it provides), plugins (Plugin modules it depends on), and body/body_file (the system prompt content, either inline or from a file path). The Skill implements callbacks: manifest/0 (returns the Spec), body/0 (returns the system prompt text), allowed_tools/0, actions/0, and plugins/0. The Jido.AI.Skill.Loader loads skill definitions from SKILL.md markdown files at runtime – a powerful pattern for dynamic capability injection without code changes. The Jido.AI.Skill.Registry manages available skills for lookup by name.
When a skill is loaded, it brings a system prompt (from body/0 or body_file), allowed tools list, and associated actions/plugins. The rendered prompt includes only the tools that the skill is authorized to use via allowed_tools, implementing least-privilege at the prompt level. This is complementary to runtime tool permission enforcement: the prompt restricts what the model can attempt, and the runtime policy restricts what can actually execute.
The interplay between Plugins and Skills creates a layered architecture: Plugins handle runtime behavior and state, Skills handle prompt construction and tool scoping, and the agent’s strategy coordinates them all. This separation enables independent evolution: you can update a skill’s prompt without changing plugin code, add a new plugin without modifying skills, or replace a strategy without touching either.
Composition order matters. Plugins mount in declaration order, and later plugins can depend on earlier ones (via requires). Conflicting signal routes from different plugins are resolved by declaration order within the plugin layer. This is deterministic but must be documented and tested.
How this fit on projects You will apply this concept in projects 7, 11, 15, 16, 18, and 20.
Definitions & key terms
- Jido.Plugin: Module extending agent behavior via
mount,handle_signal,transform_resultcallbacks. - Jido.Plugin.Spec: Struct declaring plugin metadata:
module,name,state_key,description,category,vsn,schema,config_schema,config,signal_patterns,tags,actions. - state_key: Isolated namespace for plugin state within the agent state map. Must be unique across all mounted plugins.
- config_schema: Plugin configuration validation schema, checked at mount time.
- Jido.AI.Skill: Higher-level prompt-driven capability defined with
use Jido.AI.Skill. Declaresname,description,license,allowed_tools,actions,plugins,body/body_file. - Skill callbacks:
manifest/0,body/0,allowed_tools/0,actions/0,plugins/0. - SKILL.md: Markdown file format for defining skills loaded at runtime by
Jido.AI.Skill.Loader. - allowed_tools: Per-skill tool whitelist implementing least-privilege at the prompt level.
- Route precedence: Deterministic merge order: strategy > agent > plugin.
Mental model diagram
[Agent State Map]
|
+-- :strategy_state (owned by active strategy)
+-- :chat (owned by ChatPlugin, isolated by state_key)
+-- :memory (owned by MemoryPlugin, isolated by state_key)
+-- :tools (owned by ToolPlugin, isolated by state_key)
[Plugin Spec]
%Jido.Plugin.Spec{
module: ChatPlugin,
name: "chat",
state_key: :chat,
description: "Conversational chat capabilities",
category: :communication,
vsn: "1.0.0",
schema: [entries: [type: :list]],
config_schema: [max_history: [type: :integer, default: 100]],
config: %{max_history: 100},
signal_patterns: ["chat.*"],
tags: [:conversation, :llm],
actions: [ChatAction, SummarizeAction]
}
[Skill Loading]
SKILL.md (on disk) --> Jido.AI.Skill.Loader --> Skill.Registry
|
v
[Skill.Prompt.render]
system_prompt + allowed_tools filter
|
v
[Strategy receives scoped prompt + tools]
[Signal Route Merge Precedence]
Strategy routes: react.* -> {:strategy_cmd, :react_*} (HIGHEST)
Agent routes: agent.* -> {:agent_cmd, :handle_*} (MIDDLE)
Plugin routes: chat.* -> {:plugin_cmd, :handle_chat} (LOWEST)
How it works (step-by-step, with invariants and failure modes)
- Plugin modules declare
Jido.Plugin.Specwith state_key, config_schema, signal_patterns, actions. - On agent boot, plugins mount in declaration order via
mountcallback. - Each plugin initializes its state under its
state_keyin the agent state map. - Signal routes from all plugins are merged with strategy and agent routes using precedence rules.
- Skills are loaded from
SKILL.mdfiles byJido.AI.Skill.Loaderand registered inSkill.Registry. - When a strategy needs a prompt,
Skill.Prompt.renderproduces system prompt + filtered tool list. - Runtime signals are routed through the merged route table; plugins handle signals matching their patterns.
Invariants:
state_keymust be unique across all mounted plugins.- Plugin
requiresmust be satisfied by already-mounted plugins. signal_routesfrom different plugins must not create ambiguous matches within the plugin layer.allowed_toolsin skills must be a subset of tools available in the runtime registry.
Failure modes:
- State key collision: Two plugins use the same
state_key, causing data corruption. - Unresolved dependency: Plugin A requires Plugin B which is not mounted.
- Route conflict: Two plugins claim the same signal pattern with different handlers.
- Skill tool mismatch: Skill allowlists reference tools not registered in the runtime.
Minimal concrete example
PSEUDOCODE (v2 Plugin + Skill composition with actual structs)
# Plugin definition
defmodule MemoryPlugin do
use Jido.Plugin
def spec do
%Jido.Plugin.Spec{
module: __MODULE__,
name: "memory",
state_key: :memory,
description: "Long-term memory storage for agent",
config_schema: [max_entries: [type: :integer, default: 1000]],
config: %{max_entries: 1000},
signal_patterns: ["memory.*"],
actions: [StoreMemory, RecallMemory]
}
end
def mount(agent_state, config) do
put_in(agent_state, [:memory], %{entries: [], max_entries: config.max_entries})
end
def handle_signal(%{type: "memory.store"} = signal, state) do
# Process memory storage signal
{:ok, updated_state, []}
end
end
# Skill module definition
defmodule IncidentAnalyst do
use Jido.AI.Skill,
name: "incident-analyst",
description: "Analyzes production incidents",
allowed_tools: ["search_logs", "summarize", "create_ticket"],
actions: [SearchLogs, Summarize, CreateTicket],
body_file: "priv/skills/incident_analyst.md"
# Callbacks: manifest/0, body/0, allowed_tools/0, actions/0, plugins/0
# are auto-generated by use Jido.AI.Skill
end
# Or load from SKILL.md at runtime:
{:ok, skill_spec} = Jido.AI.Skill.Loader.load("priv/skills/SKILL.md")
Jido.AI.Skill.Registry.register("incident-analyst", skill_spec)
You are an incident analyst. Use available tools to...
# At boot:
# 1. Mount plugins -> state = %{memory: %{entries: [], ...}, chat: %{...}}
# 2. Merge routes -> strategy routes + agent routes + plugin routes
# 3. Load skills -> Registry.register("incident-analyst", skill_spec)
# 4. Render prompt -> Skill.Prompt.render("incident-analyst") -> system_prompt + 3 tools
Common misconceptions
- “Plugins and Skills are the same thing.” No: Plugins handle runtime behavior and state; Skills handle prompt construction and tool scoping.
- “Plugin order does not matter.” Yes it does: mount order determines dependency resolution and route priority within the plugin layer.
- “Skills can access any tool.” No: Skills declare
allowed_toolswhich filters the tool list in the rendered prompt.
Check-your-understanding questions
- What happens if two plugins declare the same
state_key? - How does route precedence prevent plugin routes from overriding strategy routes?
- Why are Skills loaded from markdown files rather than compiled modules?
Check-your-understanding answers
- Data corruption: both plugins read/write the same state namespace, causing unpredictable behavior. This should be caught at mount time with a validation check.
- During route merge, strategy routes are checked first. If a match is found at the strategy level, plugin routes are never consulted for that signal type.
- Markdown files enable runtime loading without code recompilation, supporting dynamic capability injection, A/B testing of prompts, and non-developer skill authoring.
Real-world applications
- Modular agent marketplaces where plugins can be installed/removed independently.
- Multi-tenant systems where different tenants have different skill configurations.
- Composable assistant systems where capabilities are added based on user role or subscription tier.
Where you’ll apply it
- Projects 7, 11, 15, 16, 18, and 20.
References
- Jido.Plugin source
- Jido.Plugin.Spec source
- Jido.AI.Skill source
- Jido.AI.Skill.Loader source
- Jido.AI.Skill.Registry source
- Jido.AI.Skill.Prompt source
Key insights Plugins provide runtime extensibility with state isolation; Skills provide prompt-level capability scoping. Together they enable composable agents without monolithic coupling.
Summary Plugin Specs with state_key isolation + Skill SKILL.md loading create a layered composition architecture where runtime behavior, state management, and prompt capabilities evolve independently.
Homework/Exercises to practice the concept
- Write two plugins with different state_keys and verify they cannot corrupt each other’s state.
- Create a SKILL.md file with an
allowed_toolslist and verify that rendered prompts only include those tools. - Test route precedence by creating a strategy route and a plugin route for the same signal type. Verify the strategy route wins.
Solutions to the homework/exercises
- Mount both plugins, write to each state_key, and assert the other is unchanged. Add a boot-time check for duplicate state_keys.
- Load the SKILL.md, render the prompt, and assert the tool list is the intersection of
allowed_toolsand the runtime registry. - Emit a signal matching both routes, log which handler fires, and assert it is the strategy handler.
Glossary
- Action (
Jido.Action): Behaviour module implementingrun(params, context)with schema validation pipeline. The atomic unit of work in Jido. - AgentServer: GenServer that processes signals, routes to strategies, and executes directives.
- Bus (
Jido.Signal.Bus): GenServer-based pub/sub system for signal distribution. - Core Directives (
Jido.Agent.Directive): BEAM-level effect structs:Emit,Error,Spawn,SpawnAgent,StopChild,Schedule,Stop,Cron,CronCancel. Helper constructors:Directive.emit/2,Directive.spawn/2,Directive.spawn_agent/3,Directive.schedule/2,Directive.cron/3,Directive.cron_cancel/1,Directive.emit_to_parent/3. - AI Directives (
Jido.AI.Directive): LLM/tool effect structs:LLMStream,ToolExec,LLMGenerate,LLMEmbed,EmitToolError,EmitRequestError. - DirectiveExec (
Jido.AgentServer.DirectiveExec): Protocol withexec/3callback for polymorphic directive execution. Each directive type implements this protocol. - Dispatch (
Jido.Signal.Dispatch): Multi-adapter signal dispatcher supporting pid, pubsub, http, webhook, logger, console, noop, named targets. Three modes: syncdispatch/2, asyncdispatch_async/2, batcheddispatch_batch/3. - Exec (
Jido.Exec): Execution engine withrun/4,run_async/3,await/1. Sub-modules: Validator, Telemetry, Retry, Compensation, Async, Chain, Closure. Default timeout 30000ms. - Fsmx: FSM library used by Jido.AI for explicit state machine definitions with transition guards.
- Idempotency Key: Identifier that prevents duplicate effect execution.
- Instruction (
Jido.Instruction): Struct wrapping an Action withid,action,params,context,optsfor execution. - Journal (
Jido.Signal.Journal): Event persistence layer with InMemory, ETS, and Mnesia backends. - LLMDB: Model metadata database tracking 45+ providers and 665+ models with context_window, capabilities, and costs.
- MetadataHandle: Concurrent async usage collection module for accurate billing across multiple streams.
- Netsplit: Temporary network partition between distributed BEAM nodes.
- Plan (
Jido.Plan): DAG-based execution plan built from Instructions with dependency resolution viaPlan.build/2. - Plugin (
Jido.Plugin): Module extending agent behavior viamount,handle_signal,transform_resultcallbacks withSpecmetadata. - Plugin Spec (
Jido.Plugin.Spec): Struct declaring plugin metadata:module,name,state_key,description,category,vsn,schema,config_schema,config,signal_patterns,tags,actions. - Router (
Jido.Signal.Router): Trie-based pattern matcher for signals supporting*(single-level) and**(multi-level) wildcards. - Signal: A typed event envelope used for routing and feedback, aligned with CloudEvents.
- Skill (
Jido.AI.Skill): Prompt-driven capability withSpec,Loader,Registry,Promptmodules. Loaded from SKILL.md files. - StateOp: In-strategy state mutation operation:
SetState,ReplaceState,DeleteKeys,SetPath,DeletePath. - Strategy: Reasoning/control policy module under
Jido.AI.Strategies.*(ReAct, CoT, ToT, GoT, Adaptive, TRM). - StreamChunk: Typed streaming chunk from ReqLLM:
:content,:thinking,:tool_call,:meta. - Thread: Conversation history/context maintained across interactions within a strategy or agent session.
- Usage Telemetry: Token/cost/latency metrics used for operational control, calculated via
ReqLLM.Billing. - Worker Pool: Pre-warmed bounded set of workers for low-latency execution.
- Zoi: Schema validation library used throughout Jido for struct definitions (via
Zoi.struct/3) and compile-time config validation. Actions, Directives, and Plugin Specs all use Zoi schemas. Invalid configs raiseCompileErrorbefore the module loads.
Why Jido + BEAM Agent Engineering Matters
- AI-native engineering is now default behavior: Stack Overflow Developer Survey 2025 reports 84% of respondents are using or planning to use AI tools in development, and 50.6% of professional developers report daily AI-tool usage.
- Agent workflows are now operational, not experimental: In the same 2025 survey, about 70% of AI-agent users report reduced task time and 69% report productivity gains, while only 17% report improved team collaboration. This gap is exactly where reliability, policy, and observability engineering matters.
- Open event standards matured: CloudEvents was approved as a CNCF Graduated project on January 25, 2024, and the project page lists cross-cloud adopters (for example AWS EventBridge, Azure Event Grid, Google Eventarc, Knative Eventing).
- BEAM remains uniquely suitable for autonomous loops: Erlang/OTP docs state a default process limit of 1,048,576 processes, configurable with
+Pup to 134,217,727, which is a practical fit for isolated-agent process topologies. - Jido ecosystem momentum (as of February 12, 2026 UTC):
agentjido/jido: 887 GitHub stars; Hex latest2.0.0-rc.4, latest stable1.2.0, 16,125 all-time downloads.agentjido/req_llm: 383 GitHub stars; Hex latest1.5.1, 30,659 all-time downloads.agentjido/jido_ai: 114 GitHub stars; Hex latest0.5.2, 3,930 all-time downloads.agentjido/jido_signal: Signal infrastructure library (Bus, Router, Dispatch, Journal) providing the event backbone.agentjido/llm_db: LLMDB model metadata covering 45+ providers and 665+ models with per-model context_window, capabilities, input_cost, output_cost data.agentjido/jido_browser: Browser automation library for multimodal agent pipelines.agentjido/jido_studio: LiveView-based agent observation and HITL control center.
Context & Evolution
Early LLM systems optimized for one-shot prompts and short-lived request handlers. Modern agent systems are long-running control loops that need explicit event contracts, bounded retries, budget-aware routing, permission gates, and replayable traces. The shift is from “prompt integration” to “autonomous runtime engineering.” Jido v2 crystallizes this shift with Actions (run/2 with Zoi schema validation and 6 lifecycle hooks), StateOps (deterministic state mutations via Jido.Agent.StateOp.*), two directive families (Core from Jido.Agent.Directive for BEAM operations, AI from Jido.AI.Directive for LLM/tool operations), and the Signal Bus/Router/Dispatch/Journal infrastructure from the jido_signal package.
Old "LLM app" pattern New BEAM-native agent pattern (Jido v2)
-------------------------------------- ------------------------------------------------
Prompt -> Provider -> Text Signal -> Bus -> Router -> Strategy FSM (Fsmx)
(single call, minimal control) Action.run(params, ctx) -> StateOps + Directives
No schema validation Zoi schema validation at compile + runtime
Side effects mixed with logic Core Directives (BEAM) + AI Directives (LLM/tool)
No state contract DirectiveExec protocol for polymorphic execution
No cost visibility AgentServer drain loop under supervision
Feedback signals via Dispatch close the loop
Journal (InMemory|ETS|Mnesia) for replay + audit
Billing.calculate + LLMDB enforce cost boundaries
Plugin.Spec + Skill.allowed_tools for composition
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Deterministic Agent Core and Directives | Jido.Action behaviour (6 lifecycle hooks, Zoi validation), StateOp.* for atomic state mutations, Core Directives (Emit/Spawn/SpawnAgent/Cron/Stop) + AI Directives (LLMStream/ToolExec) via DirectiveExec protocol; keep transitions pure and auditable. |
| Signal Contracts and BEAM Routing | Bus/Router/Dispatch/Journal provide typed event infrastructure; route with deterministic precedence. |
| Reasoning Strategies as State Machines | ReAct/CoT/ToT/GoT/Adaptive/TRM as Fsmx-backed bounded control-flow systems, not prompt tricks. |
| ReqLLM + Production Control Plane | 45+ providers, 665+ models via LLMDB; generate_text/3 + Billing.calculate/2 for budgets, observability, safety. |
| Action System, Instructions, and Plan DAGs | Jido.Action behaviour (jido_action package) with Zoi schema pipeline, Instruction structs, Plan DAG execution with Jido.Exec sub-modules (Validator, Retry, Compensation, Chain, Async). |
| Plugin and Skill Composition | Plugin Specs (Jido.Plugin.Spec) with state_key isolation, Skill use Jido.AI.Skill with allowed_tools/body/body_file, SKILL.md runtime loading, deterministic route merge precedence. |
Project-to-Concept Map
| Project | Concepts Applied |
|---|---|
| Project 1 | Deterministic Agent Core, Reasoning Strategies, Action System |
| Project 2 | Signal Contracts, Deterministic Agent Core, Action System |
| Project 3 | ReqLLM Control Plane, Deterministic Agent Core |
| Project 4 | Reasoning Strategies, ReqLLM Control Plane |
| Project 5 | ReqLLM Control Plane, Signal Contracts, Action System |
| Project 6 | ReqLLM Control Plane, Deterministic Agent Core |
| Project 7 | Signal Contracts, Deterministic Agent Core, Plugin/Skill Composition, Action System |
| Project 8 | Reasoning Strategies, Deterministic Agent Core, Action System |
| Project 9 | ReqLLM Control Plane, BEAM Routing, Action System |
| Project 10 | ReqLLM Control Plane, Deterministic Agent Core |
| Project 11 | Signal Contracts, BEAM Routing, Plugin/Skill Composition, Action System |
| Project 12 | ReqLLM Control Plane, Signal Contracts |
| Project 13 | Signal Contracts, BEAM Routing |
| Project 14 | Reasoning Strategies, ReqLLM Control Plane |
| Project 15 | Signal Contracts, ReqLLM Control Plane, Plugin/Skill Composition |
| Project 16 | Deterministic Agent Core, ReqLLM Control Plane, Plugin/Skill Composition, Action System |
| Project 17 | Reasoning Strategies, ReqLLM Control Plane |
| Project 18 | ReqLLM Control Plane, Signal Contracts, Plugin/Skill Composition |
| Project 19 | BEAM Routing, ReqLLM Control Plane |
| Project 20 | All six concept clusters |
Deep Dive Reading by Concept
| Concept | Book and Chapter | Why This Matters |
|---|---|---|
| Deterministic Agent Core and Directives | “Designing Elixir Systems with OTP” - supervision and process boundaries chapters | Teaches reliable state/effect separation under OTP. |
| Signal Contracts and BEAM Routing | “Erlang and OTP in Action” - distributed messaging and supervision chapters | Connects message protocol design with fault boundaries. |
| Reasoning Strategies as State Machines | “AI Engineering” by Chip Huyen - agent workflows + evaluation sections | Frames reasoning patterns as engineering systems. |
| ReqLLM + Production Control Plane | “Designing Data-Intensive Applications” - reliability and observability themes | Helps reason about fault tolerance, consistency, and operational tradeoffs. |
| Action System, Instructions, and Plan DAGs | “Designing Elixir Systems with OTP” - data transformation and validation chapters | Teaches schema-driven pipeline design with compensation. |
| Plugin and Skill Composition | “Domain-Driven Design” - bounded contexts and context mapping | Frames modular capability composition as bounded context integration. |
Quick Start: Your First 48 Hours
Day 1:
- Read the entire
## Theory Primer. - Clone
jido,jido_ai, andreq_llmand inspect their guides. - Build Project 1 and produce deterministic command transcripts.
Day 2:
- Validate Project 1 against its Definition of Done.
- Start Project 2 and add failure-mode tests for malformed tool arguments.
- Record one page of lessons on state invariants and routing mistakes.
Recommended Learning Paths
Path 1: The Reliability Engineer
- Project 1 -> Project 3 -> Project 9 -> Project 13 -> Project 19 -> Project 20
Path 2: The AI Product Builder
- Project 1 -> Project 2 -> Project 4 -> Project 5 -> Project 6 -> Project 15 -> Project 20
Path 3: The Research-Oriented Agent Engineer
- Project 1 -> Project 8 -> Project 14 -> Project 17 -> Project 18 -> Project 20
Success Metrics
- You can explain and defend state invariants for every strategy transition.
- You can run one workload across at least two providers with stable behavior and tracked cost.
- You can recover from at least three injected failures (provider timeout, child crash, netsplit) without manual emergency patches.
- You can show one capstone transcript with policy-compliant autonomous behavior.
Optional Domain Appendices
Operational Debugging Checklist
- Verify every signal includes
correlation_id,causation_id, andschema_version. - Verify every tool execution has an idempotency key and timeout budget.
- Verify every terminal state includes a machine-readable
termination_reason. - Verify policy denials emit explicit audit events (not silent drops).
Common Failure Signatures
| Symptom | Probable Cause | First Diagnostic |
|---|---|---|
| Agent loops without completion | Missing or weak termination guards (max_iterations, timeout fences) |
Inspect strategy state transitions for repeated non-progress edges |
| Wrong tool result applied to current state | Missing call_id/version checks |
Compare pending_tool_calls list to incoming tool_result |
| Cost spikes during complex prompts | Adaptive router escalates strategy/model too aggressively | Plot usage telemetry by route and enforce downgrade thresholds |
| Burst events overwhelm runtime | Unbounded ingress or worker saturation | Track mailbox/queue depth and apply batch+backpressure controls |
| Cron jobs skipped after restart | In-memory timer assumption | Compare scheduled jobs against persisted last_run_at checkpoints |
Golden Evidence Pack (per project)
- One deterministic transcript that includes signal, directive, and terminal state sequence.
- One failure-injection transcript showing recovery behavior.
- One metric snapshot (latency, retries, cost, policy denials).
- One short postmortem note describing what invariant failed or held.
Project Overview Table
| # | Project | Focus | Difficulty | Time |
|---|---|---|---|---|
| 1 | Signal-Native ReAct Calculator Agent | deterministic Action/StateOp/Directive loop + tools | Level 2 | 8-12h |
| 2 | Tool-Governed Web Research Agent | tool contracts + safe routing | Level 2 | 10-14h |
| 3 | Multi-Provider Failover Gateway | provider abstraction + fallback | Level 3 | 12-18h |
| 4 | Streaming Observability Console | token streaming + telemetry UI | Level 3 | 12-18h |
| 5 | Structured Output Contracts | schema-first object generation | Level 2 | 8-12h |
| 6 | Cost-Aware Model Router | spend-aware policy routing | Level 3 | 12-18h |
| 7 | Skill/Plugin Composition Lab | modular capabilities + routing | Level 3 | 12-18h |
| 8 | Strategy State Machine Switchboard | adaptive strategy orchestration | Level 4 | 16-24h |
| 9 | Worker Pool Load Lab | bounded concurrency + latency | Level 3 | 12-18h |
| 10 | Persistent Thread Memory | checkpoint + journal lifecycle | Level 3 | 12-20h |
| 11 | Sensor-Driven Incident Triage | event bridges + reactive ops | Level 3 | 12-20h |
| 12 | Cron Autonomous Maintenance | recurring jobs + reliability | Level 3 | 12-20h |
| 13 | Distributed Netsplit Recovery Drill | cluster fault recovery | Level 4 | 18-28h |
| 14 | ETS/Mnesia Hybrid Agent Memory | high-speed memory + consistency | Level 4 | 18-28h |
| 15 | LiveView HITL Control Center | human approval workflows | Level 3 | 14-20h |
| 16 | Tool Permission Firewall | safety policy engine | Level 4 | 16-24h |
| 17 | Red-Team Eval Harness | adversarial testing + scoring | Level 4 | 16-24h |
| 18 | Multimodal Agent Pipeline | image+text+tools workflows | Level 4 | 16-24h |
| 19 | Hot Upgrade Release Drill | runtime upgrades under load | Level 5 | 24-36h |
| 20 | BEAM Autonomous Ops Swarm (Capstone) | end-to-end autonomous platform | Level 5 | 30-50h |
Project List
The following projects guide you from single-agent deterministic loops to distributed, policy-controlled autonomous systems on Elixir/BEAM.
Project 1: Signal-Native ReAct Calculator Agent
- File:
P01-react-calculator-agent.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang, Gleam, Rust (sidecar tools)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The Micro-SaaS / Pro Tool
- Difficulty: Level 2: Intermediate
- Knowledge Area: Deterministic agent loops, tool calls, signal routing
- Software or Tool:
jido,jido_ai,req_llm - Main Book: “Designing Elixir Systems with OTP” by James Edward Gray II and Bruce A. Tate
What you will build: A ReAct calculator agent that only answers through validated tool calls and emits auditable react.* signals.
Why it teaches Jido deeply: You implement the exact Action.run/2 -> {:ok, result, directives} contract with StateOps and observe how react.llm.response and react.tool.result signals close the loop.
Core challenges you will face:
- Routing
react.*signals correctly -> maps to strategysignal_routes/1 - Normalizing JSON tool args -> maps to
Jido.AI.Executorbehavior - Converging to terminal state -> maps to bounded
max_iterations
Real World Outcome
You run one deterministic demo where the model asks for calculator, the tool executes, and the final answer is emitted only after the tool result signal.
$ mix run scripts/p01_react_calculator_demo.exs
[boot] agent=calculator_agent strategy=ReAct model=:fast
[signal] type=react.user_query payload="what is (15*23)+100?"
[directive] LLMStream id=call_001
[signal] type=react.llm.response result=tool_calls tool=calculator args={"a":15,"b":23,"operation":"multiply"}
[directive] ToolExec id=tool_001 name=calculator
[signal] type=react.tool.result tool=calculator ok=true result={"value":345}
[directive] LLMStream id=call_002
[signal] type=react.llm.response result=final_answer text="445"
[done] status=completed iterations=2 total_cost_usd=0.0009
The Core Question You Are Answering
“How do I make model reasoning observable and reproducible instead of magical?”
Concepts You Must Understand First
Action.run/2determinism, StateOps, and directives- Can the same input produce different directives in your implementation?
- Book Reference: “Designing Elixir Systems with OTP” - supervision and state boundaries
- Tool argument normalization and validation
- How do string-key JSON args become typed action params safely?
- Book Reference: “Clean Architecture” - boundary validation patterns
- Signal lifecycle in ReAct
- Which signal means “tool request” versus “final answer”?
- Book Reference: “Operating Systems: Three Easy Pieces” - event-loop mental model
Questions to Guide Your Design
- State design
- Which fields are mandatory (
status,iteration,pending_tool_calls,usage)? - What transition is illegal and must fail closed?
- Which fields are mandatory (
- Tool execution
- How do you correlate
tool_call_idbetween LLM output and tool result? - Where do you enforce
max_iterations?
- How do you correlate
- Operational evidence
- Which log line proves the answer came from tool output, not hallucination?
Thinking Exercise
Sketch the loop for two turns: user query -> LLM tool call -> tool result -> final answer. Mark exactly where state mutates.
The Interview Questions They Will Ask
- “Why not call the tool directly from the strategy?”
- “How do you prevent stale
ToolResultsignals from mutating current state?” - “What metric tells you ReAct loops are stuck?”
- “How do you prove deterministic behavior in tests?”
- “Where do you cap cost and iteration count?”
Hints in Layers
Hint 1: Start with one tool only Model/tool complexity is easier once one tool path is perfect.
Hint 2: Enforce strict signal types
Reject unknown react.* events early.
Hint 3: Keep a call-id ledger
Track current_llm_call_id and pending_tool_calls in state.
Hint 4: Capture a golden transcript Use a fixed prompt and model alias for reproducibility.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| OTP state boundaries | “Designing Elixir Systems with OTP” | Process boundaries chapters |
| Defensive contracts | “Clean Architecture” | Interface adapters |
| Event loops | “Operating Systems: Three Easy Pieces” | Concurrency intro |
Common Pitfalls and Debugging
Problem 1: “Agent never leaves awaiting_tool”
- Why: Tool result signal type mismatch or missing
call_id. - Fix: Validate incoming
react.tool.resultschema and correlation id. - Quick test: Inject a valid and invalid
ToolResultand assert only valid transitions.
Problem 2: “Final answer arrives without tool execution”
- Why: Tool calls not enforced in prompt/policy.
- Fix: Add explicit tool-required policy for arithmetic prompts.
- Quick test: Run 20 math prompts and assert at least one tool directive per prompt.
Definition of Done
- ReAct loop produces both
react.llm.responseandreact.tool.result - Tool arguments are normalized and validated before action execution
- Iteration and cost limits are enforced with explicit terminal reasons
- Golden transcript is reproducible across runs
Project 2: Tool-Governed Web Research Agent
- File:
P02-web-research-tool-agent.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang, Gleam, Rust
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Policy routing, tool governance, source grounding
- Software or Tool:
jido_ai,req_llmweb search,jido_signal - Main Book: “The Pragmatic Programmer”
What you will build: A research agent that can use web search only through an allowlist and returns citation-first answers.
Why it teaches Jido deeply: It forces you to separate model planning from policy enforcement and to handle tool-deny branches as first-class transitions.
Core challenges you will face:
- Allow/deny gate for tools -> maps to directive pre-execution policy
- Domain filtering and citation format -> maps to tool result post-processing
- Safe fallback when search fails -> maps to fail-closed transitions
Real World Outcome
$ mix run scripts/p02_research_agent_demo.exs "latest jido release notes"
[policy] allowed_tools=[web_search,fetch_url] denied=[bash,fs_write]
[signal] react.user_query "latest jido release notes"
[directive] LLMStream call_id=call_100
[signal] react.llm.response type=tool_calls tool=web_search
[directive] ToolExec tool=web_search args={"query":"agentjido jido changelog"}
[signal] react.tool.result tool=web_search count=5
[citation] 1. github.com/agentjido/jido/CHANGELOG.md
[citation] 2. agentjido.xyz/blog
[final] status=completed grounded=true tool_denials=0
The Core Question You Are Answering
“How do I let an agent search the web without letting it do dangerous things?”
Concepts You Must Understand First
- Tool allowlists and deny-by-default
- Book Reference: “Foundations of Information Security” - access control basics
- Signal-based policy feedback
- Book Reference: “Clean Architecture” - policy boundaries
- Grounded response formatting
- Book Reference: “The Pragmatic Programmer” - traceability mindset
Questions to Guide Your Design
- What is the canonical policy object shape?
- How do you represent “tool denied” in agent state and user output?
- What minimum citation fields are required (
title,url,retrieved_at)?
Thinking Exercise
Draw two branches for the same prompt: tool approved and tool denied. Compare terminal statuses.
The Interview Questions They Will Ask
- “What does fail-closed look like for tool use?”
- “How do you detect citation spoofing?”
- “How do you distinguish model failure from policy denial?”
- “What metrics indicate policy is too strict?”
- “How would you add per-tenant policy overrides safely?”
Hints in Layers
Hint 1: Policy first, prompts second
Hint 2: Emit explicit tool.denied signals
Hint 3: Normalize citations before final answer
Hint 4: Add replay tests for denied paths
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Access control thinking | “Foundations of Information Security” | Policy and controls chapters |
| Boundary contracts | “Clean Architecture” | Boundaries |
| Reliable workflows | “The Pragmatic Programmer” | Tracer bullets |
Common Pitfalls and Debugging
Problem 1: “Agent returns uncited claims”
- Why: Final answer generated without required citation schema check.
- Fix: Validate answer structure before completion.
- Quick test: Fail build if citation array is empty.
Problem 2: “Deny rules never trigger”
- Why: Tool name mismatch (
web-searchvsweb_search). - Fix: Canonicalize tool names before policy lookup.
- Quick test: Unit test alias map for tool names.
Definition of Done
- Tool allowlist is enforced with deny-by-default semantics
- Denied tools produce explicit user-visible policy output
- Final response includes normalized citations
- Replay test covers tool-approved and tool-denied paths
Project 3: Multi-Provider Failover Gateway
- File:
P03-multi-provider-failover-gateway.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 4. The Open Core Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Provider abstraction, fallback, reliability budgets
- Software or Tool:
req_llm,llm_db,jido_aiconfig aliases - Main Book: “Designing Data-Intensive Applications”
What you will build: A gateway that routes requests by model alias and fails over across providers when latency, error, or budget thresholds are hit.
Why it teaches Jido deeply: It operationalizes model selection as stateful policy, not ad-hoc if/else logic.
Core challenges you will face:
- Fallback ordering and circuit state -> maps to deterministic policy state
- Provider-specific option translation -> maps to ReqLLM provider adapters
- Cost-aware model downgrades -> maps to usage telemetry feedback loop
Real World Outcome
$ mix run scripts/p03_failover_gateway_demo.exs
[gateway] alias=:fast primary=openai:gpt-4o-mini fallback=anthropic:claude-haiku-4-5
[request] id=req_42 timeout_ms=3000
[provider] openai status=429 retry_after=2
[fallback] switching_to=anthropic reason=rate_limit
[provider] anthropic status=200 latency_ms=1187
[usage] input_tokens=312 output_tokens=144 total_cost_usd=0.0017
[result] status=ok provider=anthropic degraded=true
The Core Question You Are Answering
“How do I keep agent behavior stable when model providers are unstable?”
Concepts You Must Understand First
- Failure classification (rate limit, timeout, malformed)
- Book Reference: “Designing Data-Intensive Applications” - reliability chapters
- Idempotent retries and request correlation
- Book Reference: “The Linux Programming Interface” - robust I/O patterns
- Model alias resolution
- Book Reference: “Clean Architecture” - configuration boundaries
Questions to Guide Your Design
- Which failures trigger immediate fallback vs retry?
- How do you avoid retry storms across all providers?
- What telemetry drives automatic downgrade to cheaper models?
Thinking Exercise
Design a fallback matrix: failure type x current provider -> next provider + backoff.
The Interview Questions They Will Ask
- “How do you avoid double-billing when retries happen?”
- “What is your fallback policy when every provider is degraded?”
- “How do you validate provider parity for JSON outputs?”
- “Where do you store circuit state and why?”
- “How do you test failover deterministically?”
Hints in Layers
Hint 1: Start with two providers and one alias
Hint 2: Persist request and attempt IDs
Hint 3: Separate transport errors from model errors
Hint 4: Add synthetic chaos tests for 429/5xx/timeout
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Reliability patterns | “Designing Data-Intensive Applications” | Fault tolerance |
| Timeout/backoff design | “The Linux Programming Interface” | Robust system call patterns |
| Boundary design | “Clean Architecture” | Policy vs detail |
Common Pitfalls and Debugging
Problem 1: “Fallback loop never exits”
- Why: No max-attempt guard.
- Fix: Enforce bounded attempts per request.
- Quick test: Simulate all providers failing; assert terminal fallback failure.
Problem 2: “Costs spike after failover”
- Why: Fallback model is more expensive than primary.
- Fix: Add policy layer that checks projected token cost before route.
- Quick test: Run load with cost cap and assert downgrade events.
Definition of Done
- Gateway handles provider 429/5xx/timeout with deterministic fallback
- Attempt IDs and routing decisions are logged and queryable
- Cost telemetry is captured per provider attempt
- Chaos test suite validates fallback matrix
Project 4: Streaming Observability Console
- File:
P04-streaming-observability-console.md - Main Programming Language: Elixir
- Alternative Programming Languages: TypeScript (front-end overlays), Erlang
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Telemetry, trace correlation, LiveView dashboards
- Software or Tool:
jido_live_dashboard,jido_studio,:telemetry - Main Book: “Clean Architecture”
What you will build: A LiveView console that streams agent state transitions, directive timings, and correlated traces for one end-to-end request.
Why it teaches Jido deeply: You make hidden runtime behavior explicit by wiring Jido telemetry into an operator-facing control plane.
Core challenges you will face:
- Correlating spans across signals/directives -> maps to trace_id discipline
- Rendering high-rate event streams -> maps to bounded buffers and backpressure
- Separating debug vs production verbosity -> maps to observability policy
Real World Outcome
You can open /dashboard and /studio, trigger an agent run, and watch synchronized signal and directive timelines in near real-time.
UI behavior:
- Runtime page lists active AgentServer PIDs and statuses.
- Traces page groups events by
trace_idand exposes span durations. - Clicking a trace shows signal type, directive type, result, and latency.
$ mix phx.server
[info] mounted JidoLiveDashboard pages at /dashboard
[info] mounted JidoStudio at /studio
[telemetry] [:jido,:agent_server,:signal,:start] trace_id=tr_88
[telemetry] [:jido,:agent_server,:directive,:stop] directive_type=ToolExec duration_ms=92
The Core Question You Are Answering
“Can I explain exactly why an agent made a decision during an incident review?”
Concepts You Must Understand First
- Telemetry event shape and handler cost
- Book Reference: “Clean Architecture” - observability and boundaries
- LiveView event-stream rendering
- Book Reference: “The Pragmatic Programmer” - feedback loops
- Trace correlation IDs
- Book Reference: “Designing Data-Intensive Applications” - distributed traces
Questions to Guide Your Design
- Which events are mandatory for incident triage?
- How do you avoid UI lockups under event bursts?
- What retention window is enough for debugging without memory blowup?
Thinking Exercise
Design an incident timeline table with columns: timestamp, signal, directive, result, duration, cost.
The Interview Questions They Will Ask
- “What is the minimum telemetry set for production readiness?”
- “How do you sample traces without losing critical incidents?”
- “How do you correlate child-agent events with parent requests?”
- “How do you protect PII in logs?”
- “How does this dashboard change your on-call MTTR?”
Hints in Layers
Hint 1: Attach to a small subset of events first
Hint 2: Keep trace buffer bounded
Hint 3: Normalize metadata keys across events
Hint 4: Add one-click trace export for postmortems
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Observability design | “Clean Architecture” | System quality attributes |
| Feedback-driven engineering | “The Pragmatic Programmer” | Tracer bullets |
| Incident analysis | “Designing Data-Intensive Applications” | Monitoring and operations |
Common Pitfalls and Debugging
Problem 1: “Trace view misses events”
- Why: Inconsistent
trace_idpropagation. - Fix: Inject trace metadata when signal enters runtime.
- Quick test: End-to-end trace must contain both signal and directive spans.
Problem 2: “Dashboard slows app”
- Why: Heavy synchronous handlers.
- Fix: Forward telemetry to async workers.
- Quick test: Load test with and without dashboard; compare latency delta.
Definition of Done
- Runtime and trace dashboards show live AgentServer activity
- Event correlation works from request signal to final directive result
- Buffer limits prevent unbounded memory growth
- One incident replay can be reconstructed from captured traces
Project 5: Structured Output Contracts
- File:
P05-structured-output-contracts.md - Main Programming Language: Elixir
- Alternative Programming Languages: Python, TypeScript
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 2. The Micro-SaaS / Pro Tool
- Difficulty: Level 2: Intermediate
- Knowledge Area: Schema-first AI responses, contract tests
- Software or Tool:
ReqLLM.generate_object/4, NimbleOptions, Zoi - Main Book: “Clean Architecture”
What you will build: A structured-output gateway that validates every model object against explicit schemas and rejects malformed payloads.
Why it teaches Jido deeply: It transforms model output into typed contracts that strategies can trust.
Core challenges you will face:
- Schema drift across providers -> maps to provider compatibility tests
- Strict vs relaxed mode behavior -> maps to policy decisions
- Error surfacing for retries -> maps to actionable failures
Real World Outcome
$ mix run scripts/p05_structured_output_demo.exs
[schema] ticket={priority:enum,severity:enum,summary:string,actions:list}
[request] model=openai:gpt-4o-mini mode=strict
[result] validation=ok object={"priority":"high","severity":"s2","summary":"db timeout","actions":["restart pool"]}
[request] model=anthropic:claude-haiku-4-5 mode=strict
[result] validation=error field=severity reason="not in enum"
[policy] retry_with_repair_prompt=true attempt=2
[result] validation=ok
The Core Question You Are Answering
“How do I treat LLM output like API data instead of free-form text?”
Concepts You Must Understand First
- Schema compilation and validation
- Book Reference: “Clean Architecture” - data contracts
- Provider-specific structured output modes
- Book Reference: “The Pragmatic Programmer” - adaptability
- Retry with targeted repair prompts
- Book Reference: “Designing Data-Intensive Applications” - robust pipelines
Questions to Guide Your Design
- Which fields are hard-required vs optional defaults?
- How do you present validation errors to strategy state?
- What repair strategy is deterministic and bounded?
Thinking Exercise
Create three malformed payload examples and map each to a repair action.
The Interview Questions They Will Ask
- “Why not parse JSON manually?”
- “How do you prevent silent schema downgrades?”
- “What’s your strict-mode fallback?”
- “How do you test cross-provider parity?”
- “How do you avoid infinite repair loops?”
Hints in Layers
Hint 1: Start with one small schema
Hint 2: Capture field-level validation errors
Hint 3: Build repair prompts from validation failures
Hint 4: Add provider matrix tests for the same schema
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Contracts and boundaries | “Clean Architecture” | Entities and DTOs |
| Robust parsing | “The Pragmatic Programmer” | Design by contracts |
| Data pipeline reliability | “Designing Data-Intensive Applications” | Data quality |
Common Pitfalls and Debugging
Problem 1: “Valid JSON but invalid business object”
- Why: JSON parse success mistaken for schema success.
- Fix: Separate parse and validation stages.
- Quick test: Invalid enum must fail despite valid JSON syntax.
Problem 2: “Repair prompt makes object worse”
- Why: Entire object rewritten each retry.
- Fix: Ask model to patch only failed fields.
- Quick test: Preserve unchanged fields across retries.
Definition of Done
- Structured outputs are validated against explicit schemas
- Invalid payloads generate field-level errors and bounded retries
- Same schema works across at least two providers
- Contract tests prevent silent drift
Project 6: Cost-Aware Model Router
- File:
P06-cost-aware-model-router.md - Main Programming Language: Elixir
- Alternative Programming Languages: Go, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: AI FinOps, policy routing, usage telemetry
- Software or Tool:
req_llmusage metadata,llm_db,:telemetry - Main Book: “Designing Data-Intensive Applications”
What you will build: A router that automatically chooses model/provider by budget, latency target, and required capabilities.
Why it teaches Jido deeply: It turns cost from an afterthought into a deterministic control variable in strategy execution.
Core challenges you will face:
- Capability constraints vs budget constraints -> maps to policy precedence
- Rolling cost windows -> maps to stateful telemetry aggregation
- Downgrade safety -> maps to quality guardrails
Real World Outcome
$ mix run scripts/p06_cost_router_demo.exs
[policy] minute_budget_usd=0.05 require={tools:true,json:true}
[route] req=1 model=openai:gpt-4o-mini projected_cost=0.0031
[usage] req=1 total_cost=0.0034 rolling_minute=0.0034
[route] req=7 model=anthropic:claude-haiku-4-5 reason=budget_pressure
[usage] req=7 total_cost=0.0012 rolling_minute=0.0468
[route] req=8 model=anthropic:claude-haiku-4-5 reason=capabilities_ok_budget_guard
[alert] budget_near_limit=true
The Core Question You Are Answering
“How do I keep quality acceptable while preventing runaway model spend?”
Concepts You Must Understand First
- Usage and cost fields from
ReqLLM.Response- Book Reference: “Designing Data-Intensive Applications” - metrics and feedback
- Capability-based selection (
tools,json,streaming)- Book Reference: “Clean Architecture” - policy decisions
- Rolling-window aggregation
- Book Reference: “Algorithms, Fourth Edition” - sliding windows
Questions to Guide Your Design
- Which constraints are hard stops vs soft preferences?
- How do you model budget by tenant/team/request class?
- What quality fallback happens when only cheap models remain?
Thinking Exercise
Define a policy table for three request classes: critical, normal, batch.
The Interview Questions They Will Ask
- “How do you prevent policy oscillation between models?”
- “How do you price unknown models?”
- “What is your outage strategy if cheap models fail?”
- “How do you audit routing fairness across tenants?”
- “How do you test budget logic deterministically?”
Hints in Layers
Hint 1: Start with static cost metadata
Hint 2: Add rolling-minute state next
Hint 3: Keep routing reason in every response
Hint 4: Add canary quality checks before aggressive downgrades
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Feedback control in systems | “Designing Data-Intensive Applications” | Monitoring and adaptation |
| Policy layering | “Clean Architecture” | Use-case policy |
| Sliding window math | “Algorithms, Fourth Edition” | Data structures for streams |
Common Pitfalls and Debugging
Problem 1: “Budget exceeded despite guard”
- Why: Guard checks pre-request only; post-request costs ignored.
- Fix: Update rolling window on completion and re-evaluate next route.
- Quick test: Simulate 100 requests; budget breach must trigger downgrade.
Problem 2: “Low-cost routing breaks output quality”
- Why: Capability checks too coarse.
- Fix: Add per-task quality floors and fallback to capable model when needed.
- Quick test: Regression set with expected JSON correctness.
Definition of Done
- Router chooses model/provider by capability + budget policy
- Rolling spend windows are tracked and exposed in metrics
- Routing decisions include explicit reason codes
- Quality regression suite prevents unsafe downgrades
Project 7: Skill and Plugin Composition Lab
- File:
P07-skill-plugin-composition-lab.md - Main Programming Language: Elixir
- Alternative Programming Languages: TypeScript, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The Micro-SaaS / Pro Tool
- Difficulty: Level 3: Advanced
- Knowledge Area: Capability composition, state isolation, prompt skills
- Software or Tool:
Jido.Plugin,Jido.AI.Skill, skill registry/loader - Main Book: “Domain-Driven Design”
What you will build: An agent composed from multiple plugins (chat, memory, tools) plus loaded SKILL.md capabilities with tool allowlists.
Why it teaches Jido deeply: You learn the difference between runtime capabilities (plugins/actions) and prompt capabilities (skills).
Core challenges you will face:
- Plugin state-key isolation -> maps to modular correctness
- Skill prompt rendering + tool filtering -> maps to controlled tool exposure
- Mount order and lifecycle hooks -> maps to compositional behavior
Real World Outcome
$ mix run scripts/p07_skill_plugin_lab.exs
[plugin] mounted=chat state_key=:chat
[plugin] mounted=memory state_key=:memory
[skill] loaded=incident-analyst allowed_tools=[search_logs,summarize]
[prompt] rendered_skills=1 filtered_tools=2/6
[query] "summarize latest incident and propose next step"
[result] status=ok used_tools=[search_logs,summarize] blocked_tools=[]
The Core Question You Are Answering
“How do I compose many capabilities without creating a tangled agent monolith?”
Concepts You Must Understand First
- Plugin lifecycle (
mount,handle_signal,transform_result)- Book Reference: “Domain-Driven Design” - bounded contexts
- Skill manifests and allowlists
- Book Reference: “Clean Architecture” - policy enforcement
- State isolation by
state_key- Book Reference: “Design Patterns” - modular composition
Questions to Guide Your Design
- Which capabilities belong in plugins vs skills?
- How do you detect conflicting signal routes across plugins?
- How do you test that a skill cannot escalate tool permissions?
Thinking Exercise
Model a conflict case where two plugins route the same signal type.
The Interview Questions They Will Ask
- “When do you choose a plugin over a skill?”
- “How do you avoid plugin state collisions?”
- “How do skill allowlists interact with global policy?”
- “How do you version capabilities safely?”
- “How do you test composition order effects?”
Hints in Layers
Hint 1: Start with two plugins and one skill
Hint 2: Log merged route table at startup
Hint 3: Enforce allowed_tools intersection with global policy
Hint 4: Add composition snapshot tests
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Bounded contexts | “Domain-Driven Design” | Context mapping |
| Composition patterns | “Design Patterns” | Strategy/Decorator |
| Policy control | “Clean Architecture” | Use case boundaries |
Common Pitfalls and Debugging
Problem 1: “Plugin overrides another plugin unexpectedly”
- Why: Route precedence not documented.
- Fix: Emit deterministic route order at boot and assert in tests.
- Quick test: Snapshot route table.
Problem 2: “Skill loads but tools stay unavailable”
- Why: Skill allowlist names don’t match registry tool names.
- Fix: Add canonical tool name adapter.
- Quick test: Validate every allowlisted tool exists in registry.
Definition of Done
- Plugins mount with isolated state keys and no collisions
- Skills load from
SKILL.mdand filter tools correctly - Route precedence is explicit and tested
- Capability composition passes regression tests
Project 8: Strategy State Machine Switchboard
- File:
P08-strategy-state-machine-switchboard.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 4: Expert
- Knowledge Area: Strategy orchestration, finite states, adaptive routing
- Software or Tool:
Jido.AI.Strategies.*,Fsmx, strategy snapshots - Main Book: “Operating System Concepts”
What you will build: A switchboard agent that routes tasks across ReAct, CoT, ToT, GoT, and Adaptive based on query traits and runtime constraints.
Why it teaches Jido deeply: You treat reasoning strategies as explicit state machines with deterministic transitions and bounded resources.
Core challenges you will face:
- State machine compatibility across strategies -> maps to normalized snapshots
- Switch costs and mode transitions -> maps to orchestration policy
- Preventing strategy thrash -> maps to hysteresis logic
Real World Outcome
$ mix run scripts/p08_strategy_switchboard.exs
[input] query="compare 3 migration plans with risks"
[classifier] tags=[multi_path,tradeoff]
[switch] selected=tree_of_thoughts reason=requires_branching
[state] status=awaiting_llm iteration=1
[result] candidates=3 best_score=0.81
[switch] selected=graph_of_thoughts reason=synthesis_phase
[final] status=completed strategy_path=[tot,got] cost_usd=0.0062
The Core Question You Are Answering
“How do I choose reasoning strategy as a control problem instead of guesswork?”
Concepts You Must Understand First
- Machine states and legal transitions
- Book Reference: “Operating System Concepts” - state models
- Strategy-specific cost profiles
- Book Reference: “Designing Data-Intensive Applications” - resource tradeoffs
- Snapshot-driven orchestration
- Book Reference: “Clean Architecture” - stable contracts
Questions to Guide Your Design
- Which query features trigger strategy changes?
- How do you prevent infinite switching between two strategies?
- What status values are strategy-agnostic and mandatory?
Thinking Exercise
Build a transition matrix for 5 strategies and mark forbidden transitions.
The Interview Questions They Will Ask
- “Why use FSMs for strategy orchestration?”
- “How do you detect and stop oscillation?”
- “How do you compare outputs from different strategy types?”
- “How do you budget token cost across stages?”
- “What if one strategy crashes mid-run?”
Hints in Layers
Hint 1: Start with ReAct vs CoT only
Hint 2: Add one normalized snapshot shape
Hint 3: Track last N strategy choices for hysteresis
Hint 4: Add switch reason and confidence to logs
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| State-machine reasoning | “Operating System Concepts” | Process state models |
| Tradeoff analysis | “Designing Data-Intensive Applications” | Performance and cost |
| Stable interfaces | “Clean Architecture” | Interface contracts |
Common Pitfalls and Debugging
Problem 1: “Switchboard keeps restarting strategies”
- Why: No persisted orchestration state.
- Fix: Store orchestrator snapshot in agent state.
- Quick test: Resume mid-run and assert same strategy path.
Problem 2: “Output format differs by strategy”
- Why: No canonical result schema.
- Fix: Normalize outputs before aggregation.
- Quick test: All strategy outputs pass one schema validator.
Definition of Done
- Switchboard selects strategy by explicit policy
- State transitions are legal and tested
- Strategy switches are logged with reasons and confidence
- Oscillation protection and budget controls are active
Project 9: Worker Pool Load Lab
- File:
P09-worker-pool-load-lab.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Concurrency bounds, queue pressure, latency SLOs
- Software or Tool:
Jido.Agent.WorkerPool, Telemetry metrics - Main Book: “Operating Systems: Three Easy Pieces”
What you will build: A benchmark harness for WorkerPool.call/4 and with_agent/4 under varying pool sizes and overflow settings.
Why it teaches Jido deeply: It exposes the operational tradeoff between cold starts and stateful pooled workers.
Core challenges you will face:
- Pool sizing and overflow strategy -> maps to throughput/latency tuning
- State leakage across checkouts -> maps to reset discipline
- Timeout layering -> maps to checkout vs call timeout separation
Real World Outcome
$ mix run scripts/p09_worker_pool_bench.exs
[config] pool=:search size=8 max_overflow=4 strategy=lifo
[load] rps=120 duration=60s
[status] available=0 checked_out=8 overflow=3
[metric] p50=31ms p95=89ms p99=144ms timeout_rate=0.7%
[warning] overflow_active=true recommendation="increase size to 10 or lower call_timeout"
The Core Question You Are Answering
“How do I bound concurrency without sacrificing latency under bursts?”
Concepts You Must Understand First
- Worker pool semantics and checkout lifecycle
- Book Reference: “Operating Systems: Three Easy Pieces” - scheduling
- Stateful worker reuse risks
- Book Reference: “The Linux Programming Interface” - process/resource lifecycle
- Tail-latency measurement
- Book Reference: “Algorithms, Fourth Edition” - percentile/statistics basics
Questions to Guide Your Design
- What is your target p95 latency and why?
- Which state fields must reset between requests?
- How should overload be signaled to callers?
Thinking Exercise
Calculate initial pool size from expected RPS and mean service time, then validate experimentally.
The Interview Questions They Will Ask
- “Why choose
:lifovs:fifo?” - “How do you detect leaked checkouts?”
- “How do you model burst capacity?”
- “What does healthy overflow usage look like?”
- “How would you autoscale pool size safely?”
Hints in Layers
Hint 1: Benchmark one pool profile at a time
Hint 2: Record pool status every 5 seconds
Hint 3: Add reset action before each call
Hint 4: Separate timeout errors by phase (checkout vs processing)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Scheduling and queues | “Operating Systems: Three Easy Pieces” | Scheduling chapters |
| Resource lifecycle | “The Linux Programming Interface” | Process/resource management |
| Performance analysis | “Algorithms, Fourth Edition” | Statistical analysis basics |
Common Pitfalls and Debugging
Problem 1: “Unexpected state from previous request”
- Why: Reused pooled worker state not reset.
- Fix: Add deterministic reset signal in
with_agenttransaction. - Quick test: Repeated calls must produce independent outcomes.
Problem 2: “Timeouts despite low CPU”
- Why: Checkout timeout too short for burst queue.
- Fix: Tune checkout timeout or increase pool size.
- Quick test: Compare timeout rate across timeout values.
Definition of Done
- Benchmark report includes p50/p95/p99 and timeout breakdown
- Pool status metrics are captured and graphed
- State reset strategy prevents cross-request contamination
- Capacity recommendation is justified by measured data
Project 10: Persistent Thread Memory
- File:
P10-persistent-thread-memory.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Checkpointing, journal pointers, resume safety
- Software or Tool:
Jido.Storage.File,Jido.Agent.Persistence.hibernate/4,Jido.Agent.Persistence.thaw/3 - Main Book: “The Linux Programming Interface”
What you will build: A memory-capable agent that survives restarts using thread journal + checkpoint pointer invariants.
Why it teaches Jido deeply: It forces you to reason about durability correctness (thread_rev pointer) instead of just serialization convenience.
Core challenges you will face:
- Checkpoint/thread consistency -> maps to thaw safety checks
- Custom checkpoint/restore callbacks -> maps to backward compatibility
- Manual vs auto lifecycle (InstanceManager) -> maps to runtime control
Real World Outcome
$ mix run scripts/p10_persistence_demo.exs
[state] session=user-123 messages=5 thread_rev=42
[persist] hibernate=true adapter=Jido.Storage.File path=priv/jido/storage
[simulate] process_restart=true
[restore] thaw=true session=user-123 thread_rev=42
[check] last_message="deploy approved" pointer_match=true
The Core Question You Are Answering
“How do I persist conversational agent state without corrupting event history?”
Concepts You Must Understand First
- Checkpoint pointer invariant (
thread_id,thread_rev)- Book Reference: “The Linux Programming Interface” - file/data integrity
- Journal-first then snapshot write ordering
- Book Reference: “Designing Data-Intensive Applications” - log + snapshot pattern
- Restore-time mismatch handling
- Book Reference: “Clean Architecture” - explicit error boundaries
Questions to Guide Your Design
- Which state is durable vs ephemeral?
- How do you migrate checkpoint schema versions?
- What do you do on
:thread_mismatch?
Thinking Exercise
Draw recovery flow for three cases: happy path, missing thread, revision mismatch.
The Interview Questions They Will Ask
- “Why not embed full thread in checkpoint?”
- “How do you guarantee replay consistency?”
- “What migration strategy do you use for checkpoint versions?”
- “How do you test crash recovery deterministically?”
- “When do you use auto hibernation?”
Hints in Layers
Hint 1: Keep checkpoint schema versioned
Hint 2: Never persist transient cache fields
Hint 3: Verify pointer revision during thaw
Hint 4: Build one crash-restart integration test
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Durable state handling | “The Linux Programming Interface” | File and process durability |
| Log/snapshot systems | “Designing Data-Intensive Applications” | Storage and recovery |
| Migration design | “Clean Architecture” | Evolution of interfaces |
Common Pitfalls and Debugging
Problem 1: “Agent restores but history is missing”
- Why: Thread pointer not persisted or thread not flushed.
- Fix: Flush journal before checkpoint write.
- Quick test: Assert non-empty thread after thaw.
Problem 2: “Restore crashes after schema changes”
- Why: Unversioned checkpoint payload.
- Fix: Add
versionfield and migration path. - Quick test: Restore old fixture checkpoint in CI.
Definition of Done
- Hibernate/thaw works across process restart
- Thread pointer and revision checks pass
- Checkpoint schema is versioned and migration-tested
- Crash-recovery test proves durable behavior
Project 11: Sensor-Driven Incident Triage
- File:
P11-sensor-driven-incident-triage.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Sensor runtime, event ingestion, triage workflows
- Software or Tool:
Jido.Sensor.Runtime,JidoCode.GitHub.Sensors.WebhookSensor - Main Book: “The Pragmatic Programmer”
What you will build: A webhook-to-signal ingestion pipeline where a sensor emits github.issue.* signals that trigger triage agents.
Why it teaches Jido deeply: You bridge external systems into Jido’s deterministic loop with clear transformation contracts.
Core challenges you will face:
- Polling and idempotent delivery marking -> maps to robust sensor design
- Signal type normalization -> maps to routing reliability
- Backpressure on burst deliveries -> maps to batching strategy
Real World Outcome
$ mix run scripts/p11_sensor_triage_demo.exs
[sensor] started name=github_webhook poll_interval=5000 batch_size=10
[poll] pending_deliveries=3
[emit] type=github.issue.opened repo=acme/api delivery_id=del_01
[route] coordinator=issue_run_coordinator signal=issue.start
[triage] classification=bug severity=s2
[ack] delivery_id=del_01 status=processed
The Core Question You Are Answering
“How do I ingest real-world events into agents without losing or duplicating work?”
Concepts You Must Understand First
- Sensor callbacks and directives (
:schedule,:emit)- Book Reference: “The Pragmatic Programmer” - automation reliability
- Idempotent processing markers
- Book Reference: “Designing Data-Intensive Applications” - exactly/at-most-once tradeoffs
- Batch poll patterns
- Book Reference: “Algorithms, Fourth Edition” - batching and queues
Questions to Guide Your Design
- What delivery state model do you need (
pending,processed,failed)? - How do you route event type/action into signal names?
- How do you handle partial batch failures?
Thinking Exercise
Model a burst of 200 webhook events and design batching + retry strategy.
The Interview Questions They Will Ask
- “Why use a polling sensor instead of direct webhook handlers?”
- “How do you avoid reprocessing after crash?”
- “How do you map external event schemas safely?”
- “What happens if marking processed fails?”
- “How would you scale this for many repos?”
Hints in Layers
Hint 1: Build deterministic build_signal_type/2 helper
Hint 2: Mark processed only after successful emit path
Hint 3: Capture per-batch metrics
Hint 4: Add dead-letter queue for repeated failures
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Reliable automation | “The Pragmatic Programmer” | Pragmatic automation |
| Event delivery guarantees | “Designing Data-Intensive Applications” | Messaging semantics |
| Queue/batch behavior | “Algorithms, Fourth Edition” | Queue structures |
Common Pitfalls and Debugging
Problem 1: “Same delivery processed twice”
- Why: Non-atomic mark-processed flow.
- Fix: Guard by delivery status and idempotency key.
- Quick test: Re-run same batch; no duplicate downstream artifacts.
Problem 2: “Sensor floods agent with bursts”
- Why: Batch size too high and no pacing.
- Fix: Tune batch size and poll interval; add per-batch limit.
- Quick test: Measure queue depth under synthetic burst.
Definition of Done
- Sensor emits correct
github.*signal types from delivery records - Delivery marking is idempotent and crash-safe
- Burst handling maintains bounded queue growth
- End-to-end triage run is triggered from emitted signal
Project 12: Cron Autonomous Maintenance
- File:
P12-cron-autonomous-maintenance.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Scheduling, idempotency, recurring operations
- Software or Tool:
Directive.cron,Directive.schedule,Directive.cron_cancel - Main Book: “The Linux Programming Interface”
What you will build: A maintenance agent that runs recurring health jobs, emits reports, and avoids duplicate work during restarts.
Why it teaches Jido deeply: You learn Jido’s timer semantics (in-memory, at-most-once) and design safe idempotent tasks.
Core challenges you will face:
- Missed-run handling -> maps to explicit last-run state
- Cron upsert semantics -> maps to job id lifecycle
- Safe cancellation -> maps to operational controls
Real World Outcome
$ mix run scripts/p12_cron_maintenance_demo.exs
[setup] cron job_id=:nightly_health expr="0 2 * * *" timezone=Etc/UTC
[tick] signal=maintenance.run run_id=run_2026_02_12
[task] checks={queue_depth,dead_letters,cost_spend} status=ok
[state] last_run_at=2026-02-12T02:00:01Z report_count=14
[control] cron_cancel job_id=:nightly_health result=ok
The Core Question You Are Answering
“How do I run autonomous recurring jobs safely when timers are non-persistent?”
Concepts You Must Understand First
- Schedule/Cron at-most-once semantics
- Book Reference: “The Linux Programming Interface” - timer behavior
- Idempotency keys for recurring jobs
- Book Reference: “Designing Data-Intensive Applications” - idempotent processing
- Timezone and job identity
- Book Reference: “The Pragmatic Programmer” - operational correctness
Questions to Guide Your Design
- Which jobs are safe to skip vs must replay externally?
- How do you generate deterministic run IDs?
- How do you disable a job in emergencies?
Thinking Exercise
Simulate crash at 01:59:59 for a 02:00 cron job and design compensating logic.
The Interview Questions They Will Ask
- “What guarantees does Jido Cron provide and not provide?”
- “How do you avoid duplicate reports?”
- “How do you handle timezone drift across environments?”
- “When would you switch to external scheduler (Oban/Quantum)?”
- “How do you test cron behavior deterministically?”
Hints in Layers
Hint 1: Use explicit job_id always
Hint 2: Persist last_run_at and dedupe key
Hint 3: Separate scheduler from business action
Hint 4: Add manual trigger signal for debugging
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Timer behavior | “The Linux Programming Interface” | Time and timers |
| Idempotent jobs | “Designing Data-Intensive Applications” | Reliable batch processing |
| Practical operations | “The Pragmatic Programmer” | Automation discipline |
Common Pitfalls and Debugging
Problem 1: “Cron runs twice after config reload”
- Why: Duplicate job IDs or duplicate registration path.
- Fix: Enforce single registration and rely on upsert semantics.
- Quick test: Reload config repeatedly; only one run per schedule.
Problem 2: “Expected run missing after restart”
- Why: In-memory timers do not catch up.
- Fix: Add startup reconciliation check based on
last_run_at. - Quick test: Restart before tick and verify compensating run policy.
Definition of Done
- Recurring jobs run with explicit job IDs and timezone configuration
- Idempotency prevents duplicate side effects
- Missed-run policy is documented and tested
- Jobs can be cancelled and resumed operationally
Project 13: Distributed Netsplit Recovery Drill
- File:
P13-distributed-netsplit-recovery.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The Open Core Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Distributed BEAM, partition handling, reconciliation
- Software or Tool: Distributed Erlang,
jido_signalreplay/snapshots - Main Book: “Erlang and OTP in Action”
What you will build: A two-node Jido deployment that detects netsplits, buffers/replays missed signals, and reconciles child-agent state after healing.
Why it teaches Jido deeply: You practice failure-first operations where message ordering and eventual consistency matter more than happy-path throughput.
Core challenges you will face:
- Detecting
nodedownand degraded mode entry -> maps to cluster health policy - Signal backlog replay and dedupe -> maps to idempotency and causal ordering
- Parent/child state reconciliation -> maps to explicit merge strategies
Real World Outcome
$ iex --sname node_a -S mix
$ iex --sname node_b -S mix
[node_a] connected=node_b@127.0.0.1 status=healthy
[inject] simulate_netsplit=true
[node_a] event=nodedown node=node_b@127.0.0.1 mode=degraded
[node_a] buffered_signals=17
[heal] nodeup=node_b@127.0.0.1
[replay] replayed=17 deduped=3 failed=0
[reconcile] children_synced=true divergence=0
The Core Question You Are Answering
“How do I keep autonomous workflows safe when the cluster partitions?”
Concepts You Must Understand First
- Netsplit failure modes
- Book Reference: “Erlang and OTP in Action” - distributed nodes
- Replayable event logs and dedupe keys
- Book Reference: “Designing Data-Intensive Applications” - log-based recovery
- Conflict resolution strategies
- Book Reference: “Domain-Driven Design” - aggregate consistency
Questions to Guide Your Design
- Which actions are safe during degraded mode?
- How do you order replay after healing?
- What conflict resolution rule wins on diverged state?
Thinking Exercise
Write a recovery runbook with 5 steps from nodedown to steady state.
The Interview Questions They Will Ask
- “How do you distinguish slow node from partition?”
- “How do you guarantee replay idempotency?”
- “What data can diverge and why?”
- “How do you test netsplits in CI?”
- “When do you abort automation and require human approval?”
Hints in Layers
Hint 1: Start with read-only degraded mode
Hint 2: Buffer signals with monotonic sequence IDs
Hint 3: Reconcile before accepting new writes
Hint 4: Add post-heal consistency check command
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Distributed BEAM patterns | “Erlang and OTP in Action” | Distribution and supervision |
| Replay and recovery | “Designing Data-Intensive Applications” | Event logs |
| Consistency modeling | “Domain-Driven Design” | Aggregates |
Common Pitfalls and Debugging
Problem 1: “Replay causes duplicate effects”
- Why: Missing idempotency keys for emitted directives.
- Fix: Attach stable dedupe keys to side-effect signals.
- Quick test: Re-run replay twice; no new side effects second time.
Problem 2: “Cluster heals but state still diverged”
- Why: No deterministic merge rule.
- Fix: Define conflict policy (timestamp, version vector, authority node).
- Quick test: Inject divergent writes and verify deterministic winner.
Definition of Done
- Netsplit detection transitions system into safe degraded mode
- Buffered signals replay successfully after heal with dedupe
- State reconciliation policy is deterministic and tested
- Recovery runbook is executable by on-call engineers
Project 14: ETS and Mnesia Hybrid Agent Memory
- File:
P14-ets-mnesia-hybrid-agent-memory.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The Open Core Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Hot-cache + durable memory layering
- Software or Tool: ETS tables, Mnesia journal, Jido checkpoint pointers
- Main Book: “Operating Systems: Three Easy Pieces”
What you will build: A hybrid memory subsystem with ETS for low-latency working set and Mnesia for durable event/journal state.
Why it teaches Jido deeply: It demonstrates memory-tier tradeoffs in long-lived agents and explicit promotion/eviction policy.
Core challenges you will face:
- Cache coherence across tiers -> maps to invalidation rules
- Write path durability guarantees -> maps to journal-first policy
- Recovery speed vs correctness -> maps to snapshot cadence
Real World Outcome
$ mix run scripts/p14_hybrid_memory_demo.exs
[mem] ets_hits=1842 ets_misses=211 hit_rate=89.7%
[mem] mnesia_writes=211 checkpoint_interval=500 events
[evict] policy=lru evicted=120
[restart] recover_from=mnesia_journal restored_entries=211
[check] consistency=ok cache_warmup_ms=340
The Core Question You Are Answering
“How do I get fast memory access without sacrificing restart safety?”
Concepts You Must Understand First
- ETS strengths and limits
- Book Reference: “Operating Systems: Three Easy Pieces” - in-memory data access
- Mnesia durability semantics
- Book Reference: “Erlang and OTP in Action” - distributed storage basics
- Snapshot and replay tradeoffs
- Book Reference: “Designing Data-Intensive Applications” - storage architecture
Questions to Guide Your Design
- Which keys belong in hot cache only vs durable store?
- When do you flush and checkpoint?
- How do you verify consistency after restart?
Thinking Exercise
Model a failure during write path and decide what can be lost.
The Interview Questions They Will Ask
- “Why hybrid instead of one store?”
- “How do you avoid stale cache reads?”
- “What is your crash-consistency model?”
- “How do you tune checkpoint frequency?”
- “How do you test tier coherence?”
Hints in Layers
Hint 1: Implement read-through cache first
Hint 2: Use write-ahead journal before cache mutation
Hint 3: Add periodic consistency sweeps
Hint 4: Measure warmup time and hit-rate separately
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Memory hierarchy thinking | “Operating Systems: Three Easy Pieces” | Memory chapters |
| Durable logs and snapshots | “Designing Data-Intensive Applications” | Storage engines |
| BEAM data systems | “Erlang and OTP in Action” | Mnesia/distribution |
Common Pitfalls and Debugging
Problem 1: “Cache returns old value after restart”
- Why: Cache restored from stale snapshot without replay.
- Fix: Replay journal deltas after snapshot load.
- Quick test: Verify latest version after forced restart.
Problem 2: “Writes succeed but disappear”
- Why: Cache mutated before durable write commit.
- Fix: Journal-first write policy.
- Quick test: Crash immediately after write; verify persisted value.
Definition of Done
- Hybrid memory read/write path is implemented with explicit tier policy
- Restart recovery restores durable state and warms cache safely
- Hit-rate and warmup metrics are observable
- Crash-consistency tests pass
Project 15: LiveView Human-in-the-Loop Control Center
- File:
P15-liveview-hitl-control-center.md - Main Programming Language: Elixir
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Human approval gates, operator UX, traceability
- Software or Tool: Phoenix LiveView,
jido_studio, resolver-based access control - Main Book: “The Pragmatic Programmer”
What you will build: A HITL control center where risky directives pause for operator review (approve/reject/escalate) before execution.
Why it teaches Jido deeply: It connects autonomous agent loops to real governance and operational accountability.
Core challenges you will face:
- Pause/resume semantics for pending directives -> maps to strategy status transitions
- Role-based access (
:all,:read_only) -> maps to resolver policy - Audit trails for approvals -> maps to compliance-grade observability
Real World Outcome
Users open /studio, inspect pending actions, and choose approval outcomes with real-time state updates.
Screen behavior:
- Queue tab shows pending directives with risk score and policy reason.
- Action panel offers
Approve,Reject,Escalate. - Timeline records operator identity, decision, and resulting signal.
$ mix phx.server
[studio] mounted at /studio resolver=MyApp.StudioResolver
[approval] directive_id=dir_77 risk=high status=pending
[user] role=admin action=approve
[signal] type=policy.approved directive_id=dir_77
[result] directive_executed=true audit_event_id=audit_901
The Core Question You Are Answering
“How do I keep humans in control of high-risk autonomous actions without slowing everything down?”
Concepts You Must Understand First
- Approval-state modeling
- Book Reference: “Domain-Driven Design” - aggregate state transitions
- LiveView real-time UX constraints
- Book Reference: “The Pragmatic Programmer” - user feedback loops
- Access control resolvers
- Book Reference: “Foundations of Information Security” - authorization
Questions to Guide Your Design
- Which directive types require human approval?
- What timeout policy applies to unreviewed items?
- How do you make audit logs immutable and searchable?
Thinking Exercise
Define risk categories and required approver role for each.
The Interview Questions They Will Ask
- “How do you prevent bypassing approval gates?”
- “How do you handle stale approvals?”
- “How do you model multi-approver workflows?”
- “How do you design for low-latency operator feedback?”
- “What belongs in the audit event schema?”
Hints in Layers
Hint 1: Start with one approval queue
Hint 2: Represent decisions as signals, not direct mutations
Hint 3: Gate only risky directives first
Hint 4: Add role-based UI states (read-only vs approve)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Workflow modeling | “Domain-Driven Design” | Aggregates and invariants |
| Human-centered operations | “The Pragmatic Programmer” | Feedback and automation |
| Authorization principles | “Foundations of Information Security” | Access controls |
Common Pitfalls and Debugging
Problem 1: “Approved action executes twice”
- Why: Duplicate approval signals.
- Fix: Idempotent approval decision key.
- Quick test: Re-submit same decision; only first takes effect.
Problem 2: “Read-only users can approve”
- Why: Resolver not enforced server-side.
- Fix: Check access in action handlers, not only UI.
- Quick test: Attempt approval with read-only role; assert forbidden.
Definition of Done
- Risky directives pause for explicit human decisions
- Role-based access control is enforced server-side
- Approval/rejection events are audit logged
- End-to-end approve and reject paths are tested
Project 16: Tool Permission Firewall
- File:
P16-tool-permission-firewall.md - Main Programming Language: Elixir
- Alternative Programming Languages: Rust (sandbox), Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The Open Core Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Policy engines, sandbox boundaries, least privilege
- Software or Tool:
Jido.AI.Skillallowlists,jido_claudeallowed_tools, policy middleware - Main Book: “Foundations of Information Security”
What you will build: A tool firewall enforcing per-agent and per-skill permission policies with reasoned deny events.
Why it teaches Jido deeply: It operationalizes least privilege at directive execution time.
Core challenges you will face:
- Merging policy layers (global, agent, skill, user role) -> maps to deterministic precedence
- Context-aware approvals for risky tools -> maps to runtime policy hooks
- Clear deny diagnostics -> maps to debuggable security posture
Real World Outcome
$ mix run scripts/p16_tool_firewall_demo.exs
[policy] global_allow=[Read,Glob,Grep] global_deny=[Bash,FsWrite]
[request] tool=Bash actor=agent/researcher
[decision] denied code=tool_not_allowed reason="blocked by global policy"
[signal] type=policy.tool.denied tool=Bash
[request] tool=Read actor=agent/researcher
[decision] allowed
The Core Question You Are Answering
“How do I guarantee an agent cannot execute tools outside policy even if prompted to?”
Concepts You Must Understand First
- Least-privilege tool design
- Book Reference: “Foundations of Information Security” - access control
- Policy precedence rules
- Book Reference: “Clean Architecture” - policy layers
- Security observability signals
- Book Reference: “The Pragmatic Programmer” - operational diagnostics
Questions to Guide Your Design
- What policy source has highest precedence?
- Which denied actions need escalation instead of silent block?
- How do you prove policy tamper resistance?
Thinking Exercise
Write a precedence matrix: global, tenant, skill, session override.
The Interview Questions They Will Ask
- “Where is policy enforced: prompt, strategy, or runtime?”
- “How do you prevent policy bypass through aliases?”
- “How do you audit denied tool attempts?”
- “How do you support temporary emergency exceptions?”
- “How do you test firewall correctness?”
Hints in Layers
Hint 1: Canonicalize tool names first
Hint 2: Emit structured deny events
Hint 3: Keep policy evaluation pure and testable
Hint 4: Add shadow-mode policy before enforcement rollout
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Access control | “Foundations of Information Security” | Authorization models |
| Policy engine architecture | “Clean Architecture” | Policy vs detail |
| Operational rollout | “The Pragmatic Programmer” | Incremental deployment |
Common Pitfalls and Debugging
Problem 1: “Allowed tool blocked unexpectedly”
- Why: Policy precedence bug.
- Fix: Return evaluation trace in debug mode.
- Quick test: Unit tests for all precedence combinations.
Problem 2: “Tool alias bypasses deny rule”
- Why: Matching pre-normalization.
- Fix: Normalize aliases to canonical tool ID before evaluate.
- Quick test: Attempt blocked tool through alias variants.
Definition of Done
- Policy firewall enforces least privilege at runtime
- Denied attempts emit structured security signals
- Policy evaluation precedence is explicit and tested
- Shadow-mode and enforce-mode behavior are both validated
Project 17: Red-Team Evaluation Harness
- File:
P17-red-team-evaluation-harness.md - Main Programming Language: Elixir
- Alternative Programming Languages: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 4: Expert
- Knowledge Area: Adversarial testing, scorecards, regression safety
- Software or Tool:
jido_eval(experimental), custom eval suites, telemetry - Main Book: “Practical Malware Analysis” (for adversarial mindset)
What you will build: A harness that runs adversarial prompts/tool attacks and scores policy compliance, grounding quality, and recovery behavior.
Why it teaches Jido deeply: It shifts evaluation from anecdotal demos to repeatable security and reliability benchmarks.
Core challenges you will face:
- Scenario design quality -> maps to realistic threat models
- Deterministic scoring despite LLM variance -> maps to rubric design
- Regression gating in CI -> maps to production safety culture
Real World Outcome
$ mix run scripts/p17_redteam_harness.exs
[suite] scenarios=32 categories=[prompt_injection,tool_escalation,schema_fuzz]
[run] case=inj_07 expected=deny_tool actual=deny_tool score=1.0
[run] case=schema_03 expected=repair actual=repair score=1.0
[run] case=ground_04 expected=citations>=2 actual=1 score=0.0
[summary] pass_rate=87.5% critical_failures=1
[gate] ci_status=failed threshold=95%
The Core Question You Are Answering
“How do I know my agent is still safe after every prompt, tool, or model change?”
Concepts You Must Understand First
- Adversarial test taxonomy
- Book Reference: “Practical Malware Analysis” - adversarial patterns mindset
- Scoring rubrics and confidence thresholds
- Book Reference: “Algorithms, Fourth Edition” - scoring/statistics
- Regression gates in delivery pipelines
- Book Reference: “The Pragmatic Programmer” - quality automation
Questions to Guide Your Design
- Which failures are release blockers?
- How do you score partially correct responses?
- How do you keep the suite representative over time?
Thinking Exercise
Design 10 attack scenarios across three categories and define expected safe behavior.
The Interview Questions They Will Ask
- “How do you prevent eval overfitting?”
- “What makes a red-team scenario realistic?”
- “How do you score non-deterministic outputs?”
- “How do you tie evals to release decisions?”
- “How do you prioritize failing scenarios?”
Hints in Layers
Hint 1: Start with 5 high-value scenarios
Hint 2: Separate hard rules from soft quality metrics
Hint 3: Version your eval datasets
Hint 4: Publish trendline metrics over weekly runs
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Adversarial thinking | “Practical Malware Analysis” | Attack mindset |
| Scoring systems | “Algorithms, Fourth Edition” | Metrics/statistics |
| Continuous quality | “The Pragmatic Programmer” | Automation and testing |
Common Pitfalls and Debugging
Problem 1: “Pass rate fluctuates wildly”
- Why: Rubric depends on free-form wording.
- Fix: Score structural signals (policy action, citations, tool usage) first.
- Quick test: Re-run same suite 5x and inspect variance.
Problem 2: “CI too noisy”
- Why: Thresholds not tiered by severity.
- Fix: Separate critical blocker metrics from advisory metrics.
- Quick test: Inject one advisory fail and confirm release policy behavior.
Definition of Done
- Red-team suite runs reproducibly with versioned scenarios
- Critical policy failures gate releases
- Score reports include per-category breakdowns
- Historical trend tracking is available
Project 18: Multimodal Agent Pipeline
- File:
P18-multimodal-agent-pipeline.md - Main Programming Language: Elixir
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The Service and Support Model
- Difficulty: Level 4: Expert
- Knowledge Area: Browser automation, vision input, multimodal context
- Software or Tool:
jido_browser,req_llmmultimodal APIs - Main Book: “Clean Architecture”
What you will build: An agent that navigates a web page, captures screenshot/content, and produces structured incident summaries from multimodal inputs.
Why it teaches Jido deeply: It combines external interaction, extraction, and model reasoning into one controlled pipeline.
Core challenges you will face:
- Session lifecycle for browser tools -> maps to robust setup/cleanup
- Large multimodal context shaping -> maps to token control
- Cross-modal grounding -> maps to verifiable outputs
Real World Outcome
$ mix run scripts/p18_multimodal_pipeline.exs --url https://status.example.com
[browser] session_started adapter=vibium
[navigate] ok url=https://status.example.com
[extract] markdown_chars=9421 screenshot_bytes=183204
[llm] model=openai:gpt-4o-mini input_modalities=[text,image]
[result] severity=s2 affected_services=3 confidence=0.84
[artifact] wrote=artifacts/p18_incident_summary.json
[browser] session_ended=true
The Core Question You Are Answering
“How do I make multimodal agent outputs grounded in what was actually seen on screen?”
Concepts You Must Understand First
- Browser action sequence and waits
- Book Reference: “The Pragmatic Programmer” - automation reliability
- Multimodal message construction
- Book Reference: “Clean Architecture” - adapter patterns
- Output grounding checks
- Book Reference: “Designing Data-Intensive Applications” - data quality
Questions to Guide Your Design
- Which browser actions are mandatory before extraction?
- How do you avoid stale page capture?
- How do you link output claims to screenshot/text evidence?
Thinking Exercise
Design a provenance object linking every final claim to source modality.
The Interview Questions They Will Ask
- “How do you ensure deterministic browser flows?”
- “What is your strategy for long page content?”
- “How do you test multimodal grounding quality?”
- “How do you recover from browser-session failures?”
- “How do you secure browser automation in production?”
Hints in Layers
Hint 1: Add explicit wait_for_selector steps
Hint 2: Capture both markdown and screenshot every run
Hint 3: Use structured output schema for summary
Hint 4: Add failure path for partial extraction
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Reliable automation | “The Pragmatic Programmer” | Automation workflows |
| Adapter architecture | “Clean Architecture” | Interface adapters |
| Data quality controls | “Designing Data-Intensive Applications” | Data correctness |
Common Pitfalls and Debugging
Problem 1: “Vision summary contradicts extracted text”
- Why: Different capture timestamps.
- Fix: Capture all modalities in one atomic step sequence.
- Quick test: Assert same page URL/timestamp in all artifacts.
Problem 2: “Browser process leaks”
- Why: Session not closed on error path.
- Fix: Ensure cleanup in finally/termination callback.
- Quick test: Stress run 100 sessions; no orphan processes.
Definition of Done
- Browser session lifecycle is reliable under success/failure
- Multimodal prompt includes synchronized text and image artifacts
- Structured summary output is validated and stored
- Grounding/provenance fields are present for key claims
Project 19: Hot Upgrade Release Drill
- File:
P19-hot-upgrade-release-drill.md - Main Programming Language: Elixir/Erlang
- Alternative Programming Languages: N/A
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The Open Core Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: OTP release handling, appup/relup, zero-downtime ops
- Software or Tool: OTP releases,
release_handler,appup,relup - Main Book: “Erlang and OTP in Action”
What you will build: A controlled hot-upgrade drill where a running Jido agent system is upgraded with no lost in-flight work and validated rollback path.
Why it teaches Jido deeply: It connects agent runtime guarantees with BEAM’s operational superpower: live upgrades.
Core challenges you will face:
- State transformation between versions -> maps to code_change safety
- Release packaging correctness (
.appup,relup) -> maps to deploy reliability - Rollback under partial failure -> maps to operational resilience
Real World Outcome
$ _build/prod/rel/my_app/bin/my_app upgrade 0.2.0
[release] install_release from=0.1.0 to=0.2.0
[agent] in_flight_requests=12
[upgrade] code_change module=MyApp.Agent.Runtime result=ok
[upgrade] directives_queue_dropped=0
[health] status=green p95_latency_ms=96
[rollback_test] install_release 0.1.0 result=ok
The Core Question You Are Answering
“Can I evolve a live autonomous system without stopping it or corrupting state?”
Concepts You Must Understand First
- OTP release handling basics (
appup,relup)- Book Reference: “Erlang and OTP in Action” - releases and operations
- State migration (
code_change/3)- Book Reference: “Clean Architecture” - versioned contracts
- Upgrade/rollback runbooks
- Book Reference: “The Pragmatic Programmer” - operational discipline
Questions to Guide Your Design
- Which modules require state transformation?
- What pre-upgrade health checks are mandatory?
- How do you prove no in-flight loss?
Thinking Exercise
Draft a go/no-go checklist for upgrade initiation and rollback trigger criteria.
The Interview Questions They Will Ask
- “What can and cannot be hot-upgraded safely?”
- “How do you test
code_changepaths?” - “How do you detect silent state corruption post-upgrade?”
- “What is your rollback SLO?”
- “How do you coordinate upgrades across nodes?”
Hints in Layers
Hint 1: Upgrade one non-critical module first
Hint 2: Add explicit state version tag
Hint 3: Track in-flight request counters before/after
Hint 4: Practice rollback drill in staging weekly
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Release handling | “Erlang and OTP in Action” | OTP releases |
| State evolution | “Clean Architecture” | Interface evolution |
| Ops runbooks | “The Pragmatic Programmer” | Pragmatic operations |
Common Pitfalls and Debugging
Problem 1: “Upgrade succeeds but behavior regresses”
- Why: Missing post-upgrade verification suite.
- Fix: Run synthetic workload immediately after upgrade.
- Quick test: Compare golden workload before/after upgrade.
Problem 2: “Rollback fails due to incompatible state”
- Why: One-way state migration.
- Fix: Design reversible migration where required.
- Quick test: Upgrade then rollback in staging on every release candidate.
Definition of Done
- Hot upgrade succeeds with no dropped in-flight work
- Rollback path is tested and documented
- State migration functions are versioned and validated
- Upgrade runbook includes health gates and abort criteria
Project 20: BEAM Autonomous Ops Swarm
- File:
P20-beam-autonomous-ops-swarm.md - Main Programming Language: Elixir
- Alternative Programming Languages: Erlang, Rust sidecars
- Coolness Level: Level 5: Pure Magic
- Business Potential: 5. The Industry Disruptor
- Difficulty: Level 5: Master
- Knowledge Area: End-to-end multi-agent operations platform
- Software or Tool:
jido,jido_ai,jido_signal,req_llm,jido_studio,jido_runic - Main Book: “Designing Data-Intensive Applications”
What you will build: A capstone swarm with coordinator + specialist agents handling incidents autonomously with policy gates, dashboards, persistence, and distributed recovery.
Why it teaches Jido deeply: It integrates every core concept into one production-like system under fault injection.
Core challenges you will face:
- Cross-agent protocol design -> maps to typed signals and causality
- Governed autonomy -> maps to permission firewall + HITL control
- Operational resilience -> maps to failover, replay, upgrades, and observability
Real World Outcome
$ mix run scripts/p20_ops_swarm_capstone.exs --scenario incident_simulation
[swarm] agents={coordinator:1,triage:3,repair:4,verify:2}
[incident] id=inc_2026_02_12_01 severity=s2 source=github.issue.opened
[phase] triage -> research -> patch -> quality -> approval -> deploy
[policy] high_risk_action requires_human_approval=true
[operator] approved action=deploy_patch
[resilience] provider_failover=true netsplit_recovered=true
[summary] mttr_minutes=14 cost_usd=0.42 policy_violations=0
[result] status=completed postmortem=artifacts/p20_postmortem.md
The Core Question You Are Answering
“Can I run a policy-governed autonomous operations system that is fast, explainable, and resilient?”
Concepts You Must Understand First
- Multi-agent orchestration and handoffs
- Book Reference: “Designing Data-Intensive Applications” - distributed workflows
- Signal causality and replay
- Book Reference: “Erlang and OTP in Action” - distributed message handling
- Governance and safety controls
- Book Reference: “Foundations of Information Security” - policy and audit
Questions to Guide Your Design
- Which phases are fully autonomous vs approval-gated?
- How do you define success/failure SLOs for the swarm?
- How do you keep postmortems auto-generated and trustworthy?
Thinking Exercise
Define end-to-end SLOs: availability, MTTR, cost budget, policy violation rate.
The Interview Questions They Will Ask
- “How do you keep swarm behavior explainable?”
- “What prevents cascading failures across agents?”
- “How do you enforce global policy across heterogeneous agents?”
- “How do you validate resilience claims?”
- “How do you transition this capstone to production governance?”
Hints in Layers
Hint 1: Build one vertical slice first (triage->approve->notify)
Hint 2: Add specialist agents incrementally
Hint 3: Use one canonical signal schema registry
Hint 4: Run weekly chaos drills and publish metrics
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Distributed platform design | “Designing Data-Intensive Applications” | Reliability and scale |
| BEAM operations | “Erlang and OTP in Action” | Distribution and supervision |
| Governance and controls | “Foundations of Information Security” | Risk and policy |
Common Pitfalls and Debugging
Problem 1: “Swarm finishes but postmortem is incomplete”
- Why: Missing causal links in signal metadata.
- Fix: Enforce
trace_idandparent_signal_idon all signals. - Quick test: Postmortem generator must reconstruct full phase chain.
Problem 2: “Automation stalls at approval boundaries”
- Why: No timeout/escalation policy for pending approvals.
- Fix: Add escalation signals and fallback responders.
- Quick test: Simulate absent operator; verify escalation path.
Definition of Done
- Full swarm run completes with traceable phase transitions
- Policy gates and human approvals are enforced for risky actions
- Resilience drills (failover + netsplit + restart) pass
- Capstone emits measurable SLO report and postmortem artifact
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. Signal-Native ReAct Calculator Agent | Level 2 | Weekend | Medium | ★★★★☆ |
| 2. Tool-Governed Web Research Agent | Level 2 | Weekend | Medium | ★★★★☆ |
| 3. Multi-Provider Failover Gateway | Level 3 | 1-2 weeks | High | ★★★★☆ |
| 4. Streaming Observability Console | Level 3 | 1-2 weeks | High | ★★★★★ |
| 5. Structured Output Contracts | Level 2 | Weekend | Medium | ★★★★☆ |
| 6. Cost-Aware Model Router | Level 3 | 1-2 weeks | High | ★★★★★ |
| 7. Skill and Plugin Composition Lab | Level 3 | 1-2 weeks | High | ★★★★☆ |
| 8. Strategy State Machine Switchboard | Level 4 | 2-3 weeks | Very High | ★★★★★ |
| 9. Worker Pool Load Lab | Level 3 | 1-2 weeks | High | ★★★★☆ |
| 10. Persistent Thread Memory | Level 3 | 1-2 weeks | High | ★★★★☆ |
| 11. Sensor-Driven Incident Triage | Level 3 | 1-2 weeks | High | ★★★★★ |
| 12. Cron Autonomous Maintenance | Level 3 | 1-2 weeks | High | ★★★★☆ |
| 13. Distributed Netsplit Recovery Drill | Level 4 | 2-3 weeks | Very High | ★★★★★ |
| 14. ETS and Mnesia Hybrid Agent Memory | Level 4 | 2-3 weeks | Very High | ★★★★★ |
| 15. LiveView HITL Control Center | Level 3 | 1-2 weeks | High | ★★★★☆ |
| 16. Tool Permission Firewall | Level 4 | 2-3 weeks | Very High | ★★★★★ |
| 17. Red-Team Evaluation Harness | Level 4 | 2-3 weeks | Very High | ★★★★★ |
| 18. Multimodal Agent Pipeline | Level 4 | 2-3 weeks | Very High | ★★★★★ |
| 19. Hot Upgrade Release Drill | Level 5 | 3-4 weeks | Expert | ★★★★★ |
| 20. BEAM Autonomous Ops Swarm | Level 5 | 4-6 weeks | Expert+ | ★★★★★ |
Recommendation
If you are new to Jido/BEAM agents: Start with Project 1, then Project 5, then Project 6 to build deterministic fundamentals plus cost awareness.
If you are an SRE/platform engineer: Start with Project 9, Project 13, and Project 19 to focus on runtime guarantees, partition behavior, and safe upgrades.
If you want to build production AI products quickly: Start with Project 2, Project 4, Project 15, then move to Project 20.
Final Overall Project: Autonomous Reliability Control Plane
The Goal: Combine Projects 3, 6, 13, 16, and 20 into a single autonomous operations system.
- Build multi-provider routing with policy controls and fallback.
- Add distributed supervisor topology with netsplit detection and reconciliation.
- Enforce directive safety gates and human approval paths for high-risk actions.
- Add live observability with usage/cost telemetry and replayable traces.
- Run staged hot-upgrade drills and prove no unsafe state transitions.
Success Criteria: The system remains available and policy-compliant during injected provider failures, child crashes, and node partition events while keeping budget and latency within defined bounds.
From Learning to Production
| Your Project | Production Equivalent | Gap to Fill |
|---|---|---|
| Project 1 | Tool-using assistant microservice | policy and observability hardening |
| Project 3 | Multi-provider inference gateway | enterprise auth + SLA governance |
| Project 6 | AI FinOps control service | org-level budgeting and chargeback |
| Project 13 | Geo-distributed autonomous cluster | formal reconciliation + compliance controls |
| Project 19 | Continuous upgrade pipeline | change management and canary policy |
| Project 20 | Autonomous operations platform | team process, governance, and on-call maturity |
Summary
This learning path covers Jido + BEAM-native agent engineering through 20 hands-on projects.
| # | Project Name | Main Language | Difficulty | Time Estimate |
|---|---|---|---|---|
| 1 | Signal-Native ReAct Calculator Agent | Elixir | Level 2 | 8-12h |
| 2 | Tool-Governed Web Research Agent | Elixir | Level 2 | 10-14h |
| 3 | Multi-Provider Failover Gateway | Elixir | Level 3 | 12-18h |
| 4 | Streaming Observability Console | Elixir | Level 3 | 12-18h |
| 5 | Structured Output Contracts | Elixir | Level 2 | 8-12h |
| 6 | Cost-Aware Model Router | Elixir | Level 3 | 12-18h |
| 7 | Skill and Plugin Composition Lab | Elixir | Level 3 | 12-18h |
| 8 | Strategy State Machine Switchboard | Elixir | Level 4 | 16-24h |
| 9 | Worker Pool Load Lab | Elixir | Level 3 | 12-18h |
| 10 | Persistent Thread Memory | Elixir | Level 3 | 12-20h |
| 11 | Sensor-Driven Incident Triage | Elixir | Level 3 | 12-20h |
| 12 | Cron Autonomous Maintenance | Elixir | Level 3 | 12-20h |
| 13 | Distributed Netsplit Recovery Drill | Elixir | Level 4 | 18-28h |
| 14 | ETS and Mnesia Hybrid Agent Memory | Elixir | Level 4 | 18-28h |
| 15 | LiveView Human-in-the-Loop Control Center | Elixir | Level 3 | 14-20h |
| 16 | Tool Permission Firewall | Elixir | Level 4 | 16-24h |
| 17 | Red-Team Evaluation Harness | Elixir | Level 4 | 16-24h |
| 18 | Multimodal Agent Pipeline | Elixir | Level 4 | 16-24h |
| 19 | Hot Upgrade Release Drill | Elixir | Level 5 | 24-36h |
| 20 | BEAM Autonomous Ops Swarm | Elixir | Level 5 | 30-50h |
Expected Outcomes
- You can design deterministic and auditable agent state machines.
- You can operate multi-provider LLM systems with explicit budgets and safety policy gates.
- You can run distributed, supervised autonomous workflows that recover from real failures.
Additional Resources and References
Standards and Specifications
- CloudEvents Specification v1.0.2
- CNCF CloudEvents Project
- Erlang/OTP System Limits
- Erlang
+PProcess Limit Flag
Primary Jido Ecosystem Sources
- Jido Repository - Core framework: Action, Instruction, Plan, Exec, Plugin, AgentServer
- Jido.AI Repository - AI strategies: ReAct, CoT, ToT, GoT, Adaptive, TRM; Directives; Skills
- ReqLLM Repository - Multi-provider LLM abstraction (45+ providers, 665+ models)
- jido_signal Repository - Signal infrastructure: Bus, Router, Dispatch, Journal
- LLMDB Repository - Model metadata database (context_window, capabilities, costs)
- jido_browser Repository - Browser automation for multimodal agent pipelines
- jido_studio Repository - LiveView-based agent observation and HITL control center
- Agent Jido Website
- ReqLLM 1.0 Announcement
- Hex Package: jido
- Hex Package: jido_ai
- Hex Package: req_llm
- Hex Package: jido_signal
- Hex Package: llm_db
Research Papers and Technical Foundations
- ReAct: Synergizing Reasoning and Acting in Language Models
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Graph of Thoughts: Solving Elaborate Problems with LLMs
- Toolformer: Language Models Can Teach Themselves to Use Tools
Industry Signals and Metrics (as of 2026-02-12)
- Stack Overflow Developer Survey 2025 - AI Sentiment and Usage
- Stack Overflow Developer Survey 2025 - AI Agent Uses and Impacts
- CloudEvents Project Page (Graduated announcement + adopters)
- GitHub API: agentjido/jido
- GitHub API: agentjido/jido_ai
- GitHub API: agentjido/req_llm
- GitHub API: agentjido/jido_signal
- GitHub API: agentjido/llm_db
- GitHub API: agentjido/jido_browser
- GitHub API: agentjido/jido_studio
- Hex API: jido
- Hex API: jido_ai
- Hex API: req_llm
- Hex API: jido_signal
- Hex API: llm_db