Project 1: Multi-Model Gateway with req_llm

Quick Reference

Attribute	Value
Difficulty	2
Time	1.5-2 weeks
Main Stack	Elixir + req_llm
Alternatives	`openai_ex`, direct `Req`, custom HTTP client
Why Now	Required for resilient LLM products

What You Will Build

A reusable Gateway service that accepts a single request shape and routes traffic across providers supported by req_llm (OpenAI-compatible baseline + provider-specific extensions) without changing caller code.

Real World Outcome

You will produce a local CLI-driven gateway with one observable interface:

$ mix run -e "GatewayDemo.demo(:dev)"
[info] request_id=bbf2f6 provider=anthropic:claude-sonnet-4 model=claude-sonnet-4
[info] tokens_in=74 tokens_out=192 cost_usd=0.00640 latency_ms=1210
[info] stream=off structured=false
[info] result=triage_summary: "Customer requested feature change; priority=high"

The user-level behavior:

One endpoint spec, no provider-specific logic at callers
Automatic model metadata lookup, with transparent fallback to compatible provider/model
Strictly typed request envelope for policy, telemetry, and output shape control

The Core Question You Are Answering

“How do we make model provider differences disappear for upstream teams while keeping control of quality, cost, and observability?”

Why This Project Matters

req_llm explicitly markets itself as a provider-agnostic interface above the OpenAI chat baseline. This is a production requirement when:

model availability changes by region/provider
costs shift by latency and output length
compliance requires complete usage logging

Minimal Conceptual Example

Input:
  user_prompt + context + guardrails
  -> policy.select_model(request)
  -> req_llm.generate_text(model_ref, prompt, options)
  -> response, usage, provider_metadata
  -> normalized_output

ASCII Design Diagram

                +----------------------+
                |  Upstream Request    |
                +----------+-----------+
                           |
                           v
      +--------------------------------------------+
      | req_llm Gateway Contract (single shape)     |
      | - provider_hint                            |
      | - task_class (chat/rewrite/extract/etc)     |
      | - policy flags (max_cost, latency budget)   |
      +----------------------+---------------------+
                             |
          +------------------+------------------+
          |                                     |
          v                                     v
 +----------------------+            +----------------------+
 | Provider Registry    |            | Strategy Router      |
 | model metadata       |            | policy evaluation    |
 | fallback policy      |            | timeout + budget     |
 +----------+-----------+            +----------+-----------+
            |                                     |
            +-----------------+-------------------+
                              v
                   +--------------------------+
                   | req_llm.generate_* APIs   |
                   +--------------------------+
                              |
                              v
                 +-------------------------------+
                 | normalized usage + result       |
                 | token + cost telemetry         |
                 +-------------------------------+

Deep Dive Plan

1. Canonical Input Layer

Define a single request structure with fields:

provider_hint (:openai, :anthropic, etc.) and optional hard constraints
mode (stream, text, object, image)
quality_profile (latency-first, quality-first, balanced)
max_cost_usd (hard budget when possible)

2. Policy Router

Implement policy evaluation that:

sorts candidate models by suitability for task
enforces hard budget caps
rejects unsupported provider/model combinations
records decision rationale for auditability

3. req_llm Integration Layer

Use the same invocation shape for all providers:

generate_text(model, prompt, options)
optional stream_text, generate_object from later projects
maintain per-request api_key override for incident-driven rotation

4. Observability and Determinism

Capture:

request ID, model string, provider string, tokens, cost, latency
fallback decisions and reasons
structured logs and stable output envelope

5. Contract Regression Harness

Build scenario fixtures:

provider down, budget exceeded, unsupported mode
stable outputs under same prompt with deterministic settings
one CLI command to print golden runbook traces

Concepts You Must Understand First

Provider abstraction
- Why a unified API is safer than hand-rolled per-provider clients.
- Reference: req_llm overview (https://hexdocs.pm/req_llm/1.5.1/overview.html)
Canonical model specification
- provider:model parsing and validation behavior.
- Reference: req_llm documentation models/registry concepts.
Telemetry as a first-class output
- Usage and cost fields as output contracts, not side effects.
- Reference: req_llm usage tracking section.

Questions to Guide Your Design

Routing policy
- Which request fields should force provider choice vs allow auto-selection?
- How do you avoid routing oscillation during transient failures?
Contract stability
- What minimal fields must always exist in output?
- How do you preserve compatibility while evolving provider options?
Failure and retry
- What are your retry bounds for provider errors before fallback?
- How should error surfaces differ between user and operator visibility?

Thinking Exercise

Design a policy matrix that forces deterministic selection when budget is strict.

Scenario A: max_cost_usd=0.005, prompt small, no streaming.
Scenario B: max_cost_usd=0.005, long prompt, stream: true.
Scenario C: Preferred provider unavailable for 30 seconds.

For each scenario, pick model/provider and justify why in one sentence.

Interview Questions They Will Ask

“How does req_llm reduce provider lock-in while avoiding the adapter anti-pattern?”
“Explain the difference between policy routing and failover.”
“How would you prove cost governance with observability?”
“How do you safely override keys per request?”
“What invariants guarantee that routing decisions are reproducible?”

Hints in Layers

Hint 1: Start with the contract Create one envelope for every request and every response. Everything else (provider, retries, logs) depends on it.

Hint 2: Policy first Build routing logic before adding provider-specific code. Your router should return a model string and execution plan.

Hint 3: Normalize telemetry Store usage as: input_tokens, output_tokens, cost_usd, and provider.

Hint 4: Add chaos tests Simulate timeout, invalid credentials, and unsupported model names to verify your deterministic fallback.

Common Pitfalls and Debugging

Problem: Same request returns different provider each run.
- Why: missing secondary sort keys in policy.
- Fix: include explicit tie-breakers (provider_preference, region, cost_per_token).
- Quick test:
  - mix run -e "GatewayDemo.stability_check(:100)" and assert no more than 1 candidate flips.
Problem: Cost tracking absent for streaming.
- Why: usage metadata pulled after stream consumption only.
- Fix: read StreamResponse.usage(response) in a completion callback.
- Quick test: run a long stream with usage assertion at end.
Problem: API key source confusion.
- Why: environment and in-memory keys collide.
- Fix: define explicit precedence and include it in trace logs.
- Quick test: rotate keys in one process and assert request shows chosen source.

Books That Will Help

Topic	Book	Chapter
Distributed systems	Designing Data-Intensive Applications	Data Models and Serialization
Elixir OTP	Programming Elixir	Fault Tolerance and Supervision

Definition of Done

One request form maps to all providers without caller changes
Fallback matrix tested under forced provider outage
Usage + cost logs include provider and request metadata
Policy decisions are reproducible and explainable
No business logic is tied to a single provider API

References

https://hexdocs.pm/req_llm/1.5.1/overview.html