Project 1: Multi-Model Gateway with req_llm

Quick Reference

Attribute Value
Difficulty 2
Time 1.5-2 weeks
Main Stack Elixir + req_llm
Alternatives openai_ex, direct Req, custom HTTP client
Why Now Required for resilient LLM products

What You Will Build

A reusable Gateway service that accepts a single request shape and routes traffic across providers supported by req_llm (OpenAI-compatible baseline + provider-specific extensions) without changing caller code.

Real World Outcome

You will produce a local CLI-driven gateway with one observable interface:

$ mix run -e "GatewayDemo.demo(:dev)"
[info] request_id=bbf2f6 provider=anthropic:claude-sonnet-4 model=claude-sonnet-4
[info] tokens_in=74 tokens_out=192 cost_usd=0.00640 latency_ms=1210
[info] stream=off structured=false
[info] result=triage_summary: "Customer requested feature change; priority=high"

The user-level behavior:

  • One endpoint spec, no provider-specific logic at callers
  • Automatic model metadata lookup, with transparent fallback to compatible provider/model
  • Strictly typed request envelope for policy, telemetry, and output shape control

The Core Question You Are Answering

“How do we make model provider differences disappear for upstream teams while keeping control of quality, cost, and observability?”

Why This Project Matters

req_llm explicitly markets itself as a provider-agnostic interface above the OpenAI chat baseline. This is a production requirement when:

  • model availability changes by region/provider
  • costs shift by latency and output length
  • compliance requires complete usage logging

Minimal Conceptual Example

Input:
  user_prompt + context + guardrails
  -> policy.select_model(request)
  -> req_llm.generate_text(model_ref, prompt, options)
  -> response, usage, provider_metadata
  -> normalized_output

ASCII Design Diagram

                +----------------------+
                |  Upstream Request    |
                +----------+-----------+
                           |
                           v
      +--------------------------------------------+
      | req_llm Gateway Contract (single shape)     |
      | - provider_hint                            |
      | - task_class (chat/rewrite/extract/etc)     |
      | - policy flags (max_cost, latency budget)   |
      +----------------------+---------------------+
                             |
          +------------------+------------------+
          |                                     |
          v                                     v
 +----------------------+            +----------------------+
 | Provider Registry    |            | Strategy Router      |
 | model metadata       |            | policy evaluation    |
 | fallback policy      |            | timeout + budget     |
 +----------+-----------+            +----------+-----------+
            |                                     |
            +-----------------+-------------------+
                              v
                   +--------------------------+
                   | req_llm.generate_* APIs   |
                   +--------------------------+
                              |
                              v
                 +-------------------------------+
                 | normalized usage + result       |
                 | token + cost telemetry         |
                 +-------------------------------+

Deep Dive Plan

1. Canonical Input Layer

Define a single request structure with fields:

  • provider_hint (:openai, :anthropic, etc.) and optional hard constraints
  • mode (stream, text, object, image)
  • quality_profile (latency-first, quality-first, balanced)
  • max_cost_usd (hard budget when possible)

2. Policy Router

Implement policy evaluation that:

  • sorts candidate models by suitability for task
  • enforces hard budget caps
  • rejects unsupported provider/model combinations
  • records decision rationale for auditability

3. req_llm Integration Layer

Use the same invocation shape for all providers:

  • generate_text(model, prompt, options)
  • optional stream_text, generate_object from later projects
  • maintain per-request api_key override for incident-driven rotation

4. Observability and Determinism

Capture:

  • request ID, model string, provider string, tokens, cost, latency
  • fallback decisions and reasons
  • structured logs and stable output envelope

5. Contract Regression Harness

Build scenario fixtures:

  • provider down, budget exceeded, unsupported mode
  • stable outputs under same prompt with deterministic settings
  • one CLI command to print golden runbook traces

Concepts You Must Understand First

  1. Provider abstraction
    • Why a unified API is safer than hand-rolled per-provider clients.
    • Reference: req_llm overview (https://hexdocs.pm/req_llm/1.5.1/overview.html)
  2. Canonical model specification
    • provider:model parsing and validation behavior.
    • Reference: req_llm documentation models/registry concepts.
  3. Telemetry as a first-class output
    • Usage and cost fields as output contracts, not side effects.
    • Reference: req_llm usage tracking section.

Questions to Guide Your Design

  1. Routing policy
    • Which request fields should force provider choice vs allow auto-selection?
    • How do you avoid routing oscillation during transient failures?
  2. Contract stability
    • What minimal fields must always exist in output?
    • How do you preserve compatibility while evolving provider options?
  3. Failure and retry
    • What are your retry bounds for provider errors before fallback?
    • How should error surfaces differ between user and operator visibility?

Thinking Exercise

Design a policy matrix that forces deterministic selection when budget is strict.

  • Scenario A: max_cost_usd=0.005, prompt small, no streaming.
  • Scenario B: max_cost_usd=0.005, long prompt, stream: true.
  • Scenario C: Preferred provider unavailable for 30 seconds.

For each scenario, pick model/provider and justify why in one sentence.

Interview Questions They Will Ask

  1. “How does req_llm reduce provider lock-in while avoiding the adapter anti-pattern?”
  2. “Explain the difference between policy routing and failover.”
  3. “How would you prove cost governance with observability?”
  4. “How do you safely override keys per request?”
  5. “What invariants guarantee that routing decisions are reproducible?”

Hints in Layers

Hint 1: Start with the contract Create one envelope for every request and every response. Everything else (provider, retries, logs) depends on it.

Hint 2: Policy first Build routing logic before adding provider-specific code. Your router should return a model string and execution plan.

Hint 3: Normalize telemetry Store usage as: input_tokens, output_tokens, cost_usd, and provider.

Hint 4: Add chaos tests Simulate timeout, invalid credentials, and unsupported model names to verify your deterministic fallback.

Common Pitfalls and Debugging

  • Problem: Same request returns different provider each run.
    • Why: missing secondary sort keys in policy.
    • Fix: include explicit tie-breakers (provider_preference, region, cost_per_token).
    • Quick test:
      • mix run -e "GatewayDemo.stability_check(:100)" and assert no more than 1 candidate flips.
  • Problem: Cost tracking absent for streaming.
    • Why: usage metadata pulled after stream consumption only.
    • Fix: read StreamResponse.usage(response) in a completion callback.
    • Quick test: run a long stream with usage assertion at end.
  • Problem: API key source confusion.
    • Why: environment and in-memory keys collide.
    • Fix: define explicit precedence and include it in trace logs.
    • Quick test: rotate keys in one process and assert request shows chosen source.

Books That Will Help

Topic Book Chapter
Distributed systems Designing Data-Intensive Applications Data Models and Serialization
Elixir OTP Programming Elixir Fault Tolerance and Supervision

Definition of Done

  • One request form maps to all providers without caller changes
  • Fallback matrix tested under forced provider outage
  • Usage + cost logs include provider and request metadata
  • Policy decisions are reproducible and explainable
  • No business logic is tied to a single provider API

References

  • https://hexdocs.pm/req_llm/1.5.1/overview.html