Project 3: Streaming Copilot Operations Center

Quick Reference

Attribute Value
Difficulty 3
Time 1.5-2 weeks
Main Stack req_llm streaming + Phoenix LiveView-style event flow
Alternatives Polling API, long polling buffers
Why Now Real agents need responsiveness under uncertain latency

What You Will Build

An operations center that streams token-by-token responses into a shared operations UI model while preserving backpressure, partial completion, and final cost attribution.

Real World Outcome

$ mix run -e "OpsCenterDemo.start_session(:support)"
[info] session_id=op-884f state=streaming provider=openai:gpt-4o-mini
[info] token_rate=18 t/s backpressure=low p99=1.8s
[info] stream_status=finalized duration_ms=14220
[info] usage input_tokens=120 output_tokens=900 total_cost=0.0071
[info] ui_broadcast=complete

The session page should display:

  • incremental answer text
  • model/provider badge
  • rolling latency
  • streaming progress bar
  • final telemetry summary

The Core Question You Are Answering

“How do we make LLM streaming usable for operators, not just technically possible?”

Why This Project Matters

req_llm documents production-grade streaming with concurrent metadata collection and token stream support. This is where UX, reliability, and observability intersect.

Conceptual Diagram

+-------------+     tokens/chunks     +--------------------+
| User Input  |--------------------->| Streaming Gateway  |
+-------------+                      +---------+----------+
                                             |
                                  +----------+-----------+
                                  |                      |
                                  | req_llm StreamResponse|
                                  | (tokens + usage async) |
                                  +----------+-----------+
                                             |
                                   +---------v----------+
                                   | BackpressureBuffer |
                                   +---------+----------+
                                             |
                                 +-----------+-----------+
                                 |                       |
                       +---------v---------+     +-------v------+
                       | PubSub / Bus      |     | Session Store|
                       | emit update msg   |     | snapshots    |
                       +---------+---------+     +-------+------+
                                 |                       |
                                 +-----------+-----------+
                                             |
                                     +-------v--------+
                                     | Ops UI Client  |
                                     | + Logs/Telemetry|
                                     +----------------+

Design Notes

Token Stream Ingestion

  • Consume stream chunks from StreamResponse.tokens/1.
  • Emit updates at bounded intervals to avoid UI chattiness (e.g., 100-250ms flush windows).

Metadata Timing

  • Use concurrent metadata collection to keep stream smooth.
  • Finalize billing numbers only after stream ends, then merge into session record.

Fault Handling

  • Short provider hiccups: buffer last known-good state and continue gracefully.
  • Hard failures: send an explicit terminal error state and failure reason with recovery button.

Concepts You Must Understand First

  1. Stream lifecycle
    • Token stream start, midstream errors, and final metadata.
  2. Backpressure mechanics
    • Batching updates avoids both memory blow-up and UI jank.
  3. Observability by phase
    • First-token latency, inter-token gap, finalization latency.

Questions to Guide Your Design

  1. UI contract
    • Which token transitions should update clients immediately?
  2. Session semantics
    • What exactly causes a session to become “complete”?
  3. Reliability
    • Can we resume partially dropped sessions?

Thinking Exercise

Simulate a stream for 15 seconds in your head.

  • At second 1 provider returns nothing.
  • At second 4 first token lands.
  • At second 8 metadata arrives late.
  • At second 12 connection drops once then resumes.

Identify exactly what user-visible states are shown at each checkpoint.

Interview Questions They Will Ask

  1. “How do you avoid flooding clients with token-level events?”
  2. “How is usage metadata kept in sync with streamed content?”
  3. “What do you do when stream fails after partial output?”
  4. “How would you benchmark token smoothness across providers?”
  5. “Why does this matter for agentic copilots?”

Hints in Layers

Hint 1: Define session states starting -> streaming -> finalizing -> complete|errored

Hint 2: Separate transport and presentation Store raw chunks in buffer, render derived state for UI.

Hint 3: Track correlation IDs One ID per stream should tie tokens, telemetry, retries, and support ticket updates.

Hint 4: Add synthetic load tests Run repeated high-latency and bursty sessions to validate jitter handling.

Common Pitfalls and Debugging

  • Problem: UI appears frozen while tokens arrive.
    • Why: synchronous rendering for every token.
    • Fix: batch by tick and publish diffs.
    • Quick test: 100-token prompt should keep UI responsive.
  • Problem: Cost shown as zero.
    • Why: usage queried before stream ends.
    • Fix: gate billing report on final stream metadata.
    • Quick test: enforce cost assertion in end-of-stream hook.
  • Problem: User sees empty completion state after partial failure.
    • Why: missing error branch in stream loop.
    • Fix: explicit terminal error event with recovery actions.
    • Quick test: inject dropped connection and assert error UI.

Books That Will Help

Topic Book Chapter
Reactive UIs Reactive Design Patterns Streaming data and buffering
Concurrency Erlang and OTP in Action Message passing and process design

Definition of Done

  • Streaming sessions stream tokens and update UI at controlled cadence
  • Final telemetry is attached to each session record
  • Provider-level failures are visible and recoverable
  • Latency metrics and token-rate KPIs are available
  • Backpressure controls prevent client-side lag

References

  • https://hexdocs.pm/req_llm/1.5.1/overview.html