Project 3: Streaming Copilot Operations Center
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | 3 |
| Time | 1.5-2 weeks |
| Main Stack | req_llm streaming + Phoenix LiveView-style event flow |
| Alternatives | Polling API, long polling buffers |
| Why Now | Real agents need responsiveness under uncertain latency |
What You Will Build
An operations center that streams token-by-token responses into a shared operations UI model while preserving backpressure, partial completion, and final cost attribution.
Real World Outcome
$ mix run -e "OpsCenterDemo.start_session(:support)"
[info] session_id=op-884f state=streaming provider=openai:gpt-4o-mini
[info] token_rate=18 t/s backpressure=low p99=1.8s
[info] stream_status=finalized duration_ms=14220
[info] usage input_tokens=120 output_tokens=900 total_cost=0.0071
[info] ui_broadcast=complete
The session page should display:
- incremental answer text
- model/provider badge
- rolling latency
- streaming progress bar
- final telemetry summary
The Core Question You Are Answering
“How do we make LLM streaming usable for operators, not just technically possible?”
Why This Project Matters
req_llm documents production-grade streaming with concurrent metadata collection and token stream support. This is where UX, reliability, and observability intersect.
Conceptual Diagram
+-------------+ tokens/chunks +--------------------+
| User Input |--------------------->| Streaming Gateway |
+-------------+ +---------+----------+
|
+----------+-----------+
| |
| req_llm StreamResponse|
| (tokens + usage async) |
+----------+-----------+
|
+---------v----------+
| BackpressureBuffer |
+---------+----------+
|
+-----------+-----------+
| |
+---------v---------+ +-------v------+
| PubSub / Bus | | Session Store|
| emit update msg | | snapshots |
+---------+---------+ +-------+------+
| |
+-----------+-----------+
|
+-------v--------+
| Ops UI Client |
| + Logs/Telemetry|
+----------------+
Design Notes
Token Stream Ingestion
- Consume stream chunks from
StreamResponse.tokens/1. - Emit updates at bounded intervals to avoid UI chattiness (e.g., 100-250ms flush windows).
Metadata Timing
- Use concurrent metadata collection to keep stream smooth.
- Finalize billing numbers only after stream ends, then merge into session record.
Fault Handling
- Short provider hiccups: buffer last known-good state and continue gracefully.
- Hard failures: send an explicit terminal error state and failure reason with recovery button.
Concepts You Must Understand First
- Stream lifecycle
- Token stream start, midstream errors, and final metadata.
- Backpressure mechanics
- Batching updates avoids both memory blow-up and UI jank.
- Observability by phase
- First-token latency, inter-token gap, finalization latency.
Questions to Guide Your Design
- UI contract
- Which token transitions should update clients immediately?
- Session semantics
- What exactly causes a session to become “complete”?
- Reliability
- Can we resume partially dropped sessions?
Thinking Exercise
Simulate a stream for 15 seconds in your head.
- At second 1 provider returns nothing.
- At second 4 first token lands.
- At second 8 metadata arrives late.
- At second 12 connection drops once then resumes.
Identify exactly what user-visible states are shown at each checkpoint.
Interview Questions They Will Ask
- “How do you avoid flooding clients with token-level events?”
- “How is usage metadata kept in sync with streamed content?”
- “What do you do when stream fails after partial output?”
- “How would you benchmark token smoothness across providers?”
- “Why does this matter for agentic copilots?”
Hints in Layers
Hint 1: Define session states
starting -> streaming -> finalizing -> complete|errored
Hint 2: Separate transport and presentation Store raw chunks in buffer, render derived state for UI.
Hint 3: Track correlation IDs One ID per stream should tie tokens, telemetry, retries, and support ticket updates.
Hint 4: Add synthetic load tests Run repeated high-latency and bursty sessions to validate jitter handling.
Common Pitfalls and Debugging
- Problem: UI appears frozen while tokens arrive.
- Why: synchronous rendering for every token.
- Fix: batch by tick and publish diffs.
- Quick test: 100-token prompt should keep UI responsive.
- Problem: Cost shown as zero.
- Why: usage queried before stream ends.
- Fix: gate billing report on final stream metadata.
- Quick test: enforce cost assertion in end-of-stream hook.
- Problem: User sees empty completion state after partial failure.
- Why: missing error branch in stream loop.
- Fix: explicit terminal error event with recovery actions.
- Quick test: inject dropped connection and assert error UI.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Reactive UIs | Reactive Design Patterns | Streaming data and buffering |
| Concurrency | Erlang and OTP in Action | Message passing and process design |
Definition of Done
- Streaming sessions stream tokens and update UI at controlled cadence
- Final telemetry is attached to each session record
- Provider-level failures are visible and recoverable
- Latency metrics and token-rate KPIs are available
- Backpressure controls prevent client-side lag
References
- https://hexdocs.pm/req_llm/1.5.1/overview.html