Project 17: Agent Observability with OpenTelemetry

Build complete trace and metric instrumentation for agent loops, tool calls, policy checks, and memory operations.


Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 10-18 hours
Language TypeScript (alt: Python, Go)
Prerequisites Projects 6, 9
Key Topics tracing, semantic conventions, root-cause workflows

Learning Objectives

  1. Instrument agent execution with end-to-end trace IDs.
  2. Emit GenAI semantic attributes for model/tool spans.
  3. Build latency, token, and cost dashboards.
  4. Perform root-cause analysis from a single failed request.

The Core Question You’re Answering

“How do you make stochastic agent behavior operationally debuggable?”


Concepts You Must Understand First

Concept Why It Matters Where to Learn
Span trees Causality across components OpenTelemetry fundamentals
GenAI semantic attributes Standardized AI telemetry fields OTel GenAI conventions
Sampling strategy Controls observability cost SRE observability practices

Theoretical Foundation

Request -> Trace Root -> (LLM span, Tool span, Policy span, Memory span) -> Outcome

Without span-level causality, agent debugging becomes guesswork.


Project Specification

What You’ll Build

An observability layer that:

  • Instruments all critical boundaries
  • Emits OTLP traces + metrics
  • Redacts sensitive fields
  • Supports trace-driven incident analysis

Functional Requirements

  1. Correlation ID propagation
  2. Span emission for model/tool/policy/memory operations
  3. Token/cost metric extraction
  4. Dashboard and alert definitions

Non-Functional Requirements

  • Low instrumentation overhead
  • Privacy-safe logs and traces
  • Replay-friendly trace export

Real World Outcome

$ npm run p17:trace -- --goal "summarize incident retro"
[trace] id=trace_a9f2d
[spans] llm=7 tool=5 memory=3 policy=5
[latency] p50=1.2s p95=4.8s
[cost] est_usd=0.41
[dashboard] updated successfully

Architecture Overview

Agent Runtime -> Instrumentation SDK -> OTel Collector -> Storage -> Dashboards/Alerts

Implementation Guide

Phase 1: Trace Foundations

  • Request IDs, root spans, and propagation.

Phase 2: GenAI Attributes

  • Add model/tool metadata and token usage.

Phase 3: Ops Workflows

  • Build runbooks for trace-based incident triage.

Testing Strategy

  • Missing span detection tests
  • PII redaction tests
  • Synthetic incident replay tests

Common Pitfalls & Debugging

Pitfall Symptom Fix
Fragmented traces no end-to-end view enforce correlation ID propagation
Over-verbose labels telemetry cost spikes reduce high-cardinality attributes
Hidden sensitive fields compliance risk redact before export

Interview Questions They’ll Ask

  1. What should every agent trace include?
  2. How do you map business failures to technical spans?
  3. How do you balance observability depth vs cost?
  4. How do traces support evaluation and routing?

Hints in Layers

  • Hint 1: Instrument only critical path first.
  • Hint 2: Add policy and memory spans explicitly.
  • Hint 3: Track tokens and cost per model call.
  • Hint 4: Build one-click trace drilldown playbook.

Submission / Completion Criteria

Minimum Completion

  • End-to-end trace for one full agent run

Full Completion

  • Dashboard with latency/cost/error breakdown

Excellence

  • Alerting + incident triage workflow from trace IDs