Project 17: Agent Observability with OpenTelemetry

Build complete trace and metric instrumentation for agent loops, tool calls, policy checks, and memory operations.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced
Time Estimate	10-18 hours
Language	TypeScript (alt: Python, Go)
Prerequisites	Projects 6, 9
Key Topics	tracing, semantic conventions, root-cause workflows

Learning Objectives

Instrument agent execution with end-to-end trace IDs.
Emit GenAI semantic attributes for model/tool spans.
Build latency, token, and cost dashboards.
Perform root-cause analysis from a single failed request.

The Core Question You’re Answering

“How do you make stochastic agent behavior operationally debuggable?”

Concepts You Must Understand First

Concept	Why It Matters	Where to Learn
Span trees	Causality across components	OpenTelemetry fundamentals
GenAI semantic attributes	Standardized AI telemetry fields	OTel GenAI conventions
Sampling strategy	Controls observability cost	SRE observability practices

Theoretical Foundation

Request -> Trace Root -> (LLM span, Tool span, Policy span, Memory span) -> Outcome

Without span-level causality, agent debugging becomes guesswork.

Project Specification

What You’ll Build

An observability layer that:

Instruments all critical boundaries
Emits OTLP traces + metrics
Redacts sensitive fields
Supports trace-driven incident analysis

Functional Requirements

Correlation ID propagation
Span emission for model/tool/policy/memory operations
Token/cost metric extraction
Dashboard and alert definitions

Non-Functional Requirements

Low instrumentation overhead
Privacy-safe logs and traces
Replay-friendly trace export

Real World Outcome

$ npm run p17:trace -- --goal "summarize incident retro"
[trace] id=trace_a9f2d
[spans] llm=7 tool=5 memory=3 policy=5
[latency] p50=1.2s p95=4.8s
[cost] est_usd=0.41
[dashboard] updated successfully

Architecture Overview

Agent Runtime -> Instrumentation SDK -> OTel Collector -> Storage -> Dashboards/Alerts

Implementation Guide

Phase 1: Trace Foundations

Request IDs, root spans, and propagation.

Phase 2: GenAI Attributes

Add model/tool metadata and token usage.

Phase 3: Ops Workflows

Build runbooks for trace-based incident triage.

Testing Strategy

Missing span detection tests
PII redaction tests
Synthetic incident replay tests

Common Pitfalls & Debugging

Pitfall	Symptom	Fix
Fragmented traces	no end-to-end view	enforce correlation ID propagation
Over-verbose labels	telemetry cost spikes	reduce high-cardinality attributes
Hidden sensitive fields	compliance risk	redact before export

Interview Questions They’ll Ask

What should every agent trace include?
How do you map business failures to technical spans?
How do you balance observability depth vs cost?
How do traces support evaluation and routing?

Hints in Layers

Hint 1: Instrument only critical path first.
Hint 2: Add policy and memory spans explicitly.
Hint 3: Track tokens and cost per model call.
Hint 4: Build one-click trace drilldown playbook.

Submission / Completion Criteria

Minimum Completion

End-to-end trace for one full agent run

Full Completion

Dashboard with latency/cost/error breakdown

Excellence

Alerting + incident triage workflow from trace IDs