Project 15: Long-Context Memory Compression Engine
Build a memory subsystem that compresses long interactions into retrievable, verifiable capsules.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 14-26 hours |
| Language | Python (alt: TypeScript, Rust) |
| Prerequisites | Projects 4, 5, 9 |
| Key Topics | context engineering, memory quality metrics, provenance |
Learning Objectives
- Design memory tiers (working/episodic/semantic) for long sessions.
- Compress context while preserving critical constraints.
- Evaluate compression quality with retrieval benchmarks.
- Enforce provenance for compressed claims.
The Core Question You’re Answering
“How do you scale context over months of interaction without losing critical facts?”
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Memory hierarchy | Controls retention and retrieval behavior | Prior memory projects + agent papers |
| Compression metrics | Quantifies information loss | IR evaluation fundamentals |
| Workflow transforms | Structured multi-stage memory pipelines | LlamaIndex workflow example |
Theoretical Foundation
Raw Events -> Capsule Builder -> Semantic Distiller -> Retrieval Index -> Rehydration
Compression must be measured, not assumed.
Project Specification
What You’ll Build
A compression pipeline that:
- Converts long transcripts to episodic capsules
- Distills durable semantic facts
- Supports query-time rehydration
- Tracks citation pointers for each fact
Functional Requirements
- Capsule schema with timestamps and actors
- Semantic fact extraction with provenance
- Retrieval API with confidence scoring
- Compression quality evaluator
Non-Functional Requirements
- Bounded compression latency
- Deterministic benchmark runs
- No unsourced semantic claims
Real World Outcome
$ python p15_context_compressor.py --session support_90d.json
[input] 3.2M tokens
[compress] 42 capsules + 128 semantic facts
[reduction] tokens -78.4%
[eval] recall@5=0.86 faithfulness=0.93
[artifact] memory_index.json + quality_report.md
Architecture Overview
Ingestion -> Capsuleizer -> Fact Distiller -> Indexer -> Retrieval Gateway
Implementation Guide
Phase 1: Capsule Schema
- Build deterministic event-to-capsule transforms.
Phase 2: Semantic Distillation
- Extract durable facts with source pointers.
Phase 3: Benchmark Harness
- Evaluate retrieval and faithfulness on held-out queries.
Testing Strategy
- Retrieval recall tests
- Faithfulness checks against source snippets
- Regression tests for stale memory pruning
Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Over-compression | lost critical constraints | define non-compressible fields |
| Hallucinated summaries | unsourced facts | enforce citation-required outputs |
| Drift over time | outdated memory dominates | temporal decay and recency weighting |
Interview Questions They’ll Ask
- Why not just increase context window?
- How do you evaluate memory quality objectively?
- How do you enforce faithfulness in summaries?
- How should memory age out over time?
Hints in Layers
- Hint 1: Start extractive before abstractive summaries.
- Hint 2: Keep source pointers in every capsule.
- Hint 3: Evaluate with fixed benchmark queries.
- Hint 4: Add confidence-based retrieval gating.
Submission / Completion Criteria
Minimum Completion
- Token reduction with working retrieval API
Full Completion
- Provenance-aware compression + benchmark quality metrics
Excellence
- Adaptive rehydration policy with measurable quality lift