Project 15: Long-Context Memory Compression Engine

Build a memory subsystem that compresses long interactions into retrievable, verifiable capsules.


Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 14-26 hours
Language Python (alt: TypeScript, Rust)
Prerequisites Projects 4, 5, 9
Key Topics context engineering, memory quality metrics, provenance

Learning Objectives

  1. Design memory tiers (working/episodic/semantic) for long sessions.
  2. Compress context while preserving critical constraints.
  3. Evaluate compression quality with retrieval benchmarks.
  4. Enforce provenance for compressed claims.

The Core Question You’re Answering

“How do you scale context over months of interaction without losing critical facts?”


Concepts You Must Understand First

Concept Why It Matters Where to Learn
Memory hierarchy Controls retention and retrieval behavior Prior memory projects + agent papers
Compression metrics Quantifies information loss IR evaluation fundamentals
Workflow transforms Structured multi-stage memory pipelines LlamaIndex workflow example

Theoretical Foundation

Raw Events -> Capsule Builder -> Semantic Distiller -> Retrieval Index -> Rehydration

Compression must be measured, not assumed.


Project Specification

What You’ll Build

A compression pipeline that:

  • Converts long transcripts to episodic capsules
  • Distills durable semantic facts
  • Supports query-time rehydration
  • Tracks citation pointers for each fact

Functional Requirements

  1. Capsule schema with timestamps and actors
  2. Semantic fact extraction with provenance
  3. Retrieval API with confidence scoring
  4. Compression quality evaluator

Non-Functional Requirements

  • Bounded compression latency
  • Deterministic benchmark runs
  • No unsourced semantic claims

Real World Outcome

$ python p15_context_compressor.py --session support_90d.json
[input] 3.2M tokens
[compress] 42 capsules + 128 semantic facts
[reduction] tokens -78.4%
[eval] recall@5=0.86 faithfulness=0.93
[artifact] memory_index.json + quality_report.md

Architecture Overview

Ingestion -> Capsuleizer -> Fact Distiller -> Indexer -> Retrieval Gateway

Implementation Guide

Phase 1: Capsule Schema

  • Build deterministic event-to-capsule transforms.

Phase 2: Semantic Distillation

  • Extract durable facts with source pointers.

Phase 3: Benchmark Harness

  • Evaluate retrieval and faithfulness on held-out queries.

Testing Strategy

  • Retrieval recall tests
  • Faithfulness checks against source snippets
  • Regression tests for stale memory pruning

Common Pitfalls & Debugging

Pitfall Symptom Fix
Over-compression lost critical constraints define non-compressible fields
Hallucinated summaries unsourced facts enforce citation-required outputs
Drift over time outdated memory dominates temporal decay and recency weighting

Interview Questions They’ll Ask

  1. Why not just increase context window?
  2. How do you evaluate memory quality objectively?
  3. How do you enforce faithfulness in summaries?
  4. How should memory age out over time?

Hints in Layers

  • Hint 1: Start extractive before abstractive summaries.
  • Hint 2: Keep source pointers in every capsule.
  • Hint 3: Evaluate with fixed benchmark queries.
  • Hint 4: Add confidence-based retrieval gating.

Submission / Completion Criteria

Minimum Completion

  • Token reduction with working retrieval API

Full Completion

  • Provenance-aware compression + benchmark quality metrics

Excellence

  • Adaptive rehydration policy with measurable quality lift