Project 2: Conversation Memory Manager
Build a policy-driven memory service that balances short-term context, summaries, and durable user facts without polluting memory.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | 10-20 hours |
| Main Programming Language | Python (Alternatives: TypeScript, Go) |
| Alternative Programming Languages | TypeScript, Go |
| Coolness Level | Level 4 |
| Business Potential | Level 3 |
| Prerequisites | Project 1, basic DB design, API thinking |
| Key Topics | memory taxonomy, summarization, write gates, retrieval policy |
1. Learning Objectives
By completing this project, you will:
- Design memory types with explicit retention and retrieval rules.
- Implement safe memory writes with provenance requirements.
- Maintain context quality over long multi-turn conversations.
- Detect summary drift and contradiction in stored user facts.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Memory Taxonomy and Write-Gate Policy
Fundamentals Conversation memory is not one store. It is a set of stores with different responsibilities: recent-turn buffer for immediate context, summary snapshots for compressed continuity, and durable profile memory for stable user preferences. Each store should have explicit rules for writes, reads, expiry, and conflict resolution. Without policy boundaries, memory quality decays and user trust drops.
Deep Dive into the concept A frequent mistake in LLM applications is to store everything from every turn, then retrieve broadly. At first this feels “smart” because the assistant seems informed, but over time it leads to noise, contradiction, and latency. Good memory systems are selective. They separate what should be transient (small talk), what should be compressed (older task details), and what should be durable (confirmed preferences or identity facts).
The memory taxonomy usually starts with three classes:
- episodic memory: recent events and task context;
- semantic memory: stable knowledge snippets grounded in documents;
- profile memory: persistent user attributes and preferences.
Each class needs a write gate. For example, profile memory should not be written from speculative model inferences. Durable writes should require explicit user confirmation or high-confidence extraction with clear provenance. Episodic memory can be written automatically but expires quickly. Semantic memory often originates from ingest pipelines, not live chat.
Summarization is a compression tool, not a truth source. If you repeatedly summarize summaries, detail loss accumulates. To mitigate this, keep summary snapshots versioned and linked to source turn ranges. Use periodic refreshes from original turns when feasible. Also keep extraction logs so you can inspect where a persistent fact came from.
Conflict resolution is another core policy. Users change preferences over time, and memory must reconcile contradictions. Last-write-wins may be simple but can preserve accidental inputs. A better approach combines recency, confidence, and confirmation. For high-impact preferences, ask a clarification question before overwriting durable memory.
Retrieval should be context-aware. Not every turn requires pulling durable profile facts. Over-retrieval bloats context and can produce repetitive responses. A retrieval gate classifies the query type and fetches only relevant memory types. For example, a technical troubleshooting query may need semantic memory from docs plus recent episodic context, but not food preferences.
Security and privacy policies must be first-class. Durable profile facts often contain sensitive information. Add labels, purpose constraints, and deletion workflows. In multi-tenant environments, isolation guarantees are mandatory. Provenance metadata should include source type, timestamp, and consent flags.
Finally, measure memory quality continuously. Useful metrics include contradiction rate in profile store, summary drift score, retrieval precision by memory type, and stale-memory hit rate. When metrics degrade, adjust write gate thresholds and summarization cadence before changing models.
How this fits on projects
Definitions & key terms
- Episodic memory: short-lived event memory.
- Profile memory: durable user preference/identity facts.
- Write gate: policy deciding whether memory is persisted.
- Summary drift: meaning loss caused by repeated compression.
Mental model diagram (ASCII)
incoming turn
|
v
[classifier] -> episodic store (TTL)
| -> summary candidate
| -> profile candidate (write gate)
v
[retrieval gate] -> assemble context for next response
How it works (step-by-step)
- Classify each new turn into memory candidates.
- Apply write-gate policy per memory type.
- Update short-term buffer and optional summary snapshots.
- Resolve profile conflicts using recency + confidence + confirmation.
- At response time, retrieve only required memory types.
Invariants:
- Durable writes always include provenance.
- Sensitive profile data requires stricter write rules.
- Memory retrieval respects token budget and relevance.
Failure modes:
- Profile pollution from unconfirmed inference.
- Summary drift from repeated lossy compression.
- Over-retrieval causing context bloat.
Minimal concrete example
turn: "I prefer concise answers."
classifier => profile_candidate
write_gate => allowed (explicit preference statement)
store => PRF-0008 with source_turn_id and timestamp
Common misconceptions
- “If memory exists, always include it.”
- “Summaries are equivalent to raw history.”
- “Last-write-wins is enough for profile conflicts.”
Check-your-understanding questions
- Why should profile memory use stricter write gates?
- How can summary drift be detected?
- When should retrieval skip profile memory?
Check-your-understanding answers
- Durable profile errors persist and harm trust.
- Compare summary claims to original turn fixtures.
- When query intent has no user-preference dependency.
Real-world applications
- Customer support copilots with persistent customer context.
- Personal assistants that adapt style while avoiding unsafe memory persistence.
Where you’ll apply it
References
- MemGPT: https://arxiv.org/abs/2310.08560
- RAG paper: https://arxiv.org/abs/2005.11401
Key insights Memory quality depends more on write policy than storage volume.
Summary You are designing controlled persistence for a stateless model, with explicit trade-offs.
Homework/Exercises to practice the concept
- Write a policy table for episodic, summary, and profile stores.
- Create a contradiction scenario and define conflict resolution behavior.
Solutions to the homework/exercises
- Include write conditions, TTL, retrieval triggers, and privacy label.
- Require confirmation before overriding high-confidence durable facts.
3. Project Specification
3.1 What You Will Build
A memory manager service with:
- per-type memory stores,
- policy-controlled persistence,
- retrieval gate for context assembly,
- trace logs for memory decisions.
3.2 Functional Requirements
- Support episodic, summary, and profile memory types.
- Persist memory with provenance metadata.
- Apply write gate thresholds and rules.
- Assemble context under token budget constraints.
- Provide memory inspection and replay CLI commands.
3.3 Non-Functional Requirements
- Performance: Retrieval and assembly under 120ms p95 for standard sessions.
- Reliability: Deterministic replay for fixed transcript fixtures.
- Security: Tenant-safe memory access boundaries.
3.4 Example Usage / Output
$ llm-memory session replay --fixture fixtures/session_alpha.json
[TURN 12] profile_write=ALLOW key=response_style value=concise
[TURN 18] summary_created=SUM-0003 turns=1..18
[TURN 19] context_assembled tokens=2960 stores=[episodic,summary,profile]
3.5 Data Formats / Schemas / Protocols
memory_record:
- id
- memory_type
- content
- confidence
- source_turn_id
- created_at
- expires_at (optional)
- tenant_id
3.6 Edge Cases
- Contradictory profile updates.
- Missing provenance metadata.
- Session with no valid summary candidates.
- Retrieval request exceeding token budget.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
$ llm-memory session ingest --fixture fixtures/demo_conversation.json
$ llm-memory session inspect --session demo-001
$ llm-memory session assemble-context --session demo-001 --budget 3200
3.7.2 Golden Path Demo (Deterministic)
$ llm-memory session inspect --session golden
[PROFILE] PRF-0002 response_style=concise
[SUMMARY] SUM-0001 turns=1..14
[EPISODIC] 6 active entries
exit_code=0
3.7.3 Failure Demo (Deterministic)
$ llm-memory session ingest --fixture fixtures/missing_provenance.json
[ERROR] record rejected: required field source_turn_id missing
exit_code=3
4. Solution Architecture
4.1 High-Level Design
conversation stream -> classifier -> write gate -> typed stores -> retrieval gate -> context assembler
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Classifier | Identify memory candidates | Keep simple and deterministic |
| Write Gate | Approve/reject persistence | Confidence + consent rules |
| Store Layer | Persist typed records | Separate TTL and indexing by type |
| Retrieval Gate | Choose what to fetch | Query-intent-aware |
4.3 Data Structures (No Full Code)
ProfileFact{key,value,confidence,source,updated_at}
SummarySnapshot{id,turn_range,text,version}
4.4 Algorithm Overview
- Parse incoming turn.
- Extract memory candidates.
- Apply write gate.
- Persist accepted memory records.
- Build next-turn context using retrieval gate.
Complexity:
- Time: O(n) per turn for candidate processing.
- Space: O(m) memory records per session lifecycle.
5. Implementation Guide
5.1 Development Environment Setup
# initialize db, load fixture transcripts, run deterministic replay
5.2 Project Structure
p02-conversation-memory-manager/
src/
classifier
write_gate
stores
retrieval_gate
cli
fixtures/
tests/
5.3 The Core Question You’re Answering
“How can I preserve useful continuity without persisting low-quality or unsafe memory?”
5.4 Concepts You Must Understand First
- Memory taxonomy.
- Write gate policy.
- Summary drift and conflict resolution.
5.5 Questions to Guide Your Design
- Which facts deserve durable storage?
- What proof is required before profile overwrite?
5.6 Thinking Exercise
Trace one preference from mention to storage to retrieval across two separate sessions.
5.7 The Interview Questions They’ll Ask
- How do you classify memory types?
- What should trigger durable writes?
- How do you avoid profile contradictions?
- How do you test memory quality over long sessions?
- How do you implement memory deletion?
5.8 Hints in Layers
- Hint 1: Start with strict schemas.
- Hint 2: Add write gate before persistence.
- Hint 3: Version summaries to detect drift.
- Hint 4: Add replay-based regression tests.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Policy boundaries | Clean Architecture | Boundary and policy chapters |
| System-level quality | Fundamentals of Software Architecture | Quality attributes |
5.10 Implementation Phases
- Phase 1: typed stores + schema validation.
- Phase 2: write gate + conflict resolution.
- Phase 3: retrieval gate + replay tests.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Profile overwrite | last-write / confirmation | confirmation for sensitive keys | trust and safety |
| Summary cadence | fixed every N turns / adaptive | adaptive on token pressure | quality + efficiency |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | policy correctness | write gate thresholds |
| Integration | full session lifecycle | ingest -> retrieve -> assemble |
| Edge Case | safety and conflicts | contradictory preferences |
6.2 Critical Test Cases
- Explicit preference statement must persist.
- Low-confidence inferred preference must not persist.
- Contradictory profile update triggers policy-defined resolution.
6.3 Test Data
Use fixed transcript fixtures with expected memory snapshots per turn.
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Ungated writes | memory pollution | strict write gate |
| Summary recursion | detail loss | periodic refresh from raw turns |
| Retrieval overreach | repetitive responses | query-aware retrieval gate |
7.2 Debugging Strategies
- Replay transcript and diff memory snapshot by turn.
- Compare raw-turn facts to summary facts for drift detection.
7.3 Performance Traps
Global scans of all memory records per turn without indexing by session/type.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add memory export command.
- Add explainable write decision logs.
8.2 Intermediate Extensions
- Add contradiction detector with confidence weighting.
- Add profile key-level TTL policies.
8.3 Advanced Extensions
- Add privacy labels and purpose-based retrieval rules.
- Add online memory quality monitoring dashboard.
9. Real-World Connections
9.1 Industry Applications
- AI support agents.
- Personalized productivity assistants.
9.2 Related Open Source Projects
- LangGraph and memory-oriented agent frameworks.
9.3 Interview Relevance
This project demonstrates state design, policy controls, and reliability discipline for LLM systems.
10. Resources
10.1 Essential Reading
- MemGPT paper.
- RAG foundational paper.
10.2 Video Resources
- Architecture talks on conversational agents and memory safety.
10.3 Tools & Documentation
- SQLite or Postgres docs for schema and indexing.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain memory taxonomy and write gates.
- I can explain why summary drift happens.
11.2 Implementation
- Durable writes require provenance.
- Context assembly remains within budget.
11.3 Growth
- I can defend policy choices with concrete trade-offs.
12. Submission / Completion Criteria
Minimum Viable Completion:
- typed stores + write gate + context assembly
Full Completion:
- conflict handling + summary drift checks + replay tests
Excellence (Going Above & Beyond):
- privacy-aware retrieval and quality telemetry dashboard