Project 2: Conversation Memory Manager

Build a policy-driven memory service that balances short-term context, summaries, and durable user facts without polluting memory.

Quick Reference

Attribute	Value
Difficulty	Level 2: Intermediate
Time Estimate	10-20 hours
Main Programming Language	Python (Alternatives: TypeScript, Go)
Alternative Programming Languages	TypeScript, Go
Coolness Level	Level 4
Business Potential	Level 3
Prerequisites	Project 1, basic DB design, API thinking
Key Topics	memory taxonomy, summarization, write gates, retrieval policy

1. Learning Objectives

By completing this project, you will:

Design memory types with explicit retention and retrieval rules.
Implement safe memory writes with provenance requirements.
Maintain context quality over long multi-turn conversations.
Detect summary drift and contradiction in stored user facts.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Memory Taxonomy and Write-Gate Policy

Fundamentals Conversation memory is not one store. It is a set of stores with different responsibilities: recent-turn buffer for immediate context, summary snapshots for compressed continuity, and durable profile memory for stable user preferences. Each store should have explicit rules for writes, reads, expiry, and conflict resolution. Without policy boundaries, memory quality decays and user trust drops.

Deep Dive into the concept A frequent mistake in LLM applications is to store everything from every turn, then retrieve broadly. At first this feels “smart” because the assistant seems informed, but over time it leads to noise, contradiction, and latency. Good memory systems are selective. They separate what should be transient (small talk), what should be compressed (older task details), and what should be durable (confirmed preferences or identity facts).

The memory taxonomy usually starts with three classes:

episodic memory: recent events and task context;
semantic memory: stable knowledge snippets grounded in documents;
profile memory: persistent user attributes and preferences.

Each class needs a write gate. For example, profile memory should not be written from speculative model inferences. Durable writes should require explicit user confirmation or high-confidence extraction with clear provenance. Episodic memory can be written automatically but expires quickly. Semantic memory often originates from ingest pipelines, not live chat.

Summarization is a compression tool, not a truth source. If you repeatedly summarize summaries, detail loss accumulates. To mitigate this, keep summary snapshots versioned and linked to source turn ranges. Use periodic refreshes from original turns when feasible. Also keep extraction logs so you can inspect where a persistent fact came from.

Conflict resolution is another core policy. Users change preferences over time, and memory must reconcile contradictions. Last-write-wins may be simple but can preserve accidental inputs. A better approach combines recency, confidence, and confirmation. For high-impact preferences, ask a clarification question before overwriting durable memory.

Retrieval should be context-aware. Not every turn requires pulling durable profile facts. Over-retrieval bloats context and can produce repetitive responses. A retrieval gate classifies the query type and fetches only relevant memory types. For example, a technical troubleshooting query may need semantic memory from docs plus recent episodic context, but not food preferences.

Security and privacy policies must be first-class. Durable profile facts often contain sensitive information. Add labels, purpose constraints, and deletion workflows. In multi-tenant environments, isolation guarantees are mandatory. Provenance metadata should include source type, timestamp, and consent flags.

Finally, measure memory quality continuously. Useful metrics include contradiction rate in profile store, summary drift score, retrieval precision by memory type, and stale-memory hit rate. When metrics degrade, adjust write gate thresholds and summarization cadence before changing models.

How this fits on projects

Primary concept in this project.
Reused in Project 4 and Project 6.

Definitions & key terms

Episodic memory: short-lived event memory.
Profile memory: durable user preference/identity facts.
Write gate: policy deciding whether memory is persisted.
Summary drift: meaning loss caused by repeated compression.

Mental model diagram (ASCII)

incoming turn
   |
   v
[classifier] -> episodic store (TTL)
      |       -> summary candidate
      |       -> profile candidate (write gate)
      v
[retrieval gate] -> assemble context for next response

How it works (step-by-step)

Classify each new turn into memory candidates.
Apply write-gate policy per memory type.
Update short-term buffer and optional summary snapshots.
Resolve profile conflicts using recency + confidence + confirmation.
At response time, retrieve only required memory types.

Invariants:

Durable writes always include provenance.
Sensitive profile data requires stricter write rules.
Memory retrieval respects token budget and relevance.

Failure modes:

Profile pollution from unconfirmed inference.
Summary drift from repeated lossy compression.
Over-retrieval causing context bloat.

Minimal concrete example

turn: "I prefer concise answers."
classifier => profile_candidate
write_gate => allowed (explicit preference statement)
store => PRF-0008 with source_turn_id and timestamp

Common misconceptions

“If memory exists, always include it.”
“Summaries are equivalent to raw history.”
“Last-write-wins is enough for profile conflicts.”

Check-your-understanding questions

Why should profile memory use stricter write gates?
How can summary drift be detected?
When should retrieval skip profile memory?

Check-your-understanding answers

Durable profile errors persist and harm trust.
Compare summary claims to original turn fixtures.
When query intent has no user-preference dependency.

Real-world applications

Customer support copilots with persistent customer context.
Personal assistants that adapt style while avoiding unsafe memory persistence.

Where you’ll apply it

This project directly.
Also used in: Project 4, Project 6.

References

MemGPT: https://arxiv.org/abs/2310.08560
RAG paper: https://arxiv.org/abs/2005.11401

Key insights Memory quality depends more on write policy than storage volume.

Summary You are designing controlled persistence for a stateless model, with explicit trade-offs.

Homework/Exercises to practice the concept

Write a policy table for episodic, summary, and profile stores.
Create a contradiction scenario and define conflict resolution behavior.

Solutions to the homework/exercises

Include write conditions, TTL, retrieval triggers, and privacy label.
Require confirmation before overriding high-confidence durable facts.

3. Project Specification

3.1 What You Will Build

A memory manager service with:

per-type memory stores,
policy-controlled persistence,
retrieval gate for context assembly,
trace logs for memory decisions.

3.2 Functional Requirements

Support episodic, summary, and profile memory types.
Persist memory with provenance metadata.
Apply write gate thresholds and rules.
Assemble context under token budget constraints.
Provide memory inspection and replay CLI commands.

3.3 Non-Functional Requirements

Performance: Retrieval and assembly under 120ms p95 for standard sessions.
Reliability: Deterministic replay for fixed transcript fixtures.
Security: Tenant-safe memory access boundaries.

3.4 Example Usage / Output

$ llm-memory session replay --fixture fixtures/session_alpha.json
[TURN 12] profile_write=ALLOW key=response_style value=concise
[TURN 18] summary_created=SUM-0003 turns=1..18
[TURN 19] context_assembled tokens=2960 stores=[episodic,summary,profile]

3.5 Data Formats / Schemas / Protocols

memory_record:
- id
- memory_type
- content
- confidence
- source_turn_id
- created_at
- expires_at (optional)
- tenant_id

3.6 Edge Cases

Contradictory profile updates.
Missing provenance metadata.
Session with no valid summary candidates.
Retrieval request exceeding token budget.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

$ llm-memory session ingest --fixture fixtures/demo_conversation.json
$ llm-memory session inspect --session demo-001
$ llm-memory session assemble-context --session demo-001 --budget 3200

3.7.2 Golden Path Demo (Deterministic)

$ llm-memory session inspect --session golden
[PROFILE] PRF-0002 response_style=concise
[SUMMARY] SUM-0001 turns=1..14
[EPISODIC] 6 active entries
exit_code=0

3.7.3 Failure Demo (Deterministic)

$ llm-memory session ingest --fixture fixtures/missing_provenance.json
[ERROR] record rejected: required field source_turn_id missing
exit_code=3

4. Solution Architecture

4.1 High-Level Design

conversation stream -> classifier -> write gate -> typed stores -> retrieval gate -> context assembler

4.2 Key Components

Component	Responsibility	Key Decisions
Classifier	Identify memory candidates	Keep simple and deterministic
Write Gate	Approve/reject persistence	Confidence + consent rules
Store Layer	Persist typed records	Separate TTL and indexing by type
Retrieval Gate	Choose what to fetch	Query-intent-aware

4.3 Data Structures (No Full Code)

ProfileFact{key,value,confidence,source,updated_at}
SummarySnapshot{id,turn_range,text,version}

4.4 Algorithm Overview

Parse incoming turn.
Extract memory candidates.
Apply write gate.
Persist accepted memory records.
Build next-turn context using retrieval gate.

Complexity:

Time: O(n) per turn for candidate processing.
Space: O(m) memory records per session lifecycle.

5. Implementation Guide

5.1 Development Environment Setup

# initialize db, load fixture transcripts, run deterministic replay

5.2 Project Structure

p02-conversation-memory-manager/
  src/
    classifier
    write_gate
    stores
    retrieval_gate
    cli
  fixtures/
  tests/

5.3 The Core Question You’re Answering

“How can I preserve useful continuity without persisting low-quality or unsafe memory?”

5.4 Concepts You Must Understand First

Memory taxonomy.
Write gate policy.
Summary drift and conflict resolution.

5.5 Questions to Guide Your Design

Which facts deserve durable storage?
What proof is required before profile overwrite?

5.6 Thinking Exercise

Trace one preference from mention to storage to retrieval across two separate sessions.

5.7 The Interview Questions They’ll Ask

How do you classify memory types?
What should trigger durable writes?
How do you avoid profile contradictions?
How do you test memory quality over long sessions?
How do you implement memory deletion?

5.8 Hints in Layers

Hint 1: Start with strict schemas.
Hint 2: Add write gate before persistence.
Hint 3: Version summaries to detect drift.
Hint 4: Add replay-based regression tests.

5.9 Books That Will Help

Topic	Book	Chapter
Policy boundaries	Clean Architecture	Boundary and policy chapters
System-level quality	Fundamentals of Software Architecture	Quality attributes

5.10 Implementation Phases

Phase 1: typed stores + schema validation.
Phase 2: write gate + conflict resolution.
Phase 3: retrieval gate + replay tests.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Profile overwrite	last-write / confirmation	confirmation for sensitive keys	trust and safety
Summary cadence	fixed every N turns / adaptive	adaptive on token pressure	quality + efficiency

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	policy correctness	write gate thresholds
Integration	full session lifecycle	ingest -> retrieve -> assemble
Edge Case	safety and conflicts	contradictory preferences

6.2 Critical Test Cases

Explicit preference statement must persist.
Low-confidence inferred preference must not persist.
Contradictory profile update triggers policy-defined resolution.

6.3 Test Data

Use fixed transcript fixtures with expected memory snapshots per turn.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Ungated writes	memory pollution	strict write gate
Summary recursion	detail loss	periodic refresh from raw turns
Retrieval overreach	repetitive responses	query-aware retrieval gate

7.2 Debugging Strategies

Replay transcript and diff memory snapshot by turn.
Compare raw-turn facts to summary facts for drift detection.

7.3 Performance Traps

Global scans of all memory records per turn without indexing by session/type.

8. Extensions & Challenges

8.1 Beginner Extensions

Add memory export command.
Add explainable write decision logs.

8.2 Intermediate Extensions

Add contradiction detector with confidence weighting.
Add profile key-level TTL policies.

8.3 Advanced Extensions

Add privacy labels and purpose-based retrieval rules.
Add online memory quality monitoring dashboard.

9. Real-World Connections

9.1 Industry Applications

AI support agents.
Personalized productivity assistants.

LangGraph and memory-oriented agent frameworks.

9.3 Interview Relevance

This project demonstrates state design, policy controls, and reliability discipline for LLM systems.

10. Resources

10.1 Essential Reading

MemGPT paper.
RAG foundational paper.

10.2 Video Resources

Architecture talks on conversational agents and memory safety.

10.3 Tools & Documentation

SQLite or Postgres docs for schema and indexing.

Previous: Project 1
Next: Project 3

11. Self-Assessment Checklist

11.1 Understanding

I can explain memory taxonomy and write gates.
I can explain why summary drift happens.

11.2 Implementation

Durable writes require provenance.
Context assembly remains within budget.

11.3 Growth

I can defend policy choices with concrete trade-offs.

12. Submission / Completion Criteria

Minimum Viable Completion:

typed stores + write gate + context assembly

Full Completion:

conflict handling + summary drift checks + replay tests

Excellence (Going Above & Beyond):

privacy-aware retrieval and quality telemetry dashboard