Project 2: Conversation Episode Store

Build a storage system that captures conversations as “episodes” with embeddings, timestamps, and metadata—the foundation for episodic memory in AI agents.

Quick Reference

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate 1 week (15-20 hours)
Language Python (Alternatives: TypeScript, Go)
Prerequisites Project 1 (graph basics), vector embeddings understanding, async Python
Key Topics Episodic memory, vector embeddings, chunking strategies, time-series data, hybrid storage

1. Learning Objectives

By completing this project, you will:

  1. Understand the difference between episodic and semantic memory in AI systems.
  2. Design a storage schema that captures both raw conversations and their embeddings.
  3. Implement efficient chunking strategies for conversation data.
  4. Build a retrieval system that combines recency and semantic similarity.
  5. Create the foundation for temporal queries (“what did we discuss last week?”).

2. Theoretical Foundation

2.1 Core Concepts

  • Episodic Memory: Stores specific events/experiences with temporal context. In AI, this means raw conversation turns with timestamps, not just extracted facts.

  • Vector Embeddings: Dense numerical representations of text that capture semantic meaning. Similar texts have similar vectors (close in embedding space).

  • Chunking Strategy: How you split conversations affects retrieval quality. Options: by turn, by time window, by topic shift, by token count.

  • Hybrid Storage: Combining graph (relationships), vector (semantics), and traditional (metadata) storage for complete memory.

2.2 Why This Matters

Raw conversations contain nuance that extracted facts lose:

  • Tone and sentiment (“user was frustrated”)
  • Context (“this was discussed after the meeting”)
  • Uncertainty (“user mentioned they might change their mind”)

Episodic memory preserves this richness for later retrieval and synthesis.

2.3 Common Misconceptions

  • “Just store everything in vectors.” Vectors alone lose structure, time, and exact wording.
  • “Chunking doesn’t matter.” Poor chunking fragments context and hurts retrieval.
  • “Recency is enough.” Users often ask about semantically relevant old conversations.

2.4 ASCII Diagram: Episode Structure

CONVERSATION SESSION
====================

┌────────────────────────────────────────────────────────┐
│ Session: sess_001                                      │
│ Started: 2024-12-15T10:00:00Z                         │
│ User: user_123                                         │
└────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│ Episode 1       │   │ Episode 2       │   │ Episode 3       │
│ Turns: 1-5      │──▶│ Turns: 6-12     │──▶│ Turns: 13-18    │
│ Time: 10:00-02  │   │ Time: 10:03-08  │   │ Time: 10:09-15  │
│ Topic: greeting │   │ Topic: API help │   │ Topic: debugging│
│                 │   │                 │   │                 │
│ ┌─────────────┐ │   │ ┌─────────────┐ │   │ ┌─────────────┐ │
│ │ Embedding   │ │   │ │ Embedding   │ │   │ │ Embedding   │ │
│ │ [0.12, ...] │ │   │ │ [0.45, ...] │ │   │ │ [0.78, ...] │ │
│ └─────────────┘ │   │ └─────────────┘ │   │ └─────────────┘ │
└─────────────────┘   └─────────────────┘   └─────────────────┘

3. Project Specification

3.1 What You Will Build

A Python library and CLI for storing and retrieving conversation episodes:

  • Store conversations as episodes with embeddings
  • Retrieve by recency, semantic similarity, or both
  • Support multiple users/sessions
  • Export episodes for analysis

3.2 Functional Requirements

  1. Store episode: store.add_episode(session_id, turns, metadata)
  2. Retrieve by recency: store.get_recent(user_id, limit=10)
  3. Retrieve by similarity: store.search(query, user_id, top_k=5)
  4. Hybrid retrieval: store.retrieve(query, user_id, recency_weight=0.3)
  5. List sessions: store.get_sessions(user_id)
  6. Export session: store.export(session_id, format='json')

3.3 Non-Functional Requirements

  • Latency: Search queries < 200ms for 10K episodes
  • Scalability: Handle 100K+ episodes per user
  • Isolation: Strict user data separation

3.4 Example Usage / Output

from episode_store import EpisodeStore

store = EpisodeStore()

# Store a conversation episode
episode = store.add_episode(
    session_id="sess_001",
    user_id="user_123",
    turns=[
        {"role": "user", "content": "How do I connect to Neo4j?"},
        {"role": "assistant", "content": "You can use the neo4j Python driver..."},
        {"role": "user", "content": "What about authentication?"},
        {"role": "assistant", "content": "Use the NEO4J_AUTH environment variable..."}
    ],
    metadata={"topic": "neo4j_setup", "sentiment": "neutral"}
)
print(f"Stored episode {episode.id} with {len(episode.turns)} turns")

# Retrieve by semantic similarity
results = store.search("database connection", user_id="user_123", top_k=3)
for r in results:
    print(f"[{r.score:.2f}] {r.episode.summary[:50]}...")

# Hybrid retrieval
results = store.retrieve(
    query="Neo4j authentication",
    user_id="user_123",
    recency_weight=0.3,
    top_k=5
)

CLI Example:

$ episode-store search "API rate limiting" --user user_123 --top-k 3

Found 3 relevant episodes:

1. [0.89] Session sess_045 (2024-12-10)
   Topic: API design discussion
   "...we talked about implementing rate limiting with Redis..."

2. [0.82] Session sess_032 (2024-11-28)
   Topic: Backend architecture
   "...rate limiting was mentioned as a future consideration..."

3. [0.76] Session sess_012 (2024-10-15)
   Topic: API security review
   "...discussed rate limiting as a security measure..."

4. Solution Architecture

4.1 High-Level Design

┌───────────────────┐
│  EpisodeStore API │
└─────────┬─────────┘
          │
    ┌─────┴─────┐
    │           │
    ▼           ▼
┌────────┐  ┌──────────┐
│ SQLite │  │ Vector   │
│ (meta) │  │ Store    │
└────────┘  │ (embed)  │
            └──────────┘

4.2 Key Components

Component Responsibility Technology
EpisodeStore Main API, orchestration Python class
MetadataStore Sessions, episodes, metadata SQLite
VectorStore Embeddings, similarity search ChromaDB/FAISS
Embedder Text → vector conversion sentence-transformers
Chunker Split conversations into episodes Custom logic

4.3 Data Model

-- Sessions table
CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    user_id TEXT NOT NULL,
    started_at TIMESTAMP,
    ended_at TIMESTAMP,
    metadata JSON
);

-- Episodes table
CREATE TABLE episodes (
    id TEXT PRIMARY KEY,
    session_id TEXT REFERENCES sessions(id),
    sequence_num INTEGER,
    content TEXT,  -- JSON array of turns
    summary TEXT,
    created_at TIMESTAMP,
    token_count INTEGER,
    metadata JSON
);

-- Vector store (in ChromaDB)
-- Collection: episodes
-- Documents: episode content
-- Embeddings: episode vectors
-- Metadata: episode_id, session_id, user_id, timestamp

4.4 Chunking Algorithm

Strategy: Sliding Window with Topic Detection

  1. Start with first N turns
  2. Compute embedding similarity between consecutive turns
  3. If similarity drops below threshold, start new episode
  4. Ensure minimum/maximum episode sizes
  5. Overlap edges for context continuity

5. Implementation Guide

5.1 Development Environment Setup

# Create project
mkdir episode-store && cd episode-store
python -m venv .venv && source .venv/bin/activate

# Install dependencies
pip install chromadb sentence-transformers sqlalchemy pydantic click

# Verify embedding model
python -c "from sentence_transformers import SentenceTransformer; m = SentenceTransformer('all-MiniLM-L6-v2'); print(m.encode('test').shape)"

5.2 Project Structure

episode-store/
├── src/
│   ├── __init__.py
│   ├── store.py         # Main EpisodeStore class
│   ├── metadata.py      # SQLite metadata storage
│   ├── vectors.py       # ChromaDB vector operations
│   ├── embedder.py      # Embedding generation
│   ├── chunker.py       # Conversation chunking
│   └── models.py        # Pydantic models
├── cli/
│   └── main.py          # Click CLI
├── tests/
└── README.md

5.3 Implementation Phases

Phase 1: Basic Storage (4-5h)

Goals:

  • Store episodes with metadata
  • Generate and store embeddings

Tasks:

  1. Set up SQLite schema for sessions/episodes
  2. Integrate sentence-transformers for embedding
  3. Set up ChromaDB for vector storage
  4. Implement add_episode method

Checkpoint: Can store and retrieve an episode.

Phase 2: Retrieval Methods (4-5h)

Goals:

  • Implement similarity search
  • Implement recency-based retrieval
  • Combine into hybrid retrieval

Tasks:

  1. Implement search with vector similarity
  2. Implement get_recent with timestamp ordering
  3. Implement hybrid RRF fusion
  4. Add user isolation to all queries

Checkpoint: All three retrieval methods working.

Phase 3: Chunking and Polish (4-5h)

Goals:

  • Intelligent episode chunking
  • CLI interface
  • Export functionality

Tasks:

  1. Implement topic-based chunking
  2. Build Click CLI
  3. Add JSON/CSV export
  4. Write documentation

Checkpoint: Full functionality with CLI.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Test individual components Chunker logic, embedding generation
Integration Test storage operations CRUD with both stores
Retrieval Test search quality Precision/recall on test queries

6.2 Critical Test Cases

  1. User isolation: User A cannot see User B’s episodes
  2. Embedding consistency: Same text produces same embedding
  3. Hybrid ranking: Recency weight affects final ordering
  4. Chunking boundaries: Topics are correctly separated

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Embedding model too large Slow startup, OOM Use smaller model (MiniLM-L6-v2)
ChromaDB persistence Data lost on restart Configure persist_directory
Chunk size too small Fragmented context Increase minimum chunk size
Missing user filter Cross-user data leaks Add user_id to all queries

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add episode summarization using an LLM
  • Add sentiment analysis per episode
  • Implement session timeline visualization

8.2 Intermediate Extensions

  • Add reranking with cross-encoder
  • Implement streaming episode ingestion
  • Add episode clustering for topic discovery

8.3 Advanced Extensions

  • Multi-modal episodes (text + images)
  • Distributed vector storage (Qdrant, Pinecone)
  • Real-time episode updates with CDC

9. Real-World Connections

9.1 Industry Applications

  • ChatGPT Memory: Stores conversation context across sessions
  • Zep: Episodic memory layer for LLM applications
  • Customer Support AI: Maintains conversation history per user

9.2 Interview Relevance

  • Explain the tradeoff between RAG and long context
  • Discuss embedding model selection criteria
  • Describe hybrid retrieval strategies

10. Resources

10.1 Essential Reading

  • “AI Engineering” by Chip Huyen — Ch. on RAG and Memory
  • Sentence Transformers Documentation — Embedding best practices
  • ChromaDB Documentation — Vector storage patterns
  • Previous: Project 1 (Personal Memory Graph)
  • Next: Project 3 (Entity Extraction Pipeline)

11. Self-Assessment Checklist

  • I can explain episodic vs. semantic memory
  • I understand how vector embeddings capture meaning
  • I can implement hybrid retrieval with weighted fusion
  • I know when to chunk by turns vs. by topic

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Store and retrieve episodes with embeddings
  • Semantic similarity search working
  • User isolation enforced

Full Completion:

  • Hybrid retrieval with configurable weights
  • Intelligent chunking strategy
  • CLI with search and export

Excellence:

  • Episode summarization
  • Topic clustering
  • Performance benchmarks