Sprint: Temporal Knowledge Graph AI Agent Memory Mastery - Real World Projects
Goal: Deeply understand how to build, query, and reason over temporal knowledge graphs for AI agent memory systems. You will learn why traditional RAG fails for dynamic, multi-session agent interactions, how episodic and semantic memory mirror human cognition, why bi-temporal data models enable point-in-time reasoning, and how frameworks like Zep/Graphiti, Mem0, and MemGPT implement these concepts. By the end, you will be able to architect production-ready memory systems that enable agents to remember, reason, and evolve over time.
Introduction
What is Temporal Knowledge Graph Memory?
A Temporal Knowledge Graph (TKG) is a graph-based data structure where nodes represent entities (people, concepts, events), edges represent relationships between them, and every edge carries explicit timestamps indicating when that relationship was valid. When applied to AI agent memory, TKGs become the “external brain” that allows agents to:
- Remember facts, preferences, and interactions across sessions
- Reason about how knowledge evolved over time (“What did the user prefer last month?”)
- Update beliefs when new information contradicts old facts
- Collaborate by sharing a persistent knowledge substrate with other agents
TRADITIONAL RAG vs TEMPORAL KG MEMORY
┌─────────────────────────────────────────────────────────────┐
│ TRADITIONAL RAG │
│ │
│ Documents ──► Chunk ──► Embed ──► Vector DB ──► Retrieve │
│ │
│ Problems: │
│ • No temporal awareness (when was this true?) │
│ • No entity relationships (who is connected to whom?) │
│ • No contradiction handling (old vs new facts) │
│ • Static after indexing (requires re-embedding) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ TEMPORAL KNOWLEDGE GRAPH │
│ │
│ ┌─────────┐ │
│ │ Alice │ │
│ │(Person) │ │
│ └────┬────┘ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────┐ ┌────────────────┐ │
│ │ WORKS_AT │ │ LIKES │ │ MANAGES │ │
│ │ Acme Corp │ │ Python │ │ Bob │ │
│ │ │ │ │ │ │ │
│ │ valid: │ │ valid: │ │ valid: │ │
│ │ 2023-01-01 │ │ 2020-* │ │ 2024-03-15 │ │
│ │ to: * │ │ to: * │ │ to: * │ │
│ │ │ │ │ │ │ │
│ │ ingested: │ │ ingested:│ │ ingested: │ │
│ │ 2024-06-01 │ │ 2024-01 │ │ 2024-03-16 │ │
│ └─────────────┘ └──────────┘ └────────────────┘ │
│ │
│ Capabilities: │
│ ✓ Bi-temporal: event time + ingestion time │
│ ✓ Entity relationships with semantic meaning │
│ ✓ Contradiction detection and resolution │
│ ✓ Real-time incremental updates │
│ ✓ Multi-hop reasoning ("Alice's manager's projects") │
└─────────────────────────────────────────────────────────────┘
What Problem Does It Solve Today?
AI agents increasingly need to operate over extended time horizons—days, weeks, or months of interaction with users. The LLM’s context window (even at 128K or 1M tokens) cannot hold everything. Traditional RAG retrieves semantically similar documents but fails at:
- Temporal queries: “What did we decide about the API design before the refactor?”
- Entity tracking: “Which projects is Sarah working on now vs. six months ago?”
- Contradiction handling: “The user said they prefer Python, but recently mentioned switching to Rust”
- Cross-session synthesis: “Summarize everything we discussed about authentication across our 12 sessions”
Temporal Knowledge Graphs solve these problems by making time and relationships first-class citizens in the memory architecture.
What Will You Build Across the Projects?
You will build 15 progressively complex projects:
| Phase | Projects | What You Learn |
|---|---|---|
| Foundation | 1-4 | Graph fundamentals, entity extraction, basic memory |
| Temporal | 5-8 | Bi-temporal models, time-aware queries, versioning |
| Frameworks | 9-12 | Zep/Graphiti, Mem0, MemGPT integration |
| Production | 13-15 | Hybrid retrieval, benchmarking, multi-agent memory |
What Is In Scope vs Out of Scope?
In Scope:
- Graph database fundamentals (Neo4j, FalkorDB)
- Temporal data modeling (bi-temporal, event sourcing)
- Entity and relationship extraction with LLMs
- Memory frameworks (Zep/Graphiti, Mem0, MemGPT, LangGraph)
- Hybrid retrieval (semantic + graph + keyword)
- Benchmarking (DMR, LongMemEval)
- Multi-agent memory sharing
Out of Scope:
- General LLM fine-tuning (covered elsewhere)
- Vector database internals (you should already understand embeddings)
- Distributed graph databases at massive scale (Dgraph, TigerGraph)
- Real-time streaming architectures (Kafka, Flink)
How to Use This Guide
Reading Order
-
Read the Theory Primer first (Chapters 1-6). This is your mini-book on temporal knowledge graphs. Each chapter builds on the previous.
-
Run the Quick Start within 48 hours. Get a working memory system up before diving deep.
- Pick a Learning Path based on your background:
- Path A (Systems Engineer): Projects 1, 2, 5, 9, 13
- Path B (ML/AI Engineer): Projects 3, 4, 7, 10, 14
- Path C (Full Stack): Projects 1, 3, 6, 11, 15
- Validate with Definition of Done. Every project has explicit success criteria. Don’t move on until you’ve met them.
How to Learn Effectively
- Build first, read second: Start each project by attempting it, then read hints when stuck
- Draw diagrams: Before coding, draw the graph schema on paper
- Test with edge cases: What happens with contradictions? Time boundaries? Missing entities?
- Benchmark everything: Measure latency, accuracy, and token usage
Prerequisites & Background Knowledge
Essential Prerequisites (Must Have)
Programming Skills:
- Intermediate Python (async/await, type hints, dataclasses)
- Basic SQL (SELECT, JOIN, WHERE with temporal predicates)
- Comfort with REST APIs and JSON
Recommended Reading: “Fluent Python” by Luciano Ramalho - Ch. 17-21 (Concurrency)
Domain Fundamentals:
- Understanding of embeddings and vector similarity (cosine, dot product)
- Basic knowledge of LLM APIs (OpenAI, Anthropic)
- Familiarity with graph concepts (nodes, edges, traversal)
Recommended Reading: “Graph Databases” by Ian Robinson et al. - Ch. 1-3
Tool Proficiency:
- Docker (for running Neo4j, databases)
- Git and basic CLI operations
- Jupyter notebooks or similar interactive environment
Helpful But Not Required
- Graph query languages (Cypher, Gremlin) - Learn during Projects 1-2
- LangChain/LlamaIndex basics - Learn during Projects 9-12
- Database administration - Learn as needed during Production projects
Self-Assessment Questions
Before starting, you should be able to answer these:
-
Embeddings: “If I have two text chunks with cosine similarity of 0.95, what does that tell me? What does it NOT tell me?”
-
Graphs: “What is the difference between a directed and undirected graph? When would you use each?”
-
Temporal Logic: “If event A happened at time T1 and event B at time T2 where T1 < T2, and I query at time T3 > T2, how many facts should be valid?”
-
LLM Context: “Why can’t we just put all conversation history in the context window? What are the failure modes?”
-
Consistency: “If a user says ‘I prefer Python’ in session 1 and ‘I’ve switched to Rust’ in session 5, what should the agent believe in session 6?”
If you struggle with questions 1-3, review embeddings and graph basics first. If you struggle with questions 4-5, you’re in the right place—this guide will teach you.
Development Environment Setup
Required Tools:
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.10+ | Primary language |
| Neo4j | 5.x | Graph database |
| Docker | 24.x+ | Container runtime |
| pip/poetry | Latest | Package management |
Required Python Packages:
neo4j>=5.0.0
graphiti-core>=0.5.0
mem0ai>=0.1.0
openai>=1.0.0
langchain>=0.1.0
langgraph>=0.2.0
numpy>=1.24.0
Recommended Tools:
| Tool | Purpose |
|---|---|
| Neo4j Browser | Visual graph exploration |
| Postman/httpie | API testing |
| Jupyter | Interactive experimentation |
Testing Your Setup:
# Start Neo4j
$ docker run -d --name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:5-community
# Verify connection
$ python -c "from neo4j import GraphDatabase; \
d = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password')); \
d.verify_connectivity(); print('Connected!')"
Connected!
# Test Graphiti installation
$ python -c "from graphiti_core import Graphiti; print('Graphiti ready!')"
Graphiti ready!
Time Investment
| Project Category | Time Per Project | Cumulative |
|---|---|---|
| Foundation (1-4) | 8-12 hours each | ~40 hours |
| Temporal (5-8) | 12-16 hours each | ~100 hours |
| Frameworks (9-12) | 10-14 hours each | ~150 hours |
| Production (13-15) | 16-24 hours each | ~210 hours |
Total Sprint: 3-5 months at 10-15 hours/week
Important Reality Check
Temporal Knowledge Graphs are not a silver bullet. They add complexity that is only justified when:
- Your agent operates across multiple sessions over days/weeks
- You need to answer temporal queries (“What changed?”, “When did X happen?”)
- Entities and their relationships matter (not just document retrieval)
- You need to handle contradictions and knowledge evolution
If you only need simple Q&A over static documents, traditional RAG is simpler and sufficient. This guide is for when RAG isn’t enough.
Big Picture / Mental Model
The Memory Stack
Think of agent memory as a layered stack, similar to computer memory hierarchy:
┌─────────────────────────────────────────────────────────────────┐
│ MEMORY HIERARCHY │
│ (Fastest → Slowest, Smallest → Largest) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ LLM CONTEXT WINDOW │ │
│ │ (Main Context / "RAM") │ │
│ │ │ │
│ │ • System prompt + instructions │ │
│ │ • Recent conversation turns (FIFO queue) │ │
│ │ • Working memory scratchpad │ │
│ │ • Retrieved facts from lower layers │ │
│ │ │ │
│ │ Size: 8K - 200K tokens │ │
│ │ Latency: 0ms (always in context) │ │
│ │ Persistence: None (per-request only) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Retrieve / Evict │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ SEMANTIC ENTITY SUBGRAPH │ │
│ │ (Core Memory / "L2 Cache") │ │
│ │ │ │
│ │ • Extracted entities (people, places, concepts) │ │
│ │ • Relationships with validity intervals │ │
│ │ • Compressed summaries of key facts │ │
│ │ • User persona and preferences │ │
│ │ │ │
│ │ Size: 10K - 100K facts │ │
│ │ Latency: 50-300ms (graph traversal + embedding search) │ │
│ │ Persistence: Graph database (Neo4j, FalkorDB) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Extract / Consolidate │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ EPISODIC MEMORY SUBGRAPH │ │
│ │ (Recall Memory / "Disk") │ │
│ │ │ │
│ │ • Raw conversation episodes (full text) │ │
│ │ • Timestamps and session metadata │ │
│ │ • Source attribution for traceability │ │
│ │ • Links to extracted entities │ │
│ │ │ │
│ │ Size: Unlimited (append-only log) │ │
│ │ Latency: 100-500ms (search + fetch) │ │
│ │ Persistence: Vector DB + Graph DB │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Summarize / Index │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ COMMUNITY SUBGRAPH │ │
│ │ (Archival Memory / "Cold Storage") │ │
│ │ │ │
│ │ • High-level domain summaries │ │
│ │ • Community clusters of related entities │ │
│ │ • Global statistics and patterns │ │
│ │ • Cross-session synthesis │ │
│ │ │ │
│ │ Size: Compressed summaries of entity subgraph │ │
│ │ Latency: 200-1000ms (LLM summarization if not cached) │ │
│ │ Persistence: Graph DB + Pre-computed summaries │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
The Data Flow
When a user message arrives, here’s how a temporal KG memory system processes it:
┌─────────────────────────────────────────────────────────────────┐
│ MEMORY WRITE PATH │
│ (User Message → Knowledge Graph) │
└─────────────────────────────────────────────────────────────────┘
User Message: "I just got promoted to VP at TechCorp"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 1. EPISODE CREATION │
│ • Create Episode node with timestamp, session_id │
│ • Store raw text as episode content │
│ • Generate embedding for semantic search │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 2. ENTITY EXTRACTION (LLM Call) │
│ • Identify entities: [User, VP, TechCorp] │
│ • Extract relationships: │
│ - User --[HAS_TITLE]--> VP │
│ - User --[WORKS_AT]--> TechCorp │
│ • Assign confidence scores │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 3. ENTITY RESOLUTION │
│ • "User" → resolve to existing User entity │
│ • "TechCorp" → match existing or create new │
│ • Handle aliases: "TechCorp" = "Tech Corp Inc" │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 4. CONFLICT DETECTION │
│ • Check existing: User --[HAS_TITLE]--> "Manager" │
│ • Detect conflict: old title vs new title │
│ • Resolution: Invalidate old edge, create new edge │
│ - Old: valid_from=2023-01, valid_to=2024-12 (NOW) │
│ - New: valid_from=2024-12, valid_to=NULL (current) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 5. GRAPH UPDATE │
│ • Create/update nodes with embeddings │
│ • Create edges with bi-temporal timestamps │
│ • Update community memberships if needed │
│ • Trigger any downstream summarization │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ MEMORY READ PATH │
│ (Agent Query → Retrieved Context) │
└─────────────────────────────────────────────────────────────────┘
Agent needs: "What is the user's current role and company?"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 1. QUERY UNDERSTANDING │
│ • Parse intent: entity lookup (User, role, company) │
│ • Identify temporal scope: "current" = valid_to IS NULL │
│ • Plan retrieval strategy │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 2. HYBRID RETRIEVAL (Parallel) │
│ ┌────────────────┬────────────────┬────────────────┐ │
│ │ Semantic Search│ Graph Traversal│ BM25 Keyword │ │
│ │ (embeddings) │ (Cypher query) │ (text match) │ │
│ │ │ │ │ │
│ │ Top-k similar │ MATCH (u:User) │ "role" "VP" │
│ │ episodes │ -[r:HAS_TITLE] │ "company" │
│ │ │ ->(t:Title) │ "TechCorp" │
│ │ │ WHERE r.valid │ │
│ └────────────────┴────────────────┴────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 3. RESULT FUSION & RERANKING │
│ • Combine results from all retrieval paths │
│ • Reciprocal Rank Fusion (RRF) or MMR │
│ • Episode-mentions reranking (graph-aware) │
│ • Filter by temporal validity │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 4. CONTEXT ASSEMBLY │
│ • Format retrieved facts for LLM context │
│ • Include source attribution │
│ • Respect token budget │
│ • Return: "User is VP at TechCorp (as of Dec 2024)" │
└─────────────────────────────────────────────────────────────┘
How Frameworks Fit Together
┌─────────────────────────────────────────────────────────────────┐
│ FRAMEWORK LANDSCAPE │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ │
│ AGENT FRAMEWORKS │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ LangChain │ │ LlamaIndex │ │ CrewAI │ │
│ │ /LangGraph │ │ │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └─────────────────┼─────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ MEMORY LAYER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Zep Cloud │ │ Mem0 │ │ MemGPT │ │ │
│ │ │ (Graphiti) │ │ (Mem0^g) │ │ (Letta) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Temporal KG │ │ Flat + Graph│ │ Virtual Ctx │ │ │
│ │ │ Bi-temporal │ │ Decay Model │ │ Self-Edit │ │ │
│ │ │ 3-tier hier │ │ Token-optim │ │ OS-inspired │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ │ │ │ │ │ │
│ └─────────┼────────────────┼────────────────┼────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ STORAGE LAYER │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ Neo4j │ │ FalkorDB │ │ Neptune │ ← Graph DBs │ │
│ │ └───────────┘ └───────────┘ └───────────┘ │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ Chroma │ │ Pinecone │ │ pgvector │ ← Vector DBs │ │
│ │ └───────────┘ └───────────┘ └───────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
This big picture should anchor your understanding as you work through the projects. Each project will zoom into a specific part of this architecture.
Theory Primer
This mini-book covers the foundational concepts you need before building temporal knowledge graph memory systems. Read these chapters in order—each builds on the previous.
Chapter 1: Knowledge Graph Foundations
Fundamentals
A knowledge graph is a structured representation of information where nodes (also called vertices) represent entities and edges (also called relationships or predicates) represent how those entities relate to each other. Unlike relational databases that store data in rigid tables, knowledge graphs model the world as a flexible network of interconnected concepts.
The fundamental unit of a knowledge graph is the triple: (subject, predicate, object). For example:
(Alice, works_at, TechCorp)(TechCorp, located_in, San Francisco)(Alice, manages, Bob)
These triples can be chained together to answer complex queries through graph traversal. To find “Where does Alice’s manager work?”, you traverse: Alice <-[manages]- ? -[works_at]-> ?.
Deep Dive
Knowledge graphs emerged from the Semantic Web vision of the early 2000s, where Tim Berners-Lee proposed a web of machine-readable data. The Resource Description Framework (RDF) became the standard format, representing all knowledge as subject-predicate-object triples. However, RDF’s complexity led to the rise of property graphs—a more pragmatic model where nodes and edges can have arbitrary key-value properties attached.
Property Graph Model: The dominant model in modern graph databases (Neo4j, FalkorDB, Neptune). Each node has:
- A unique identifier
- One or more labels (types):
Person,Company,Concept - Properties (key-value pairs):
{name: "Alice", age: 32}
Each edge has:
- A type (relationship name):
WORKS_AT,MANAGES,KNOWS - Properties:
{since: "2023-01-15", role: "Senior Engineer"} - Direction: from source node to target node
RDF Model: Used in academic and enterprise settings (SPARQL endpoints, Wikidata). Everything is a triple, including properties. More verbose but more standardized.
PROPERTY GRAPH vs RDF TRIPLE STORE
Property Graph (Neo4j): RDF Triples (SPARQL):
┌─────────────────────────┐
│ (n:Person) │ <Alice> rdf:type foaf:Person .
│ { │ <Alice> foaf:name "Alice" .
│ name: "Alice", │ <Alice> foaf:age 32 .
│ age: 32 │ <Alice> org:worksAt <TechCorp> .
│ } │ <TechCorp> rdf:type org:Organization .
│ │ │
│ [WORKS_AT] │
│ {since: "2023"} │
│ │ │
│ ▼ │
│ (c:Company) │
│ {name: "TechCorp"} │
└─────────────────────────┘
Graph Traversal Patterns:
-
Breadth-First Search (BFS): Explore all neighbors at distance 1, then distance 2, etc. Good for “shortest path” queries.
-
Depth-First Search (DFS): Follow one path to its end before backtracking. Good for detecting cycles or exploring hierarchies.
-
Pattern Matching: Declarative queries that describe the subgraph shape you want. Cypher (Neo4j) and Gremlin are the dominant query languages.
Cypher Query Example:
// Find all people who work at companies in San Francisco
// and manage someone who knows Python
MATCH (p:Person)-[:WORKS_AT]->(c:Company)-[:LOCATED_IN]->(city:City {name: "San Francisco"})
MATCH (p)-[:MANAGES]->(employee:Person)-[:KNOWS]->(skill:Skill {name: "Python"})
RETURN p.name, c.name, employee.name
How This Fits in Projects
Projects 1-2 focus on building basic knowledge graphs from scratch. You’ll implement node/edge creation, basic traversal, and Cypher queries. This foundation is essential before adding temporal dimensions.
Definitions & Key Terms
| Term | Definition |
|---|---|
| Node/Vertex | An entity in the graph (person, place, concept) |
| Edge/Relationship | A connection between two nodes with a type and direction |
| Triple | The atomic unit: (subject, predicate, object) |
| Property | Key-value metadata on nodes or edges |
| Label | A type classification for nodes (e.g., Person, Company) |
| Traversal | Walking through the graph following edges |
| Cypher | Neo4j’s declarative graph query language |
| Subgraph | A portion of a larger graph matching some criteria |
Mental Model Diagram
KNOWLEDGE GRAPH STRUCTURE
┌──────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────┐ │
│ │ Person │ │
│ │ "Alice" │ │
│ │ age: 32 │ │
│ └──────┬──────┘ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ WORKS_AT │ │ MANAGES │ │ KNOWS │ │
│ │since: 2023 │ │ │ │level: expert│ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Company │ │ Person │ │ Skill │ │
│ │"TechCorp"│ │ "Bob" │ │ "Python" │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Query: "Who does Alice manage?" │
│ Traversal: Alice -[MANAGES]-> Bob │
│ Result: Bob │
│ │
│ Query: "What skills do Alice's reports have?" │
│ Traversal: Alice -[MANAGES]-> ? -[KNOWS]-> ? │
│ Result: (Bob, Python) │
│ │
└──────────────────────────────────────────────────────────┘
How It Works (Step-by-Step)
-
Schema Design: Define your node labels and relationship types. What entities exist? How do they connect?
-
Data Ingestion: Create nodes for each entity, then create edges to connect them. Assign properties.
-
Index Creation: Create indexes on frequently-queried properties (e.g.,
Person.name) for fast lookups. -
Query Execution: Write Cypher/Gremlin queries. The database plans the optimal traversal path.
-
Result Assembly: Collect matching subgraphs, extract requested properties, return to caller.
Failure Modes:
- Cartesian explosions: Queries without proper constraints match everything × everything
- Missing indexes: Full graph scans on large graphs are slow
- Unbounded traversals: Queries like “all paths” can be infinite in cyclic graphs
Minimal Concrete Example
// Create nodes
CREATE (alice:Person {name: "Alice", age: 32})
CREATE (bob:Person {name: "Bob", age: 28})
CREATE (techcorp:Company {name: "TechCorp"})
CREATE (python:Skill {name: "Python"})
// Create relationships
CREATE (alice)-[:WORKS_AT {since: "2023-01-15"}]->(techcorp)
CREATE (alice)-[:MANAGES]->(bob)
CREATE (bob)-[:KNOWS {level: "expert"}]->(python)
// Query: Find skills of people Alice manages
MATCH (alice:Person {name: "Alice"})-[:MANAGES]->(report)-[:KNOWS]->(skill)
RETURN report.name, skill.name
// Result:
// | report.name | skill.name |
// |-------------|------------|
// | "Bob" | "Python" |
Common Misconceptions
-
“Graphs are slower than SQL”: False for connected data. Joins in SQL are O(n²); graph traversals are O(edges).
-
“You need a graph database for graphs”: False. You can model graphs in PostgreSQL with recursive CTEs. But specialized graph DBs are much faster.
-
“All data should be a graph”: False. Tabular, time-series, and document data often don’t benefit from graph modeling.
Check-Your-Understanding Questions
- What is the difference between a node label and a node property?
- If you have 1000 Person nodes and 1000 Company nodes, and you write
MATCH (p:Person), (c:Company) RETURN p, c, how many results do you get? - Why would you create an index on
Person.name? - In a property graph, can an edge have properties? Can it have multiple labels?
Check-Your-Understanding Answers
-
A label is a type classification (Person, Company) used for filtering and schema. A property is a key-value pair storing data (name: “Alice”). Labels are categorical; properties are data.
-
1,000,000 results (1000 × 1000 Cartesian product). This is a common mistake—always constrain your queries with WHERE clauses or relationship patterns.
-
Without an index, finding
Person {name: "Alice"}requires scanning all Person nodes (O(n)). With an index, it’s O(log n) or O(1). -
Yes, edges can have properties (e.g.,
since: "2023"). In most property graphs, edges have exactly one type/label (e.g., WORKS_AT), unlike nodes which can have multiple labels.
Real-World Applications
- Social networks: Facebook’s social graph, LinkedIn’s professional network
- Recommendation engines: “People who bought X also bought Y”
- Fraud detection: Detecting rings of connected accounts
- Knowledge bases: Google Knowledge Graph, Wikidata
- AI Agent Memory: This guide—storing entities and relationships from conversations
Where You’ll Apply It
- Project 1: Build a basic knowledge graph from conversation data
- Project 2: Implement Cypher queries for entity lookup
- Project 5: Add temporal dimensions to edges
- Project 9: Use Neo4j with Graphiti framework
References
- “Graph Databases” by Ian Robinson, Jim Webber, Emil Eifrem (O’Reilly)
- Neo4j Cypher Manual: https://neo4j.com/docs/cypher-manual/current/
- “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 2 (Data Models)
Key Insight
Knowledge graphs model the world as entities and relationships, enabling queries that “hop” through connections—something relational databases struggle with at scale.
Summary
Knowledge graphs represent information as nodes (entities) connected by edges (relationships). The property graph model lets you attach key-value properties to both nodes and edges. Graph databases like Neo4j optimize for traversal queries, making them ideal for connected data like social networks, recommendations, and—crucially for this guide—AI agent memory where you need to track entities mentioned across conversations.
Homework/Exercises
-
Exercise 1: Model a small family tree (5 people, relationships like PARENT_OF, MARRIED_TO) as a property graph. Draw it on paper.
-
Exercise 2: Write Cypher queries to find: (a) All children of “John”, (b) All grandchildren of “Mary”, (c) All married couples.
-
Exercise 3: Given a graph of Users and Posts with AUTHORED and LIKED relationships, write a query to find “posts liked by people who also liked posts I liked” (collaborative filtering).
Solutions to Homework/Exercises
- Solution to Exercise 1:
(John:Person)-[:MARRIED_TO]->(Mary:Person) (John)-[:PARENT_OF]->(Alice:Person) (John)-[:PARENT_OF]->(Bob:Person) (Mary)-[:PARENT_OF]->(Alice) (Mary)-[:PARENT_OF]->(Bob) (Alice)-[:PARENT_OF]->(Charlie:Person) - Solution to Exercise 2: ```cypher // (a) Children of John MATCH (john:Person {name: “John”})-[:PARENT_OF]->(child) RETURN child.name
// (b) Grandchildren of Mary MATCH (mary:Person {name: “Mary”})-[:PARENT_OF]->()-[:PARENT_OF]->(grandchild) RETURN grandchild.name
// (c) All married couples MATCH (a:Person)-[:MARRIED_TO]->(b:Person) WHERE id(a) < id(b) // Avoid duplicates (A-B and B-A) RETURN a.name, b.name
3. **Solution to Exercise 3**:
```cypher
// Collaborative filtering: posts liked by people who liked posts I liked
MATCH (me:User {name: "CurrentUser"})-[:LIKED]->(myPost:Post)<-[:LIKED]-(similar:User)
MATCH (similar)-[:LIKED]->(recommended:Post)
WHERE NOT (me)-[:LIKED]->(recommended)
RETURN DISTINCT recommended, COUNT(similar) AS score
ORDER BY score DESC
Chapter 2: Episodic vs Semantic Memory
Fundamentals
Human memory is not a single system but multiple specialized systems working together. Cognitive scientists distinguish between:
- Episodic Memory: Memories of specific events and experiences (“I had coffee with Sarah last Tuesday”)
- Semantic Memory: General knowledge and facts (“Coffee contains caffeine”)
AI agent memory systems mirror this architecture. Episodic memory stores raw conversation logs, timestamps, and session contexts. Semantic memory stores extracted facts, entities, and relationships distilled from those episodes.
Deep Dive
The episodic/semantic distinction comes from Endel Tulving’s work in the 1970s. He observed that patients with certain brain injuries could remember general facts but not personal experiences, and vice versa. This suggests separate neurological substrates for each memory type.
For AI agents, this translates to:
Episodic Memory Layer:
- Stores raw conversation turns with full context
- Indexed by time and session ID
- Enables replay and source attribution
- Grows unboundedly (append-only log)
- High fidelity but expensive to search
Semantic Memory Layer:
- Stores extracted entities and relationships
- Indexed by entity and relationship type
- Enables fast fact lookup
- Grows more slowly (deduplicated, consolidated)
- Lower fidelity but efficient to query
EPISODIC vs SEMANTIC MEMORY IN AI AGENTS
┌─────────────────────────────────────────────────────────────────┐
│ EPISODIC MEMORY │
│ (What Happened, When, Where) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Session 1 (2024-01-15 10:00) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ User: "Hi, I'm working on the authentication module" │ │
│ │ Agent: "Great! What aspect are you focusing on?" │ │
│ │ User: "I need to implement OAuth with Google" │ │
│ │ Agent: "I'll help you set up OAuth 2.0..." │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Session 2 (2024-01-17 14:30) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ User: "The OAuth is working but now I need refresh tokens"│ │
│ │ Agent: "Building on our previous work..." │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Properties: Full text, timestamp, session_id, user_id │
│ Query style: "Show me our conversation from Jan 15" │
│ │
└─────────────────────────────────────────────────────────────────┘
│
│ EXTRACTION
│ (LLM processes episodes)
▼
┌─────────────────────────────────────────────────────────────────┐
│ SEMANTIC MEMORY │
│ (Facts, Entities, Relations) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ User │ │ Project │ │ Technology │ │
│ │ (Entity) │ │ (Entity) │ │ (Entity) │ │
│ │ │ │"auth_module" │ │ "OAuth" │ │
│ └──────┬───────┘ └───────┬──────┘ └──────┬───────┘ │
│ │ │ │ │
│ │ WORKS_ON │ USES │ │
│ └──────────────────────┼─────────────────────┘ │
│ │ │
│ Extracted facts: │
│ • User WORKS_ON auth_module │
│ • auth_module USES OAuth │
│ • auth_module USES Google (provider) │
│ • User NEEDS refresh_tokens (status: in_progress) │
│ │
│ Properties: Entity type, relationship type, confidence │
│ Query style: "What technologies does the auth module use?" │
│ │
└─────────────────────────────────────────────────────────────────┘
The Extraction Process:
- Episode arrives: Raw conversation turn is stored in episodic memory
- Entity extraction: LLM identifies entities mentioned (people, projects, technologies)
- Relationship extraction: LLM identifies how entities relate (“User is working on Project”)
- Entity resolution: Match extracted entities to existing ones or create new nodes
- Semantic update: Add/update facts in the semantic layer with timestamps
Why Both Layers?
You might ask: why not just use semantic memory? Or just episodic?
Episodic-only problems:
- Slow to search (must scan all conversations)
- Redundant storage (same fact mentioned 100 times)
- No structured queries (“What does Alice work on?” requires NLP on raw text)
Semantic-only problems:
- Loss of context (extracted fact may miss nuance)
- No source attribution (“Where did we discuss this?”)
- Extraction errors accumulate without raw data to correct against
The hybrid approach gives you the best of both:
- Fast structured queries against semantic layer
- Source attribution by linking semantic facts to episodes
- Error correction by re-extracting from episodic layer
How This Fits in Projects
Project 3 builds the episodic memory layer (conversation storage with timestamps). Project 4 adds the semantic extraction pipeline. Later projects (9-12) show how frameworks like Graphiti and Mem0 implement this dual architecture.
Definitions & Key Terms
| Term | Definition |
|---|---|
| Episodic Memory | Storage of specific events/experiences with temporal context |
| Semantic Memory | Storage of facts, concepts, and their relationships |
| Episode | A single unit of episodic memory (e.g., one conversation turn) |
| Entity | A named thing extracted from episodes (person, place, concept) |
| Extraction | The process of deriving semantic facts from episodic data |
| Entity Resolution | Matching a new mention to an existing entity (deduplication) |
| Source Attribution | Linking a semantic fact back to the episode(s) it came from |
Mental Model Diagram
MEMORY SYSTEM INFORMATION FLOW
┌────────────────────────────────────────────────┐
│ USER CONVERSATION │
│ │
│ "I just finished the API refactor for the │
│ payments service. It now uses Stripe." │
└────────────────────────┬───────────────────────┘
│
▼
┌────────────────────────────────────────────────┐
│ EPISODIC MEMORY (Append) │
│ │
│ Episode #427 │
│ ├── text: "I just finished the API..." │
│ ├── timestamp: 2024-12-15T14:32:00Z │
│ ├── session_id: sess_abc123 │
│ ├── user_id: user_xyz │
│ └── embedding: [0.12, -0.34, 0.56, ...] │
└────────────────────────┬───────────────────────┘
│
│ LLM Extraction
▼
┌────────────────────────────────────────────────┐
│ EXTRACTION OUTPUT (JSON) │
│ │
│ { │
│ "entities": [ │
│ {"name": "User", "type": "Person"}, │
│ {"name": "payments_service", "type": │
│ "Project"}, │
│ {"name": "Stripe", "type": "Technology"}, │
│ {"name": "API_refactor", "type": "Task"} │
│ ], │
│ "relationships": [ │
│ {"subj": "User", "pred": "COMPLETED", │
│ "obj": "API_refactor"}, │
│ {"subj": "API_refactor", "pred": │
│ "AFFECTS", "obj": "payments_service"}, │
│ {"subj": "payments_service", "pred": │
│ "USES", "obj": "Stripe"} │
│ ] │
│ } │
└────────────────────────┬───────────────────────┘
│
│ Entity Resolution + Graph Update
▼
┌────────────────────────────────────────────────┐
│ SEMANTIC MEMORY (Graph) │
│ │
│ (User)──COMPLETED──>(API_refactor) │
│ │ │
│ AFFECTS │
│ │ │
│ ▼ │
│ (payments_service) │
│ │ │
│ USES │
│ │ │
│ ▼ │
│ (Stripe) │
│ │
│ + Link: API_refactor.source = Episode #427 │
└────────────────────────────────────────────────┘
How It Works (Step-by-Step)
- Receive message: User sends a conversation turn
- Create episode: Store raw text + metadata + embedding in episodic layer
- Trigger extraction: Send episode to LLM with entity/relationship extraction prompt
- Parse extraction: Structure LLM output into entities and relationships
- Resolve entities: For each entity, find existing match or create new node
- Update graph: Create relationship edges in semantic layer
- Link source: Add edge from semantic facts to source episode
- Index: Update semantic and vector indexes for retrieval
Invariants:
- Every semantic fact should trace back to at least one episode
- Episodic layer is append-only (never delete, only add)
- Entity names should be normalized (canonical form)
Failure Modes:
- Extraction hallucination: LLM invents entities not in the text
- Entity fragmentation: Same entity gets multiple nodes (“Alice”, “alice”, “A. Smith”)
- Relationship over-extraction: Creating edges for implied but unstated relationships
- Stale semantics: If extraction is async, semantic layer lags behind episodic
Minimal Concrete Example
# Pseudo-code for episodic → semantic extraction
class Episode:
id: str
text: str
timestamp: datetime
session_id: str
embedding: list[float]
class SemanticFact:
subject: str
predicate: str
object: str
source_episode_id: str
confidence: float
def process_message(user_message: str, session_id: str):
# 1. Create episode
episode = Episode(
id=generate_uuid(),
text=user_message,
timestamp=datetime.now(),
session_id=session_id,
embedding=embed(user_message)
)
episodic_store.append(episode)
# 2. Extract semantics via LLM
extraction_prompt = f"""
Extract entities and relationships from this text:
"{user_message}"
Return JSON with "entities" and "relationships" arrays.
"""
extraction = llm.complete(extraction_prompt)
# 3. Resolve entities and update graph
for entity in extraction["entities"]:
existing = graph.find_entity(entity["name"])
if not existing:
graph.create_node(entity["name"], entity["type"])
for rel in extraction["relationships"]:
graph.create_edge(
rel["subj"], rel["pred"], rel["obj"],
source_episode_id=episode.id
)
Common Misconceptions
-
“Just use embeddings for memory”: Embeddings capture semantic similarity but lose temporal and relational structure. You can find similar topics but not answer “what changed between session 1 and 5?”
-
“Extract everything”: Over-extraction creates noise. A mention of “coffee” in casual chat shouldn’t become a permanent entity. Focus on salient entities.
-
“Semantic memory replaces episodic”: Wrong. They complement each other. Semantic is for fast lookup; episodic is the source of truth.
Check-Your-Understanding Questions
- Why would you store both the raw conversation AND extracted entities?
- What happens if entity resolution fails and “Alice” becomes two separate nodes?
- How would you answer “What did we discuss about authentication last month?” using both memory layers?
- What’s the risk of only using semantic memory without episodic backup?
Check-Your-Understanding Answers
-
Source of truth + efficiency. Episodic memory is the complete record (for auditing, correction, re-extraction). Semantic memory is the efficient index (for structured queries). You need both.
-
Entity fragmentation. Facts about Alice are split across nodes, making queries incomplete. “What does Alice work on?” misses half the relationships. This is why entity resolution is critical.
-
Hybrid query: (a) Search semantic graph for entities related to “authentication”, (b) Find episodes that mention those entities, (c) Filter episodes by timestamp (last month), (d) Return relevant episodes. The semantic layer narrows the search; the episodic layer provides full context.
-
No recovery from extraction errors. If the LLM misinterprets “I hate JavaScript” as “User LIKES JavaScript”, you have no raw data to correct it. Episodic memory enables error correction and re-extraction.
Real-World Applications
- Customer support agents: Episodic = ticket history; Semantic = customer profile, product ownership
- Personal assistants: Episodic = calendar, messages; Semantic = contacts, preferences
- Research assistants: Episodic = papers read, notes taken; Semantic = concepts, authors, citations
Where You’ll Apply It
- Project 3: Build episodic memory with conversation storage
- Project 4: Add semantic extraction pipeline
- Project 10: Implement Mem0’s dual memory architecture
- Project 11: Integrate MemGPT’s self-editing semantic memory
References
- Tulving, E. (1972). “Episodic and Semantic Memory” - The foundational paper
- “Zep: A Temporal Knowledge Graph Architecture for Agent Memory” (2025) - arXiv:2501.13956
- “Mem0: Building Production-Ready AI Agents” (2025) - arXiv:2504.19413
Key Insight
Episodic memory stores what happened; semantic memory stores what it means. Effective agent memory requires both: episodic for fidelity and attribution, semantic for efficient structured queries.
Summary
AI agent memory mirrors human cognition by separating episodic memory (raw events with timestamps) from semantic memory (extracted facts and relationships). Episodes are append-only and high-fidelity. Semantic facts are extracted via LLM, deduplicated through entity resolution, and stored in a graph for efficient querying. The two layers work together: semantic for fast lookup, episodic for source attribution and error correction.
Homework/Exercises
- Exercise 1: Given this conversation snippet, list the entities and relationships you would extract:
User: "I'm meeting with Sarah tomorrow to discuss the Q4 budget." -
Exercise 2: Design an entity resolution strategy for handling: “Sarah”, “sarah”, “Sarah Johnson”, “S. Johnson”, “Sarah from Finance”
- Exercise 3: Write pseudocode for a function that answers “What did we discuss about X?” using both episodic and semantic memory.
Solutions to Homework/Exercises
- Solution to Exercise 1:
- Entities: User (Person), Sarah (Person), Q4_budget (Topic/Document), tomorrow (Time reference → resolve to actual date)
- Relationships: User MEETING_WITH Sarah, MEETING discusses Q4_budget, MEETING scheduled_for [resolved date]
- Solution to Exercise 2:
- Normalize case: “sarah” → “Sarah”
- Check for full name match: “Sarah Johnson” matches “Sarah” if context suggests same person
- Use embeddings: embed “Sarah from Finance” and compare to known entities
- Attribute matching: if only one Sarah in the company, merge
- Create alias edges: (Sarah) -[ALIAS]-> (“S. Johnson”) for future resolution
- Solution to Exercise 3:
def what_did_we_discuss(topic: str) -> list[Episode]: # 1. Find entities related to topic in semantic layer related_entities = graph.search_entities(topic, limit=10) # 2. Get episodes that mention these entities episode_ids = set() for entity in related_entities: episodes = graph.get_episodes_mentioning(entity) episode_ids.update(episodes) # 3. Retrieve full episodes from episodic layer episodes = episodic_store.get_by_ids(list(episode_ids)) # 4. Sort by relevance (could use embedding similarity to topic) episodes.sort(key=lambda e: similarity(e.embedding, embed(topic)), reverse=True) return episodes[:10] # Top 10 most relevant
Chapter 3: Bi-Temporal Data Models
Fundamentals
Standard databases track data as it is now. Bi-temporal databases track data across two independent time dimensions:
- Valid Time (t_valid): When the fact was true in the real world
- Transaction Time (t_transaction): When the fact was recorded in the system
This distinction is critical for AI agent memory because:
- Facts change over time (“Alice worked at Company A, then moved to Company B”)
- We learn about facts at different times (“We just found out Alice changed jobs last month”)
- We need to answer historical queries (“What did we believe on January 1st?”)
Deep Dive
Consider this scenario:
- On January 1, 2024, Alice starts working at TechCorp (valid time)
- On January 5, 2024, she tells the agent about her new job (transaction time)
- On June 1, 2024, Alice leaves TechCorp and joins StartupXYZ (valid time)
- On June 3, 2024, she mentions her new job (transaction time)
A single-temporal model (only tracking valid time) would show:
- Jan 1 - Jun 1: Alice works at TechCorp
- Jun 1 - present: Alice works at StartupXYZ
But we lose important information:
- What did the system believe on January 3? (Nothing—we hadn’t learned it yet)
- When did we first learn Alice was at TechCorp? (January 5)
A bi-temporal model tracks both:
BI-TEMPORAL RECORD: Alice's Employment
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Relationship: (Alice)-[WORKS_AT]->(TechCorp) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ valid_from: 2024-01-01 │ │
│ │ valid_to: 2024-06-01 │ │
│ │ transaction_from: 2024-01-05 │ │
│ │ transaction_to: ∞ (still the recorded truth) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Relationship: (Alice)-[WORKS_AT]->(StartupXYZ) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ valid_from: 2024-06-01 │ │
│ │ valid_to: ∞ (still employed) │ │
│ │ transaction_from: 2024-06-03 │ │
│ │ transaction_to: ∞ (still the recorded truth) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Query Examples:
Q: "Where does Alice work NOW?"
Filter: valid_to IS NULL AND transaction_to IS NULL
Answer: StartupXYZ
Q: "Where did Alice work on Feb 15, 2024?"
Filter: valid_from <= 2024-02-15 < valid_to
Answer: TechCorp
Q: "What did the system believe about Alice on Jan 3, 2024?"
Filter: transaction_from <= 2024-01-03 < transaction_to
Answer: Nothing (first record arrived Jan 5)
Q: "When did we learn Alice left TechCorp?"
Filter: Look at transaction_from for the invalidating record
Answer: 2024-06-03
Allen’s Interval Algebra:
Bi-temporal models use interval logic. James Allen defined 13 possible relationships between two time intervals:
ALLEN'S INTERVAL RELATIONS
A: ════════════
B: ════════════
Relation: A BEFORE B (A ends before B starts)
A: ════════════
B: ════════════
Relation: A MEETS B (A ends exactly when B starts)
A: ════════════════
B: ════════════
Relation: A OVERLAPS B (partial overlap)
A: ════════════════════════
B: ════════
Relation: A CONTAINS B (B is fully within A)
A: ════════════
B: ════════════
Relation: A EQUALS B (same interval)
... and inverses (AFTER, MET_BY, OVERLAPPED_BY, DURING, etc.)
For AI agent memory, the most common pattern is MEETS: when a fact becomes invalid, a new fact starts exactly at that point (Alice leaves TechCorp → Alice joins StartupXYZ).
Contradiction Handling:
Bi-temporal models excel at handling contradictions:
- User says: “I work at TechCorp” (Session 1)
- User later says: “Actually, I work at StartupXYZ” (Session 5)
Without bi-temporal: Delete TechCorp record, insert StartupXYZ. History lost.
With bi-temporal:
- Record 1: WORKS_AT TechCorp, valid_to = Session 5 timestamp
- Record 2: WORKS_AT StartupXYZ, valid_from = Session 5 timestamp
Both records remain. The system knows:
- What the user currently believes (StartupXYZ)
- What the user previously believed (TechCorp)
- When the change occurred (Session 5)
How This Fits in Projects
Projects 5-8 focus on temporal modeling. Project 5 implements basic bi-temporal edges. Project 6 adds temporal query support. Project 7 handles contradiction detection and resolution.
Definitions & Key Terms
| Term | Definition |
|---|---|
| Valid Time | When a fact was/is true in the real world |
| Transaction Time | When a fact was recorded in the database |
| Bi-Temporal | Tracking both valid and transaction time independently |
| Validity Interval | [valid_from, valid_to) range when fact is true |
| Current Record | A record where valid_to IS NULL (still true) |
| Point-in-Time Query | Query asking “what was true at time T?” |
| As-of Query | Query asking “what did we believe at time T?” |
Mental Model Diagram
BI-TEMPORAL TIME DIMENSIONS
Transaction Time (When we learned it)
───────────────────────────────────────►
│
Valid │ Jan 1 Jan 5 Jun 1 Jun 3
Time │ │ │ │ │
(When true) │ │ │ │ │
│ │ │ │ │ │
▼ │ │ │ │ │
│ │ ┌─────┴────────────────────────
2024-01-01 ─┼───────┼────┤ Alice @ TechCorp │
│ │ │ (valid: Jan1-Jun1) │
│ │ │ (recorded: Jan5+) │
2024-06-01 ─┼───────┼────┼────────────────┬───────────┘
│ │ │ │
│ │ │ ┌───────────┴─────────────
│ │ │ │ Alice @ StartupXYZ │
│ │ │ │ (valid: Jun1+) │
│ │ │ │ (recorded: Jun3+) │
NOW ─┼───────┼────┼────┼────────────────────────
│ │ │ │
│ │ │ │
┌─────────────────────────────────────────────────────────┐
│ The shaded regions show what we know and when. │
│ │
│ • Before Jan 5: We know nothing about Alice's job │
│ • Jan 5 - Jun 3: We know Alice @ TechCorp │
│ • After Jun 3: We know Alice @ StartupXYZ │
│ (and historical fact about TechCorp) │
└─────────────────────────────────────────────────────────┘
How It Works (Step-by-Step)
- Receive new fact: “Alice works at StartupXYZ”
- Check for conflicts: Query existing facts about Alice’s employment
- Invalidate old fact: Set
valid_to= now on TechCorp relationship - Create new fact: Insert StartupXYZ relationship with
valid_from= now - Preserve history: Both records remain, queryable by time
Invariants:
- Validity intervals should not overlap for same-type relationships (can’t work at two companies simultaneously—unless modeled as separate relationships)
- Transaction time always moves forward (records are never backdated)
valid_fromis always set;valid_tois NULL for current facts
Failure Modes:
- Clock skew: Different systems recording with different timestamps
- Retroactive changes: “Actually, I started on Dec 15, not Jan 1”—valid_from needs correction
- Merge conflicts: Two agents learning different facts simultaneously
Minimal Concrete Example
# Bi-temporal edge schema
class BiTemporalEdge:
source: str # Source node ID
target: str # Target node ID
relationship: str # Edge type (WORKS_AT, KNOWS, etc.)
valid_from: datetime # When fact became true
valid_to: datetime # When fact stopped being true (None = current)
txn_from: datetime # When we recorded this
txn_to: datetime # When we superseded this (None = current)
properties: dict # Additional metadata
# Creating an edge
def create_edge(source, rel, target, valid_time=None):
edge = BiTemporalEdge(
source=source,
target=target,
relationship=rel,
valid_from=valid_time or datetime.now(),
valid_to=None, # Current
txn_from=datetime.now(),
txn_to=None, # Current
)
graph.insert(edge)
return edge
# Invalidating an edge (when fact changes)
def invalidate_edge(edge, valid_end_time=None):
edge.valid_to = valid_end_time or datetime.now()
graph.update(edge)
# Point-in-time query
def query_at_time(entity, relationship, at_time):
return graph.query("""
MATCH (e {id: $entity})-[r:$relationship]->(target)
WHERE r.valid_from <= $at_time
AND (r.valid_to IS NULL OR r.valid_to > $at_time)
AND r.txn_to IS NULL -- Current knowledge
RETURN target
""", entity=entity, relationship=relationship, at_time=at_time)
Common Misconceptions
-
“Just use updated_at”: A single timestamp loses history. You can’t answer “what did we believe last month?” or “when did this fact change?”
-
“Delete old records”: Deleting loses provenance. In AI agents, you need to know what changed and when for debugging and auditing.
-
“Bi-temporal is overkill”: For simple chatbots, yes. For agents operating over weeks/months where facts change, bi-temporal prevents subtle bugs and enables powerful queries.
Check-Your-Understanding Questions
- What’s the difference between “Where did Alice work on March 1?” and “What did we believe Alice’s job was on March 1?”
- If a user says “Actually, I started at TechCorp in December, not January”, which timestamp do you update?
- Why would
valid_fromandtxn_fromever be different? - How do you model “Alice works at TechCorp AND moonlights at StartupXYZ”?
Check-Your-Understanding Answers
-
First is a valid-time query (what was true in reality). Second is an as-of query (what did the system know). They can differ if we learned about her job after the fact.
-
Update valid_from to December (correcting the real-world timeline). Keep txn_from as now (when we learned this correction). This is called a “retroactive update.”
-
Late data arrival. If Alice tells you in June that she started in January, valid_from = January, txn_from = June. The system didn’t know in January; it learned in June.
-
Separate relationships or different relationship types. Option A: Two WORKS_AT edges with overlapping validity (if your model allows). Option B: PRIMARY_EMPLOYER vs SECONDARY_EMPLOYER relationship types. The right choice depends on your schema design.
Real-World Applications
- Financial systems: Audit trails, regulatory compliance (what did we believe when we made a trade?)
- Healthcare: Medical history with corrections (when was the diagnosis made vs. corrected?)
- Legal: Contract validity periods, retroactive amendments
- AI agents: Tracking user beliefs, preferences, and facts as they evolve
Where You’ll Apply It
- Project 5: Implement bi-temporal edge storage
- Project 6: Build temporal query interface
- Project 7: Handle contradictions with automatic invalidation
- Project 9: Use Graphiti’s built-in bi-temporal model
References
- Snodgrass, R. “Developing Time-Oriented Database Applications in SQL” (free online)
- ISO 9075:2011 (SQL:2011) - Temporal database extensions
- “Zep: A Temporal Knowledge Graph Architecture” - Section on bi-temporal model
Key Insight
Bi-temporal models separate “when it happened” from “when we knew.” This enables point-in-time queries, historical auditing, and graceful handling of contradictions—essential for AI agents operating over extended time.
Summary
Bi-temporal data models track two independent time dimensions: valid time (real-world truth) and transaction time (system recording). This enables queries like “what was true at T?” and “what did we believe at T?” For AI agent memory, bi-temporal models handle fact evolution (user changed jobs), late data arrival (learned about past events), and contradictions (user corrected earlier statement). Every edge carries four timestamps: valid_from, valid_to, txn_from, txn_to.
Homework/Exercises
- Exercise 1: Model this scenario with bi-temporal records:
- Jan 1: Alice is hired at TechCorp
- Jan 5: Agent learns about Alice’s job
- Mar 1: Alice gets promoted to VP
- Mar 3: Agent learns about the promotion
- Jun 1: Alice leaves TechCorp
- Jun 2: Agent learns Alice left
-
Exercise 2: Write queries for: (a) “What was Alice’s title on Feb 15?”, (b) “What did the system believe on Feb 15?”, (c) “When did we learn Alice became VP?”
- Exercise 3: How would you handle: “Actually, my promotion was effective Feb 1, not Mar 1” (received on Jun 5)?
Solutions to Homework/Exercises
- Solution to Exercise 1: ``` Edge 1: (Alice)-[EMPLOYED_BY]->(TechCorp) valid_from: Jan 1, valid_to: Jun 1 txn_from: Jan 5, txn_to: NULL
Edge 2: (Alice)-[HAS_TITLE]->(Employee) valid_from: Jan 1, valid_to: Mar 1 txn_from: Jan 5, txn_to: NULL
Edge 3: (Alice)-[HAS_TITLE]->(VP) valid_from: Mar 1, valid_to: Jun 1 txn_from: Mar 3, txn_to: NULL
2. **Solution to Exercise 2**:
```python
# (a) What was Alice's title on Feb 15? (valid-time query)
# Answer: Employee (Feb 15 is between Jan 1 and Mar 1)
# (b) What did system believe on Feb 15? (as-of query)
# Answer: Employee (txn_from Jan 5 < Feb 15, txn_to NULL)
# (c) When did we learn Alice became VP?
# Answer: Mar 3 (txn_from of the VP title edge)
- Solution to Exercise 3:
```
Create new edge with corrected valid_from:
Edge 4: (Alice)-[HAS_TITLE]->(VP) valid_from: Feb 1 # Corrected date valid_to: Jun 1 txn_from: Jun 5 # When we learned the correction txn_to: NULL
Optionally mark old edge as superseded:
Edge 3: txn_to = Jun 5 # This version is no longer current
Now the system knows:
- VP was valid from Feb 1 (correct)
- We first learned about VP on Mar 3 (original record)
- We learned the correct start date on Jun 5 (corrected record)
---
**Chapter 4: Graph Databases for AI Memory**
**Fundamentals**
Graph databases store and query data as nodes and edges, optimizing for traversal operations. For AI agent memory, graph databases provide:
1. **Efficient relationship traversal**: Follow edges without expensive joins
2. **Flexible schema**: Add new relationship types without migrations
3. **Pattern matching**: Find complex subgraph structures
4. **Index-free adjacency**: Each node stores direct pointers to neighbors
The dominant graph databases for AI memory are **Neo4j** (property graph, Cypher query language), **FalkorDB** (Redis-based, fast), and **Amazon Neptune** (managed, RDF and property graph).
**Deep Dive**
**Why Not Just Use PostgreSQL?**
You *can* model graphs in SQL:
```sql
CREATE TABLE nodes (id INT, label VARCHAR, properties JSONB);
CREATE TABLE edges (source INT, target INT, type VARCHAR, properties JSONB);
But graph traversal becomes expensive:
-- Find friends of friends of friends
SELECT DISTINCT f3.*
FROM edges e1
JOIN edges e2 ON e1.target = e2.source
JOIN edges e3 ON e2.target = e3.source
WHERE e1.source = 'Alice'
AND e1.type = 'FRIEND'
AND e2.type = 'FRIEND'
AND e3.type = 'FRIEND';
Each JOIN is O(n) or worse. With millions of edges, this becomes prohibitive.
Graph databases use index-free adjacency: each node physically stores pointers to its neighbors. Traversing an edge is O(1), making deep traversals efficient.
INDEX-FREE ADJACENCY
SQL (with indexes):
┌─────────────────────────────────────────────────────────────┐
│ To find Alice's friends: │
│ 1. Look up Alice in nodes table (index: O(log n)) │
│ 2. Scan edges table for source=Alice (index: O(log n)) │
│ 3. For each friend, look up in nodes table │
│ Total: O(k * log n) where k = number of friends │
└─────────────────────────────────────────────────────────────┘
Graph DB (index-free adjacency):
┌─────────────────────────────────────────────────────────────┐
│ Alice node contains: [ptr_to_Bob, ptr_to_Carol, ...] │
│ To find friends: │
│ 1. Look up Alice (index: O(log n)) │
│ 2. Follow pointers directly (O(k)) │
│ Total: O(log n + k), dominated by O(k) for local queries │
└─────────────────────────────────────────────────────────────┘
Neo4j Architecture:
Neo4j is the most popular graph database for AI applications. Key concepts:
- Nodes: Entities with labels (types) and properties
- Relationships: Typed, directed edges with properties
- Cypher: Declarative pattern-matching query language
- APOC: Extended library of graph algorithms
NEO4J DATA MODEL
┌─────────────────────────────────────────────────────────────┐
│ │
│ Node: Relationship: │
│ ┌──────────────────────┐ ┌───────────────────┐ │
│ │ (p:Person:Employee) │ │ [r:WORKS_AT] │ │
│ │ { │──────►│ { │ │
│ │ name: "Alice", │ │ since: 2023, │ │
│ │ age: 32, │ │ role: "Engineer"│ │
│ │ email: "a@..." │ │ } │ │
│ │ } │ └─────────┬─────────┘ │
│ └──────────────────────┘ │ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ (c:Company) │ │
│ │ { │ │
│ │ name:"TechCorp"│ │
│ │ } │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Cypher Query:
MATCH (p:Person {name: "Alice"})-[r:WORKS_AT]->(c:Company)
RETURN p.name, r.role, c.name
FalkorDB (formerly RedisGraph):
- Built on Redis, extremely fast for small-to-medium graphs
- Cypher-compatible query language
- In-memory by default, persistence optional
- Good for real-time agent applications
Amazon Neptune:
- Managed service, scales automatically
- Supports both RDF (SPARQL) and property graph (Gremlin, openCypher)
- Integrates with AWS ecosystem
- Higher latency but fully managed
How This Fits in Projects
Projects 1-2 teach Neo4j basics. Project 9 uses Neo4j with Graphiti. Projects 13-15 may explore FalkorDB for performance optimization.
Definitions & Key Terms
| Term | Definition |
|---|---|
| Property Graph | Graph model where nodes/edges have typed properties |
| Cypher | Neo4j’s declarative graph query language |
| Index-Free Adjacency | Architecture where nodes store direct pointers to neighbors |
| APOC | Neo4j’s extended procedure library for algorithms |
| Gremlin | Apache TinkerPop’s graph traversal language |
| Graph Traversal | Walking through a graph following edges |
| Pattern Matching | Finding subgraphs that match a specified structure |
Mental Model Diagram
GRAPH DATABASE QUERY EXECUTION
Query: MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.name = "Alice"
RETURN b.name, c.name
Step 1: Index Lookup
┌─────────────────────────────────────────────────────────┐
│ Find node where label=Person AND name="Alice" │
│ → Index returns: Node #42 │
└─────────────────────────────────────────────────────────┘
│
▼
Step 2: Traverse KNOWS edges (index-free)
┌─────────────────────────────────────────────────────────┐
│ Node #42 (Alice) has outgoing KNOWS edges to: │
│ → Node #17 (Bob) │
│ → Node #23 (Carol) │
│ → Node #56 (Dave) │
└─────────────────────────────────────────────────────────┘
│
▼
Step 3: Traverse WORKS_AT edges from each
┌─────────────────────────────────────────────────────────┐
│ Node #17 (Bob) → WORKS_AT → Node #89 (TechCorp) │
│ Node #23 (Carol) → WORKS_AT → Node #89 (TechCorp) │
│ Node #56 (Dave) → WORKS_AT → Node #91 (StartupXYZ) │
└─────────────────────────────────────────────────────────┘
│
▼
Step 4: Return results
┌─────────────────────────────────────────────────────────┐
│ | b.name | c.name | │
│ |---------|------------| │
│ | Bob | TechCorp | │
│ | Carol | TechCorp | │
│ | Dave | StartupXYZ | │
└─────────────────────────────────────────────────────────┘
Total operations: 1 index lookup + 3 KNOWS traversals + 3 WORKS_AT traversals = 7
SQL equivalent would require: 2 index lookups + 2 hash joins = O(n) or worse
How It Works (Step-by-Step)
- Parse query: Convert Cypher text to abstract syntax tree (AST)
- Plan execution: Optimizer chooses traversal order, index usage
- Index lookup: Find starting nodes using property indexes
- Traverse: Follow edges using index-free adjacency
- Filter: Apply WHERE conditions at each step
- Collect results: Gather matching paths into result set
- Return: Project requested properties from matched nodes/edges
Invariants:
- All relationships have exactly one type
- Relationships are always directed (though you can query both directions)
- Node labels and relationship types are case-sensitive
Failure Modes:
- Cartesian products: Forgetting to connect patterns causes explosion
- Missing indexes: Full scans on large graphs are slow
- Unbounded variable-length paths:
[:KNOWS*]with no limit can explode
Minimal Concrete Example
# Neo4j Python driver example
from neo4j import GraphDatabase
# Connect
driver = GraphDatabase.driver(
"bolt://localhost:7687",
auth=("neo4j", "password")
)
# Create nodes and relationships
with driver.session() as session:
session.run("""
CREATE (alice:Person {name: 'Alice', age: 32})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (techcorp:Company {name: 'TechCorp'})
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
CREATE (alice)-[:WORKS_AT {role: 'Engineer'}]->(techcorp)
CREATE (bob)-[:WORKS_AT {role: 'Manager'}]->(techcorp)
""")
# Query: Find Alice's coworkers
with driver.session() as session:
result = session.run("""
MATCH (alice:Person {name: 'Alice'})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(coworker)
WHERE coworker <> alice
RETURN coworker.name, company.name
""")
for record in result:
print(f"{record['coworker.name']} works at {record['company.name']}")
# Output: Bob works at TechCorp
driver.close()
Common Misconceptions
-
“Graphs are only for social networks”: False. Any connected data benefits: recommendations, fraud detection, knowledge bases, AI memory.
-
“Graph queries are complex”: Cypher is actually more intuitive than SQL for connected data.
(a)-[:KNOWS]->(b)is clearer than JOIN syntax. -
“Graph DBs don’t scale”: Neo4j handles billions of nodes. For truly massive scale, distributed graph DBs (TigerGraph, Dgraph) exist.
Check-Your-Understanding Questions
- Why is index-free adjacency faster for graph traversal than SQL joins?
- What happens if you write
MATCH (a), (b) RETURN a, bin Cypher? - When would you use FalkorDB instead of Neo4j?
- How do you prevent unbounded traversal explosion in Cypher?
Check-Your-Understanding Answers
-
SQL joins require looking up rows through indexes for each hop. Graph DBs store direct pointers to neighbors, making each hop O(1) instead of O(log n).
-
Cartesian product. It matches every node
awith every nodeb. If you have 1000 nodes, you get 1,000,000 results. Always connect your patterns with relationships. -
Low latency requirements (FalkorDB is faster for small graphs), Redis ecosystem integration, or simpler deployment (single binary). Neo4j is better for complex queries, larger graphs, and enterprise features.
-
Use bounded variable-length paths:
[:KNOWS*1..3]limits to 1-3 hops. Or use APOC procedures with termination conditions. Never use unbounded*on large graphs.
Real-World Applications
- Recommendation systems: Netflix, Amazon use graphs for collaborative filtering
- Fraud detection: Banks model transaction networks to find suspicious patterns
- Knowledge management: Enterprise knowledge bases linking documents, people, concepts
- AI agents: Storing extracted entities and relationships from conversations
Where You’ll Apply It
- Project 1: Set up Neo4j, create basic schema
- Project 2: Implement Cypher queries for entity lookup
- Project 5: Add temporal properties to edges
- Project 9: Use Neo4j with Graphiti framework
- Project 13: Compare Neo4j vs FalkorDB performance
References
- “Graph Databases” by Robinson, Webber, Eifrem (O’Reilly) - Definitive introduction
- Neo4j Cypher Manual: https://neo4j.com/docs/cypher-manual/current/
- FalkorDB Documentation: https://docs.falkordb.com/
Key Insight
Graph databases optimize for connection traversal, making them ideal for AI memory where you need to query “who is connected to what” across multiple hops efficiently.
Summary
Graph databases store data as nodes and edges, optimizing for traversal through index-free adjacency. Neo4j (Cypher), FalkorDB (Redis-based), and Neptune (managed) are the primary options for AI agent memory. Graph DBs excel at pattern matching, multi-hop queries, and flexible schema evolution—all critical for storing and querying extracted entities and relationships from conversations.
Homework/Exercises
-
Exercise 1: Install Neo4j locally (Docker recommended) and create a small social network: 5 people who KNOWS and WORKS_AT 2 companies.
-
Exercise 2: Write Cypher queries for: (a) All people at TechCorp, (b) Friends of friends of Alice, (c) Shortest path between Alice and Eve.
-
Exercise 3: Compare query performance: Run the “friends of friends” query in Neo4j vs a SQL equivalent. Time both.
Solutions to Homework/Exercises
- Solution to Exercise 1:
```bash
Start Neo4j
docker run -d –name neo4j -p 7474:7474 -p 7687:7687
-e NEO4J_AUTH=neo4j/testpassword neo4j:5-community
Access browser at http://localhost:7474
Run in Neo4j Browser:
```cypher
CREATE (alice:Person {name: 'Alice'})
CREATE (bob:Person {name: 'Bob'})
CREATE (carol:Person {name: 'Carol'})
CREATE (dave:Person {name: 'Dave'})
CREATE (eve:Person {name: 'Eve'})
CREATE (techcorp:Company {name: 'TechCorp'})
CREATE (startup:Company {name: 'StartupXYZ'})
CREATE (alice)-[:KNOWS]->(bob)
CREATE (bob)-[:KNOWS]->(carol)
CREATE (carol)-[:KNOWS]->(dave)
CREATE (dave)-[:KNOWS]->(eve)
CREATE (alice)-[:WORKS_AT]->(techcorp)
CREATE (bob)-[:WORKS_AT]->(techcorp)
CREATE (carol)-[:WORKS_AT]->(startup)
CREATE (dave)-[:WORKS_AT]->(startup)
CREATE (eve)-[:WORKS_AT]->(techcorp)
- Solution to Exercise 2: ```cypher // (a) All people at TechCorp MATCH (p:Person)-[:WORKS_AT]->(c:Company {name: ‘TechCorp’}) RETURN p.name
// (b) Friends of friends of Alice (2 hops) MATCH (alice:Person {name: ‘Alice’})-[:KNOWS*2]->(fof:Person) RETURN DISTINCT fof.name
// (c) Shortest path between Alice and Eve MATCH path = shortestPath( (alice:Person {name: ‘Alice’})-[:KNOWS*]-(eve:Person {name: ‘Eve’}) ) RETURN path, length(path)
3. **Solution to Exercise 3** (conceptual):
- Neo4j: `MATCH (:Person {name:'Alice'})-[:KNOWS*2]->(fof) RETURN fof` runs in ~1-5ms
- SQL equivalent with recursive CTE or double-JOIN: 10-100ms+ depending on indexes
- For deeper traversals (3+ hops), the gap widens significantly
---
**Chapter 5: Entity and Relationship Extraction**
**Fundamentals**
Entity and relationship extraction is the process of identifying structured information from unstructured text. For AI agent memory, this means:
1. **Entity extraction**: Identifying mentions of people, organizations, concepts, events
2. **Relationship extraction**: Identifying how entities are connected
3. **Entity resolution**: Matching extracted mentions to canonical entities
Modern approaches use LLMs with structured output to perform extraction, replacing older NLP pipelines (spaCy NER, OpenIE).
**Deep Dive**
**Traditional NLP Pipeline**:
Text → Tokenize → POS Tag → NER → Dependency Parse → OpenIE → Triples
This pipeline is fast but brittle:
- NER models only recognize trained entity types (PERSON, ORG, LOCATION)
- Relationship extraction depends on grammatical patterns
- Misses implicit relationships ("I work with Sarah" doesn't explicitly name the company)
**LLM-Based Extraction**:
Text → LLM with extraction prompt → JSON output → Parse → Entities + Relationships
LLMs understand context, handle implicit relationships, and can extract custom entity types. The tradeoff: slower, more expensive, but far more accurate for open-domain extraction.
LLM EXTRACTION PIPELINE
Input: “I just finished refactoring the authentication module. Sarah helped me debug the OAuth integration with Google.”
│
▼ ┌─────────────────────────────────────────────────────────────────┐ │ LLM EXTRACTION PROMPT │ │ │ │ System: You are an entity and relationship extractor. │ │ Extract all entities (people, projects, technologies, │ │ organizations) and relationships between them. │ │ Return JSON with "entities" and "relationships". │ │ │ │ User: [text above] │ │ │ │ Expected output: │ │ { │ │ "entities": [ │ │ {"name": "User", "type": "Person"}, │ │ {"name": "Sarah", "type": "Person"}, │ │ {"name": "authentication_module", "type": "Project"}, │ │ {"name": "OAuth", "type": "Technology"}, │ │ {"name": "Google", "type": "Organization"} │ │ ], │ │ "relationships": [ │ │ {"subject": "User", "predicate": "REFACTORED", │ │ "object": "authentication_module"}, │ │ {"subject": "Sarah", "predicate": "HELPED_DEBUG", │ │ "object": "OAuth"}, │ │ {"subject": "authentication_module", "predicate": "USES", │ │ "object": "OAuth"}, │ │ {"subject": "OAuth", "predicate": "INTEGRATES_WITH", │ │ "object": "Google"} │ │ ] │ │ } │ └─────────────────────────────────────────────────────────────────┘ ```
Entity Resolution (Deduplication):
Raw extraction produces mentions like “Sarah”, “sarah”, “Sarah from engineering”, “S. Chen”. Entity resolution matches these to a single canonical entity.
Approaches:
- Exact match: Normalize case, strip titles
- Fuzzy match: Levenshtein distance, Jaro-Winkler similarity
- Embedding similarity: Embed mentions, compare vectors
- LLM-based: Ask LLM if two mentions refer to same entity
ENTITY RESOLUTION APPROACHES
Mention: "Sarah from engineering"
1. Exact Match (after normalization):
"sarah from engineering" → No exact match
2. Fuzzy Match (Jaro-Winkler):
vs "Sarah": 0.85 (partial match)
vs "Sarah Chen": 0.78
vs "Bob": 0.30
→ Best match: "Sarah" (but below threshold 0.90)
3. Embedding Similarity:
embed("Sarah from engineering") • embed("Sarah Chen") = 0.92
→ Match! (above threshold)
4. LLM Verification:
"Do 'Sarah from engineering' and 'Sarah Chen' refer to the same person
given context about a tech company?"
→ "Yes, likely the same person given engineering context"
Relationship Types:
For AI agent memory, common relationship types include:
- WORKS_ON: Person → Project
- WORKS_AT: Person → Organization
- KNOWS: Person → Person
- USES: Project → Technology
- PREFERS: User → Concept/Technology
- DISCUSSED: Conversation → Topic
- MENTIONED_IN: Entity → Episode
How This Fits in Projects
Project 4 builds the extraction pipeline. Project 6 adds entity resolution. Projects 9-12 show how frameworks like Graphiti handle extraction automatically.
Definitions & Key Terms
| Term | Definition |
|---|---|
| Entity Extraction | Identifying named things (people, places, concepts) in text |
| Relationship Extraction | Identifying connections between entities |
| Entity Resolution | Matching mentions to canonical entities (deduplication) |
| Triple | (subject, predicate, object) fact structure |
| Named Entity Recognition (NER) | Traditional ML approach to entity extraction |
| Coreference Resolution | Linking pronouns to their referents (“she” → “Sarah”) |
| Structured Output | LLM response in a predefined format (JSON, schema) |
Mental Model Diagram
EXTRACTION PIPELINE STAGES
┌──────────────────────────────────────────────────────┐
│ RAW CONVERSATION TEXT │
│ │
│ "I'm working on the payments service with Alice. │
│ We're integrating Stripe for payment processing. │
│ Alice is handling the webhook endpoints." │
│ │
└────────────────────────┬─────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ STAGE 1: ENTITY EXTRACTION │
│ │
│ Entities found: │
│ • "I" → User (Person) │
│ • "payments service" → Project │
│ • "Alice" → Person │
│ • "Stripe" → Technology/Service │
│ • "webhook endpoints" → Component │
└────────────────────────┬─────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ STAGE 2: RELATIONSHIP EXTRACTION │
│ │
│ Relationships found: │
│ • User WORKS_ON payments_service │
│ • Alice WORKS_ON payments_service │
│ • User COLLABORATES_WITH Alice │
│ • payments_service INTEGRATES Stripe │
│ • Alice HANDLES webhook_endpoints │
│ • webhook_endpoints PART_OF payments_service │
└────────────────────────┬─────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ STAGE 3: ENTITY RESOLUTION │
│ │
│ Check existing entities: │
│ • "Alice" → Match: Alice Chen (existing node #42) │
│ • "payments service" → Match: payments_api (node │
│ #89, alias added) │
│ • "Stripe" → No match → Create new node │
│ • "webhook endpoints" → No match → Create new node │
└────────────────────────┬─────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ STAGE 4: GRAPH UPDATE │
│ │
│ (User)──────WORKS_ON──────►(payments_api) │
│ │ │
│ (Alice#42)──WORKS_ON────────────►│ │
│ │ │ │
│ │ INTEGRATES │
│ HANDLES │ │
│ │ ▼ │
│ ▼ (Stripe) [NEW] │
│ (webhook_endpoints) [NEW] │
│ │ │
│ PART_OF │
│ │ │
│ └──────────────►(payments_api) │
└──────────────────────────────────────────────────────┘
How It Works (Step-by-Step)
- Receive text: Conversation turn arrives for processing
- Construct prompt: Build extraction prompt with schema and examples
- Call LLM: Send prompt, receive structured JSON response
- Parse response: Extract entities and relationships from JSON
- Validate: Check for required fields, reasonable types
- Resolve entities: For each entity, find or create canonical node
- Create edges: Add relationships to graph with timestamps
- Link source: Connect new facts to source episode
Invariants:
- Every relationship has exactly one subject and one object
- Entity names should be normalized (consistent casing, no extra whitespace)
- Confidence scores should be between 0 and 1
Failure Modes:
- Hallucinated entities: LLM invents entities not in the text
- Over-extraction: Creating entities for every noun
- Under-extraction: Missing implicit relationships
- Resolution errors: Merging distinct entities or splitting one entity
Minimal Concrete Example
# LLM-based extraction with OpenAI
EXTRACTION_PROMPT = """
Extract entities and relationships from this conversation.
Entities should be: Person, Project, Technology, Organization, Concept
Relationships should be: WORKS_ON, WORKS_AT, USES, KNOWS, PREFERS, DISCUSSED
Return JSON:
{
"entities": [{"name": "...", "type": "..."}],
"relationships": [{"subject": "...", "predicate": "...", "object": "..."}]
}
Text: {text}
"""
def extract_from_text(text: str) -> dict:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You extract structured data from text."},
{"role": "user", "content": EXTRACTION_PROMPT.format(text=text)}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
def resolve_entity(name: str, entity_type: str, graph) -> str:
# Try exact match
existing = graph.find_node(name=name.lower(), type=entity_type)
if existing:
return existing.id
# Try fuzzy match
candidates = graph.find_nodes_by_type(entity_type)
for candidate in candidates:
if fuzzy_match(name, candidate.name) > 0.85:
# Add alias
graph.add_alias(candidate.id, name)
return candidate.id
# Create new
new_node = graph.create_node(name=name, type=entity_type)
return new_node.id
def process_conversation(text: str, episode_id: str, graph):
# Extract
extraction = extract_from_text(text)
# Resolve entities
entity_map = {}
for entity in extraction["entities"]:
node_id = resolve_entity(entity["name"], entity["type"], graph)
entity_map[entity["name"]] = node_id
# Create relationships
for rel in extraction["relationships"]:
graph.create_edge(
source=entity_map[rel["subject"]],
target=entity_map[rel["object"]],
type=rel["predicate"],
source_episode=episode_id
)
Common Misconceptions
-
“NER is sufficient”: Traditional NER only finds predefined types (PERSON, ORG, GPE). LLMs can extract domain-specific entities (Project, Technology, Concept).
-
“More extraction is better”: Over-extraction creates noise. Focus on salient entities—things the user might ask about later.
-
“Entity resolution is optional”: Without resolution, “Alice”, “alice@company.com”, and “Alice Smith” become three separate entities, fragmenting the knowledge graph.
Check-Your-Understanding Questions
- Why would you use LLM extraction over spaCy NER?
- Given “Alice and Bob are working on the auth module”, what entities and relationships would you extract?
- How do you handle “She finished the project” when “she” refers to Alice mentioned earlier?
- What’s the risk of entity resolution being too aggressive (merging too much)?
Check-Your-Understanding Answers
-
Flexibility and context. spaCy NER only finds types it was trained on. LLMs can extract custom types (Project, Technology) and understand implicit relationships (“I work with Bob” implies shared workplace).
-
Entities: Alice (Person), Bob (Person), auth_module (Project). Relationships: Alice WORKS_ON auth_module, Bob WORKS_ON auth_module, Alice COLLABORATES_WITH Bob.
-
Coreference resolution. Either use a coreference model to replace “She” with “Alice” before extraction, or ensure your extraction prompt includes the full conversation context so the LLM can resolve pronouns.
-
False merges. Two different people named “John” become one entity, mixing up their facts. You lose the ability to distinguish them. Better to under-merge (with aliases) than over-merge.
Real-World Applications
- Enterprise search: Extracting entities from documents for knowledge management
- News analysis: Building knowledge graphs from news articles
- Biomedical NLP: Extracting drug-gene-disease relationships
- AI agents: Building memory graphs from conversations
Where You’ll Apply It
- Project 4: Build extraction pipeline from scratch
- Project 6: Add entity resolution with fuzzy matching
- Project 9: Use Graphiti’s built-in extraction
- Project 10: Use Mem0’s extraction pipeline
References
- “KGGen: Extracting Knowledge Graphs from Plain Text with Language Models” (2025)
- Neo4j LLM Knowledge Graph Builder: https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
- Relik framework for entity linking: https://github.com/SapienzaNLP/relik
Key Insight
Entity and relationship extraction transforms unstructured conversation into structured knowledge. LLMs provide flexibility; entity resolution ensures consistency. Together, they build the semantic memory layer.
Summary
Entity and relationship extraction converts raw text into structured triples (subject, predicate, object). LLM-based extraction is more flexible than traditional NER, handling custom entity types and implicit relationships. Entity resolution matches extracted mentions to canonical nodes, preventing fragmentation. The extraction pipeline processes each conversation turn, building up the semantic memory graph incrementally.
Homework/Exercises
-
Exercise 1: Write an extraction prompt that extracts software development entities (Developer, Task, Bug, Feature, Technology, Repository).
-
Exercise 2: Given the text “John fixed the login bug that was causing issues for the mobile app”, extract all entities and relationships.
-
Exercise 3: Design an entity resolution strategy that handles: email addresses (john@company.com), usernames (@john_dev), full names (John Smith), and nicknames (Johnny).
Solutions to Homework/Exercises
- Solution to Exercise 1: ``` Extract software development entities and relationships from this text.
Entity types:
- Developer: A person who writes code
- Task: A unit of work (ticket, story, todo)
- Bug: A defect or issue
- Feature: A product capability
- Technology: A language, framework, tool
- Repository: A code repository
Relationship types:
- WORKS_ON: Developer → Task/Bug/Feature
- FIXED: Developer → Bug
- IMPLEMENTED: Developer → Feature
- USES: Repository/Feature → Technology
- BLOCKED_BY: Task → Bug
- PART_OF: Bug/Feature → Repository
Return JSON with “entities” and “relationships” arrays.
2. **Solution to Exercise 2**:
```json
{
"entities": [
{"name": "John", "type": "Developer"},
{"name": "login_bug", "type": "Bug"},
{"name": "mobile_app", "type": "Project"}
],
"relationships": [
{"subject": "John", "predicate": "FIXED", "object": "login_bug"},
{"subject": "login_bug", "predicate": "AFFECTED", "object": "mobile_app"}
]
}
- Solution to Exercise 3:
def resolve_person(mention: str, context: str, graph) -> str: # Extract potential identifiers email_pattern = r'\b[\w.-]+@[\w.-]+\.\w+\b' username_pattern = r'@[\w_]+' # Check if mention is an email if re.match(email_pattern, mention): existing = graph.find_by_property("email", mention) if existing: return existing.id # Check if mention is a username if mention.startswith("@"): existing = graph.find_by_property("username", mention) if existing: return existing.id # Normalize name normalized = normalize_name(mention) # "Johnny" → "John", case normalize # Fuzzy match against existing persons candidates = graph.find_nodes_by_type("Person") for candidate in candidates: # Check name similarity if fuzzy_match(normalized, candidate.name) > 0.85: return candidate.id # Check against aliases for alias in candidate.aliases: if fuzzy_match(normalized, alias) > 0.85: return candidate.id # Create new if no match return graph.create_node(name=normalized, type="Person").id
Chapter 6: Hybrid Retrieval for Agent Memory
Fundamentals
No single retrieval method is optimal for all queries. Hybrid retrieval combines multiple approaches:
- Semantic search: Vector similarity for conceptually related content
- Graph traversal: Follow relationships for connected entities
- Keyword search (BM25): Exact term matching for specific names/codes
For AI agent memory, hybrid retrieval lets you answer:
- “What do we know about authentication?” (semantic)
- “What projects does Alice work on?” (graph)
- “Find mentions of API_KEY_12345” (keyword)
Deep Dive
Why Hybrid?
Each retrieval method has strengths and weaknesses:
| Method | Strengths | Weaknesses |
|---|---|---|
| Semantic (Vector) | Conceptual similarity, handles paraphrasing | Misses exact matches, no temporal reasoning |
| Graph Traversal | Structured relationships, multi-hop queries | Requires schema knowledge, no fuzzy matching |
| Keyword (BM25) | Exact matches, fast, handles codes/IDs | No semantic understanding, brittle to typos |
HYBRID RETRIEVAL IN ACTION
Query: "What authentication methods does Alice's project use?"
┌─────────────────────────────────────────────────────────────────┐
│ RETRIEVAL PATHS (PARALLEL) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ PATH 1: SEMANTIC SEARCH │
│ │
│ embed("authentication methods Alice project") │
│ → Search episodes by embedding similarity │
│ → Results: │
│ • Episode #127: "I'm implementing OAuth for the..." (0.89)│
│ • Episode #89: "Auth module uses JWT tokens..." (0.85) │
│ • Episode #203: "Security review for login..." (0.78) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ PATH 2: GRAPH TRAVERSAL │
│ │
│ MATCH (alice:Person {name: "Alice"}) │
│ -[:WORKS_ON]->(project) │
│ -[:USES]->(tech) │
│ WHERE tech.type = "Authentication" │
│ → Results: │
│ • Alice → auth_module → OAuth │
│ • Alice → auth_module → JWT │
│ • Alice → api_gateway → API_KEY │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ PATH 3: KEYWORD SEARCH (BM25) │
│ │
│ Search terms: ["authentication", "Alice", "project"] │
│ → Results: │
│ • Episode #127: "Alice" + "authentication" (score: 4.2) │
│ • Episode #156: "auth" + "Alice's project" (score: 3.8) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ FUSION & RERANKING │
│ │
│ Reciprocal Rank Fusion (RRF): │
│ score(doc) = Σ 1/(k + rank_i(doc)) for each retriever i │
│ │
│ Combined ranking: │
│ 1. Episode #127 (appeared in all 3 paths) │
│ 2. OAuth entity (graph + semantic) │
│ 3. Episode #89 (semantic + keyword) │
│ 4. JWT entity (graph only, but highly relevant) │
│ │
│ → Return top 5 results for LLM context │
└─────────────────────────────────────────────────────────────────┘
Result Fusion Algorithms:
Reciprocal Rank Fusion (RRF):
RRF_score(d) = Σ 1/(k + rank_i(d))
Where k is typically 60. Documents appearing in multiple result lists get higher scores.
Maximal Marginal Relevance (MMR):
MMR = λ · Sim(d, query) - (1-λ) · max(Sim(d, selected_docs))
Balances relevance with diversity, avoiding redundant results.
Episode-Mentions Reranking (Graphiti-specific):
- Count how many extracted entities in each episode are mentioned elsewhere
- Episodes with frequently-referenced entities rank higher
- This graph-aware reranking improves precision
How This Fits in Projects
Projects 13-14 implement hybrid retrieval. Project 9 uses Graphiti’s built-in hybrid search. Project 15 optimizes retrieval performance.
Definitions & Key Terms
| Term | Definition |
|---|---|
| Hybrid Retrieval | Combining multiple retrieval methods (semantic, graph, keyword) |
| BM25 | Best Match 25, a probabilistic keyword ranking algorithm |
| Reciprocal Rank Fusion | Algorithm to combine ranked lists by reciprocal of ranks |
| MMR | Maximal Marginal Relevance, balances relevance and diversity |
| Reranking | Second-pass scoring to improve initial retrieval results |
| Top-k | Returning the k highest-scoring results |
Mental Model Diagram
HYBRID RETRIEVAL ARCHITECTURE
┌─────────────────────────────────────────────────────────┐
│ USER QUERY │
│ "What did Alice say about the API refactor?" │
└────────────────────────┬────────────────────────────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SEMANTIC │ │ GRAPH │ │ KEYWORD │
│ SEARCH │ │ TRAVERSAL │ │ (BM25) │
│ │ │ │ │ │
│ Vector index │ │ Cypher query │ │ Full-text │
│ (episodes) │ │ (entities) │ │ index │
│ │ │ │ │ │
│ Top-10 by │ │ Paths from │ │ Top-10 by │
│ similarity │ │ Alice │ │ term match │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────┼────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ FUSION LAYER │
│ │
│ 1. Collect results from all retrievers │
│ 2. Apply RRF to combine rankings │
│ 3. Apply MMR to reduce redundancy │
│ 4. Apply temporal filtering (if time-scoped query) │
│ 5. Apply episode-mentions reranking │
│ │
└────────────────────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ CONTEXT ASSEMBLY │
│ │
│ Format top results for LLM: │
│ • Include source attribution │
│ • Add temporal context ("discussed on Jan 15") │
│ • Respect token budget │
│ • Prioritize entity facts over raw episodes │
│ │
└─────────────────────────────────────────────────────────┘
How It Works (Step-by-Step)
- Parse query: Identify entities, temporal scope, intent
- Plan retrieval: Decide which methods to use based on query type
- Execute parallel: Run semantic, graph, keyword searches concurrently
- Collect results: Gather candidate documents/entities from each
- Score fusion: Apply RRF or similar to combine rankings
- Rerank: Apply domain-specific reranking (MMR, episode-mentions)
- Filter: Apply temporal and access control filters
- Format: Assemble context for LLM within token budget
Invariants:
- Fusion should never lose highly-ranked results from any single retriever
- Temporal filters should be applied after fusion (don’t pre-filter)
- Token budget should be enforced as late as possible
Failure Modes:
- Over-reliance on one retriever: If semantic dominates, you miss exact matches
- Fusion parameter tuning: Wrong k in RRF can hurt performance
- Ignoring entity results: Graph entities are facts, not just documents
Minimal Concrete Example
# Hybrid retrieval implementation
def hybrid_retrieve(query: str, graph, vector_store, text_index, top_k: int = 10):
# Parallel retrieval
semantic_results = vector_store.search(
embed(query),
limit=top_k * 2 # Over-retrieve for fusion
)
graph_results = graph.query("""
CALL db.index.fulltext.queryNodes('entityIndex', $query)
YIELD node, score
MATCH (node)-[r]->(related)
RETURN node, r, related, score
LIMIT $limit
""", query=query, limit=top_k * 2)
keyword_results = text_index.bm25_search(query, limit=top_k * 2)
# Reciprocal Rank Fusion
k = 60
scores = defaultdict(float)
for rank, doc in enumerate(semantic_results):
scores[doc.id] += 1 / (k + rank)
for rank, entity in enumerate(graph_results):
scores[entity.node.id] += 1 / (k + rank)
for rank, doc in enumerate(keyword_results):
scores[doc.id] += 1 / (k + rank)
# Sort by combined score
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
# Fetch full documents for top-k
result_ids = [doc_id for doc_id, _ in ranked[:top_k]]
return fetch_documents(result_ids)
def apply_mmr(results, query_embedding, lambda_param=0.7):
"""Maximal Marginal Relevance for diversity"""
selected = []
remaining = list(results)
while remaining and len(selected) < len(results):
best_score = -float('inf')
best_doc = None
for doc in remaining:
relevance = cosine_sim(doc.embedding, query_embedding)
redundancy = max(
cosine_sim(doc.embedding, s.embedding)
for s in selected
) if selected else 0
mmr_score = lambda_param * relevance - (1 - lambda_param) * redundancy
if mmr_score > best_score:
best_score = mmr_score
best_doc = doc
selected.append(best_doc)
remaining.remove(best_doc)
return selected
Common Misconceptions
-
“Vector search is enough”: Vector search misses exact matches, codes, and IDs. Hybrid catches what vectors miss.
-
“Graph queries are slow”: With proper indexes, graph traversal is O(edges), often faster than scanning documents.
-
“Just return more results”: More results without fusion means noise. Quality > quantity.
Check-Your-Understanding Questions
- Why use RRF instead of just averaging similarity scores?
- When would keyword (BM25) retrieval outperform semantic search?
- What’s the purpose of MMR after RRF?
- How does episode-mentions reranking improve results?
Check-Your-Understanding Answers
-
Different score scales. Cosine similarity is 0-1, BM25 scores can be any positive number. RRF uses ranks (ordinal) which are comparable across systems.
-
Exact matches: API keys, error codes, specific IDs, proper nouns that embeddings might not distinguish (e.g., “Alice” vs “Alex” have similar embeddings).
-
Diversity. RRF might rank multiple paraphrases of the same fact highly. MMR ensures you get diverse information, not redundant copies.
-
Graph-aware relevance. Episodes mentioning entities that appear frequently in the graph (well-connected nodes) are likely more important. It leverages graph structure for ranking.
Real-World Applications
- Enterprise search: Combining vector, structured, and keyword search
- E-commerce: Product search with attributes + descriptions
- Legal research: Keyword for statutes + semantic for concepts
- AI agents: Memory retrieval for context injection
Where You’ll Apply It
- Project 9: Use Graphiti’s built-in hybrid retrieval
- Project 13: Implement custom hybrid retrieval
- Project 14: Optimize retrieval for latency
- Project 15: Benchmark retrieval accuracy
References
- “Stop Using RAG for Agent Memory” - Zep blog on hybrid approaches
- BM25 original paper: Robertson & Walker (1994)
- RRF: Cormack et al. “Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods”
Key Insight
Hybrid retrieval combines the strengths of semantic (conceptual), graph (relational), and keyword (exact) search. No single method handles all query types—the combination catches what each alone misses.
Summary
Hybrid retrieval combines semantic search (vector similarity), graph traversal (relationship queries), and keyword search (BM25 exact matching). Results are fused using algorithms like Reciprocal Rank Fusion, then diversified with MMR. For AI agent memory, hybrid retrieval ensures you can answer conceptual questions, entity-relationship queries, and exact-match lookups from a single query interface.
Homework/Exercises
-
Exercise 1: Given these queries, identify which retrieval method(s) would be most effective: (a) “What’s the company’s policy on remote work?”, (b) “Who reports to Alice?”, (c) “Find all mentions of ERROR_CODE_5043”
-
Exercise 2: Implement RRF in pseudocode for combining three ranked lists of 10 items each.
-
Exercise 3: Design a query router that decides which retrieval methods to use based on query analysis.
Solutions to Homework/Exercises
- Solution to Exercise 1:
- (a) Semantic primary: “policy on remote work” is conceptual, might be phrased differently in documents
- (b) Graph primary: Direct relationship query
(Alice)-[:MANAGES]->(reports) - (c) Keyword primary: Exact code match, embedding won’t help
- Solution to Exercise 2:
def reciprocal_rank_fusion(ranked_lists, k=60): """ ranked_lists: list of lists, each list is ordered by relevance k: constant (typically 60) """ scores = {} for ranked_list in ranked_lists: for rank, item in enumerate(ranked_list): if item not in scores: scores[item] = 0 scores[item] += 1 / (k + rank) # Sort by RRF score descending fused = sorted(scores.items(), key=lambda x: x[1], reverse=True) return [item for item, score in fused] - Solution to Exercise 3:
def route_query(query: str) -> list[str]: """Returns list of retrieval methods to use""" methods = [] # Always use semantic as baseline methods.append("semantic") # Check for entity relationship patterns relationship_patterns = ["who", "reports to", "works on", "works at", "knows"] if any(p in query.lower() for p in relationship_patterns): methods.append("graph") # Check for exact match patterns exact_patterns = [ r'[A-Z_]+_\d+', # ERROR_CODE_123 r'[a-zA-Z0-9_.+-]+@', # email r'"[^"]+"', # quoted string ] if any(re.search(p, query) for p in exact_patterns): methods.append("keyword") # Check for temporal patterns if any(t in query.lower() for t in ["when", "before", "after", "last week"]): methods.append("temporal_filter") return methods
Glossary
High-Signal Definitions for Quick Reference
-
Bi-Temporal Model: Data model tracking two independent time dimensions—valid time (when fact was true in the real world) and transaction time (when fact was recorded in the system).
-
Community Detection: Graph algorithm (e.g., Leiden, Louvain) that identifies densely connected clusters of nodes, used to group related entities for summarization.
-
Edge (Relationship): Connection between two nodes in a graph, with a type label and optional properties. E.g.,
(Alice)-[:WORKS_AT {since: "2023"}]->(Acme). -
Embedding: Dense vector representation of text in high-dimensional space where semantic similarity corresponds to geometric proximity.
-
Entity: Named object in the world—person, organization, product, concept—represented as a node in the knowledge graph.
-
Entity Resolution: Process of determining whether two entity mentions refer to the same real-world object and merging them if so.
-
Episode/Episodic Memory: Record of a specific event or conversation with temporal bounds—the “what happened when” layer.
-
Fact (Triple): Atomic unit of knowledge graph: subject-predicate-object. E.g., “Alice WORKS_AT Acme”.
-
Graph Database: Database optimized for storing and querying highly connected data using nodes, edges, and properties rather than tables.
-
Graphiti: Open-source temporal knowledge graph framework by Zep for building AI agent memory with episodic/semantic layers.
-
Hallucination: AI generating plausible but factually incorrect information, often prevented by grounding responses in knowledge graph facts.
-
Hybrid Retrieval: Combining multiple retrieval methods (semantic, keyword, graph) and fusing results for comprehensive recall.
-
Index-Free Adjacency: Graph database property where each node directly points to its neighbors, enabling O(1) edge traversal without index lookups.
-
Knowledge Graph (KG): Graph structure where entities are nodes and relationships are labeled edges, encoding facts about the world.
-
LLM (Large Language Model): Neural network trained on text that generates human-like responses; used for entity extraction, summarization, and reasoning.
-
Maximal Marginal Relevance (MMR): Algorithm that balances relevance and diversity when selecting results, avoiding redundancy.
-
Mem0: AI memory framework with graph extensions (Mem0g) for structured long-term agent memory.
-
MemGPT/Letta: Architecture using virtual context management—OS-inspired memory tiers with explicit memory operations.
-
Neo4j: Leading native graph database using the Cypher query language.
-
Node: Vertex in a graph representing an entity, with labels (types) and properties.
-
RAG (Retrieval-Augmented Generation): Pattern of retrieving relevant context before generating LLM responses.
-
Reciprocal Rank Fusion (RRF): Score-agnostic algorithm for combining ranked results from multiple retrieval systems.
-
Relationship Extraction: NLP task of identifying typed connections between entities in text.
-
Semantic Memory: General knowledge about the world, abstracted from specific episodes—the “what we know” layer.
-
Temporal Decay: Memory relevance decreasing over time, with recent information weighted more heavily.
-
Temporal Knowledge Graph (TKG): Knowledge graph where facts have temporal validity (start/end times) enabling point-in-time queries.
-
Transaction Time: When a fact was recorded or modified in the database (system-managed, immutable).
-
Triple Store: Database storing subject-predicate-object triples, often supporting SPARQL queries.
-
Valid Time: When a fact was true in the real world (application-managed).
-
Vector Database: Database optimized for storing and querying high-dimensional vectors using approximate nearest neighbor search.
-
Zep: Commercial platform for AI agent memory built on temporal knowledge graphs, open-source via Graphiti.
Why Temporal Knowledge Graphs for AI Agent Memory Matters
The Problem: AI Agents Have Goldfish Memory
When you chat with most AI systems today, each conversation starts fresh. Ask about a project you discussed yesterday, and the AI draws a blank. This isn’t a minor inconvenience—it’s a fundamental limitation that prevents AI from being truly useful for:
- Personal assistants that should remember your preferences, relationships, and history
- Customer support agents that should recall previous issues and context
- Research assistants that should track what they’ve learned across sessions
- Enterprise copilots that should understand organizational knowledge
Current State and Adoption (2024-2025)
The AI memory space is rapidly evolving:
| Metric | Value | Source |
|---|---|---|
| Vector DB market size | $1.5B+ (2024) | Industry reports |
| Neo4j enterprise deployments | 75% of Fortune 100 | Neo4j 2024 |
| RAG adoption | 60%+ of production LLM apps | Developer surveys |
| LangChain memory issues | #1 limitation cited | Community feedback |
| Zep users | 10,000+ developers | Zep 2024 |
Why Traditional Approaches Fail
Traditional RAG Memory
┌─────────────────────────────────────────────┐
│ │
│ User Query → Vector Search → Top-K Chunks │
│ │
│ Problems: │
│ • No relationship understanding │
│ • No temporal reasoning │
│ • Context window stuffing │
│ • No contradiction detection │
│ • Information scattered across chunks │
│ │
└─────────────────────────────────────────────┘
Temporal Knowledge Graph Memory
┌─────────────────────────────────────────────┐
│ │
│ User Query → Hybrid Retrieval → Structured │
│ ↓ Context │
│ • Entity + Relationship graph traversal │
│ • Temporal filtering (what's current?) │
│ • Community summaries (bird's eye view) │
│ • Contradiction resolution (latest wins) │
│ • Episodic recall (specific conversations) │
│ │
└─────────────────────────────────────────────┘
What Temporal KGs Enable
-
Longitudinal Reasoning: “Based on our conversations over the past month, what patterns do you see in my concerns about the project?”
-
Entity-Centric Recall: “What do you know about the competitor we discussed?” (retrieves all connected facts, not just keyword matches)
-
Temporal Precision: “What was our strategy for Q3?” (returns Q3 facts, not confused with Q4)
-
Contradiction Handling: “Actually, we changed the deadline to next Friday.” (old deadline invalidated, new one recorded)
-
Organizational Knowledge: “Who’s the best person to talk to about Kubernetes issues?” (traverses expertise relationships)
The Landscape Evolution
2020-2022: Context Window Era
┌────────────────────────────────────────┐
│ Stuff everything in the prompt │
│ Memory = conversation history │
│ Limit: ~4K-8K tokens │
└────────────────────────────────────────┘
↓
2023: Vector RAG Era
┌────────────────────────────────────────┐
│ Embed everything, retrieve top-K │
│ Memory = vector database │
│ Limit: semantic similarity only │
└────────────────────────────────────────┘
↓
2024-2025: Structured Memory Era
┌────────────────────────────────────────┐
│ Temporal knowledge graphs │
│ Hybrid retrieval (vector + graph) │
│ Memory = entities + relationships + │
│ temporal facts + summaries │
└────────────────────────────────────────┘
Industry Momentum
- Microsoft GraphRAG: Uses community detection for global queries
- Zep/Graphiti: Open-source temporal KG for AI memory
- Mem0: Memory layer with graph extensions
- LangGraph: Persistence and memory for agent workflows
- MemGPT/Letta: OS-inspired memory architecture
Why This Matters for Your Career
Understanding temporal knowledge graphs for AI memory positions you at the intersection of:
- Graph databases (high-demand skill)
- LLM applications (fastest-growing area)
- Systems design (architectural thinking)
- AI engineering (emerging discipline)
This is not academic—production AI agents at companies from startups to FAANG are implementing these patterns today.
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Knowledge Graph Foundations | Entities are nodes, relationships are edges, facts are triples. Graph structure enables multi-hop reasoning that vector search cannot do. |
| Episodic vs Semantic Memory | Episodes are timestamped events (“what happened”); semantics are abstracted facts (“what we know”). Both layers serve different query types. |
| Bi-Temporal Data Models | Two time dimensions: valid_time (when true in world) and transaction_time (when recorded). Enables point-in-time queries and audit trails. |
| Graph Databases | Index-free adjacency means O(1) edge traversal. Cypher is the SQL of graphs. Property graphs beat triple stores for AI memory use cases. |
| Entity & Relationship Extraction | LLMs extract structured (entity, relationship, entity) triples from text. Structured output and pipelining prevent hallucination. Entity resolution handles duplicates. |
| Hybrid Retrieval | No single retrieval method handles all queries. Combine semantic (conceptual), graph (relational), and keyword (exact) with RRF fusion and MMR diversity. |
Project-to-Concept Map
| Project | Concepts Applied |
|---|---|
| Project 1: Personal Memory Graph CLI | Knowledge Graph Foundations, Entity Extraction |
| Project 2: Conversation Episode Store | Episodic Memory, Bi-Temporal Models |
| Project 3: Entity Extraction Pipeline | Entity Extraction, Relationship Extraction |
| Project 4: Entity Resolution System | Entity Resolution, Knowledge Graph |
| Project 5: Bi-Temporal Fact Store | Bi-Temporal Models, Graph Databases |
| Project 6: Temporal Query Engine | Bi-Temporal Models, Cypher Queries |
| Project 7: Semantic Memory Synthesizer | Semantic Memory, LLM Summarization |
| Project 8: Community Detection & Summaries | Community Detection, Semantic Memory |
| Project 9: Graphiti Integration | All Concepts (Framework Integration) |
| Project 10: Mem0g Memory Layer | Mem0 Architecture, Hybrid Memory |
| Project 11: MemGPT-Style Virtual Context | Virtual Context, Memory Tiers |
| Project 12: Hybrid Retrieval Engine | Hybrid Retrieval, RRF, MMR |
| Project 13: Multi-Agent Shared Memory | Graph Databases, Access Control |
| Project 14: Production Memory Service | All Concepts (System Integration) |
| Project 15: Memory Benchmark Suite | Evaluation, All Retrieval Methods |
Deep Dive Reading by Concept
This section maps each concept to specific book chapters and resources for deeper understanding. Read these before or alongside the projects.
Knowledge Graph Foundations
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Graph theory basics | “Graph Algorithms” by Mark Needham & Amy Hodler - Ch. 1-2 | Foundation for understanding graph structures |
| Property graphs | “Graph Databases” by Robinson, Webber & Eifrem - Ch. 3 | Neo4j model used by most TKG frameworks |
| Cypher query language | “Learning Neo4j” by Rik Van Bruggen - Ch. 4-6 | Essential for querying knowledge graphs |
| Knowledge representation | “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 2 | Data modeling tradeoffs |
Memory Architecture
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Episodic memory | “Cognitive Science” by Bermúdez - Ch. 8 | Psychological foundations of memory types |
| Memory consolidation | “AI Engineering” by Chip Huyen - Ch. 8 | LLM memory patterns |
| Context management | “Building LLM Applications” (various) | Practical context handling |
Temporal Data
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Bi-temporal modeling | “Temporal Data & The Relational Model” by Date, Darwen & Lorentzos | Canonical reference for temporal databases |
| Time-series patterns | “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 11 | Event sourcing and time handling |
| Temporal queries | Allen’s Interval Algebra (academic paper) | Formal foundation for temporal reasoning |
NLP & Extraction
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Named Entity Recognition | “Speech and Language Processing” by Jurafsky & Martin - Ch. 8 | NER foundations |
| Information extraction | “Natural Language Processing with Transformers” by Tunstall et al. - Ch. 10 | Modern extraction with LLMs |
| Structured output | OpenAI Function Calling docs, Instructor library docs | Practical implementation |
Retrieval Systems
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Vector search | “Introduction to Information Retrieval” by Manning et al. - Ch. 6 | Embedding search foundations |
| Hybrid retrieval | “AI Engineering” by Chip Huyen - Ch. 6-7 | RAG patterns and retrieval |
| Ranking algorithms | “Introduction to Information Retrieval” by Manning et al. - Ch. 7 | Scoring and ranking |
Essential Reading Order
For maximum comprehension, read in this order:
- Foundation (Week 1):
- “Graph Databases” Ch. 1-3 (What are knowledge graphs)
- “Designing Data-Intensive Applications” Ch. 2 (Data models)
- Temporal & Memory (Week 2):
- “AI Engineering” Ch. 8 (Memory for LLMs)
- Zep blog posts on temporal KG architecture
- Implementation (Week 3+):
- Neo4j Cypher documentation
- Graphiti/Zep documentation and source code
- Mem0 documentation
Quick Start: Your First 48 Hours
Day 1: Foundation (4-6 hours)
- Morning: Graph Database Setup (2 hours)
# Install Neo4j locally (Docker recommended) docker run -d --name neo4j \ -p 7474:7474 -p 7687:7687 \ -e NEO4J_AUTH=neo4j/password \ neo4j:latest # Open browser: http://localhost:7474 # Login with neo4j/password - Afternoon: First Knowledge Graph (2 hours)
- Complete Neo4j’s “Movie Graph” tutorial (built-in)
- Write 10 Cypher queries from scratch
- Create a simple “People who know People” graph
- Evening: Read Theory Primer Chapters 1-2 (2 hours)
- Knowledge Graph Foundations
- Episodic vs Semantic Memory
- Complete the homework exercises
Day 2: Temporal + Extraction (4-6 hours)
- Morning: Bi-Temporal Concepts (2 hours)
- Read Theory Primer Chapter 3 (Bi-Temporal Models)
- Add valid_time and transaction_time to your Neo4j nodes
- Write a point-in-time query
- Afternoon: Entity Extraction (2 hours)
- Set up OpenAI API key
- Write a simple entity extractor using structured output
- Extract entities from 5 sample sentences
- Evening: Start Project 1 (2 hours)
- Begin the Personal Memory Graph CLI
- Goal: Store and retrieve 3 facts about yourself
- Verify you can query: “What do I know about X?”
Validation Checkpoints
After 48 hours, you should be able to:
- Run Cypher queries against Neo4j
- Explain the difference between episodic and semantic memory
- Add timestamps to graph nodes
- Extract entities from text using an LLM
- Have a working (basic) personal memory graph
Recommended Learning Paths
Path 1: The Backend Engineer (Focus: Systems & Storage)
Build robust storage and retrieval infrastructure.
Project 2 (Episode Store)
↓
Project 5 (Bi-Temporal Facts)
↓
Project 6 (Temporal Queries)
↓
Project 12 (Hybrid Retrieval)
↓
Project 14 (Production Service)
| *Time: 6-8 weeks | Key skills: Graph databases, temporal modeling, systems design* |
Path 2: The AI/ML Engineer (Focus: Extraction & Intelligence)
Build the intelligence layer that processes and understands content.
Project 3 (Entity Extraction)
↓
Project 4 (Entity Resolution)
↓
Project 7 (Semantic Synthesis)
↓
Project 8 (Community Detection)
↓
Project 15 (Benchmarking)
| *Time: 6-8 weeks | Key skills: NLP, LLMs, information extraction, evaluation* |
Path 3: The Full-Stack Builder (Focus: End-to-End)
Build complete memory systems using existing frameworks.
Project 1 (Personal Memory CLI)
↓
Project 9 (Graphiti Integration)
↓
Project 10 (Mem0g Layer)
↓
Project 11 (MemGPT Virtual Context)
↓
Project 13 (Multi-Agent Memory)
| *Time: 5-7 weeks | Key skills: Framework integration, API design, practical application* |
Path 4: The Speed Runner (Minimum Viable Understanding)
Fastest path to building something useful.
Project 1 (Personal Memory CLI) - Weekend
↓
Project 9 (Graphiti Integration) - 1 week
↓
Project 14 (Production Service) - 2 weeks
| *Time: 3-4 weeks | Key skills: Practical integration, production deployment* |
Path 5: The Researcher (Focus: Evaluation & Improvement)
For those wanting to advance the field or deeply understand tradeoffs.
Project 3 (Entity Extraction)
↓
Project 12 (Hybrid Retrieval)
↓
Project 15 (Benchmark Suite)
↓
Project 8 (Community Detection)
| *Time: 8-10 weeks | Key skills: Benchmarking, research methodology, ablation studies* |
Success Metrics
You have achieved Level 1 (Foundation) when you can:
- Explain why vector-only RAG is insufficient for agent memory
- Write Cypher queries to traverse 3+ hops in a graph
- Distinguish episodic from semantic memory with examples
- Add bi-temporal properties to any data model
You have achieved Level 2 (Practitioner) when you can:
- Build a complete entity extraction pipeline
- Implement RRF fusion for hybrid retrieval
- Configure and deploy Graphiti or Mem0g
- Debug temporal query anomalies
- Design a memory schema for a new domain
You have achieved Level 3 (Expert) when you can:
- Architect a production memory system for 100K+ users
- Benchmark and compare retrieval strategies quantitatively
- Implement custom community detection algorithms
- Optimize graph queries for sub-100ms latency
- Handle multi-agent memory isolation and sharing
Measurable Milestones
| Milestone | Metric | Target |
|---|---|---|
| Graph fluency | Cypher queries written | 50+ without reference |
| Extraction accuracy | Entity F1 score | >0.85 on test set |
| Retrieval quality | MRR@10 | >0.7 on benchmark |
| System performance | Query latency p95 | <200ms |
| Production readiness | Uptime | 99.9% over 30 days |
Project List
The following 15 projects guide you from basic knowledge graph operations to production-grade temporal memory systems for AI agents. Each project builds on previous concepts while introducing new challenges.
Project 1: Personal Memory Graph CLI
- File: P01-personal-memory-graph-cli.md
- Expanded Project Guide: P01-personal-memory-graph-cli.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript, Go, Rust
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Knowledge Graphs, Graph Databases
- Software or Tool: Neo4j, Python, Click/Typer
- Main Book: “Graph Databases” by Robinson, Webber & Eifrem
What you will build: A command-line tool that lets you store personal facts (“I work at Acme”, “I know Alice”) and query them using natural language, building your first knowledge graph from scratch.
Why it teaches temporal knowledge graphs: This is your “Hello World” for knowledge graphs. You’ll confront the fundamental challenge: how do you represent facts about the world as nodes and relationships? By building a personal memory, you’ll internalize graph thinking before adding temporal complexity.
Core challenges you will face:
- Designing your first schema → Maps to Knowledge Graph Foundations
- Writing Cypher queries → Maps to Graph Database Operations
- Parsing natural language into triples → Maps to Entity Extraction basics
- Handling queries that span multiple hops → Maps to Graph Traversal
Real World Outcome
You’ll have a working CLI that stores facts about your life and answers questions by traversing the graph.
Example Session:
$ memory add "I work at Acme Corp as a Software Engineer"
✓ Added: (You)-[:WORKS_AT {role: "Software Engineer"}]->(Acme Corp)
✓ Added: (You)-[:HAS_ROLE]->(Software Engineer)
$ memory add "Alice is my manager at Acme"
✓ Added: (Alice)-[:MANAGES]->(You)
✓ Added: (Alice)-[:WORKS_AT]->(Acme Corp)
$ memory add "Bob works on the Platform team with me"
✓ Added: (Bob)-[:WORKS_ON]->(Platform Team)
✓ Added: (You)-[:WORKS_ON]->(Platform Team)
$ memory query "Who do I work with?"
Based on your knowledge graph:
• You work at Acme Corp
• Alice is your manager at Acme Corp
• Bob works on the Platform Team with you
Graph traversal: (You)-[:WORKS_AT|WORKS_ON]->()<-[:WORKS_AT|WORKS_ON]-(?)
$ memory query "What's my relationship with Alice?"
Alice is your manager at Acme Corp.
Path: (You)<-[:MANAGES]-(Alice), (You)-[:WORKS_AT]->(Acme Corp)<-[:WORKS_AT]-(Alice)
$ memory show
Nodes: 5 (You, Acme Corp, Alice, Bob, Platform Team)
Relationships: 6
Last updated: 2025-01-03 14:32:00
What you’ll see in Neo4j Browser:
Navigate to http://localhost:7474 and run:
MATCH (n) RETURN n
You’ll see an interactive visualization with nodes as circles and relationships as arrows connecting them.
The Core Question You Are Answering
“How do I represent knowledge about my world as a graph, and how do I query it?”
This is the fundamental question of knowledge representation. Before temporal knowledge graphs, before AI agents, before anything else—you need to understand that facts can be decomposed into entities and relationships, and that graphs let you traverse those relationships in powerful ways.
Concepts You Must Understand First
- Property Graphs
- What’s the difference between a node and an edge?
- What are labels and properties?
- Why use graphs instead of tables?
- Book Reference: “Graph Databases” by Robinson, Webber & Eifrem - Ch. 2-3
- Basic Cypher
- How do you CREATE nodes and relationships?
- How do you MATCH patterns?
- How do you RETURN results?
- Book Reference: Neo4j Cypher Manual, “Learning Neo4j” - Ch. 4
- Triple Thinking
- How do you decompose “Alice manages Bob at Acme” into triples?
- When should something be a node vs. a property?
- Book Reference: “Designing Data-Intensive Applications” - Ch. 2
Questions to Guide Your Design
- Schema Design
- What entities will you track? (People, Organizations, Roles, Projects?)
- What relationship types do you need? (WORKS_AT, KNOWS, MANAGES?)
- Where do you store attributes like “since 2023”?
- Query Interface
- How will users phrase questions?
- Do you need NLP or can you use simple patterns?
- How do you translate questions to Cypher?
- Data Entry
- Free text parsing vs. structured commands?
- How do you handle ambiguous input?
- How do you confirm what was added?
Thinking Exercise
Before coding, trace this on paper:
Given these three statements:
- “I started working at TechCorp in 2022”
- “Sarah is the CEO of TechCorp”
- “I report to Sarah”
Draw the graph that represents these facts. For each node, decide:
- What label should it have?
- What properties should it have?
- What relationships connect it?
Questions while drawing:
- Is “2022” a property on the relationship or a separate node?
- Should “CEO” be a node or a property on Sarah?
- How would you query “Who is my CEO?” (requires multi-hop!)
The Interview Questions They Will Ask
- “Explain when you’d use a graph database vs. a relational database.”
- “How do you model many-to-many relationships in a graph vs. SQL?”
- “What is index-free adjacency and why does it matter for traversal performance?”
- “Walk me through how you’d find the shortest path between two entities.”
- “How do you handle bi-directional relationships in a property graph?”
Hints in Layers
Hint 1: Starting Point
Use Neo4j’s Python driver (neo4j package). Start with just three Cypher queries: CREATE for adding, MATCH for reading, and MERGE for upserts.
Hint 2: Simple Schema
Start with just two node labels: Person and Organization. Add more as needed. Relationship types like WORKS_AT, KNOWS, MANAGES cover most cases.
Hint 3: Query Translation For MVP, use keyword matching: “who” → find people, “where” → find organizations, “work” → WORKS_AT relationship. You don’t need full NLP yet.
Hint 4: Debugging
Use Neo4j Browser to visualize your graph. Run MATCH (n) RETURN n LIMIT 50 to see all nodes. Check relationship directions carefully—they matter!
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Property graph model | “Graph Databases” by Robinson et al. | Ch. 2-3 |
| Cypher basics | “Learning Neo4j” by Van Bruggen | Ch. 4-6 |
| Data modeling | “Designing Data-Intensive Applications” by Kleppmann | Ch. 2 |
| Python CLI | “Click” documentation | Getting Started |
Common Pitfalls and Debugging
Problem 1: “Duplicate nodes keep appearing”
- Why: Using CREATE instead of MERGE
- Fix: Always use
MERGEfor entities that might already exist - Quick test:
MATCH (n:Person {name: "Alice"}) RETURN count(n)should return 1
Problem 2: “Can’t find paths that should exist”
- Why: Relationship direction matters in MATCH
- Fix: Use undirected patterns
(a)-[:KNOWS]-(b)when direction doesn’t matter - Quick test: Try both
(a)-[:REL]->(b)and(a)<-[:REL]-(b)
Problem 3: “Queries return nothing but nodes exist”
- Why: Label or property name typo (case-sensitive!)
- Fix: Use
MATCH (n) RETURN labels(n), keys(n)to inspect - Quick test:
MATCH (n) WHERE n.name CONTAINS "Ali" RETURN n
Definition of Done
- Can add facts via CLI:
memory add "fact" - Can query facts:
memory query "question" - Graph has at least 10 nodes and 15 relationships
- Can traverse 2+ hops: “Who does my manager report to?”
- Data persists across CLI restarts (stored in Neo4j)
- Can visualize graph in Neo4j Browser
Project 2: Conversation Episode Store
- File: P02-conversation-episode-store.md
- Expanded Project Guide: P02-conversation-episode-store.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Episodic Memory, Temporal Data
- Software or Tool: Neo4j, PostgreSQL (optional), Python
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you will build: A system that stores conversations as timestamped episodes, links them to entities mentioned, and enables queries like “What did we discuss about Project X last week?”
Why it teaches temporal knowledge graphs: This project introduces the temporal dimension. Every conversation has a start and end time. Facts mentioned in conversations have a “when we learned it” timestamp. You’ll build the episodic layer that forms the foundation of agent memory.
Core challenges you will face:
- Modeling episodes with temporal bounds → Maps to Bi-Temporal Models
- Linking episodes to entities → Maps to Knowledge Graph structure
- Querying across time ranges → Maps to Temporal Query patterns
- Handling episode boundaries → Maps to Episodic Memory concepts
Real World Outcome
You’ll have a conversation storage system that remembers when things were discussed and can recall context based on time and entities.
Example Session:
$ episodes ingest --source chat_log.json
Processing 47 messages...
✓ Created 12 episodes
✓ Extracted 34 entity mentions
✓ Linked 89 episode-entity relationships
$ episodes query "What did we discuss about the API migration?"
Found 3 episodes mentioning "API migration":
[Episode 2024-12-15 14:30 - 15:45]
Participants: Alice, Bob
Summary: Discussed timeline for API v2 migration. Decided to delay
until Q1 due to dependency on auth service.
Key entities: API v2, Auth Service, Q1 Timeline
Confidence: 0.92
[Episode 2024-12-20 10:00 - 10:30]
Participants: Alice, Carol
Summary: Carol raised concerns about backward compatibility.
Agreed to maintain v1 endpoints for 6 months.
Key entities: API v1, API v2, Backward Compatibility
Confidence: 0.87
[Episode 2025-01-02 09:15 - 09:45]
Participants: Bob
Summary: Bob confirmed auth service will be ready by Jan 15.
Green light for migration to proceed.
Key entities: Auth Service, API v2, January Timeline
Confidence: 0.94
$ episodes timeline --entity "API migration" --last 30d
Timeline for "API migration":
Dec 15 ──●── "Delay until Q1" (Alice, Bob)
│
Dec 20 ──●── "Maintain v1 for 6 months" (Alice, Carol)
│
Jan 02 ──●── "Auth ready Jan 15, proceed" (Bob)
Current status: Migration approved, waiting on Auth Service
What you’ll see in Neo4j:
// Query episode with its entities
MATCH (e:Episode)-[:MENTIONS]->(entity)
WHERE e.start_time > datetime('2024-12-01')
RETURN e, entity
You’ll see Episode nodes connected to Entity nodes, forming a bipartite graph where you can trace what was discussed when.
The Core Question You Are Answering
“How do I store conversations so I can recall what was said, when it was said, and what it was about?”
This is the fundamental episodic memory challenge. Humans remember events in context—who was there, what happened, when it occurred. Your system needs to capture this richness rather than treating all text as a flat bag of words.
Concepts You Must Understand First
- Episode Structure
- What defines episode boundaries? (time gaps, topic shifts, participants)
- What metadata should an episode have?
- How do episodes differ from raw messages?
- Book Reference: “AI Engineering” by Chip Huyen - Ch. 8
- Temporal Properties
- How do you store timestamps in Neo4j?
- What’s the difference between
datetime()andtimestamp()? - How do you query time ranges in Cypher?
- Book Reference: Neo4j Temporal documentation
- Entity Linking
- How do you connect mentions in text to canonical entities?
- What’s the MENTIONS relationship pattern?
- How do you handle ambiguous references?
- Book Reference: “Speech and Language Processing” by Jurafsky & Martin - Ch. 22
Questions to Guide Your Design
- Episode Boundaries
- What triggers a new episode? (time gap > N minutes? topic change?)
- Can episodes overlap?
- How long is a typical episode?
- Entity Extraction
- Do you extract entities during ingestion or query time?
- How do you handle mentions of the same entity with different names?
- What entity types matter? (People, Projects, Dates, Decisions)
- Temporal Queries
- How do you express “last week” in Cypher?
- Can you query “episodes where X was discussed before Y”?
- How do you rank by recency vs. relevance?
Thinking Exercise
Before coding, design the schema:
Given this conversation fragment:
[2024-12-15 14:30] Alice: Let's discuss the API migration
[2024-12-15 14:32] Bob: I think we should wait for the auth service
[2024-12-15 14:35] Alice: Good point. Let's target Q1 then
[2024-12-15 14:40] Bob: Agreed. I'll update the roadmap
--- 2 hour gap ---
[2024-12-15 16:45] Alice: Quick question about the database backup
Draw the Episode and Entity nodes with their relationships. Decide:
- How many episodes?
- What are the episode boundaries?
- What entities are mentioned?
- What temporal properties do nodes have?
The Interview Questions They Will Ask
- “How do you decide where one episode ends and another begins?”
- “Explain the tradeoffs between storing raw messages vs. episode summaries.”
- “How would you handle a query like ‘What did Alice and Bob disagree about?’”
- “What indexes would you create for efficient temporal queries?”
- “How do you handle entity mentions that span multiple episodes?”
Hints in Layers
Hint 1: Starting Point
Create Episode nodes with start_time, end_time, summary, and participants properties. Create Entity nodes and link them with MENTIONS relationships.
Hint 2: Boundary Detection Simple approach: new episode after 30+ minute gap. Better approach: use LLM to detect topic shifts. Start simple, improve later.
Hint 3: Temporal Queries Use Cypher’s temporal functions:
WHERE e.start_time > datetime() - duration('P7D') // last 7 days
Hint 4: Debugging Create a “timeline view” function that prints episodes chronologically. Visual inspection catches boundary issues fast.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Episodic memory patterns | “AI Engineering” by Chip Huyen | Ch. 8 |
| Temporal data modeling | “Designing Data-Intensive Applications” | Ch. 11 |
| Neo4j temporal types | Neo4j Documentation | Temporal section |
| Entity linking | “Speech and Language Processing” | Ch. 22 |
Common Pitfalls and Debugging
Problem 1: “Episodes are too long or too short”
- Why: Poor boundary detection heuristics
- Fix: Tune gap threshold, add topic detection, or use LLM segmentation
- Quick test: Check average episode duration—should be 5-30 minutes for conversations
Problem 2: “Same entity has multiple nodes”
- Why: Not normalizing entity names before creating nodes
- Fix: Implement entity resolution (see Project 4) or use MERGE with canonical names
- Quick test:
MATCH (e:Entity) RETURN e.name, count(*) ORDER BY count(*) DESC
Problem 3: “Temporal queries are slow”
- Why: Missing indexes on temporal properties
- Fix: Create index:
CREATE INDEX FOR (e:Episode) ON (e.start_time) - Quick test:
PROFILE MATCH (e:Episode) WHERE e.start_time > datetime()...
Definition of Done
- Can ingest conversation data (JSON/CSV format)
- Creates Episode nodes with temporal bounds
- Extracts and links mentioned entities
- Queries by time range work: “last week”, “December”
- Queries by entity work: “episodes about X”
- Can show timeline visualization for an entity
- Has at least 20 episodes with 50+ entity mentions
Project 3: Entity Extraction Pipeline
- File: P03-entity-extraction-pipeline.md
- Expanded Project Guide: P03-entity-extraction-pipeline.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: NLP, Information Extraction, LLMs
- Software or Tool: OpenAI/Anthropic API, Instructor, Pydantic
- Main Book: “Natural Language Processing with Transformers” by Tunstall et al.
What you will build: A pipeline that extracts structured entities (people, organizations, concepts) and relationships from unstructured text, outputting clean triples for knowledge graph ingestion.
Why it teaches temporal knowledge graphs: The knowledge graph is only as good as the data you put into it. This project tackles the hard problem: turning messy human text into structured facts. You’ll learn prompt engineering, structured output, and extraction pipelines.
Core challenges you will face:
- Getting LLMs to output structured data → Maps to Structured Output
- Extracting typed relationships → Maps to Relationship Extraction
- Handling extraction errors gracefully → Maps to Pipeline Robustness
- Balancing precision vs. recall → Maps to Extraction Quality
Real World Outcome
You’ll have a pipeline that takes raw text and outputs structured knowledge graph triples.
Example Session:
$ extract --text "Alice joined Acme Corp as CTO in 2023. She reports to Bob, the CEO."
Extracting from text (127 chars)...
Entities extracted:
┌─────────────┬─────────────┬────────────────────────────────┐
│ Name │ Type │ Properties │
├─────────────┼─────────────┼────────────────────────────────┤
│ Alice │ PERSON │ {} │
│ Bob │ PERSON │ {} │
│ Acme Corp │ ORG │ {} │
│ CTO │ ROLE │ {} │
│ CEO │ ROLE │ {} │
│ 2023 │ DATE │ {year: 2023} │
└─────────────┴─────────────┴────────────────────────────────┘
Relationships extracted:
┌─────────────┬─────────────┬─────────────┬──────────────────┐
│ Subject │ Predicate │ Object │ Properties │
├─────────────┼─────────────┼─────────────┼──────────────────┤
│ Alice │ WORKS_AT │ Acme Corp │ {since: 2023} │
│ Alice │ HAS_ROLE │ CTO │ {} │
│ Alice │ REPORTS_TO │ Bob │ {} │
│ Bob │ HAS_ROLE │ CEO │ {} │
│ Bob │ WORKS_AT │ Acme Corp │ {inferred: true} │
└─────────────┴─────────────┴─────────────┴──────────────────┘
$ extract --file meeting_notes.txt --output triples.json
Processing 2,456 chars across 3 paragraphs...
Extracted: 12 entities, 18 relationships
Confidence scores: min=0.72, avg=0.89, max=0.98
Output written to triples.json
$ extract --text "The project was cancelled" --validate
⚠ Warning: Low-information extraction
- No named entities found
- "project" is too generic without context
- Consider providing more context or entity hints
Output Format (triples.json):
{
"entities": [
{"id": "e1", "name": "Alice", "type": "PERSON", "confidence": 0.95},
{"id": "e2", "name": "Acme Corp", "type": "ORGANIZATION", "confidence": 0.92}
],
"relationships": [
{
"subject": "e1",
"predicate": "WORKS_AT",
"object": "e2",
"properties": {"since": "2023", "role": "CTO"},
"confidence": 0.88,
"source_span": [0, 35]
}
]
}
The Core Question You Are Answering
“How do I reliably convert unstructured text into structured knowledge graph facts?”
This is the information extraction challenge at the heart of knowledge graph construction. Without good extraction, your graph is empty or noisy. You’ll learn that extraction is not just NER—it’s typed relationships with properties.
Concepts You Must Understand First
- Structured Output from LLMs
- How do function calling and tool use work?
- What is the Instructor library?
- How do Pydantic models constrain outputs?
- Book Reference: OpenAI Function Calling documentation, Instructor docs
- Entity and Relationship Types
- What entity types should you extract? (PERSON, ORG, CONCEPT, DATE)
- What relationship types make sense? (WORKS_AT, KNOWS, RELATED_TO)
- How do you handle open-ended vs. constrained schemas?
- Book Reference: “NLP with Transformers” by Tunstall et al. - Ch. 10
- Extraction Quality
- What is precision vs. recall in extraction?
- How do you measure extraction quality?
- When should you favor precision over recall?
- Book Reference: “Speech and Language Processing” - Ch. 8
Questions to Guide Your Design
- Schema Definition
- What entity types does your domain need?
- What relationship types are most common?
- Should you use a fixed schema or allow open extraction?
- Prompt Engineering
- How do you instruct the LLM to extract consistently?
- Do you use few-shot examples?
- How do you handle edge cases in the prompt?
- Pipeline Architecture
- Single LLM call or multi-stage pipeline?
- How do you handle long documents?
- How do you validate outputs?
Thinking Exercise
Before coding, manually extract from this text:
"Yesterday, Sarah from Engineering mentioned that the new authentication
system developed by the Security team is causing issues with the mobile app.
She's scheduled a meeting with Marcus (mobile lead) and Chen (security)
for Friday to resolve this."
Extract all entities and relationships. For each:
- What type is it?
- What confidence would you assign?
- What relationships exist?
- What’s ambiguous or requires inference?
The Interview Questions They Will Ask
- “How do you handle extraction from text that’s longer than the context window?”
- “What’s the tradeoff between few-shot prompting and fine-tuning for extraction?”
- “How do you measure and improve extraction quality over time?”
- “What do you do when the LLM extracts hallucinated entities?”
- “How do you handle coreference resolution (e.g., ‘she’ referring to ‘Sarah’)?”
Hints in Layers
Hint 1: Starting Point
Use the Instructor library with Pydantic models. Define Entity and Relationship classes with required fields. Let the LLM fill in the structure.
Hint 2: Prompt Structure
Extract entities and relationships from the following text.
Entity types: PERSON, ORGANIZATION, ROLE, PROJECT, DATE
Relationship types: WORKS_AT, REPORTS_TO, WORKS_ON, KNOWS
Text: {text}
Hint 3: Validation Add a validation step: check that all relationship subjects/objects are in the entity list. Check that entity types are from allowed set.
Hint 4: Debugging Log the raw LLM response before parsing. When extraction fails, compare expected vs. actual output. Build a test set of 20 sentences with expected extractions.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| LLM extraction | “NLP with Transformers” by Tunstall et al. | Ch. 10 |
| Structured output | Instructor documentation | All |
| Information extraction | “Speech and Language Processing” | Ch. 17-18 |
| Prompt engineering | “AI Engineering” by Chip Huyen | Ch. 3 |
Common Pitfalls and Debugging
Problem 1: “LLM outputs malformed JSON”
- Why: Not using structured output correctly
- Fix: Use Instructor or OpenAI function calling, not raw JSON prompts
- Quick test: Validate with Pydantic before processing
Problem 2: “Extracts entities but misses relationships”
- Why: Prompt focuses on entities, not relationships
- Fix: Explicitly prompt for relationship extraction in a second pass
- Quick test: Run on text with obvious relationships, count extracted vs. expected
Problem 3: “Too many/too few entities extracted”
- Why: Prompt ambiguity on what counts as an entity
- Fix: Provide explicit examples and counter-examples in prompt
- Quick test: Check precision/recall on a labeled test set
Definition of Done
- Extracts entities with type and confidence
- Extracts relationships with properties
- Handles multi-sentence input
- Outputs JSON suitable for graph ingestion
- Achieves >80% precision on test set of 20 sentences
- Handles gracefully when no entities found
- Documents supported entity and relationship types
Project 4: Entity Resolution System
- File: P04-entity-resolution-system.md
- Expanded Project Guide: P04-entity-resolution-system.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Entity Resolution, Deduplication, NLP
- Software or Tool: Neo4j, Embedding Model, Python
- Main Book: “Speech and Language Processing” by Jurafsky & Martin
What you will build: A system that determines when two entity mentions refer to the same real-world entity (“Bob Smith”, “Robert”, “Bob S.”) and merges them in the knowledge graph.
Why it teaches temporal knowledge graphs: Without entity resolution, your graph becomes polluted with duplicates. “Bob” and “Robert” might be the same person, mentioned in different conversations. This project teaches you to maintain graph integrity as data grows.
Core challenges you will face:
- Detecting potential duplicates → Maps to Similarity Metrics
- Deciding when to merge → Maps to Threshold Tuning
- Merging without data loss → Maps to Graph Operations
- Handling false positives → Maps to Human-in-the-loop
Real World Outcome
You’ll have a system that finds duplicate entities and merges them, keeping your knowledge graph clean.
Example Session:
$ resolve scan --threshold 0.8
Scanning 156 entities for potential duplicates...
High-confidence matches (auto-merge candidates):
┌──────────────────┬──────────────────┬─────────┬────────────┐
│ Entity A │ Entity B │ Score │ Evidence │
├──────────────────┼──────────────────┼─────────┼────────────┤
│ Bob Smith │ Robert Smith │ 0.95 │ same email │
│ Acme Corp │ Acme Corporation │ 0.92 │ alias │
│ Q1 Planning │ Q1 Planning Mtg │ 0.89 │ overlap │
└──────────────────┴──────────────────┴─────────┴────────────┘
Medium-confidence matches (review recommended):
┌──────────────────┬──────────────────┬─────────┬────────────┐
│ Entity A │ Entity B │ Score │ Evidence │
├──────────────────┼──────────────────┼─────────┼────────────┤
│ Alice │ Alice Chen │ 0.75 │ name │
│ Platform Team │ Platform │ 0.71 │ substring │
└──────────────────┴──────────────────┴─────────┴────────────┘
$ resolve merge "Bob Smith" "Robert Smith"
Merging: Robert Smith → Bob Smith
Before merge:
Bob Smith: 12 relationships, 3 episodes
Robert Smith: 8 relationships, 2 episodes
After merge:
Bob Smith: 18 relationships, 5 episodes (2 deduped)
Robert Smith: archived as alias
✓ Merge complete. Created alias: Robert Smith → Bob Smith
$ resolve history
Recent resolution actions:
2025-01-03 14:30: Merged "Robert Smith" → "Bob Smith"
2025-01-03 14:28: Marked "Alice" ≠ "Alice Chen" (different people)
2025-01-02 10:15: Auto-merged "Acme Corp" → "Acme Corporation"
$ resolve undo --last
Undoing: Merged "Robert Smith" → "Bob Smith"
✓ Entities restored to pre-merge state
The Core Question You Are Answering
“When do two different mentions refer to the same real-world entity, and how do I safely merge them?”
This is the entity resolution problem—fundamental to any knowledge base. Without it, you have a graph full of duplicates that fragment your knowledge. With it, you can confidently say “Bob” in conversation 1 and “Robert” in conversation 5 are the same person.
Concepts You Must Understand First
- Similarity Metrics
- What is string similarity? (Levenshtein, Jaro-Winkler, fuzzy matching)
- What is embedding similarity? (cosine, euclidean)
- When do you use string vs. embedding similarity?
- Book Reference: “Speech and Language Processing” - Ch. 2
- Blocking Strategies
- How do you avoid comparing every pair (O(n²))?
- What is blocking? (grouping by first letter, type, etc.)
- How do you balance recall vs. efficiency?
- Book Reference: Entity Resolution literature (academic papers)
- Merge Strategies
- How do you combine properties from merged entities?
- How do you handle relationship conflicts?
- Should you delete or archive the merged entity?
- Book Reference: “Designing Data-Intensive Applications” - Ch. 5
Questions to Guide Your Design
- Candidate Generation
- How do you find potential duplicates without checking all pairs?
- What properties indicate likely matches?
- How do you handle different entity types?
- Scoring
- What features contribute to the match score?
- How do you weight different signals?
- What threshold separates matches from non-matches?
- Merge Operations
- What happens to relationships of the merged entity?
- How do you preserve provenance?
- Can merges be undone?
Thinking Exercise
Before coding, work through this scenario:
Your graph has these entities:
e1: {name: "Bob", type: PERSON, email: "bob@acme.com"}
e2: {name: "Robert Smith", type: PERSON, department: "Engineering"}
e3: {name: "Bob S.", type: PERSON}
e4: {name: "Bob", type: PROJECT} // Different Bob!
And these relationships:
e1 -[WORKS_AT]-> Acme
e2 -[WORKS_AT]-> Acme
e3 -[MANAGES]-> Project X
e4 -[OWNED_BY]-> Engineering
Work through:
- Which entities might be the same person?
- What evidence supports/opposes each merge?
- If e1 and e2 are merged, what’s the result?
- How do you avoid merging e1 with e4 (different types)?
The Interview Questions They Will Ask
- “How do you scale entity resolution to millions of entities?”
- “What’s the difference between precision and recall in entity resolution?”
- “How do you handle transitivity? If A=B and B=C, does A=C?”
- “What signals beyond name similarity help identify matches?”
- “How do you handle entity resolution when new data arrives continuously?”
Hints in Layers
Hint 1: Starting Point Start with same-type entities only. Use fuzzy string matching (fuzz library) for names. Score > 0.9 = likely match.
Hint 2: Better Signals Add context: same email = strong match. Connected to same organization = medium signal. Embedding similarity on entity descriptions.
Hint 3: Blocking Group by entity type and first letter. Only compare within blocks. This turns O(n²) into O(n).
Hint 4: Debugging Keep a resolution log. When you find a false positive (wrongly merged), analyze what signals were misleading. Build a labeled test set.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| String matching | “Speech and Language Processing” | Ch. 2 |
| Entity resolution | “Data Matching” by Christen | Ch. 3-5 |
| Embedding similarity | “NLP with Transformers” | Ch. 5 |
| Graph merging | Neo4j APOC documentation | Merge functions |
Common Pitfalls and Debugging
Problem 1: “Merged entities that shouldn’t be merged”
- Why: Threshold too low, or missing negative signals
- Fix: Add type checking, raise threshold, require multiple signals
- Quick test: Check if entity types match before considering merge
Problem 2: “Obvious duplicates not detected”
- Why: Blocking too aggressive, or similarity metric inappropriate
- Fix: Use embedding similarity for semantic matches, relax blocking
- Quick test: Manually add known duplicates, verify detection
Problem 3: “Merge breaks relationships”
- Why: Relationship direction or properties not preserved
- Fix: Use
MERGEwithON CREATE/ON MATCHto preserve data - Quick test: Count relationships before/after merge
Definition of Done
- Scans graph for potential duplicates
- Scores pairs with explainable signals
- Merges entities preserving all relationships
- Creates aliases for merged names
- Can undo recent merges
- Achieves >90% precision on labeled test set
- Handles continuous resolution as new entities arrive
Project 5: Bi-Temporal Fact Store
- File: P05-bi-temporal-fact-store.md
- Expanded Project Guide: P05-bi-temporal-fact-store.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Temporal Databases, Bi-Temporal Modeling
- Software or Tool: Neo4j, PostgreSQL (optional), Python
- Main Book: “Temporal Data & The Relational Model” by Date, Darwen & Lorentzos
What you will build: A storage system where every fact has two timestamps—when it was true in the real world (valid_time) and when it was recorded (transaction_time)—enabling queries like “What did we believe about X on date Y?” and “Show me the history of changes to this fact.”
Why it teaches temporal knowledge graphs: This is the core temporal infrastructure. Bi-temporal models let you track not just what’s true now, but what was true when, and what you knew when. This is essential for AI agents that need to reason about time and handle corrections.
Core challenges you will face:
- Implementing two-time dimensions → Maps to Bi-Temporal Data Models
- Invalidating facts without deleting → Maps to Temporal Versioning
- Point-in-time queries → Maps to Temporal Query Patterns
- Handling time zone edge cases → Maps to Temporal Data Handling
Real World Outcome
You’ll have a fact store that tracks truth across two time dimensions, enabling sophisticated temporal queries.
Example Session:
$ facts add "Alice works at Acme" --valid-from 2023-01-15
✓ Fact stored
valid_time: [2023-01-15, ∞)
transaction_time: [2025-01-03T14:30:00, ∞)
$ facts add "Alice works at TechCorp" --valid-from 2024-06-01
✓ Fact stored (supersedes previous employment)
Invalidated: (Alice)-[WORKS_AT]->(Acme) valid_time now [2023-01-15, 2024-06-01)
New: (Alice)-[WORKS_AT]->(TechCorp) valid_time [2024-06-01, ∞)
$ facts query "Where did Alice work in March 2023?"
Point-in-time query: valid_time contains 2023-03-15
Result: Alice worked at Acme
Fact: (Alice)-[WORKS_AT]->(Acme)
Valid: 2023-01-15 to 2024-06-01
Recorded: 2025-01-03T14:30:00
$ facts query "Where did Alice work in August 2024?"
Point-in-time query: valid_time contains 2024-08-15
Result: Alice worked at TechCorp
Fact: (Alice)-[WORKS_AT]->(TechCorp)
Valid: 2024-06-01 to present
Recorded: 2025-01-03T14:32:00
$ facts history "Alice employment"
Employment history for Alice:
Timeline (valid_time):
2023-01-15 ────────────────── 2024-06-01 ────────────────── present
│ Acme Corp │ TechCorp │
└─────────────────────────────┴─────────────────────────┘
Transaction history:
2025-01-03T14:30:00: Recorded "Alice works at Acme" (valid from 2023-01-15)
2025-01-03T14:32:00: Recorded "Alice works at TechCorp" (valid from 2024-06-01)
→ Invalidated Acme employment at 2024-06-01
$ facts as-of 2025-01-03T14:31:00 "Where does Alice work?"
As-of query: transaction_time <= 2025-01-03T14:31:00
Result: Alice works at Acme (we didn't know about TechCorp yet)
The Core Question You Are Answering
“How do I track both when facts were true and when I learned them, so I can query any historical state?”
This is the bi-temporal challenge. In the real world, facts change (Alice changes jobs) and our knowledge changes (we learn about it later). A bi-temporal store lets you separate these concerns and answer questions like “What did we believe last Tuesday?”
Concepts You Must Understand First
- Bi-Temporal Dimensions
- What is valid time? (when fact is true in the world)
- What is transaction time? (when fact was recorded)
- Why do you need both?
- Book Reference: “Temporal Data & The Relational Model” - Ch. 1-3
- Temporal Intervals
- How do you represent [start, end) intervals?
- What does “open” vs. “closed” interval mean?
- How do you query interval overlap?
- Book Reference: Allen’s Interval Algebra
- Fact Invalidation
- How do you “delete” without losing history?
- What happens when facts contradict?
- How do you handle retroactive corrections?
- Book Reference: “Designing Data-Intensive Applications” - Ch. 11
Questions to Guide Your Design
- Time Representation
- Do you store instant or interval? ([start, end) is common)
- How do you represent “forever” (infinity)?
- How do you handle time zones?
- Fact Updates
- When a fact changes, do you modify or insert new?
- How do you link old and new versions?
- Can you have overlapping valid times?
- Query Patterns
- How do you query “as of” a transaction time?
- How do you query “at” a valid time?
- How do you query the intersection?
Thinking Exercise
Before coding, work through this timeline:
Events in the real world:
- Jan 2023: Alice joins Acme
- Jun 2024: Alice moves to TechCorp
- Sep 2024: We learn Alice was actually at Acme since Dec 2022 (retroactive correction)
Recording timeline:
- Day 1: We record “Alice at Acme from Jan 2023”
- Day 2: We record “Alice at TechCorp from Jun 2024”
- Day 3: We correct: “Alice at Acme from Dec 2022” (not Jan 2023)
Draw the bi-temporal table/graph showing:
- What does the database look like after each day?
- Query “Where was Alice in Feb 2023?” on Day 2 vs. Day 3
- What rows/nodes have been invalidated?
The Interview Questions They Will Ask
- “Explain the difference between valid time and transaction time with a concrete example.”
- “How would you implement ‘as of’ queries efficiently?”
- “What indexes do you need for bi-temporal queries?”
- “How do you handle corrections to historical facts?”
- “What are the storage implications of bi-temporal vs. single-temporal?”
Hints in Layers
Hint 1: Starting Point
Add four properties to relationships: valid_from, valid_to, txn_from, txn_to. Use datetime.max or a far-future date for “infinity.”
Hint 2: Invalidation Pattern
Never delete. When a fact changes, set valid_to on the old fact and create a new fact. Set txn_to when correcting history.
Hint 3: Query Pattern
// Point-in-time: valid_time contains target_date
MATCH (a)-[r:WORKS_AT]->(c)
WHERE r.valid_from <= $target_date < r.valid_to
AND r.txn_to IS NULL // current knowledge
RETURN a, c
Hint 4: Debugging Create a “timeline view” that visualizes both dimensions. Test with known facts and verify queries return expected results at each time point.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Bi-temporal fundamentals | “Temporal Data & The Relational Model” | Ch. 1-6 |
| Interval queries | Neo4j Temporal documentation | All |
| Event sourcing | “Designing Data-Intensive Applications” | Ch. 11 |
| Time handling | “Pragmatic Programmer” | Time zone chapter |
Common Pitfalls and Debugging
Problem 1: “Queries return duplicates”
- Why: Not filtering by transaction_time (seeing all versions)
- Fix: Add
txn_to IS NULLfor current knowledge queries - Quick test: Count results—should match expected unique facts
Problem 2: “Infinite dates cause errors”
- Why:
datetime.maxdoesn’t work in some systems - Fix: Use a far-future date like ‘9999-12-31’ consistently
- Quick test: Insert and query a currently-valid fact
Problem 3: “Retroactive corrections don’t show”
- Why: Only querying valid_time, ignoring transaction_time
- Fix: Include as-of transaction_time in historical queries
- Quick test: Correct a fact, query before/after correction time
Definition of Done
- Facts have valid_time [from, to) intervals
- Facts have transaction_time [from, to) intervals
- Can add facts with valid_from date
- Updates invalidate old facts (don’t delete)
- Point-in-time queries work for any valid_time
- As-of queries work for any transaction_time
- Can show full history of a fact
- Handles time zones consistently
Project 6: Temporal Query Engine
- File: P06-temporal-query-engine.md
- Expanded Project Guide: P06-temporal-query-engine.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Query Languages, Temporal Reasoning
- Software or Tool: Neo4j, Custom DSL, Python
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you will build: A query interface that translates natural language temporal expressions (“last month”, “before the meeting”, “when Alice was at Acme”) into precise graph queries with temporal filters.
Why it teaches temporal knowledge graphs: Users don’t think in ISO timestamps—they think in human time expressions. This project bridges natural language to precise temporal queries, making your memory system actually usable.
Core challenges you will face:
- Parsing temporal expressions → Maps to NLP for time
- Translating to Cypher → Maps to Query Generation
- Handling relative times → Maps to Temporal Context
- Supporting Allen’s relations → Maps to Interval Algebra
Real World Outcome
You’ll have a query engine that understands human time expressions and returns temporally-precise results.
Example Session:
$ temporal-query "What did we discuss last week?"
Parsing temporal expression: "last week"
Resolved to: [2024-12-27, 2025-01-03)
Cypher generated:
MATCH (e:Episode)
WHERE e.start_time >= datetime('2024-12-27')
AND e.start_time < datetime('2025-01-03')
RETURN e ORDER BY e.start_time DESC
Results: 4 episodes found
[List of episodes with summaries]
$ temporal-query "Who worked at Acme when Alice was there?"
Parsing: "when Alice was at Acme"
Resolved: Looking up Alice's Acme employment period...
Found: [2023-01-15, 2024-06-01)
Cypher generated:
MATCH (alice:Person {name: "Alice"})-[r1:WORKS_AT]->(acme:Org {name: "Acme"})
MATCH (other:Person)-[r2:WORKS_AT]->(acme)
WHERE r2.valid_from < r1.valid_to AND r2.valid_to > r1.valid_from
AND other <> alice
RETURN DISTINCT other
Results: Bob, Carol, David worked at Acme during Alice's tenure
$ temporal-query "Show meetings before the Q3 planning session"
Parsing: "before the Q3 planning session"
Resolving reference: "Q3 planning session"...
Found: Episode "Q3 Planning" at 2024-07-15T10:00:00
Cypher generated:
MATCH (e:Episode)
WHERE e.end_time < datetime('2024-07-15T10:00:00')
RETURN e ORDER BY e.end_time DESC LIMIT 10
Results: 10 episodes before Q3 planning
$ temporal-query "What changed between version 1.0 and 2.0 release?"
Parsing: Resolving "version 1.0 release" and "version 2.0 release"...
Found: v1.0 released 2024-03-01, v2.0 released 2024-09-01
Showing facts that changed between [2024-03-01, 2024-09-01):
- Alice: Acme → TechCorp (changed 2024-06-01)
- Platform team: +3 members, -1 member
- 47 new episodes recorded
The Core Question You Are Answering
“How do I translate human time expressions into precise database queries?”
Users say “last week” or “during the project.” Your system must understand these expressions and generate queries that return correct results. This is the usability layer that makes temporal knowledge graphs practical.
Concepts You Must Understand First
- Temporal Expression Parsing
- What are absolute vs. relative time references?
- How do you parse “last week”, “next month”, “yesterday”?
- What libraries exist for temporal NLP?
- Book Reference: “Speech and Language Processing” - Temporal expressions chapter
- Allen’s Interval Relations
- What are the 13 interval relations? (before, after, meets, overlaps, etc.)
- How do you express “during”, “while”, “before” in queries?
- When do you need interval vs. instant queries?
- Book Reference: Allen’s Interval Algebra paper
- Query Generation
- How do you safely generate Cypher from user input?
- How do you parameterize temporal filters?
- How do you optimize temporal queries?
- Book Reference: Neo4j Query Tuning documentation
Questions to Guide Your Design
- Expression Types
- What temporal expressions will you support?
- How do you handle ambiguous expressions (“this week”)?
- Do you need entity-relative times (“when Alice was at Acme”)?
- Resolution Strategy
- How do you resolve “last week” to dates?
- What’s the reference time for relative expressions?
- How do you handle multiple time zones?
- Query Output
- Do you generate Cypher directly or use an intermediate representation?
- How do you explain what temporal filter was applied?
- How do you handle queries that find nothing?
Thinking Exercise
Before coding, parse these expressions manually:
For each expression, determine the interval [start, end):
- “last week” (assume today is 2025-01-03, Friday)
- “Q4 2024”
- “before the launch” (launch was 2024-11-15T09:00:00)
- “when Bob was on the platform team”
- “the meeting after standup” (standup was at 9am today)
For expressions 4 and 5, what entity lookups are needed first?
The Interview Questions They Will Ask
- “How do you handle ambiguous temporal expressions like ‘this morning’?”
- “Explain Allen’s interval relations and when you’d use ‘overlaps’ vs. ‘during’.”
- “How do you optimize temporal range queries in a graph database?”
- “What’s the complexity of interval overlap queries?”
- “How do you handle queries that span multiple time zones?”
Hints in Layers
Hint 1: Starting Point
Use the dateparser or parsedatetime Python library for basic temporal expression parsing. Map their output to datetime intervals.
Hint 2: Expression Categories Categorize expressions:
- Absolute: “January 2024”, “2024-01-15”
- Relative: “last week”, “yesterday”, “3 days ago”
- Entity-relative: “when X was at Y”, “before the meeting”
Hint 3: Query Templates Create Cypher templates with placeholders:
// Template: episodes in time range
MATCH (e:Episode)
WHERE e.start_time >= $start AND e.start_time < $end
RETURN e
Hint 4: Debugging Always show the resolved interval to users. If results are unexpected, they can see “last week resolved to [Dec 27 - Jan 3)” and spot the issue.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Temporal NLP | “Speech and Language Processing” | Ch. 20 |
| Interval algebra | Allen 1983 paper | All |
| Query optimization | Neo4j documentation | Query tuning |
| Date parsing | dateparser library | Documentation |
Common Pitfalls and Debugging
Problem 1: “Last week” returns wrong dates
- Why: Week start ambiguity (Sunday vs. Monday)
- Fix: Configure dateparser with explicit week start, document behavior
- Quick test: Test on Monday—”last week” should be previous Mon-Sun
Problem 2: “Entity-relative queries are slow”
- Why: Doing two queries sequentially instead of joining
- Fix: Combine into single query with pattern matching
- Quick test: Use EXPLAIN to check query plan
Problem 3: “Ambiguous expressions return nothing”
- Why: Parsed to unexpected interval
- Fix: Show resolved interval to user, ask for clarification
- Quick test: Always log what interval was resolved to
Definition of Done
- Parses absolute dates: “January 15, 2024”
- Parses relative dates: “last week”, “yesterday”
- Parses ranges: “between X and Y”
- Handles entity-relative: “when Alice was at Acme”
- Generates valid Cypher queries
- Shows resolved interval to user
- Returns results with temporal context
- Handles edge cases gracefully (no results, ambiguous)
Project 7: Semantic Memory Synthesizer
- File: P07-semantic-memory-synthesizer.md
- Expanded Project Guide: P07-semantic-memory-synthesizer.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: LLM Summarization, Memory Consolidation
- Software or Tool: OpenAI/Anthropic API, Neo4j, Python
- Main Book: “AI Engineering” by Chip Huyen
What you will build: A system that synthesizes episodic memories into semantic facts—taking raw conversation episodes and distilling them into general knowledge (“Alice is an expert in Kubernetes”) that can be queried independently.
Why it teaches temporal knowledge graphs: This is the consolidation process—turning “what happened” into “what we know.” Just like human memory consolidates episodes into general knowledge during sleep, your system needs to extract durable facts from transient conversations.
Core challenges you will face:
- Summarization without hallucination → Maps to Grounded Generation
- Fact extraction from summaries → Maps to Entity/Relationship Extraction
- Contradiction detection → Maps to Knowledge Graph Consistency
- Provenance tracking → Maps to Episodic-Semantic Linking
Real World Outcome
You’ll have a synthesizer that processes episodes and creates semantic facts with source attribution.
Example Session:
$ synthesize --episodes last-week
Processing 12 episodes from last week...
Episode analysis:
[2024-12-30] Team standup: Alice mentioned K8s deployment issues
[2024-12-31] 1:1 with Alice: Deep dive on K8s networking
[2025-01-02] Planning: Alice assigned as K8s migration lead
Synthesized semantic facts:
NEW FACT: Alice has expertise in Kubernetes
Confidence: 0.89
Evidence: 3 episodes (Dec 30, Dec 31, Jan 2)
Reasoning: Multiple mentions of K8s expertise + assigned as lead
NEW FACT: Kubernetes migration is in progress
Confidence: 0.92
Evidence: 2 episodes (Dec 30, Jan 2)
Reasoning: Deployment issues + planning for migration
UPDATED FACT: Alice's role expanded
Previous: Alice works on Platform team
New: Alice leads Kubernetes migration on Platform team
Confidence: 0.85
$ synthesize --show-reasoning "Alice has expertise in Kubernetes"
Fact: (Alice)-[HAS_EXPERTISE]->(Kubernetes)
Source episodes:
┌─────────────┬────────────────────────────────────────────────────────┐
│ Date │ Evidence │
├─────────────┼────────────────────────────────────────────────────────┤
│ Dec 30 │ "Alice is debugging the K8s deployment issues" │
│ Dec 31 │ "Discussed K8s CNI plugins and network policies" │
│ Jan 02 │ "Alice will lead the K8s migration project" │
└─────────────┴────────────────────────────────────────────────────────┘
Synthesis reasoning:
1. Debugging implies hands-on knowledge (weight: 0.3)
2. Technical discussion implies deep understanding (weight: 0.4)
3. Assigned as lead implies recognized expertise (weight: 0.3)
Combined confidence: 0.89
$ synthesize --detect-contradictions
Scanning semantic facts for contradictions...
⚠ POTENTIAL CONTRADICTION DETECTED:
Fact 1: Alice works at Acme (added 2024-01-15, confidence: 0.95)
Fact 2: Alice works at TechCorp (added 2024-06-15, confidence: 0.90)
Resolution options:
1. [SUPERSEDE] Alice moved from Acme to TechCorp
2. [CONCURRENT] Alice works at both (consulting?)
3. [ERROR] One fact is incorrect
→ Auto-resolved: SUPERSEDE (temporal sequence suggests job change)
Updated: (Alice)-[WORKS_AT]->(Acme) valid_to = 2024-06-01
The Core Question You Are Answering
“How do I distill general knowledge from specific conversations without hallucinating or losing provenance?”
This is memory consolidation—the process of turning episodic “what happened” into semantic “what we know.” The challenge is doing this accurately while maintaining links back to sources for verification.
Concepts You Must Understand First
- Memory Consolidation
- How does episodic → semantic transfer work in humans?
- What makes a fact “general enough” to extract?
- When should facts remain episode-specific?
- Book Reference: “AI Engineering” by Chip Huyen - Ch. 8
- Grounded Summarization
- How do you prevent LLM hallucination in synthesis?
- What is faithful summarization?
- How do you verify synthesized facts?
- Book Reference: “NLP with Transformers” - Ch. 6
- Contradiction Handling
- How do you detect contradictory facts?
- What resolution strategies exist?
- When is contradiction actually update?
- Book Reference: Knowledge Base literature
Questions to Guide Your Design
- Synthesis Trigger
- When do you synthesize? (periodic, on-demand, threshold-based)
- How many episodes should inform a semantic fact?
- What confidence threshold for creating facts?
- Fact Quality
- How do you distinguish signal from noise?
- What makes a fact worth extracting?
- How do you handle uncertain facts?
- Provenance
- How do you link semantic facts to source episodes?
- Can users trace back to verify?
- What happens when source episodes are deleted?
Thinking Exercise
Before coding, synthesize manually:
Given these three episodes:
Episode 1 (Dec 15): “Bob mentioned he’s been learning Rust for a side project” Episode 2 (Dec 22): “Bob showed the team his Rust CLI tool demo” Episode 3 (Jan 3): “Bob is writing the new monitoring agent in Rust”
What semantic facts would you extract?
- What confidence would you assign to each?
- What’s the minimum evidence needed?
- What if Episode 2 said “Bob struggled with Rust syntax”?
The Interview Questions They Will Ask
- “How do you prevent LLM hallucination when synthesizing facts?”
- “What’s the difference between episodic and semantic memory in your system?”
- “How do you handle contradictions between synthesized facts?”
- “How do you decide when to synthesize—periodically or incrementally?”
- “How do you maintain provenance when consolidating multiple episodes?”
Hints in Layers
Hint 1: Starting Point Create a synthesis prompt that takes episode summaries and outputs structured facts. Require the LLM to cite which episodes support each fact.
Hint 2: Evidence Requirement Require at least 2 episodes to support a semantic fact. Single-mention facts stay episodic until corroborated.
Hint 3: Contradiction Detection After synthesis, query for existing facts about the same entities. Compare new facts with existing and flag overlaps for review.
Hint 4: Debugging Create a “fact audit” view that shows each semantic fact with its source episodes. Manually verify a sample to calibrate confidence thresholds.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Memory consolidation | “AI Engineering” by Chip Huyen | Ch. 8 |
| Summarization | “NLP with Transformers” | Ch. 6 |
| Knowledge base consistency | “Designing Data-Intensive Apps” | Ch. 5 |
| Prompt engineering | Anthropic/OpenAI documentation | Best practices |
Common Pitfalls and Debugging
Problem 1: “Synthesized facts are too generic”
- Why: LLM defaulting to safe, vague statements
- Fix: Prompt for specific, falsifiable facts with evidence
- Quick test: Each fact should be testable—can you verify it?
Problem 2: “Losing source provenance”
- Why: Not storing episode links with facts
- Fix: Create
SYNTHESIZED_FROMrelationships to source episodes - Quick test: Can you trace every fact back to its sources?
Problem 3: “Contradictions not detected”
- Why: Different wording for same relationship
- Fix: Normalize relationship types, check entity overlap
- Quick test: Add a known contradiction, verify detection
Definition of Done
- Processes episodes and extracts semantic facts
- Assigns confidence scores to synthesized facts
- Links facts to source episodes (provenance)
- Detects contradictions with existing facts
- Proposes resolution for contradictions
- Shows reasoning for each synthesized fact
- Achieves >85% precision on manual review of 20 facts
Project 8: Community Detection and Summaries
- File: P08-community-detection-summaries.md
- Expanded Project Guide: P08-community-detection-summaries.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Graph Algorithms, Clustering, Summarization
- Software or Tool: Neo4j GDS, NetworkX, LLM API
- Main Book: “Graph Algorithms” by Mark Needham & Amy Hodler
What you will build: A system that identifies clusters of related entities in your knowledge graph (communities), generates summaries of each community, and enables queries at different levels of abstraction.
Why it teaches temporal knowledge graphs: Real knowledge graphs have structure—teams, projects, domains. Community detection surfaces this structure automatically. Community summaries enable “big picture” queries that would be impossible with just individual facts.
Core challenges you will face:
- Running graph algorithms → Maps to Community Detection Algorithms
- Interpreting clusters → Maps to Community Labeling
- Multi-level summaries → Maps to Hierarchical Summarization
- Updating as graph changes → Maps to Incremental Algorithms
Real World Outcome
You’ll have a system that identifies entity clusters and generates human-readable summaries at multiple levels.
Example Session:
$ communities detect --algorithm leiden --resolution 1.0
Running Leiden community detection...
Nodes: 234
Edges: 1,456
Resolution: 1.0
Communities detected: 8
┌────────┬───────────────────────────────────────┬───────┬──────────┐
│ ID │ Representative Members │ Size │ Density │
├────────┼───────────────────────────────────────┼───────┼──────────┤
│ C1 │ Alice, Bob, Carol, Platform Team │ 23 │ 0.72 │
│ C2 │ David, Eve, Security Project │ 18 │ 0.68 │
│ C3 │ Frank, Grace, Acme Corp │ 31 │ 0.54 │
│ C4 │ API, Database, Cache Service │ 15 │ 0.81 │
│ ... │ ... │ ... │ ... │
└────────┴───────────────────────────────────────┴───────┴──────────┘
$ communities summarize C1
Generating summary for community C1 (23 entities)...
**Platform Team Engineering**
This community represents the Platform Team at Acme Corp, focused on
infrastructure and developer tools.
Key Members:
- Alice (Tech Lead, Kubernetes expert)
- Bob (Senior Engineer, API design)
- Carol (Engineer, monitoring specialist)
Major Projects:
- Kubernetes migration (in progress)
- API v2 development (completed Q4)
- Observability platform (planning)
Key Relationships:
- Closely collaborates with Security team (C2)
- Primary stakeholder for Cache Service (C4)
Recent Activity:
- 12 episodes in past 30 days
- Main topics: K8s deployment, API performance, monitoring
$ communities query "What's happening with infrastructure?"
Query type: High-level topic query
Matching communities: C1 (Platform Team), C4 (Core Services)
Combined summary:
The Platform Team is leading a Kubernetes migration while maintaining
the API infrastructure. Core Services (API, Database, Cache) are
stable with planned performance improvements for Q1.
Key insights:
- K8s migration is the main infrastructure initiative
- API v2 launched successfully in Q4
- Cache service optimization planned for January
Source: 41 entities, 156 relationships, 47 episodes
$ communities hierarchy
Community hierarchy (multi-resolution):
Level 0 (coarse, 3 communities):
├── Engineering (C1, C2, C4 merged): 56 entities
├── Business (C3, C5 merged): 48 entities
└── Operations (C6, C7, C8 merged): 34 entities
Level 1 (medium, 8 communities):
├── Platform Team (C1): 23 entities
├── Security Team (C2): 18 entities
├── Core Services (C4): 15 entities
└── ...
Level 2 (fine, 21 sub-communities):
├── K8s Migration Squad: 8 entities
├── API Team: 7 entities
└── ...
The Core Question You Are Answering
“How do I discover and summarize the natural groupings in my knowledge graph?”
Knowledge graphs have inherent structure—clusters of related entities that form meaningful groups. Community detection discovers this structure automatically. Summaries make it queryable at a high level, enabling questions like “What’s the security team working on?”
Concepts You Must Understand First
- Community Detection Algorithms
- What is modularity optimization?
- How do Louvain and Leiden algorithms work?
- What does resolution parameter control?
- Book Reference: “Graph Algorithms” by Needham & Hodler - Ch. 6
- Graph Density and Metrics
- What makes a “good” community?
- How do you measure cluster quality?
- What’s the tradeoff between community size and cohesion?
- Book Reference: “Networks: An Introduction” by Newman
- Hierarchical Clustering
- How do you get multi-level communities?
- What’s the dendrogram representation?
- How do you choose the right level for queries?
- Book Reference: Graph Algorithms documentation
Questions to Guide Your Design
- Detection Parameters
- What resolution gives meaningful communities for your graph?
- How often do you re-detect communities?
- How do you handle communities that are too small/large?
- Summarization Strategy
- What information should a community summary include?
- How do you handle communities with diverse members?
- How long should summaries be?
- Query Integration
- How do you match queries to communities?
- When do you use community summaries vs. individual facts?
- How do you blend multiple community summaries?
Thinking Exercise
Before coding, analyze this graph manually:
Entities and relationships:
Alice --[WORKS_ON]--> Project A
Bob --[WORKS_ON]--> Project A
Carol --[WORKS_ON]--> Project A
Alice --[KNOWS]--> David
David --[WORKS_ON]--> Project B
Eve --[WORKS_ON]--> Project B
Frank --[WORKS_ON]--> Project B
Project A --[DEPENDS_ON]--> Service X
Project B --[DEPENDS_ON]--> Service X
- How many communities would you expect?
- What would be the summary of each?
- Where does Service X belong?
- How does the
DEPENDS_ONedge affect community structure?
The Interview Questions They Will Ask
- “Explain the Leiden algorithm and how it improves on Louvain.”
- “How do you choose the resolution parameter for community detection?”
- “What’s the time complexity of community detection on a graph with N nodes?”
- “How do you update communities incrementally when the graph changes?”
- “How do you generate meaningful summaries from heterogeneous communities?”
Hints in Layers
Hint 1: Starting Point Use Neo4j GDS (Graph Data Science) library for community detection. It has built-in Louvain and Leiden algorithms. Store community IDs as node properties.
Hint 2: Summary Generation For each community, extract: member names, common labels, shared relationships, recent episodes. Feed to LLM for natural language summary.
Hint 3: Multi-Resolution Run Leiden at multiple resolutions (0.5, 1.0, 2.0). Store all levels. Use lower resolution for broad queries, higher for specific.
Hint 4: Debugging Visualize communities in Neo4j Browser with different colors. Manually inspect the largest and smallest communities. Check if they make semantic sense.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Community detection | “Graph Algorithms” by Needham & Hodler | Ch. 6 |
| Leiden algorithm | Original paper (Traag et al.) | All |
| Graph metrics | “Networks: An Introduction” by Newman | Ch. 7 |
| Neo4j GDS | Neo4j GDS documentation | Community detection |
Common Pitfalls and Debugging
Problem 1: “All nodes in one giant community”
- Why: Resolution too low, or graph too connected
- Fix: Increase resolution parameter, or filter weak edges
- Quick test: Try resolution 2.0, 5.0 and see if communities emerge
Problem 2: “Communities don’t make semantic sense”
- Why: Algorithm uses structure, not semantics
- Fix: Weight edges by strength, filter noisy edges, add semantic similarity
- Quick test: Manually inspect 5 communities—do they feel cohesive?
Problem 3: “Summaries are too generic”
- Why: LLM doesn’t have enough specific context
- Fix: Include entity types, relationship types, recent episode summaries
- Quick test: Does summary mention specific names and projects?
Definition of Done
- Runs community detection (Leiden or Louvain)
- Stores community assignments on nodes
- Generates natural language summary per community
- Supports multi-resolution detection
- Can query by community topic
- Can show community hierarchy
- Communities update when graph changes significantly
- Summaries are specific and actionable (not generic)
Project 9: Graphiti Framework Integration
- File: P09-graphiti-framework-integration.md
- Expanded Project Guide: P09-graphiti-framework-integration.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript (Zep SDK)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 2: Intermediate
- Knowledge Area: Framework Integration, AI Memory
- Software or Tool: Graphiti, Neo4j, Zep, Python
- Main Book: “AI Engineering” by Chip Huyen
What you will build: A fully integrated memory system using Zep’s Graphiti framework, with episodic and semantic memory layers, entity extraction, and hybrid retrieval—without building everything from scratch.
Why it teaches temporal knowledge graphs: Graphiti is the production-grade implementation of everything you’ve been building. By integrating it, you’ll understand how the pieces fit together in a real system and learn the design decisions made by practitioners who’ve solved these problems at scale.
Core challenges you will face:
- Understanding framework architecture → Maps to System Design
- Configuring for your use case → Maps to Framework Customization
- Extending default behavior → Maps to Plugin Architecture
- Production deployment → Maps to DevOps for AI
Real World Outcome
You’ll have a production-ready AI memory system with all the features you’ve built individually, plus optimizations you didn’t think of.
Example Session:
$ graphiti init --neo4j-uri bolt://localhost:7687
Initializing Graphiti...
✓ Connected to Neo4j
✓ Created schema constraints
✓ Initialized entity extractor
✓ Ready for episodic ingestion
$ graphiti ingest --source conversations.json
Processing 100 conversations...
Episodes created: 100
Entities extracted: 234
Relationships created: 567
Facts synthesized: 89
Time: 45.2s
$ graphiti query "What do I know about the API migration project?"
Query type: Hybrid (semantic + graph)
Semantic results (top 3):
1. Episode 2024-12-15: API v2 migration planning
2. Episode 2024-12-20: Compatibility discussion
3. Semantic fact: API migration targets Q1 2025
Graph results:
Entity: API Migration Project
- Led by: Alice
- Team: Platform Team
- Status: In Progress
- Depends on: Auth Service, Cache Service
- Timeline: Q1 2025
Community context:
Platform Team is actively working on the API migration, with
primary focus on backward compatibility and performance.
$ graphiti search --entity "Alice" --hops 2
Traversing 2 hops from Alice...
Direct relationships:
Alice -[WORKS_AT]-> Acme Corp
Alice -[LEADS]-> API Migration Project
Alice -[WORKS_ON]-> Platform Team
2-hop relationships:
Alice -> API Migration -> Auth Service
Alice -> Platform Team -> Bob, Carol, David
Alice -> Acme Corp -> CEO Bob Smith
$ graphiti episodes --entity "API Migration" --since "last month"
Episodes mentioning API Migration (Dec 3 - Jan 3):
[Dec 15] Planning meeting - API v2 migration kickoff
[Dec 20] Technical review - Backward compatibility approach
[Dec 28] Status update - Auth service dependency identified
[Jan 2] Progress check - Timeline confirmed for Q1
The Core Question You Are Answering
“How do I use a production framework to get all the temporal KG benefits without rebuilding everything?”
Understanding frameworks accelerates your learning. Graphiti implements patterns you’ve studied—seeing how they fit together in production code teaches you things documentation can’t.
Concepts You Must Understand First
- Graphiti Architecture
- What are the three layers? (Episodic, Semantic, Community)
- How does async processing work?
- What’s the entity extraction pipeline?
- Reference: Graphiti documentation and source code
- Zep Platform
- What’s the relationship between Graphiti and Zep?
- What does the cloud platform add?
- When do you use open-source vs. cloud?
- Reference: Zep documentation
- Framework Extension Points
- How do you customize entity extraction?
- How do you add new relationship types?
- How do you tune retrieval weights?
- Reference: Graphiti SDK documentation
Questions to Guide Your Design
- Setup Decisions
- Self-hosted Neo4j or Zep cloud?
- Which LLM for extraction? (OpenAI, Anthropic, local)
- What entity types does your domain need?
- Data Pipeline
- How will you ingest data? (API, batch, streaming)
- What preprocessing is needed?
- How do you handle failures?
- Query Patterns
- What types of queries will your users make?
- How do you tune hybrid retrieval weights?
- When do you need community summaries vs. raw facts?
Thinking Exercise
Before coding, trace through Graphiti’s flow:
Given input: “Alice mentioned that the auth service deadline moved to January 15th”
Trace what happens:
- Episode creation (what metadata?)
- Entity extraction (what entities?)
- Relationship extraction (what relationships?)
- Semantic fact creation (what facts?)
- Retrieval index updates (what indexes?)
The Interview Questions They Will Ask
- “Walk me through the architecture of a temporal knowledge graph memory system.”
- “How would you extend Graphiti for a new domain with custom entity types?”
- “What’s the tradeoff between using a framework vs. building custom?”
- “How do you tune retrieval when semantic and graph results conflict?”
- “What operational concerns do you have for a production memory system?”
Hints in Layers
Hint 1: Starting Point
Install Graphiti: pip install graphiti-core. Follow the quickstart to connect to Neo4j and ingest your first episode.
Hint 2: Configuration
Graphiti uses environment variables for config. Set NEO4J_URI, OPENAI_API_KEY, and entity schema in config file.
Hint 3: Custom Entities Define a schema file with your entity and relationship types. Graphiti’s extraction will follow your schema.
Hint 4: Debugging Enable debug logging to see entity extraction and graph operations. Use Neo4j Browser to inspect what’s being created.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| AI memory architecture | “AI Engineering” by Chip Huyen | Ch. 8 |
| Production ML systems | “Designing ML Systems” by Huyen | Ch. 9-10 |
| Framework patterns | “Software Architecture Patterns” | All |
| Graph database ops | Neo4j Operations Manual | All |
Common Pitfalls and Debugging
Problem 1: “Entity extraction is slow”
- Why: Making too many LLM calls
- Fix: Batch episodes, use smaller model for extraction, cache common entities
- Quick test: Time single episode ingestion
Problem 2: “Queries return unexpected results”
- Why: Retrieval weights not tuned for your data
- Fix: Adjust semantic vs. graph weights, inspect what each returns separately
- Quick test: Run semantic-only and graph-only queries, compare
Problem 3: “Memory usage keeps growing”
- Why: Not managing Neo4j memory settings
- Fix: Configure heap size, set up periodic maintenance jobs
- Quick test: Monitor with
neo4j.metrics
Definition of Done
- Graphiti connected to Neo4j
- Can ingest episodes from your data source
- Entity extraction creates expected entities
- Queries return relevant results
- Can traverse graph from any entity
- Community summaries generate correctly
- Performance acceptable for your use case (<500ms queries)
Project 10: Mem0g Memory Layer
- File: P10-mem0g-memory-layer.md
- Expanded Project Guide: P10-mem0g-memory-layer.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 2: Intermediate
- Knowledge Area: Memory Frameworks, Graph Extensions
- Software or Tool: Mem0, Neo4j, Python
- Main Book: “AI Engineering” by Chip Huyen
What you will build: A memory layer using Mem0 with its graph memory extensions (Mem0g), providing a simpler API than Graphiti with graph-based memory organization for AI agents.
Why it teaches temporal knowledge graphs: Mem0 takes a different approach than Graphiti—simpler API, focus on memory “add/get/search” primitives. Understanding both approaches shows you the spectrum of design choices in AI memory systems.
Core challenges you will face:
- Understanding Mem0’s mental model → Maps to API Design
- Enabling graph extensions → Maps to Feature Configuration
- Comparing with Graphiti → Maps to Architecture Tradeoffs
- Building on top of Mem0 → Maps to Framework Extension
Real World Outcome
You’ll have a second memory system to compare with Graphiti, understanding when each is appropriate.
Example Session:
$ mem0 init --graph-memory
Initializing Mem0 with graph memory...
✓ Connected to Neo4j (graph storage)
✓ Connected to vector store
✓ Memory client ready
$ mem0 add --user alice "I prefer morning meetings"
Memory added:
ID: mem_abc123
Type: Preference
User: alice
Graph: (alice)-[PREFERS]->(morning_meetings)
$ mem0 add --user alice "I'm working on the Kubernetes migration"
Memory added:
ID: mem_def456
Type: Context
User: alice
Graph: (alice)-[WORKS_ON]->(kubernetes_migration)
$ mem0 add --user alice "Bob is helping me with K8s networking"
Memory added:
ID: mem_ghi789
Type: Relationship
User: alice
Graph: (bob)-[HELPS]->(alice), (bob)-[KNOWS]->(k8s_networking)
$ mem0 search --user alice "meeting preferences"
Memories found:
1. [Preference] "I prefer morning meetings" (score: 0.92)
Created: 2025-01-03
Graph context: (alice)-[PREFERS]->(morning_meetings)
2. [Context] "Standup is at 9am" (score: 0.71)
Created: 2025-01-02
Graph context: (alice)-[ATTENDS]->(standup)
$ mem0 graph --user alice --hops 2
Graph view for alice (2 hops):
alice
├── PREFERS -> morning_meetings
├── WORKS_ON -> kubernetes_migration
│ └── DEPENDS_ON -> auth_service
├── WORKS_WITH -> bob
│ └── KNOWS -> k8s_networking
└── WORKS_AT -> acme_corp
$ mem0 context --user alice --for "scheduling a meeting about K8s"
Relevant context for scheduling K8s meeting:
Preferences:
- alice prefers morning meetings
People:
- bob is helping with K8s (should be invited)
Projects:
- kubernetes_migration is the relevant project
- auth_service is a dependency (may need that team)
Suggested attendees: alice, bob, auth_service_team
Suggested time: Morning slot
The Core Question You Are Answering
“How does Mem0’s approach to AI memory differ from Graphiti, and when should I use each?”
Understanding multiple frameworks shows you the design space. Mem0 is simpler and more opinionated; Graphiti is more flexible and comprehensive. Knowing both helps you choose the right tool.
Concepts You Must Understand First
- Mem0 Architecture
- What are Mem0’s core primitives? (add, get, search, delete)
- How does graph memory extend base functionality?
- What’s the memory lifecycle?
- Reference: Mem0 documentation
- Mem0 vs Graphiti
- What does Mem0 simplify compared to Graphiti?
- What does Graphiti offer that Mem0 doesn’t?
- When is simpler better?
- Reference: Compare both frameworks’ docs
- User-Centric Memory
- How does Mem0 organize memory by user/agent?
- What’s the isolation model?
- How do you share memory between users?
- Reference: Mem0 user management docs
Questions to Guide Your Design
- Mem0 vs Graphiti Decision
- What’s your primary use case?
- Do you need temporal queries?
- How important is framework simplicity?
- Memory Organization
- How do you structure memories for your application?
- What memory types do you need?
- How do you handle memory cleanup?
- Integration Points
- How does memory integrate with your LLM?
- How do you inject memory into prompts?
- How do you update memory from responses?
Thinking Exercise
Before coding, compare the approaches:
For the use case “Personal assistant that remembers user preferences”:
With Mem0:
mem0.add(user_id, "User prefers dark mode")
mem0.search(user_id, "theme preferences")
With Graphiti:
graphiti.add_episode("User said they prefer dark mode", ...)
graphiti.query("What are the user's UI preferences?")
Compare:
- API complexity
- Data model
- Query capabilities
- When would you choose each?
The Interview Questions They Will Ask
- “Compare Mem0 and Graphiti—when would you use each?”
- “How does Mem0’s user-centric model affect multi-tenant applications?”
- “What are the tradeoffs of Mem0’s simpler API?”
- “How would you migrate from Mem0 to Graphiti if you needed more features?”
- “How do you handle memory privacy in a shared system?”
Hints in Layers
Hint 1: Starting Point
Install: pip install mem0ai. Follow quickstart to add and search memories. Enable graph memory in config.
Hint 2: Graph Configuration
Set graph_store in config to use Neo4j. This enables relationship extraction and graph queries.
Hint 3: User Management
Always specify user_id for isolation. Use agent_id for agent-specific memories that shouldn’t leak to users.
Hint 4: Debugging
Use mem0.get_all(user_id) to see all memories. Check Neo4j for graph structure. Compare vector and graph retrieval.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Memory systems | “AI Engineering” by Chip Huyen | Ch. 8 |
| API design | “REST API Design Rulebook” | All |
| User isolation | “Building Microservices” | Multi-tenancy chapter |
| Framework comparison | N/A | Compare docs of both |
Common Pitfalls and Debugging
Problem 1: “Memories not connecting in graph”
- Why: Graph memory not enabled or entity extraction failing
- Fix: Check config, ensure Neo4j connected, verify entity types
- Quick test: Add memory, then query Neo4j directly
Problem 2: “Search returns irrelevant memories”
- Why: Only using vector similarity, not graph context
- Fix: Use graph-aware search, adjust similarity threshold
- Quick test: Compare graph vs. vector-only search
Problem 3: “User memories leaking to other users”
- Why: Not filtering by user_id
- Fix: Always include user_id in queries, check isolation config
- Quick test: Search as user A, verify user B memories don’t appear
Definition of Done
- Mem0 connected with graph memory enabled
- Can add memories with graph relationships
- Search returns relevant memories
- Can view graph structure for a user
- User isolation works correctly
- Can compare results with Graphiti for same data
- Documented tradeoffs between frameworks
Project 11: MemGPT-Style Virtual Context
- File: P11-memgpt-virtual-context.md
- Expanded Project Guide: P11-memgpt-virtual-context.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 5. The “Industry Disruptor”
- Difficulty: Level 4: Expert
- Knowledge Area: Context Management, OS Concepts
- Software or Tool: MemGPT/Letta, LLM API, Python
- Main Book: “Operating Systems: Three Easy Pieces” (for concepts)
What you will build: A virtual context management system inspired by MemGPT/Letta that gives LLMs explicit memory operations—treating context like an OS treats memory, with main memory, archival storage, and explicit load/save operations.
Why it teaches temporal knowledge graphs: MemGPT represents a different philosophy—instead of automatic memory management, the LLM explicitly decides what to remember, retrieve, and forget. Understanding this approach complements the automatic approaches in Graphiti/Mem0.
Core challenges you will face:
- Implementing memory tiers → Maps to Memory Hierarchy
- Teaching LLMs to manage memory → Maps to Tool Use
- Balancing automatic vs. explicit → Maps to System Design
- Maintaining coherence → Maps to State Management
Real World Outcome
You’ll have a system where the LLM explicitly manages its own memory through function calls, enabling unbounded conversations.
Example Session:
$ memgpt chat --persona "research assistant"
MemGPT initialized with:
Core memory: 2000 tokens
Recall memory: Last 50 messages
Archival memory: Unlimited (Neo4j + Vector)
You: I'm researching temporal knowledge graphs for AI memory
[MemGPT internal]
> core_memory_append("User researching temporal KGs for AI memory")
> archival_memory_search("temporal knowledge graphs")
> Found 3 relevant entries from previous sessions
> Loading relevant context into working memory...
Assistant: I found some interesting context from our previous discussions!
[Showing from archival memory]
- "Temporal KGs use bi-temporal models (valid_time, transaction_time)"
- "Key frameworks: Zep/Graphiti, Mem0, MemGPT"
- "Graph traversal enables multi-hop reasoning"
Based on your research focus, I can help you explore:
1. The theoretical foundations (episodic vs semantic memory)
2. Practical implementation with specific frameworks
3. Production deployment considerations
What aspect would you like to dive into first?
You: Lets focus on how Graphiti organizes memory
[MemGPT internal]
> core_memory_search("Graphiti architecture")
> archival_memory_insert("User wants to focus on Graphiti memory organization")
> conversation_search_date(2024-10-01, 2024-12-15)
> Found 2 previous discussions about Graphiti
Assistant: Great choice! Graphiti uses a three-tier architecture:
1. **Episodic Store**: Raw conversation chunks with timestamps
2. **Semantic Memory**: Extracted facts as graph edges
3. **Community Summaries**: Leiden-clustered entity groups
[Retrieving Graphiti architecture diagram from archival...]
# Memory State After Session:
Core Memory (2000 tokens):
- User: Researching temporal KGs for AI memory
- Focus: Graphiti architecture
- Context: Previous discussions on bi-temporal models
Archival Writes This Session: 2
Archival Reads This Session: 5
Total Conversation Turns: 3
Memory Operation Log:
$ memgpt logs --last-session
Session: 2024-12-15T14:30:00Z (duration: 45min)
Memory Operations:
┌────────────┬──────────────────────────────────────────────┬─────────┐
│ Operation │ Content │ Tokens │
├────────────┼──────────────────────────────────────────────┼─────────┤
│ CORE_APPEND│ "User researching temporal KGs" │ 45 │
│ ARCH_SEARCH│ "temporal knowledge graphs" -> 3 results │ 0 │
│ ARCH_INSERT│ "User focus: Graphiti architecture" │ 67 │
│ CORE_SEARCH│ "Graphiti" -> 1 match in core │ 0 │
│ CONV_SEARCH│ date_range search -> 2 results │ 0 │
└────────────┴──────────────────────────────────────────────┴─────────┘
Core Memory Usage: 892/2000 tokens (44.6%)
Archival Memory: 2,847 entries
The Core Question You Are Answering
“How do we give an LLM explicit control over its own memory, and why might explicit memory management outperform automatic systems in certain scenarios?”
This question challenges the assumption that automatic memory (like Graphiti’s invisible extraction) is always better. MemGPT shows that for complex reasoning tasks, explicit memory operations give the LLM more agency and transparency. You will understand when each approach is appropriate.
Concepts You Must Understand First
- Operating System Memory Hierarchy
- What is the difference between registers, cache, RAM, and disk?
- How does virtual memory abstract physical memory?
- What is paging and why does it matter?
- Book Reference: “Operating Systems: Three Easy Pieces” by Remzi Arpaci-Dusseau - Ch. 13-22
- Context Window Economics
- How many tokens fit in GPT-4’s context window?
- What is the cost per token for different models?
- Why can’t we just use infinite context?
- Book Reference: OpenAI and Anthropic documentation on context limits
- Function Calling / Tool Use
- How does an LLM invoke external functions?
- What is the difference between ReAct and function calling?
- How do you prompt an LLM to manage its own memory?
- Book Reference: “AI Engineering” by Chip Huyen - Ch. on Tool Use
- State Machine Design
- How do you model the LLM’s internal reasoning state?
- When should the LLM decide to save vs. retrieve vs. forget?
- How do you handle errors in memory operations?
- Book Reference: Any systems programming book on state machines
Questions to Guide Your Design
- Memory Tier Architecture
- What goes in core memory (always in context) vs. archival (searched on demand)?
- How large should core memory be? What’s the tradeoff?
- How do you decide when core memory is “full” and needs eviction?
- Memory Operations
- What operations should the LLM be able to perform? (append, search, insert, delete)
- How do you format memory operation results for the LLM?
- What happens if a memory search returns nothing?
- Prompting Strategy
- How do you teach the LLM to use memory operations appropriately?
- When should the LLM proactively save information vs. wait to be asked?
- How do you prevent the LLM from over-using memory operations?
- Integration with Knowledge Graph
- How do you connect archival memory to the temporal knowledge graph?
- Can the LLM perform graph queries directly, or only semantic search?
- How do you surface relationship context alongside retrieved memories?
Thinking Exercise
Design a Memory Policy
Consider this scenario: A user has been discussing a complex software architecture over 5 sessions. The conversation includes:
- Technical decisions (which database, framework choices)
- People mentioned (team members, stakeholders)
- Timeline constraints (launch date, sprint deadlines)
- Evolving requirements (features added/removed)
Questions to think through:
- What should go in core memory (always visible)?
- What should be saved to archival immediately?
- What queries would the LLM need to make to archival?
- How would you handle contradictions (requirement changed)?
- When should the LLM forget outdated information?
Sketch out a memory policy document that defines:
- Core memory schema (what sections/categories)
- Archival save triggers (when to persist)
- Search patterns (what kinds of retrieval)
- Eviction policy (what to remove when full)
The Interview Questions They Will Ask
- “How would you implement unbounded conversation memory for an AI assistant?”
- “What are the tradeoffs between automatic memory extraction and explicit memory management?”
- “How does MemGPT/Letta handle context window limitations?”
- “Describe the memory hierarchy in a virtual context system.”
- “How would you measure the effectiveness of an LLM’s memory management?”
- “What prompting techniques help LLMs manage their own memory?”
- “How would you debug an LLM that’s making poor memory management decisions?”
- “Compare MemGPT’s approach to Graphiti’s approach. When would you use each?”
Hints in Layers
Hint 1: Start with the Memory Schema Define what core memory looks like. MemGPT uses sections like “persona” (who is the assistant), “human” (who is the user), and “scratchpad” (working notes). Start simple.
Hint 2: Implement Basic Operations
Create functions for: core_memory_append(section, content), core_memory_replace(section, content), archival_memory_insert(content), archival_memory_search(query, top_k). The LLM will call these.
Hint 3: Design the System Prompt The system prompt must explain the memory model to the LLM. Include:
- What each memory tier contains
- When to use each operation
- Examples of good memory management
- Token budget awareness
Hint 4: Connect to Your Knowledge Graph Instead of a flat vector store for archival, use your temporal KG. When the LLM searches archival memory, translate the query into:
- Semantic search over episode embeddings
- Entity extraction + graph traversal
- BM25 keyword search Combine results with RRF and return to the LLM.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| OS Memory Concepts | “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau | Ch. 13-22 (Virtual Memory) |
| LLM Tool Use | “AI Engineering” by Chip Huyen | Ch. on Agents and Tools |
| State Machines | “Computer Systems: A Programmer’s Perspective” by Bryant | Ch. 8 (Exceptional Control Flow) |
| Prompt Engineering | “Building LLM Apps” by Chip Huyen | Ch. on Prompting |
| System Design | “Designing Data-Intensive Applications” by Kleppmann | Ch. 1-3 (Foundations) |
Common Pitfalls and Debugging
Problem 1: “LLM ignores memory operations”
- Why: System prompt doesn’t emphasize memory management importance
- Fix: Add explicit instructions: “You MUST use memory operations to manage long conversations. Proactively save important information.”
- Quick test: Ask about something from 20 messages ago; if it remembers without retrieval, context isn’t being managed
Problem 2: “LLM over-uses memory operations”
- Why: Every turn triggers save/search, slowing conversation
- Fix: Add guidelines: “Only save information likely to be useful in future sessions. Don’t save trivial exchanges.”
- Quick test: Count memory operations per turn; should average 0.5-2, not 5+
Problem 3: “Core memory grows unbounded”
- Why: No eviction policy; core keeps growing past limit
- Fix: Implement token counting and summarization: When core exceeds limit, summarize oldest sections and move to archival
- Quick test: Monitor core memory tokens over a long session
Problem 4: “Archival search returns irrelevant results”
- Why: Semantic search alone isn’t capturing the query intent
- Fix: Use hybrid retrieval (semantic + keyword + entity extraction)
- Quick test: Search for a specific fact; check if it appears in top 3 results
Problem 5: “Memory operations have high latency”
- Why: Synchronous calls to vector DB and graph DB
- Fix: Batch operations, use async calls, consider local caching for frequent queries
- Quick test: Time a memory search; should be < 500ms
Definition of Done
- Core memory system with defined sections and token limits
- Archival memory connected to temporal knowledge graph
- All memory operations (append, replace, insert, search) working
- LLM successfully uses operations in multi-turn conversation
- Conversation spans “unlimited” length (tested with 100+ turns)
- Memory retrieval returns relevant past context
- Token budget is respected (core never exceeds limit)
- Can inspect memory state at any point in conversation
- Performance: memory operations < 500ms average
Project 12: Hybrid Retrieval Engine
- File: P12-hybrid-retrieval-engine.md
- Expanded Project Guide: P12-hybrid-retrieval-engine.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Information Retrieval, Search
- Software or Tool: Neo4j, Vector DB, BM25, Python
- Main Book: “Introduction to Information Retrieval” by Manning et al.
What you will build: A hybrid retrieval system that combines semantic search (vector similarity), graph traversal (relationship following), and keyword search (BM25) with intelligent result fusion using Reciprocal Rank Fusion (RRF) and Maximal Marginal Relevance (MMR).
Why it teaches temporal knowledge graphs: The power of temporal KGs is unlocked through smart retrieval. Semantic search alone misses temporal relationships; graph traversal alone misses semantic similarity. Combining them with proper fusion is the key to production-quality memory systems.
Core challenges you will face:
- Implementing multiple retrieval paths → Maps to Search Architecture
- Fusing ranked lists from different sources → Maps to RRF/MMR
- Tuning weights and thresholds → Maps to Relevance Engineering
- Measuring retrieval quality → Maps to Evaluation Metrics
Real World Outcome
You’ll have a retrieval API that queries all three sources and returns fused, deduplicated, diverse results.
Example Query:
$ curl -X POST http://localhost:8000/retrieve \
-H "Content-Type: application/json" \
-d '{
"query": "What did Alice say about the API redesign last month?",
"user_id": "user_123",
"top_k": 5,
"retrieval_config": {
"semantic_weight": 0.4,
"graph_weight": 0.4,
"keyword_weight": 0.2,
"use_mmr": true,
"mmr_lambda": 0.7
}
}'
{
"results": [
{
"id": "mem_001",
"content": "Alice proposed splitting the monolith API into microservices",
"source": "semantic",
"score": 0.89,
"fused_rank": 1,
"metadata": {
"timestamp": "2024-11-15T10:30:00Z",
"episode_id": "ep_045",
"entities": ["Alice", "API", "microservices"]
}
},
{
"id": "edge_045",
"content": "Alice PROPOSED API_redesign (valid: 2024-11-15 to present)",
"source": "graph",
"score": 0.85,
"fused_rank": 2,
"metadata": {
"relationship": "PROPOSED",
"subject": "Alice",
"object": "API_redesign",
"valid_from": "2024-11-15"
}
},
{
"id": "mem_003",
"content": "Discussion about API versioning strategy with the team",
"source": "keyword",
"score": 0.72,
"fused_rank": 3,
"metadata": {
"bm25_terms_matched": ["API", "strategy"],
"timestamp": "2024-11-18T14:00:00Z"
}
},
{
"id": "mem_007",
"content": "Alice mentioned concerns about API backward compatibility",
"source": "semantic",
"score": 0.78,
"fused_rank": 4,
"metadata": {
"timestamp": "2024-11-20T09:15:00Z",
"entities": ["Alice", "API", "compatibility"]
}
},
{
"id": "comm_002",
"content": "Community summary: Alice leads API modernization effort",
"source": "graph",
"score": 0.71,
"fused_rank": 5,
"metadata": {
"community_id": "engineering_team",
"summary_date": "2024-11-25"
}
}
],
"retrieval_stats": {
"semantic_candidates": 15,
"graph_candidates": 8,
"keyword_candidates": 12,
"pre_fusion_total": 35,
"post_dedup": 28,
"post_mmr": 5,
"latency_ms": 127
}
}
Retrieval Pipeline Visualization:
Query: "What did Alice say about API redesign last month?"
│
▼
┌────────────────┐
│ Query Analysis │
│ - Embed │
│ - Extract │
│ - Tokenize │
└───────┬────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Semantic │ │ Graph │ │ Keyword │
│ Search │ │ Traversal│ │ BM25 │
│ │ │ │ │ │
│ Vector │ │ Neo4j │ │ Inverted │
│ Index │ │ Cypher │ │ Index │
│ │ │ │ │ │
│ top_20 │ │ top_15 │ │ top_15 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└─────────────┼─────────────┘
▼
┌────────────────┐
│ Deduplication │
│ (by content │
│ fingerprint) │
└───────┬────────┘
▼
┌────────────────┐
│ RRF Fusion │
│ score = Σ 1/(k+rank)│
└───────┬────────┘
▼
┌────────────────┐
│ MMR Selection │
│ diversity vs │
│ relevance │
└───────┬────────┘
▼
┌────────────────┐
│ Final top_k │
└────────────────┘
The Core Question You Are Answering
“How do we combine the strengths of semantic understanding, structural knowledge, and lexical matching to retrieve the most relevant memories from a temporal knowledge graph?”
Each retrieval method has blind spots: semantic search misses exact names, graph traversal misses semantic similarity, keyword search misses synonyms. Understanding how to combine them—and when to weight each—is the key to production retrieval.
Concepts You Must Understand First
- Vector Similarity Search
- What is cosine similarity vs. dot product vs. Euclidean distance?
- How do approximate nearest neighbor (ANN) algorithms work?
- What is the recall-latency tradeoff in vector search?
- Book Reference: “Foundations of Information Retrieval” - Ch. on Vector Space Models
- BM25 and Lexical Retrieval
- How does BM25 score documents?
- What is TF-IDF and how does BM25 improve on it?
- When does keyword search outperform semantic search?
- Book Reference: “Introduction to Information Retrieval” by Manning - Ch. 6
- Graph Query Patterns
- What Cypher patterns find related entities?
- How do you traverse N hops efficiently?
- How do you incorporate temporal filters in graph queries?
- Book Reference: “Graph Databases” by Robinson - Ch. 3-4
- Rank Fusion Methods
- What is Reciprocal Rank Fusion (RRF)?
- How do you handle different score scales?
- What alternatives to RRF exist (CombSUM, CombMNZ)?
- Paper Reference: “Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods” by Cormack et al.
- Diversity in Retrieval
- What is Maximal Marginal Relevance (MMR)?
- Why is diversity important in retrieval?
- How do you balance relevance vs. diversity?
- Book Reference: “Introduction to Information Retrieval” - Ch. 8 (Evaluation)
Questions to Guide Your Design
- Query Analysis
- How do you extract entities from the query for graph search?
- How do you detect temporal expressions (“last month”, “in Q3”)?
- Should you expand the query with synonyms for keyword search?
- Retrieval Configuration
- What default weights work for your domain?
- Should weights be static or query-dependent?
- How many candidates should each retriever return?
- Fusion Strategy
- How do you handle items that appear in multiple retrievers?
- Should you normalize scores before fusion or use RRF’s rank-based approach?
- How do you handle retrievers that return no results?
- Diversity and Deduplication
- How do you detect near-duplicate results?
- What similarity threshold triggers deduplication?
- How aggressively should MMR diversify results?
- Performance
- Can the three retrievals run in parallel?
- What’s the latency budget for the entire pipeline?
- How do you cache frequent queries or entity lookups?
Thinking Exercise
Design a Fusion Strategy
Given these results from three retrievers for the query “Alice’s API work in November”:
Semantic Results (by embedding similarity):
- “Alice proposed the microservices migration” (score: 0.91)
- “Bob reviewed Alice’s API documentation” (score: 0.85)
- “The team discussed API authentication” (score: 0.82)
Graph Results (by traversal relevance):
- Alice –[PROPOSED]–> API_redesign (November 15)
- Alice –[AUTHORED]–> API_docs (November 20)
- API_redesign –[DISCUSSED_BY]–> Engineering_Team
Keyword Results (by BM25):
- “November 2024 API planning meeting with Alice” (score: 12.3)
- “Alice’s November objectives include API modernization” (score: 10.1)
- “API versioning discussion” (score: 8.7)
Questions:
- Which results should appear in the final top-5?
- How would you handle that “Alice’s API documentation” appears in both semantic and graph?
- Should the November temporal filter be applied to all results or just keyword?
- What MMR lambda would give good diversity here?
The Interview Questions They Will Ask
- “Explain the tradeoffs between semantic, keyword, and graph-based retrieval.”
- “What is Reciprocal Rank Fusion and why is it preferred over score averaging?”
- “How would you implement MMR for search result diversification?”
- “How do you handle temporal queries in a hybrid retrieval system?”
- “What metrics would you use to evaluate retrieval quality?”
- “How would you debug a hybrid retrieval system that returns irrelevant results?”
- “What’s the latency breakdown for a typical hybrid retrieval query?”
- “How would you A/B test different retrieval configurations?”
Hints in Layers
Hint 1: Start with Independent Retrievers Build each retriever separately first. Test them independently. Make sure semantic search, graph traversal, and BM25 all return reasonable results on their own.
Hint 2: Implement RRF RRF is simple: for each item, sum 1/(k + rank) across all retrievers where it appears, where k is typically 60. Items with lower ranks (better positions) get higher scores.
RRF_score(item) = Σ 1 / (k + rank_in_retriever)
Hint 3: Add Deduplication Before fusion, compute content fingerprints (could be embedding similarity or hash). Merge items that are semantically the same, keeping the best source metadata.
Hint 4: Implement MMR After RRF gives you a ranked list, apply MMR to select the final top_k. MMR iteratively selects items that maximize: λ * relevance - (1-λ) * max_similarity_to_selected.
Hint 5: Parallelize Retrievers Use asyncio or threading to run all three retrievers concurrently. The total latency should be max(retriever latencies) not sum.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| IR Fundamentals | “Introduction to Information Retrieval” by Manning | Ch. 1-6 |
| BM25 | “Introduction to Information Retrieval” by Manning | Ch. 6.3 |
| Vector Search | “Foundations of Vector Retrieval” (online resources) | ANN algorithms |
| Evaluation | “Introduction to Information Retrieval” by Manning | Ch. 8 |
| Diversity | Papers on MMR by Carbonell & Goldstein | Original MMR paper |
| Production Search | “Relevant Search” by Turnbull & Berryman | Ch. 9-11 |
Common Pitfalls and Debugging
Problem 1: “Graph results dominate even when irrelevant”
- Why: Graph traversal returns anything connected, regardless of semantic relevance
- Fix: Filter graph results by embedding similarity to query; only keep edges above threshold
- Quick test: Run graph retrieval alone; manually check if results are relevant
Problem 2: “RRF scores are too close together”
- Why: All results have similar ranks across retrievers
- Fix: Increase the number of candidates from each retriever; use score-weighted RRF variant
- Quick test: Log the RRF score distribution; should have clear separation
Problem 3: “MMR removes the most relevant result”
- Why: Lambda is too low (over-emphasizing diversity)
- Fix: Increase lambda to 0.7-0.9; ensure the most relevant item is always selected first
- Quick test: Set lambda=1.0 (pure relevance); verify top result is correct
Problem 4: “Temporal queries return results from wrong time period”
- Why: Temporal filter only applied to one retriever
- Fix: Parse temporal expressions early; apply date filters to all three retrievers
- Quick test: Query “Alice in November”; verify no October/December results
Problem 5: “Retrieval latency exceeds 500ms”
- Why: Sequential retriever calls or slow graph queries
- Fix: Parallelize retrievers; add indexes to graph DB; cache embedding for frequent entities
- Quick test: Time each component separately; identify bottleneck
Definition of Done
- Semantic retriever working with vector similarity
- Graph retriever working with Cypher temporal queries
- Keyword retriever working with BM25 index
- RRF fusion combines results from all three
- Deduplication handles near-duplicate results
- MMR provides diversity in final results
- Temporal expressions parsed and applied as filters
- All retrievers run in parallel
- Latency < 200ms for typical queries
- API returns structured results with source attribution
- Retrieval quality measured on test dataset
Project 13: Multi-Agent Shared Memory
- File: P13-multi-agent-shared-memory.md
- Expanded Project Guide: P13-multi-agent-shared-memory.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript, Go
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Distributed Systems, Multi-Agent Coordination
- Software or Tool: Neo4j, Redis, LangGraph, Python
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you will build: A shared memory substrate that multiple AI agents can read from and write to, with conflict resolution, access control, and real-time synchronization. Agents can collaborate on tasks by sharing facts, observations, and decisions through the temporal knowledge graph.
Why it teaches temporal knowledge graphs: Real-world AI systems increasingly involve multiple agents. Understanding how to share a knowledge graph across agents—while maintaining consistency, handling conflicts, and enabling collaboration—is essential for production multi-agent architectures.
Core challenges you will face:
- Concurrent writes from multiple agents → Maps to Distributed Consistency
- Conflict detection and resolution → Maps to CRDT / Versioning
- Agent-specific vs. shared knowledge → Maps to Access Control
- Real-time synchronization → Maps to Event-Driven Architecture
Real World Outcome
You’ll have a memory system where multiple agents can collaborate through shared knowledge, see each other’s contributions, and resolve conflicts when they disagree.
Example Multi-Agent Session:
$ multiagent start --agents "researcher,analyst,writer" --shared-memory neo4j
[System] Starting multi-agent session
[System] Shared memory: Neo4j @ localhost:7687
[System] Agents connected: researcher, analyst, writer
# Researcher agent finds information
[researcher] Found: "OpenAI released GPT-4 Turbo in November 2023"
[researcher] Writing to shared memory...
[Memory] Created: (GPT4_Turbo)-[:RELEASED_BY {date: 2023-11}]->(OpenAI)
# Analyst agent reads and adds analysis
[analyst] Reading shared memory for "GPT-4 Turbo"...
[analyst] Found 1 fact from researcher
[analyst] Adding analysis: "GPT-4 Turbo has 128K context window"
[Memory] Created: (GPT4_Turbo)-[:HAS_FEATURE]->(Context_128K)
[Memory] Created: (analyst)-[:CONTRIBUTED]->(GPT4_Turbo_analysis)
# Writer agent synthesizes
[writer] Reading shared memory for "GPT-4 Turbo" and "context window"...
[writer] Found 2 facts from researcher, analyst
[writer] Creating summary node...
[Memory] Created: (Summary_001)-[:SYNTHESIZES]->(GPT4_Turbo)
[Memory] Created: (Summary_001)-[:SYNTHESIZES]->(GPT4_Turbo_analysis)
# Conflict scenario
[researcher] Update: "GPT-4 Turbo context is actually 128K tokens"
[analyst] Concurrent update: "GPT-4 Turbo context is 200K tokens"
[Memory] CONFLICT DETECTED on (Context_128K)
[Memory] Resolution: Keep researcher's version (higher confidence score)
[Memory] Created: (Context_conflict)-[:REJECTED_CLAIM {agent: analyst, value: 200K}]
Memory State Visualization:
$ multiagent memory --graph-view
Multi-Agent Shared Memory Graph
================================
Entities (shared):
[GPT4_Turbo] ← researcher (creator)
[OpenAI] ← researcher (creator)
[Context_128K] ← researcher (creator, analyst conflict)
[Summary_001] ← writer (creator)
Relationships:
GPT4_Turbo --RELEASED_BY--> OpenAI
└─ created_by: researcher
└─ created_at: 2024-12-15T10:30:00Z
GPT4_Turbo --HAS_FEATURE--> Context_128K
└─ created_by: researcher
└─ conflict_from: analyst (rejected: 200K)
└─ resolution: higher_confidence
Summary_001 --SYNTHESIZES--> GPT4_Turbo
└─ created_by: writer
└─ sources: [researcher, analyst]
Agent Contributions:
researcher: 3 entities, 2 relationships
analyst: 1 entity, 1 relationship (1 rejected)
writer: 1 entity, 2 relationships
Conflicts Resolved: 1 (confidence-based)
Access Control Example:
$ multiagent acl --show
Access Control Matrix
=====================
| Resource | researcher | analyst | writer |
|-----------------|------------|---------|--------|
| Entity: Create | ✓ | ✓ | ✓ |
| Entity: Read | ✓ | ✓ | ✓ |
| Entity: Delete | ✓ | ✗ | ✗ |
| Relation: Create| ✓ | ✓ | ✓ |
| Summary: Create | ✗ | ✗ | ✓ |
| Conflict: Resolve| ✓ | ✓ | ✗ |
# Researcher tries to create summary (denied)
[researcher] Creating summary...
[Memory] ACCESS DENIED: researcher cannot create Summary entities
The Core Question You Are Answering
“How do multiple AI agents share a common knowledge base while maintaining consistency, resolving conflicts, and respecting access boundaries?”
This question is fundamental to the future of AI—as systems move from single agents to agent swarms. Understanding how to build collaborative memory is essential for orchestration frameworks like LangGraph, CrewAI, and AutoGPT.
Concepts You Must Understand First
- Distributed Consistency Models
- What is eventual consistency vs. strong consistency?
- What are the CAP theorem tradeoffs?
- How do you handle concurrent writes?
- Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 5, 7
- Conflict Resolution Strategies
- What is last-writer-wins (LWW)?
- How do CRDTs handle concurrent updates?
- When should conflicts require human/agent arbitration?
- Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 5
- Event Sourcing and Change Propagation
- How do you notify agents of memory changes?
- What is pub/sub in the context of shared state?
- How do you handle agent disconnection and reconnection?
- Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 11
- Access Control Models
- What is role-based access control (RBAC)?
- How do you implement attribute-based access control (ABAC)?
- How do you audit who changed what?
- Book Reference: Security and access control literature
- Multi-Agent Architectures
- How do agents coordinate in LangGraph?
- What is the difference between shared memory vs. message passing?
- How do you handle agent failures?
- Book Reference: LangGraph and CrewAI documentation
Questions to Guide Your Design
- Memory Partitioning
- Should agents have private memory plus shared memory?
- How do you migrate knowledge from private to shared?
- What’s the schema for attributing facts to agents?
- Conflict Handling
- How do you detect conflicting facts (e.g., contradictory dates)?
- What’s the default resolution strategy?
- How do you log rejected alternatives for future review?
- Synchronization
- How quickly do changes propagate to other agents?
- Do agents poll for changes or receive push notifications?
- How do you handle offline agents?
- Access Control
- What operations can each agent type perform?
- Can agents grant permissions to other agents?
- How do you audit access and modifications?
- Agent Identity
- How do you identify which agent made which contribution?
- Can agents see each other’s reasoning process?
- How do you handle anonymous or system-generated facts?
Thinking Exercise
Design a Multi-Agent Knowledge Flow
Three agents are researching a topic:
- Researcher: Finds raw facts from external sources
- Analyst: Evaluates and synthesizes facts
- Writer: Creates final summaries
Design the memory flow:
- What entities/relationships does each agent create?
- What can each agent read vs. write?
- How do you handle when Researcher and Analyst disagree on a fact?
- How does Writer know when enough facts are ready for synthesis?
- What happens if Researcher updates a fact after Writer has used it?
Sketch the state transitions and conflict scenarios.
The Interview Questions They Will Ask
- “How would you design a shared memory system for multiple AI agents?”
- “What consistency model would you choose for multi-agent knowledge graphs?”
- “How do you handle conflicting facts from different agents?”
- “Explain the tradeoffs between shared memory and message passing for agent coordination.”
- “How would you implement access control for a multi-agent system?”
- “What happens when an agent crashes mid-write to shared memory?”
- “How do you ensure all agents see a consistent view of the knowledge graph?”
- “How would you debug a multi-agent system where agents are writing conflicting facts?”
Hints in Layers
Hint 1: Start with Agent Attribution
Add an agent_id field to every node and edge in your graph. This is the foundation for tracking who contributed what and for access control.
Hint 2: Implement Optimistic Locking Use version numbers on entities. When an agent updates an entity, it must provide the version it read. If the version has changed, the update is rejected (conflict).
Hint 3: Build a Change Log Create a separate event log (in Redis or Kafka) that records every memory operation. Agents can subscribe to relevant events to stay synchronized.
Hint 4: Use Neo4j Transactions
Graph databases support transactions. Ensure conflicting writes are handled atomically. Use MERGE with conflict detection.
Hint 5: Integrate with LangGraph LangGraph provides state management for multi-agent workflows. Your shared memory can be the persistent backing store for LangGraph’s state object.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Distributed Consistency | “Designing Data-Intensive Applications” by Kleppmann | Ch. 5, 7, 9 |
| Event Sourcing | “Designing Data-Intensive Applications” by Kleppmann | Ch. 11 |
| Conflict Resolution | “Designing Data-Intensive Applications” by Kleppmann | Ch. 5 (CRDT section) |
| Multi-Agent Systems | LangGraph Documentation | State Management |
| Access Control | “Security Engineering” by Ross Anderson | Ch. 4 |
Common Pitfalls and Debugging
Problem 1: “Lost updates - agent’s write disappears”
- Why: Another agent overwrote without checking version
- Fix: Implement optimistic locking; reject updates with stale versions
- Quick test: Have two agents write to same entity simultaneously; verify conflict is detected
Problem 2: “Agents see stale data”
- Why: Caching or propagation delay
- Fix: Use real-time subscriptions (WebSocket/Redis pub-sub); invalidate cache on change events
- Quick test: Agent A writes; measure time until Agent B sees update
Problem 3: “Circular attribution - who contributed what?”
- Why: Agents read from each other and re-contribute same facts
- Fix: Track provenance chain; deduplicate facts by content hash regardless of agent
- Quick test: Have Agent B read Agent A’s fact and re-write it; verify no duplicate
Problem 4: “Access control bypassed”
- Why: Enforcement only at API level, not database level
- Fix: Use database-level constraints (Neo4j roles); validate in middleware
- Quick test: Have low-privilege agent attempt forbidden operation via direct DB access
Problem 5: “Conflict resolution always picks same agent”
- Why: Confidence scores are biased toward one agent type
- Fix: Calibrate confidence scores; add randomization or round-robin for ties
- Quick test: Create equal-confidence conflict; verify fair resolution
Definition of Done
- Multiple agents can connect to shared Neo4j instance
- Every entity/edge has agent attribution
- Concurrent writes to same entity detected as conflict
- Conflict resolution strategy implemented (configurable)
- Rejected alternatives stored for audit
- Real-time change propagation between agents
- Access control enforced (agents have different permissions)
- Audit log records all operations with agent ID
- LangGraph integration working (shared state backed by graph)
- Demo scenario: 3 agents collaborate on research task
Project 14: Production Memory Service
- File: P14-production-memory-service.md
- Expanded Project Guide: P14-production-memory-service.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Production Systems, DevOps
- Software or Tool: Docker, Kubernetes, Neo4j, Redis, FastAPI
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you will build: A production-ready memory service with multi-tenancy, rate limiting, monitoring, horizontal scaling, backup/restore, and operational tooling. This is the service you would deploy to power memory for thousands of AI agents.
Why it teaches temporal knowledge graphs: Building a service that works in development is easy. Building one that works in production—with real traffic, multi-tenancy, failures, and scale—requires understanding the full stack: database operations, caching, observability, and operational procedures.
Core challenges you will face:
- Multi-tenancy isolation → Maps to Database Design
- Rate limiting and quotas → Maps to API Gateway
- Horizontal scaling → Maps to Distributed Systems
- Operational tooling → Maps to DevOps / SRE
Real World Outcome
You’ll have a deployable memory service with proper multi-tenancy, monitoring, and operational procedures.
Production API Example:
# Create a memory for a tenant
$ curl -X POST https://memory.yourdomain.com/v1/memory \
-H "Authorization: Bearer $TENANT_API_KEY" \
-H "X-Tenant-ID: tenant_acme" \
-d '{
"user_id": "user_123",
"episode": {
"content": "User discussed project timeline",
"metadata": {"session_id": "sess_456"}
}
}'
{
"memory_id": "mem_789",
"tenant_id": "tenant_acme",
"user_id": "user_123",
"status": "processing",
"created_at": "2024-12-15T10:30:00Z"
}
# Check processing status
$ curl https://memory.yourdomain.com/v1/memory/mem_789/status \
-H "Authorization: Bearer $TENANT_API_KEY"
{
"memory_id": "mem_789",
"status": "completed",
"entities_extracted": 3,
"relationships_created": 2,
"processing_time_ms": 1250
}
Monitoring Dashboard:
Memory Service Dashboard (Grafana)
==================================
┌─────────────────────────────────────────────────────────────┐
│ Request Rate (last 1h) │
│ │
│ 800 ┤ ╭────╮ │
│ 600 ┤ ╭────╯ ╰────╮ │
│ 400 ┤ ╭────╯ ╰────╮ │
│ 200 ┤ ╭────╯ ╰────╮ │
│ 0 └─────────────────────────────────────────────────────│
│ 10:00 10:15 10:30 10:45 11:00 │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Latency Percentiles │
│ │
│ p50: 45ms [██████████░░░░░░░░░░] │
│ p95: 125ms [█████████████████░░░] │
│ p99: 280ms [███████████████████░] │
│ │
│ SLA: p99 < 500ms ✓ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Tenant Usage (today) │
│ │
│ tenant_acme: 45,230 requests [████████████████░░░░] │
│ tenant_beta: 23,100 requests [████████░░░░░░░░░░░░] │
│ tenant_gamma: 12,450 requests [████░░░░░░░░░░░░░░░░] │
│ │
│ Rate limit alerts: tenant_acme (approaching 80%) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ System Health │
│ │
│ Neo4j Primary: ✓ Healthy (12.3% CPU, 45% RAM) │
│ Neo4j Replica: ✓ Healthy (8.1% CPU, 42% RAM) │
│ Redis Cache: ✓ Healthy (15.2% RAM, 89% hit rate) │
│ API Pods: ✓ 4/4 Ready (avg 23% CPU) │
│ Queue Depth: 127 messages (< 500 threshold) │
└─────────────────────────────────────────────────────────────┘
Operational Procedures:
# Backup tenant data
$ memctl backup --tenant tenant_acme --output s3://backups/
Backing up tenant_acme...
- Neo4j nodes: 45,230
- Neo4j relationships: 89,450
- Vector embeddings: 45,230
- Total size: 1.2 GB
Backup completed: s3://backups/tenant_acme_2024-12-15.tar.gz
# Restore tenant data
$ memctl restore --tenant tenant_acme --from s3://backups/tenant_acme_2024-12-15.tar.gz
Validating backup integrity... ✓
Restoring to staging environment first...
- Neo4j nodes: 45,230 ✓
- Neo4j relationships: 89,450 ✓
- Vector embeddings: 45,230 ✓
Verification passed. Apply to production? [y/N] y
Restore completed.
# Scale up for traffic spike
$ kubectl scale deployment memory-api --replicas=8
deployment.apps/memory-api scaled
# View tenant quotas
$ memctl quota --tenant tenant_acme
Tenant: tenant_acme
Plan: Pro
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
│ Resource │ Used │ Limit │
├───────────────────┼─────────┼─────────┤
│ Requests/day │ 45,230 │ 100,000 │
│ Memories stored │ 234,567 │ 500,000 │
│ Storage (GB) │ 2.3 │ 10 │
│ Users │ 156 │ 500 │
│ Entity extraction │ 12,345 │ 50,000 │
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The Core Question You Are Answering
“What does it take to run a temporal knowledge graph memory service at production scale, serving multiple tenants with reliability, security, and observability?”
The difference between a demo and a production system is enormous. This project forces you to think about multi-tenancy, failures, scaling, security, and operations—skills essential for any production AI system.
Concepts You Must Understand First
- Multi-Tenancy Patterns
- What is the difference between shared-database vs. database-per-tenant?
- How do you ensure tenant data isolation?
- How do you handle noisy neighbors?
- Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 12
- Rate Limiting and Quotas
- What algorithms exist for rate limiting (token bucket, sliding window)?
- How do you enforce quotas across distributed API servers?
- How do you handle burst traffic?
- Book Reference: “System Design Interview” by Alex Xu - Rate Limiting chapter
- Observability (Logs, Metrics, Traces)
- What should you log for debugging?
- What metrics indicate service health?
- How does distributed tracing work?
- Book Reference: “Observability Engineering” by Charity Majors et al.
- Database Operations
- How do you backup and restore Neo4j?
- How do you handle schema migrations?
- What is the procedure for database failover?
- Book Reference: Neo4j Operations Manual
- Container Orchestration
- How does Kubernetes handle deployments and scaling?
- What is a health check and readiness probe?
- How do you do zero-downtime deployments?
- Book Reference: “Kubernetes in Action” by Marko Luksa
Questions to Guide Your Design
- Multi-Tenancy
- How do you partition data by tenant in Neo4j?
- How do you prevent one tenant from querying another’s data?
- What metadata do you store about each tenant?
- API Design
- What authentication method (API keys, JWT, OAuth)?
- How do you version your API?
- What rate limits per endpoint?
- Reliability
- What is your SLA (e.g., 99.9% uptime)?
- How do you handle Neo4j primary failure?
- What is your backup/restore procedure?
- Scaling
- What is the bottleneck as traffic increases?
- How do you scale API servers vs. database?
- When do you need to shard the graph?
- Operational Tooling
- What CLI commands do operators need?
- What alerts should page on-call?
- What runbooks do you need?
Thinking Exercise
Design Tenant Isolation
You have 100 tenants sharing one Neo4j instance. Design the isolation:
- How do you label nodes/edges with tenant ID?
- How do you ensure every query includes tenant filter?
- What happens if a bug allows cross-tenant query?
- How do you audit tenant data access?
- Can you guarantee a tenant’s data is fully deleted?
Sketch the data model and query patterns that ensure isolation.
The Interview Questions They Will Ask
- “How would you design a multi-tenant memory service?”
- “What’s your strategy for database backup and disaster recovery?”
- “How do you handle a traffic spike from one tenant?”
- “What metrics would you monitor for a memory service?”
- “How do you ensure tenant data isolation at the database level?”
- “Describe your zero-downtime deployment process.”
- “What happens when Neo4j runs out of disk space?”
- “How would you debug slow queries affecting multiple tenants?”
Hints in Layers
Hint 1: Use Tenant Labels
Every node and relationship in Neo4j should have a tenant_id property. Create indexes on this property. All queries must filter by tenant.
Hint 2: Implement API Gateway Pattern Use an API gateway (Kong, Ambassador, or custom) for authentication, rate limiting, and tenant routing. This keeps business logic in the API servers.
Hint 3: Set Up Prometheus + Grafana Expose metrics from your FastAPI service using prometheus_client. Track request count, latency histograms, error rates, and custom business metrics.
Hint 4: Write Runbooks First Before building operational tooling, write runbooks for common scenarios: tenant onboarding, backup/restore, scaling, incident response. Then automate.
Hint 5: Test Failure Scenarios Use chaos engineering principles. What happens when Neo4j is unavailable? When Redis cache fails? When API pod crashes? Build resilience for each.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Production Systems | “Designing Data-Intensive Applications” by Kleppmann | Ch. 12 |
| Observability | “Observability Engineering” by Charity Majors | Ch. 1-5 |
| SRE Practices | “Site Reliability Engineering” by Google | Ch. 4, 8, 10 |
| Kubernetes | “Kubernetes in Action” by Luksa | Ch. 5, 11 |
| API Design | “Design and Build Great Web APIs” by Amundsen | Ch. 6-8 |
Common Pitfalls and Debugging
Problem 1: “Cross-tenant data leak”
- Why: Query missing tenant_id filter
- Fix: Middleware that injects tenant filter into all queries; integration tests for isolation
- Quick test: Attempt to query with wrong tenant ID; verify 0 results
Problem 2: “Rate limiter is inconsistent across pods”
- Why: In-memory rate limiting; each pod has separate count
- Fix: Use Redis for centralized rate limit counters
- Quick test: Send requests to different pods; verify global limit applies
Problem 3: “Backup restore fails with schema mismatch”
- Why: Schema evolved since backup; restore incompatible
- Fix: Include schema version in backup; run migrations during restore
- Quick test: Restore week-old backup to staging; verify migrations apply
Problem 4: “Can’t scale beyond 4 API pods”
- Why: Database connection pool exhausted
- Fix: Use connection pooling (PgBouncer equivalent for Neo4j); tune pool size
- Quick test: Monitor active connections during scale-up
Problem 5: “No visibility into tenant-specific issues”
- Why: Metrics not tagged by tenant
- Fix: Add tenant_id label to all metrics; create per-tenant dashboards
- Quick test: Filter Grafana dashboard by tenant; verify data appears
Definition of Done
- Multi-tenancy with data isolation (tenant_id on all nodes/edges)
- API authentication with tenant-scoped API keys
- Rate limiting per tenant (Redis-backed)
- Quota enforcement (storage, requests, users)
- Prometheus metrics with tenant labels
- Grafana dashboard with key metrics
- Alerting for SLA breaches and quota warnings
- Backup procedure tested (Neo4j + vectors)
- Restore procedure tested on staging
- Kubernetes deployment with health checks
- Zero-downtime deployment verified
- Runbooks for common operations
- CLI tool for operator tasks (memctl)
Project 15: Memory Benchmark Suite
- File: P15-memory-benchmark-suite.md
- Expanded Project Guide: P15-memory-benchmark-suite.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Evaluation, Benchmarking, ML Metrics
- Software or Tool: Python, Pytest, LLM APIs
- Main Book: “Evaluating Machine Learning Models” by Alice Zheng
What you will build: A comprehensive benchmark suite for evaluating memory systems, including datasets, metrics, and comparison tooling. You’ll implement benchmarks inspired by DMR (Dialogue Memory Retrieval), LongMemEval, and custom metrics for temporal reasoning.
Why it teaches temporal knowledge graphs: Building effective memory systems requires measuring effectiveness. Understanding how to evaluate retrieval quality, temporal reasoning, and end-to-end task performance is essential for iterating on your memory architecture.
Core challenges you will face:
- Defining what “good memory” means → Maps to Metric Design
- Creating realistic test datasets → Maps to Data Engineering
- Measuring temporal reasoning ability → Maps to Temporal Evaluation
- Comparing systems fairly → Maps to Experimental Design
Real World Outcome
You’ll have a benchmark suite that can evaluate any memory system and produce detailed comparison reports.
Benchmark Execution:
$ membench run --suite full --systems "graphiti,mem0,baseline_rag"
Memory Benchmark Suite v1.0
============================
Loading benchmark datasets...
- DMR-derived: 500 dialogues, 5,000 queries
- LongMemEval-derived: 100 long conversations
- Temporal reasoning: 200 time-based queries
- Multi-hop: 150 relationship queries
Running benchmarks...
[1/4] Retrieval Quality (DMR-derived)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
| Metric | Graphiti | Mem0 | Baseline RAG |
|------------|----------|-------|--------------|
| Recall@5 | 0.847 | 0.812 | 0.723 |
| Recall@10 | 0.912 | 0.889 | 0.801 |
| MRR | 0.756 | 0.721 | 0.634 |
| NDCG@10 | 0.834 | 0.798 | 0.712 |
[2/4] Temporal Reasoning
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
| Query Type | Graphiti | Mem0 | Baseline RAG |
|--------------------|----------|-------|--------------|
| "Before X" | 0.89 | 0.72 | 0.34 |
| "After X" | 0.91 | 0.75 | 0.38 |
| "During period" | 0.85 | 0.68 | 0.29 |
| "Sequence order" | 0.78 | 0.61 | 0.22 |
| "Most recent" | 0.94 | 0.88 | 0.67 |
| Overall | 0.874 | 0.728 | 0.380 |
[3/4] Multi-hop Reasoning
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
| Hops | Graphiti | Mem0 | Baseline RAG |
|-------|----------|-------|--------------|
| 1-hop | 0.92 | 0.88 | 0.85 |
| 2-hop | 0.81 | 0.71 | 0.52 |
| 3-hop | 0.67 | 0.48 | 0.23 |
[4/4] End-to-End Task Performance
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
| Task | Graphiti | Mem0 | Baseline RAG |
|-------------------------|----------|-------|--------------|
| User preference recall | 0.89 | 0.84 | 0.71 |
| Fact consistency | 0.94 | 0.91 | 0.82 |
| Contradiction detection | 0.78 | 0.65 | 0.31 |
| Long-term coherence | 0.85 | 0.79 | 0.58 |
Summary Report
==============
Overall Winner: Graphiti (avg score: 0.847)
Strengths by system:
- Graphiti: Best temporal reasoning (0.874), best multi-hop (3-hop: 0.67)
- Mem0: Good balance, simpler setup
- Baseline RAG: Fast, simple, good for 1-hop queries
Recommendations:
- Use Graphiti when temporal queries are important
- Use Mem0 for simpler use cases with less temporal complexity
- Baseline RAG is insufficient for production memory needs
Detailed report: ./benchmark_results/report_2024-12-15.html
Temporal Reasoning Test Cases:
$ membench examples --type temporal
Temporal Reasoning Test Examples
================================
Test 1: "Before X" Query
------------------------
Context: Alice mentioned preferring Python on March 1st. She started learning
Rust on March 15th. She completed her first Rust project on April 1st.
Query: "What programming language did Alice prefer before starting Rust?"
Expected: Python
Rationale: Must understand temporal ordering to answer correctly
Test 2: "Most Recent" Query
---------------------------
Context: Bob's favorite restaurant was Italian (Jan), then Mexican (March),
then Japanese (September).
Query: "What is Bob's current favorite restaurant type?"
Expected: Japanese
Rationale: Must retrieve most recent fact, not all facts
Test 3: Contradiction Detection
-------------------------------
Context: "The project deadline is December 15" (said on Nov 1).
"The project deadline was moved to January 5" (said on Dec 1).
Query: "What is the project deadline?"
Expected: January 5 (with note about change)
Rationale: Must handle fact updates correctly
Test 4: Sequence Ordering
-------------------------
Context: Five meetings about the product launch.
Query: "What was discussed in the meeting before the final design review?"
Expected: Content from the fourth meeting
Rationale: Must understand relative ordering
The Core Question You Are Answering
“How do we objectively measure whether a memory system is effective, and what metrics capture the unique requirements of temporal knowledge graphs?”
Without benchmarks, memory system development is guesswork. Understanding how to evaluate retrieval quality, temporal reasoning, and task performance enables data-driven iteration and fair comparison.
Concepts You Must Understand First
- Information Retrieval Metrics
- What is Precision@k vs. Recall@k?
- What is Mean Reciprocal Rank (MRR)?
- What is Normalized Discounted Cumulative Gain (NDCG)?
- Book Reference: “Introduction to Information Retrieval” by Manning - Ch. 8
- Benchmark Dataset Design
- What makes a good benchmark dataset?
- How do you avoid data leakage?
- How do you ensure realistic difficulty?
- Paper Reference: DMR and LongMemEval papers
- Temporal Evaluation
- How do you measure temporal reasoning ability?
- What query types test temporal understanding?
- How do you create ground truth for temporal queries?
- Book Reference: Temporal database literature
- Statistical Significance
- How do you know if one system is truly better?
- What statistical tests apply to ranking metrics?
- How many test samples do you need?
- Book Reference: “Statistics for Machine Learning” or any ML evaluation book
- End-to-End Evaluation
- How do you measure task completion quality?
- What is the role of human evaluation?
- How do you use LLMs as evaluators?
- Paper Reference: LLM-as-Judge papers
Questions to Guide Your Design
- Dataset Creation
- What sources will you use for test dialogues?
- How do you annotate ground truth for retrieval?
- How do you ensure diversity in test cases?
- Metric Selection
- Which metrics best capture memory system quality?
- How do you weight different metric categories?
- What thresholds indicate “good enough”?
- Temporal Benchmarks
- What temporal query types will you test?
- How do you generate temporal ground truth?
- How do you handle ambiguous temporal references?
- Reproducibility
- How do you ensure consistent LLM outputs for evaluation?
- How do you handle model updates?
- How do you share benchmarks with the community?
- Automation
- How do you run benchmarks without manual intervention?
- How do you generate reports automatically?
- How do you track performance over time?
Thinking Exercise
Design a Temporal Reasoning Benchmark
Create 5 test cases for each temporal reasoning type:
- Recency: “What is the latest X?”
- Ordering: “What happened before/after X?”
- Duration: “How long was X valid?”
- Change detection: “When did X change?”
- Point-in-time: “What was X on date Y?”
For each test case, define:
- The context (facts with timestamps)
- The query
- The expected answer
- Why this tests temporal reasoning
The Interview Questions They Will Ask
- “How would you evaluate a memory system’s retrieval quality?”
- “What metrics would you use for temporal reasoning benchmarks?”
- “How do you handle subjective evaluation in memory systems?”
- “Describe the difference between Recall@k, MRR, and NDCG.”
- “How would you create a benchmark dataset for memory systems?”
- “What’s the role of LLM-as-Judge in memory evaluation?”
- “How do you ensure benchmark results are statistically significant?”
- “How would you benchmark contradiction detection ability?”
Hints in Layers
Hint 1: Start with Existing Benchmarks Look at DMR and LongMemEval papers. Adapt their methodology for your temporal KG context. Don’t reinvent evaluation from scratch.
Hint 2: Implement Standard IR Metrics First Use well-tested libraries (ranx, evaluate) for Recall, MRR, NDCG. These are your baseline metrics.
Hint 3: Create Synthetic Temporal Data Generate dialogues with controlled temporal properties. This lets you test specific temporal reasoning abilities in isolation.
Hint 4: Use LLM-as-Judge for Subjective Tasks For tasks like “coherence” or “helpfulness,” use GPT-4 as an evaluator. Prompt it with rubrics and examples.
Hint 5: Build a Leaderboard Create a simple web page that tracks benchmark results over time. This helps you see progress and regressions.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| IR Evaluation | “Introduction to Information Retrieval” by Manning | Ch. 8 |
| ML Evaluation | “Evaluating Machine Learning Models” by Zheng | Full book |
| Benchmark Design | DMR paper by Xu et al. | Methodology section |
| Statistical Testing | “Statistics for ML Engineers” | Hypothesis testing |
| LLM Evaluation | “LLM Evaluation” literature | Recent papers |
Common Pitfalls and Debugging
Problem 1: “Benchmark results are inconsistent”
- Why: LLM outputs vary; different random seeds
- Fix: Use temperature=0 for LLM calls; set random seeds; average over multiple runs
- Quick test: Run benchmark twice; verify variance < 5%
Problem 2: “All systems score similarly”
- Why: Benchmark is too easy or metrics not discriminative
- Fix: Add harder test cases; use metrics that spread scores (NDCG vs. Recall)
- Quick test: Check score distribution; should have clear separation
Problem 3: “Temporal benchmark has ambiguous ground truth”
- Why: Human annotators disagree on temporal interpretation
- Fix: Create clearer temporal constraints; use multiple annotators; measure inter-annotator agreement
- Quick test: Have 3 people annotate same cases; compute agreement
Problem 4: “Benchmark takes too long to run”
- Why: Too many LLM calls; large dataset
- Fix: Create “quick” vs. “full” benchmark modes; cache LLM embeddings
- Quick test: “Quick” mode should complete in < 10 minutes
Problem 5: “Can’t compare systems fairly”
- Why: Different preprocessing, different context lengths
- Fix: Standardize inputs; control for context length; document all settings
- Quick test: Verify all systems see identical inputs for each test case
Definition of Done
- Retrieval benchmark with Recall, MRR, NDCG
- Temporal reasoning benchmark with 5+ query types
- Multi-hop reasoning benchmark (1-hop to 3-hop)
- End-to-end task benchmark (user preference, contradiction, coherence)
- Dataset with 500+ test cases
- Automated benchmark runner (CLI)
- HTML report generation with visualizations
- Statistical significance testing
- Comparison of 3+ memory systems
- Reproducible results (seeds, versioning)
- Documentation of benchmark methodology
Project Comparison Table
| # | Project Name | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|---|
| 1 | Personal Memory Graph CLI | Level 1: Beginner | Weekend | Foundation | ★★★☆☆ |
| 2 | Conversation Episode Store | Level 2: Intermediate | 1 week | Core Storage | ★★★☆☆ |
| 3 | Entity Extraction Pipeline | Level 2: Intermediate | 1 week | NLP Integration | ★★★★☆ |
| 4 | Entity Resolution System | Level 3: Advanced | 1-2 weeks | Deduplication | ★★★☆☆ |
| 5 | Bi-Temporal Fact Store | Level 3: Advanced | 1-2 weeks | Temporal Models | ★★★★☆ |
| 6 | Temporal Query Engine | Level 3: Advanced | 2 weeks | Query Language | ★★★★★ |
| 7 | Semantic Memory Synthesizer | Level 3: Advanced | 2 weeks | Summarization | ★★★★☆ |
| 8 | Community Detection & Summaries | Level 4: Expert | 2-3 weeks | Graph Algorithms | ★★★★★ |
| 9 | Graphiti Framework Integration | Level 4: Expert | 2-3 weeks | Production Framework | ★★★★★ |
| 10 | Mem0g Memory Layer | Level 3: Advanced | 1-2 weeks | Alternative Approach | ★★★★☆ |
| 11 | MemGPT-Style Virtual Context | Level 4: Expert | 3-4 weeks | OS-Inspired Memory | ★★★★★ |
| 12 | Hybrid Retrieval Engine | Level 4: Expert | 2-3 weeks | Search Architecture | ★★★★★ |
| 13 | Multi-Agent Shared Memory | Level 4: Expert | 3-4 weeks | Distributed Systems | ★★★★★ |
| 14 | Production Memory Service | Level 5: Master | 4+ weeks | DevOps/SRE | ★★★★☆ |
| 15 | Memory Benchmark Suite | Level 4: Expert | 2-3 weeks | Evaluation | ★★★★☆ |
Recommendation
If You Are New to Knowledge Graphs
Start with Project 1: Personal Memory Graph CLI
This project gives you hands-on experience with Neo4j and graph data modeling without overwhelming complexity. You’ll learn to think in nodes and relationships, which is foundational for everything else.
Then progress to: Project 2 (storage patterns) → Project 3 (entity extraction) → Project 5 (bi-temporal)
If You Are a Backend Developer Exploring AI Memory
Start with Project 9: Graphiti Framework Integration
You already understand databases and APIs. Graphiti gives you a production-quality framework to study. Understanding how professionals solved these problems accelerates your learning.
Then progress to: Project 10 (compare with Mem0) → Project 12 (hybrid retrieval) → Project 14 (production)
If You Want to Deeply Understand Temporal Reasoning
Start with Project 5: Bi-Temporal Fact Store
Bi-temporal modeling is the intellectual heart of temporal knowledge graphs. Master this, and the rest follows logically.
Then progress to: Project 6 (temporal queries) → Project 8 (community detection) → Project 11 (MemGPT)
If You Are Building a Multi-Agent System
Start with Project 13: Multi-Agent Shared Memory
If your immediate need is multi-agent coordination, jump to the relevant project. You can backfill foundational knowledge as needed.
Prerequisites to review first: Projects 1, 3, and basic graph concepts
If You Want the Full Journey
Follow the project order as listed (1 → 15)
The projects are sequenced to build on each other. Each project assumes knowledge from previous ones. This path takes 4-6 months but gives you comprehensive understanding.
Final Overall Project: Enterprise AI Memory Platform
The Goal
Combine the best elements from all 15 projects into a comprehensive AI memory platform that could power memory for an enterprise AI assistant deployment.
What You Will Build
A complete memory platform with:
- Multi-tenant architecture (from Project 14)
- Graphiti-style 3-tier memory (from Project 9)
- Bi-temporal fact storage (from Project 5)
- LLM-powered entity extraction (from Project 3)
- Hybrid retrieval (from Project 12)
- Multi-agent support (from Project 13)
- MemGPT-style explicit memory operations (from Project 11)
- Comprehensive benchmarking (from Project 15)
Architecture
Enterprise AI Memory Platform
==============================
┌─────────────────────────────────────────────────────────────────────┐
│ API Gateway │
│ (Authentication, Rate Limiting, Tenant Routing, Load Balancing) │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Memory │ │ Query │ │ Admin │
│ Ingestion │ │ Service │ │ Service │
│ Service │ │ │ │ │
└──────┬───────┘ └──────┬───────┘ └──────────────┘
│ │
│ ┌─────────────┴─────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐
│ Entity │ │ Hybrid │
│ Extraction │ │ Retrieval │
│ Pipeline │ │ Engine │
└──────┬───────┘ └──────┬───────┘
│ │
│ ┌─────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────┐
│ Data Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Neo4j │ │ Vector │ │ Redis │ │
│ │ Cluster │ │ Store │ │ (Cache + │ │
│ │ (Graph) │ │ (Embeddings│ │ Pub/Sub) │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Observability │
│ Prometheus │ Grafana │ Jaeger │ ELK Stack │
└─────────────────────────────────────────────────────────┘
Implementation Steps
Phase 1: Core Infrastructure (2 weeks)
- Set up Neo4j cluster with multi-tenancy
- Deploy vector store (Weaviate or Pinecone)
- Configure Redis for caching and pub/sub
- Create Docker Compose for local development
Phase 2: Memory Ingestion (2 weeks)
- Build episode ingestion API
- Integrate LLM-powered entity extraction
- Implement bi-temporal fact storage
- Add entity resolution pipeline
Phase 3: Retrieval Layer (2 weeks)
- Implement semantic search
- Build graph traversal queries
- Add BM25 keyword search
- Create RRF fusion with MMR
Phase 4: Agent Integration (2 weeks)
- Add MemGPT-style memory operations
- Implement multi-agent shared memory
- Build conflict resolution
- Create agent attribution tracking
Phase 5: Production Readiness (2 weeks)
- Add API gateway with auth
- Implement rate limiting and quotas
- Set up monitoring and alerting
- Create operational tooling (CLI)
Phase 6: Validation (1 week)
- Run benchmark suite
- Load test with realistic traffic
- Test failover scenarios
- Document runbooks
Success Criteria
- Handles 1000 requests/second per tenant
- p99 latency < 500ms for retrieval
- Supports 100+ concurrent agents
- Zero cross-tenant data leaks (verified by security tests)
- 99.9% uptime over 30 days
- Benchmark scores exceed baseline RAG by 2x
- Documentation covers all operations
From Learning to Production: What Is Next
After completing these projects, here’s how your work maps to production systems:
| Your Project | Production Equivalent | Gap to Fill |
|---|---|---|
| Project 1: Personal Memory Graph | Neo4j Aura (managed Neo4j) | Schema migrations, cluster config |
| Project 3: Entity Extraction | Anthropic Claude / OpenAI | Prompt optimization, cost management |
| Project 5: Bi-Temporal Store | Apache Iceberg / Delta Lake | Distributed storage at scale |
| Project 9: Graphiti Integration | Zep Cloud (commercial Graphiti) | Managed service, SLA |
| Project 11: MemGPT Virtual Context | Letta Cloud | Managed agents, enterprise features |
| Project 12: Hybrid Retrieval | Pinecone + Neo4j + Elasticsearch | Fully managed search stack |
| Project 14: Production Service | AWS/GCP/Azure deployment | Cloud-native architecture, IAM |
| Project 15: Benchmark Suite | LangSmith / Braintrust | Commercial evaluation platforms |
Career Paths Enabled
AI/ML Engineer: Focus on Projects 3, 7, 11, 12. Build entity extraction, summarization, and retrieval systems.
Backend/Infrastructure Engineer: Focus on Projects 5, 13, 14. Build production-grade memory services with multi-tenancy.
Research Engineer: Focus on Projects 8, 11, 15. Explore community detection, virtual context, and evaluation methods.
AI Product Engineer: Focus on Projects 9, 10, 14. Integrate existing frameworks into products.
Summary
This learning path covers Temporal Knowledge Graphs for AI Agent Memory through 15 hands-on projects.
| # | Project Name | Main Language | Difficulty | Time Estimate |
|---|---|---|---|---|
| 1 | Personal Memory Graph CLI | Python | Level 1 | Weekend |
| 2 | Conversation Episode Store | Python | Level 2 | 1 week |
| 3 | Entity Extraction Pipeline | Python | Level 2 | 1 week |
| 4 | Entity Resolution System | Python | Level 3 | 1-2 weeks |
| 5 | Bi-Temporal Fact Store | Python | Level 3 | 1-2 weeks |
| 6 | Temporal Query Engine | Python | Level 3 | 2 weeks |
| 7 | Semantic Memory Synthesizer | Python | Level 3 | 2 weeks |
| 8 | Community Detection & Summaries | Python | Level 4 | 2-3 weeks |
| 9 | Graphiti Framework Integration | Python | Level 4 | 2-3 weeks |
| 10 | Mem0g Memory Layer | Python | Level 3 | 1-2 weeks |
| 11 | MemGPT-Style Virtual Context | Python | Level 4 | 3-4 weeks |
| 12 | Hybrid Retrieval Engine | Python | Level 4 | 2-3 weeks |
| 13 | Multi-Agent Shared Memory | Python | Level 4 | 3-4 weeks |
| 14 | Production Memory Service | Python | Level 5 | 4+ weeks |
| 15 | Memory Benchmark Suite | Python | Level 4 | 2-3 weeks |
Recommended Learning Paths
For beginners: Start with Projects 1, 2, 3, 4 → then 5, 6 → then choose a track
For backend engineers: Start with Projects 9, 10 → then 12, 13 → then 14
For ML engineers: Start with Projects 3, 7 → then 8, 11 → then 15
For full mastery: Complete all 15 projects in order (4-6 months)
Expected Outcomes
After completing these projects, you will:
- Understand graph data modeling for representing knowledge with entities, relationships, and temporal metadata
- Master bi-temporal data models that track both when facts were true and when they were recorded
- Build entity extraction pipelines using LLMs with structured output
- Implement hybrid retrieval combining semantic search, graph traversal, and keyword matching
- Evaluate temporal knowledge graphs using industry-standard benchmarks (DMR, LongMemEval)
- Compare major frameworks (Graphiti, Mem0, MemGPT) and understand their tradeoffs
- Design production memory systems with multi-tenancy, monitoring, and operational procedures
- Build multi-agent shared memory with conflict resolution and access control
You will have built 15 working projects that demonstrate deep understanding of temporal knowledge graphs for AI agent memory—from first principles to production deployment.
Additional Resources and References
Standards and Specifications
- Neo4j Cypher Manual: https://neo4j.com/docs/cypher-manual/current/
- OpenCypher Specification: https://opencypher.org/
- JSON-LD for Knowledge Graphs: https://json-ld.org/
- RDF and SPARQL: https://www.w3.org/RDF/
Research Papers
- DMR (Dialogue Memory Retrieval): “Evaluating Long-Term Memory in Language Model Agents” - Foundation for memory benchmarks
- LongMemEval: “LongMemEval: Evaluating Long-Term Conversational Memory” - Extended memory evaluation
- MemGPT: “MemGPT: Towards LLMs as Operating Systems” - Virtual context management
- Graphiti: Zep’s technical blog posts on temporal knowledge graphs
- Mem0: Technical documentation and architecture discussions
Books
- “Designing Data-Intensive Applications” by Martin Kleppmann - Essential for understanding data systems
- “Graph Databases” by Robinson, Webber, Eifrem - Neo4j fundamentals
- “Introduction to Information Retrieval” by Manning et al. - Retrieval metrics and methods
- “AI Engineering” by Chip Huyen - Practical AI system design
- “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau - For MemGPT memory concepts
Framework Documentation
- Zep/Graphiti: https://docs.getzep.com/
- Mem0: https://docs.mem0.ai/
- MemGPT/Letta: https://docs.letta.com/
- LangGraph: https://langchain-ai.github.io/langgraph/
Tools and Libraries
- Neo4j: https://neo4j.com/
- FalkorDB: https://www.falkordb.com/
- LlamaIndex: https://docs.llamaindex.ai/ (for graph RAG patterns)
- NetworkX: https://networkx.org/ (graph algorithms)
- CDLib: https://cdlib.readthedocs.io/ (community detection)
Community and Discussion
- Neo4j Community: https://community.neo4j.com/
- Zep Discord: Active discussion of temporal KG patterns
- LangChain Discord: Memory architecture discussions
- Hacker News: Search for “temporal knowledge graph” and “AI memory”
Video Resources
- Neo4j YouTube Channel: Graph database tutorials
- AI Engineering World’s Fair talks: Memory system architecture
- Stanford CS224W: Machine Learning with Graphs (for graph algorithms)