Project 11: MemGPT-Style Virtual Context
Build an OS-inspired memory management system that pages context in/out of the LLM’s limited window, enabling “infinite” conversation memory.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 2-3 weeks (30-40 hours) |
| Language | Python |
| Prerequisites | Projects 1-10, OS concepts, LLM function calling |
| Key Topics | Virtual memory, context window management, page faults, working memory, archival storage, self-editing prompts |
1. Learning Objectives
By completing this project, you will:
- Understand the virtual memory analogy for LLM context.
- Implement tiered memory (core, recall, archival).
- Build page-in/page-out mechanisms for context management.
- Create self-editing memory capabilities.
- Design autonomous memory management tools.
2. Theoretical Foundation
2.1 Core Concepts
-
Virtual Memory Analogy: LLM context window is like RAM—limited. Archival storage is like disk—unlimited but slower to access.
- Memory Tiers:
- Core Memory: Always in context (system prompt, persona, key facts)
- Recall Memory: Recently accessed, pageable (recent conversations)
- Archival Memory: Persistent storage, retrieved on demand (all history)
-
Page Fault: When the model needs information not in current context, it triggers a “page fault” and retrieves from archival.
-
Self-Editing: The model can modify its own core memory through tool calls.
- Heartbeat Loop: Continuous processing where the model can take actions (including memory ops) even without user input.
2.2 Why This Matters
LLM context windows are limited (8K-200K tokens). Without virtual memory:
- Conversations forget after context fills
- No way to reference information from weeks ago
- No persistent personality or learning
MemGPT’s approach enables “infinite” memory within finite context.
2.3 Common Misconceptions
- “Just use RAG.” RAG retrieves; MemGPT actively manages and writes memory.
- “Long context solves this.” Long context is expensive; virtual memory is efficient.
- “Memory is read-only.” Self-editing is core to MemGPT’s power.
2.4 ASCII Diagram: Virtual Memory Architecture
MEMGPT VIRTUAL MEMORY ARCHITECTURE
══════════════════════════════════════════════════════════════
┌─────────────────────────────────────────────────────────────┐
│ LLM CONTEXT WINDOW │
│ (Limited: ~8K-128K tokens) │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ CORE MEMORY (Always Present) │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ System Prompt: "You are a helpful assistant..." │ │ │
│ │ ├─────────────────────────────────────────────────┤ │ │
│ │ │ Persona: "I am curious and precise..." │ │ │
│ │ ├─────────────────────────────────────────────────┤ │ │
│ │ │ Human Facts: │ │ │
│ │ │ - Name: Alice │ │ │
│ │ │ - Prefers: Python, dark mode │ │ │
│ │ │ - Working on: API redesign │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ RECALL MEMORY (Recent, Pageable) │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Recent Messages: │ │ │
│ │ │ [2 mins ago] User: "How's the API work?" │ │ │
│ │ │ [1 min ago] Assistant: "Let me explain..." │ │ │
│ │ │ [now] User: "What about authentication?" │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ AVAILABLE SPACE (for new messages/retrieval) │ │
│ │ [~~~~~~~~~~~~ ~2K tokens remaining ~~~~~~~~~~~~] │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
│ ▲
│ PAGE OUT │ PAGE IN
│ (when full) │ (on demand)
▼ │
┌─────────────────────────────────────────────────────────────┐
│ ARCHIVAL MEMORY │
│ (Unlimited: Vector + Graph Store) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Conversation History (All episodes) │ │
│ │ - Episode 001: "Introduction conversation..." │ │
│ │ - Episode 002: "Discussed Python preferences..." │ │
│ │ - Episode 003: "API design discussion..." │ │
│ │ - ... (thousands of episodes) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Knowledge Graph (Entities + Relationships) │ │
│ │ - (Alice)-[:WORKS_ON]->(API_Redesign) │ │
│ │ - (Alice)-[:PREFERS]->(Python) │ │
│ │ - ... (all extracted facts) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
MEMORY OPERATIONS (Tool Calls)
══════════════════════════════
The LLM can call these tools to manage its own memory:
┌─────────────────────────────────────────────────────────────┐
│ core_memory_append(section, content) │
│ - Add fact to core memory │
│ - Example: core_memory_append("human", "Alice's birthday │
│ is March 15") │
├─────────────────────────────────────────────────────────────┤
│ core_memory_replace(section, old_content, new_content) │
│ - Update fact in core memory │
│ - Example: Replace "Working on: API redesign" with │
│ "Working on: Mobile app" │
├─────────────────────────────────────────────────────────────┤
│ archival_memory_insert(content) │
│ - Store information for long-term │
│ - Example: Save detailed meeting notes │
├─────────────────────────────────────────────────────────────┤
│ archival_memory_search(query, limit) │
│ - Retrieve relevant information (PAGE IN) │
│ - Example: "What did we discuss about authentication?" │
├─────────────────────────────────────────────────────────────┤
│ conversation_search(query, limit) │
│ - Search recent conversation history │
│ - Example: "What did the user say about deadlines?" │
└─────────────────────────────────────────────────────────────┘
PAGE FAULT EXAMPLE
══════════════════
User: "What was that library you recommended last month?"
LLM Internal Process:
┌─────────────────────────────────────────────────────────────┐
│ 1. Check core memory → Not found │
│ 2. Check recall memory → Not found (too old) │
│ 3. PAGE FAULT! Need archival retrieval │
│ 4. Call: archival_memory_search("library recommendation") │
│ 5. Result: "On Nov 15, recommended 'FastAPI' for REST APIs" │
│ 6. Incorporate into response │
└─────────────────────────────────────────────────────────────┘
Response: "Last month I recommended FastAPI for building
REST APIs. It's great for your use case because..."
3. Project Specification
3.1 What You Will Build
A Python framework that:
- Implements tiered memory (core, recall, archival)
- Provides memory management tools for the LLM
- Handles automatic context overflow
- Supports self-editing core memory
3.2 Functional Requirements
- Initialize agent:
agent = MemGPTAgent(persona, core_memory) - Process message:
agent.step(user_message)→ Response + tool calls - Core memory edit:
agent.tools.core_memory_append(section, content) - Archival search:
agent.tools.archival_memory_search(query, limit) - Context management: Auto page-out when context full
- Heartbeat:
agent.heartbeat()→ Allow autonomous actions
3.3 Example Usage / Output
from memgpt_agent import MemGPTAgent, Persona
# Initialize agent
persona = Persona(
name="Claude",
description="A helpful assistant with excellent memory",
traits=["curious", "precise", "friendly"]
)
agent = MemGPTAgent(
persona=persona,
archival_store=archival_db,
llm_client=llm_client
)
# First interaction
response = agent.step("Hi! I'm Alice, I'm a Python developer.")
print(response.message)
# "Hello Alice! Nice to meet you. I see you're a Python developer -
# that's great! What kind of projects do you work on?"
# Agent internally called: core_memory_append("human", "Name: Alice, Occupation: Python developer")
# Later interaction (context may have overflowed)
response = agent.step("What's my name again?")
print(response.message)
# "Your name is Alice! You mentioned you're a Python developer."
# (Retrieved from core memory, no archival needed)
# Much later interaction (needs archival)
response = agent.step("What library did you recommend back in our first chat about APIs?")
print(response.message)
# "Let me check my archives..."
# (Agent calls archival_memory_search("library API recommendation first chat"))
# "In our early conversations, I recommended FastAPI for building REST APIs!"
# View agent's core memory
print(agent.core_memory)
# {
# "persona": "I am Claude, a helpful assistant...",
# "human": "Name: Alice\nOccupation: Python developer\nPrefers: FastAPI\n..."
# }
# Manual archival insertion
agent.step("Here's the meeting notes from today: [detailed notes]")
# Agent internally calls: archival_memory_insert("Meeting notes 2024-12-15: ...")
4. Solution Architecture
4.1 High-Level Design
┌─────────────────┐
│ User Input │
└────────┬────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ MEMGPT AGENT │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Context │────▶│ LLM │────▶│ Tool │ │
│ │ Builder │ │ Client │ │ Executor │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ▼ │ ▼ │
│ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Core │ │ │ Archival │ │
│ │ Memory │◀───────────┼──────────│ Store │ │
│ └─────────────┘ │ └─────────────┘ │
│ │ │ │ │
│ └───────────────────┴───────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Response │
└─────────────────┘
4.2 Key Components
| Component | Responsibility | Technology |
|---|---|---|
| ContextBuilder | Assemble context from tiers | Python class |
| LLMClient | Call LLM with tools | OpenAI/Anthropic |
| ToolExecutor | Execute memory tools | Function dispatch |
| CoreMemory | In-context persistent facts | Python dict |
| RecallMemory | Recent conversation buffer | Circular buffer |
| ArchivalStore | Long-term storage | Vector DB + Graph |
4.3 Data Models
from pydantic import BaseModel
from typing import Literal
class CoreMemory(BaseModel):
persona: str # Agent's self-description
human: str # Facts about the user
system: str # Additional system context
class Message(BaseModel):
role: Literal["user", "assistant", "system", "tool"]
content: str
tool_calls: list[dict] | None = None
tool_call_id: str | None = None
class AgentResponse(BaseModel):
message: str
tool_calls: list[dict]
memory_operations: list[str] # Log of memory ops
context_usage: int # Tokens used
class MemoryTool(BaseModel):
name: str
description: str
parameters: dict
5. Implementation Guide
5.1 Development Environment Setup
mkdir memgpt-agent && cd memgpt-agent
python -m venv .venv && source .venv/bin/activate
pip install openai chromadb pydantic tiktoken
5.2 Project Structure
memgpt-agent/
├── src/
│ ├── agent.py # Main MemGPTAgent class
│ ├── context.py # Context building
│ ├── memory/
│ │ ├── core.py # Core memory management
│ │ ├── recall.py # Recall buffer
│ │ └── archival.py # Archival storage
│ ├── tools/
│ │ ├── definitions.py # Tool schemas
│ │ └── executor.py # Tool execution
│ └── models.py # Data models
├── tests/
│ ├── test_memory.py
│ └── test_agent.py
└── README.md
5.3 Implementation Phases
Phase 1: Core Memory + Basic Agent (10-12h)
Goals:
- Agent with core memory always in context
- Basic conversation handling
- Core memory tools working
Tasks:
- Build CoreMemory class with append/replace
- Implement context builder with core memory injection
- Create basic agent loop
- Add core_memory_append and core_memory_replace tools
Checkpoint: Agent remembers facts added to core memory.
Phase 2: Archival Storage + Retrieval (10-12h)
Goals:
- Archival memory storage working
- Search and retrieval implemented
- Page fault handling
Tasks:
- Set up vector store for archival
- Implement archival_memory_insert tool
- Implement archival_memory_search tool
- Add automatic page fault detection
Checkpoint: Agent can store and retrieve from archival.
Phase 3: Context Management + Polish (8-10h)
Goals:
- Automatic context overflow handling
- Recall memory buffer
- Heartbeat loop
Tasks:
- Implement token counting
- Add automatic page-out when context full
- Build recall memory circular buffer
- Add heartbeat mechanism
Checkpoint: Full MemGPT-style memory management.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | Test memory operations | Core memory append |
| Integration | Test full agent loop | Message → response with tools |
| Long-running | Test context management | 100+ message conversation |
6.2 Critical Test Cases
- Core memory persistence: Facts survive across messages
- Archival retrieval: Old information retrievable
- Context overflow: Graceful handling when full
- Self-editing: Agent updates its own memory
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Token counting wrong | Context overflows | Use tiktoken for accurate counting |
| Tool call loop | Agent keeps calling tools | Add recursion limit |
| Memory inconsistency | Facts contradict | Version core memory edits |
| Lost context | Agent forgets mid-conversation | Check recall buffer size |
8. Extensions & Challenges
8.1 Beginner Extensions
- Add memory visualization dashboard
- Implement memory export/import
8.2 Intermediate Extensions
- Add memory summarization on page-out
- Implement multi-agent memory sharing
8.3 Advanced Extensions
- Add planning/reflection loops
- Implement hierarchical memory
9. Real-World Connections
9.1 Industry Applications
- MemGPT/Letta: The original implementation
- Long-running Agents: Autonomous AI systems
- Personal Assistants: Persistent memory across sessions
9.2 Interview Relevance
- Explain virtual memory analogy for LLMs
- Discuss context window management strategies
- Describe self-editing memory tradeoffs
10. Resources
10.1 Essential Reading
- MemGPT Paper — “MemGPT: Towards LLMs as Operating Systems”
- Letta Documentation — https://docs.letta.com
- “AI Engineering” by Chip Huyen — Ch. on Memory
10.2 Related Projects
- Previous: Project 10 (Mem0g Memory Layer)
- Next: Project 12 (Hybrid Retrieval Engine)
11. Self-Assessment Checklist
- I understand the virtual memory analogy
- I can implement tiered memory storage
- I know how to handle context overflow
- I understand self-editing memory tradeoffs
12. Submission / Completion Criteria
Minimum Viable Completion:
- Core memory with append/replace tools
- Basic archival storage and search
- Agent loop with tool execution
Full Completion:
- Automatic context management
- Recall buffer for recent messages
- Heartbeat loop implemented
Excellence:
- Memory summarization
- Multi-session persistence
- Production-ready error handling