Project 11: MemGPT-Style Virtual Context

Build an OS-inspired memory management system that pages context in/out of the LLM’s limited window, enabling “infinite” conversation memory.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	2-3 weeks (30-40 hours)
Language	Python
Prerequisites	Projects 1-10, OS concepts, LLM function calling
Key Topics	Virtual memory, context window management, page faults, working memory, archival storage, self-editing prompts

1. Learning Objectives

By completing this project, you will:

Understand the virtual memory analogy for LLM context.
Implement tiered memory (core, recall, archival).
Build page-in/page-out mechanisms for context management.
Create self-editing memory capabilities.
Design autonomous memory management tools.

2. Theoretical Foundation

2.1 Core Concepts

Virtual Memory Analogy: LLM context window is like RAM—limited. Archival storage is like disk—unlimited but slower to access.
Memory Tiers:
- Core Memory: Always in context (system prompt, persona, key facts)
- Recall Memory: Recently accessed, pageable (recent conversations)
- Archival Memory: Persistent storage, retrieved on demand (all history)
Page Fault: When the model needs information not in current context, it triggers a “page fault” and retrieves from archival.
Self-Editing: The model can modify its own core memory through tool calls.
Heartbeat Loop: Continuous processing where the model can take actions (including memory ops) even without user input.

2.2 Why This Matters

LLM context windows are limited (8K-200K tokens). Without virtual memory:

Conversations forget after context fills
No way to reference information from weeks ago
No persistent personality or learning

MemGPT’s approach enables “infinite” memory within finite context.

2.3 Common Misconceptions

“Just use RAG.” RAG retrieves; MemGPT actively manages and writes memory.
“Long context solves this.” Long context is expensive; virtual memory is efficient.
“Memory is read-only.” Self-editing is core to MemGPT’s power.

2.4 ASCII Diagram: Virtual Memory Architecture

MEMGPT VIRTUAL MEMORY ARCHITECTURE
══════════════════════════════════════════════════════════════

┌─────────────────────────────────────────────────────────────┐
│                    LLM CONTEXT WINDOW                        │
│                    (Limited: ~8K-128K tokens)                │
│                                                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  CORE MEMORY (Always Present)                         │  │
│  │  ┌─────────────────────────────────────────────────┐  │  │
│  │  │ System Prompt: "You are a helpful assistant..." │  │  │
│  │  ├─────────────────────────────────────────────────┤  │  │
│  │  │ Persona: "I am curious and precise..."         │  │  │
│  │  ├─────────────────────────────────────────────────┤  │  │
│  │  │ Human Facts:                                    │  │  │
│  │  │   - Name: Alice                                 │  │  │
│  │  │   - Prefers: Python, dark mode                  │  │  │
│  │  │   - Working on: API redesign                    │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  RECALL MEMORY (Recent, Pageable)                     │  │
│  │  ┌─────────────────────────────────────────────────┐  │  │
│  │  │ Recent Messages:                                │  │  │
│  │  │   [2 mins ago] User: "How's the API work?"     │  │  │
│  │  │   [1 min ago] Assistant: "Let me explain..."   │  │  │
│  │  │   [now] User: "What about authentication?"     │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  AVAILABLE SPACE (for new messages/retrieval)         │  │
│  │  [~~~~~~~~~~~~ ~2K tokens remaining ~~~~~~~~~~~~]    │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
└─────────────────────────────────────────────────────────────┘
           │                              ▲
           │ PAGE OUT                     │ PAGE IN
           │ (when full)                  │ (on demand)
           ▼                              │
┌─────────────────────────────────────────────────────────────┐
│                    ARCHIVAL MEMORY                           │
│                    (Unlimited: Vector + Graph Store)         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Conversation History (All episodes)                │    │
│  │  - Episode 001: "Introduction conversation..."      │    │
│  │  - Episode 002: "Discussed Python preferences..."   │    │
│  │  - Episode 003: "API design discussion..."          │    │
│  │  - ... (thousands of episodes)                      │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Knowledge Graph (Entities + Relationships)         │    │
│  │  - (Alice)-[:WORKS_ON]->(API_Redesign)              │    │
│  │  - (Alice)-[:PREFERS]->(Python)                     │    │
│  │  - ... (all extracted facts)                        │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
└─────────────────────────────────────────────────────────────┘


MEMORY OPERATIONS (Tool Calls)
══════════════════════════════

The LLM can call these tools to manage its own memory:

┌─────────────────────────────────────────────────────────────┐
│  core_memory_append(section, content)                        │
│  - Add fact to core memory                                   │
│  - Example: core_memory_append("human", "Alice's birthday    │
│             is March 15")                                    │
├─────────────────────────────────────────────────────────────┤
│  core_memory_replace(section, old_content, new_content)      │
│  - Update fact in core memory                                │
│  - Example: Replace "Working on: API redesign" with          │
│             "Working on: Mobile app"                         │
├─────────────────────────────────────────────────────────────┤
│  archival_memory_insert(content)                             │
│  - Store information for long-term                           │
│  - Example: Save detailed meeting notes                      │
├─────────────────────────────────────────────────────────────┤
│  archival_memory_search(query, limit)                        │
│  - Retrieve relevant information (PAGE IN)                   │
│  - Example: "What did we discuss about authentication?"      │
├─────────────────────────────────────────────────────────────┤
│  conversation_search(query, limit)                           │
│  - Search recent conversation history                        │
│  - Example: "What did the user say about deadlines?"        │
└─────────────────────────────────────────────────────────────┘


PAGE FAULT EXAMPLE
══════════════════

User: "What was that library you recommended last month?"

LLM Internal Process:
┌─────────────────────────────────────────────────────────────┐
│ 1. Check core memory → Not found                            │
│ 2. Check recall memory → Not found (too old)                │
│ 3. PAGE FAULT! Need archival retrieval                      │
│ 4. Call: archival_memory_search("library recommendation")   │
│ 5. Result: "On Nov 15, recommended 'FastAPI' for REST APIs" │
│ 6. Incorporate into response                                │
└─────────────────────────────────────────────────────────────┘

Response: "Last month I recommended FastAPI for building
REST APIs. It's great for your use case because..."

3. Project Specification

3.1 What You Will Build

A Python framework that:

Implements tiered memory (core, recall, archival)
Provides memory management tools for the LLM
Handles automatic context overflow
Supports self-editing core memory

3.2 Functional Requirements

Initialize agent: agent = MemGPTAgent(persona, core_memory)
Process message: agent.step(user_message) → Response + tool calls
Core memory edit: agent.tools.core_memory_append(section, content)
Archival search: agent.tools.archival_memory_search(query, limit)
Context management: Auto page-out when context full
Heartbeat: agent.heartbeat() → Allow autonomous actions

3.3 Example Usage / Output

from memgpt_agent import MemGPTAgent, Persona

# Initialize agent
persona = Persona(
    name="Claude",
    description="A helpful assistant with excellent memory",
    traits=["curious", "precise", "friendly"]
)

agent = MemGPTAgent(
    persona=persona,
    archival_store=archival_db,
    llm_client=llm_client
)

# First interaction
response = agent.step("Hi! I'm Alice, I'm a Python developer.")
print(response.message)
# "Hello Alice! Nice to meet you. I see you're a Python developer -
#  that's great! What kind of projects do you work on?"

# Agent internally called: core_memory_append("human", "Name: Alice, Occupation: Python developer")

# Later interaction (context may have overflowed)
response = agent.step("What's my name again?")
print(response.message)
# "Your name is Alice! You mentioned you're a Python developer."
# (Retrieved from core memory, no archival needed)

# Much later interaction (needs archival)
response = agent.step("What library did you recommend back in our first chat about APIs?")
print(response.message)
# "Let me check my archives..."
# (Agent calls archival_memory_search("library API recommendation first chat"))
# "In our early conversations, I recommended FastAPI for building REST APIs!"

# View agent's core memory
print(agent.core_memory)
# {
#   "persona": "I am Claude, a helpful assistant...",
#   "human": "Name: Alice\nOccupation: Python developer\nPrefers: FastAPI\n..."
# }

# Manual archival insertion
agent.step("Here's the meeting notes from today: [detailed notes]")
# Agent internally calls: archival_memory_insert("Meeting notes 2024-12-15:...")

4. Solution Architecture

4.1 High-Level Design

┌─────────────────┐
│   User Input    │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│                    MEMGPT AGENT                              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐   │
│  │   Context   │────▶│    LLM      │────▶│   Tool      │   │
│  │   Builder   │     │   Client    │     │  Executor   │   │
│  └─────────────┘     └─────────────┘     └─────────────┘   │
│         │                   │                   │           │
│         │                   │                   │           │
│         ▼                   │                   ▼           │
│  ┌─────────────┐            │          ┌─────────────┐     │
│  │    Core     │            │          │  Archival   │     │
│  │   Memory    │◀───────────┼──────────│   Store     │     │
│  └─────────────┘            │          └─────────────┘     │
│         │                   │                   │           │
│         └───────────────────┴───────────────────┘           │
│                                                              │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│    Response     │
└─────────────────┘

4.2 Key Components

Component	Responsibility	Technology
ContextBuilder	Assemble context from tiers	Python class
LLMClient	Call LLM with tools	OpenAI/Anthropic
ToolExecutor	Execute memory tools	Function dispatch
CoreMemory	In-context persistent facts	Python dict
RecallMemory	Recent conversation buffer	Circular buffer
ArchivalStore	Long-term storage	Vector DB + Graph

4.3 Data Models

from pydantic import BaseModel
from typing import Literal

class CoreMemory(BaseModel):
    persona: str  # Agent's self-description
    human: str  # Facts about the user
    system: str  # Additional system context

class Message(BaseModel):
    role: Literal["user", "assistant", "system", "tool"]
    content: str
    tool_calls: list[dict] | None = None
    tool_call_id: str | None = None

class AgentResponse(BaseModel):
    message: str
    tool_calls: list[dict]
    memory_operations: list[str]  # Log of memory ops
    context_usage: int  # Tokens used

class MemoryTool(BaseModel):
    name: str
    description: str
    parameters: dict

5. Implementation Guide

5.1 Development Environment Setup

mkdir memgpt-agent && cd memgpt-agent
python -m venv .venv && source .venv/bin/activate
pip install openai chromadb pydantic tiktoken

5.2 Project Structure

memgpt-agent/
├── src/
│   ├── agent.py           # Main MemGPTAgent class
│   ├── context.py         # Context building
│   ├── memory/
│   │   ├── core.py        # Core memory management
│   │   ├── recall.py      # Recall buffer
│   │   └── archival.py    # Archival storage
│   ├── tools/
│   │   ├── definitions.py # Tool schemas
│   │   └── executor.py    # Tool execution
│   └── models.py          # Data models
├── tests/
│   ├── test_memory.py
│   └── test_agent.py
└── README.md

5.3 Implementation Phases

Phase 1: Core Memory + Basic Agent (10-12h)

Goals:

Agent with core memory always in context
Basic conversation handling
Core memory tools working

Tasks:

Build CoreMemory class with append/replace
Implement context builder with core memory injection
Create basic agent loop
Add core_memory_append and core_memory_replace tools

Checkpoint: Agent remembers facts added to core memory.

Phase 2: Archival Storage + Retrieval (10-12h)

Goals:

Archival memory storage working
Search and retrieval implemented
Page fault handling

Tasks:

Set up vector store for archival
Implement archival_memory_insert tool
Implement archival_memory_search tool
Add automatic page fault detection

Checkpoint: Agent can store and retrieve from archival.

Phase 3: Context Management + Polish (8-10h)

Goals:

Automatic context overflow handling
Recall memory buffer
Heartbeat loop

Tasks:

Implement token counting
Add automatic page-out when context full
Build recall memory circular buffer
Add heartbeat mechanism

Checkpoint: Full MemGPT-style memory management.

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	Test memory operations	Core memory append
Integration	Test full agent loop	Message → response with tools
Long-running	Test context management	100+ message conversation

6.2 Critical Test Cases

Core memory persistence: Facts survive across messages
Archival retrieval: Old information retrievable
Context overflow: Graceful handling when full
Self-editing: Agent updates its own memory

7. Common Pitfalls & Debugging

Pitfall	Symptom	Solution
Token counting wrong	Context overflows	Use tiktoken for accurate counting
Tool call loop	Agent keeps calling tools	Add recursion limit
Memory inconsistency	Facts contradict	Version core memory edits
Lost context	Agent forgets mid-conversation	Check recall buffer size

8. Extensions & Challenges

8.1 Beginner Extensions

Add memory visualization dashboard
Implement memory export/import

8.2 Intermediate Extensions

Add memory summarization on page-out
Implement multi-agent memory sharing

8.3 Advanced Extensions

Add planning/reflection loops
Implement hierarchical memory

9. Real-World Connections

9.1 Industry Applications

MemGPT/Letta: The original implementation
Long-running Agents: Autonomous AI systems
Personal Assistants: Persistent memory across sessions

9.2 Interview Relevance

Explain virtual memory analogy for LLMs
Discuss context window management strategies
Describe self-editing memory tradeoffs

10. Resources

10.1 Essential Reading

MemGPT Paper — “MemGPT: Towards LLMs as Operating Systems”
Letta Documentation — https://docs.letta.com
“AI Engineering” by Chip Huyen — Ch. on Memory

Previous: Project 10 (Mem0g Memory Layer)
Next: Project 12 (Hybrid Retrieval Engine)

11. Self-Assessment Checklist

I understand the virtual memory analogy
I can implement tiered memory storage
I know how to handle context overflow
I understand self-editing memory tradeoffs

12. Submission / Completion Criteria

Minimum Viable Completion:

Core memory with append/replace tools
Basic archival storage and search
Agent loop with tool execution

Full Completion:

Automatic context management
Recall buffer for recent messages
Heartbeat loop implemented

Excellence:

Memory summarization
Multi-session persistence
Production-ready error handling

Project 11: MemGPT-Style Virtual Context

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Common Misconceptions

2.4 ASCII Diagram: Virtual Memory Architecture

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Example Usage / Output

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Models

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 Implementation Phases

Phase 1: Core Memory + Basic Agent (10-12h)

Phase 2: Archival Storage + Retrieval (10-12h)

Phase 3: Context Management + Polish (8-10h)

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

7. Common Pitfalls & Debugging

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Related Projects

11. Self-Assessment Checklist

12. Submission / Completion Criteria