Project 11: MemGPT-Style Virtual Context

Build an OS-inspired memory management system that pages context in/out of the LLM’s limited window, enabling “infinite” conversation memory.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 2-3 weeks (30-40 hours)
Language Python
Prerequisites Projects 1-10, OS concepts, LLM function calling
Key Topics Virtual memory, context window management, page faults, working memory, archival storage, self-editing prompts

1. Learning Objectives

By completing this project, you will:

  1. Understand the virtual memory analogy for LLM context.
  2. Implement tiered memory (core, recall, archival).
  3. Build page-in/page-out mechanisms for context management.
  4. Create self-editing memory capabilities.
  5. Design autonomous memory management tools.

2. Theoretical Foundation

2.1 Core Concepts

  • Virtual Memory Analogy: LLM context window is like RAM—limited. Archival storage is like disk—unlimited but slower to access.

  • Memory Tiers:
    • Core Memory: Always in context (system prompt, persona, key facts)
    • Recall Memory: Recently accessed, pageable (recent conversations)
    • Archival Memory: Persistent storage, retrieved on demand (all history)
  • Page Fault: When the model needs information not in current context, it triggers a “page fault” and retrieves from archival.

  • Self-Editing: The model can modify its own core memory through tool calls.

  • Heartbeat Loop: Continuous processing where the model can take actions (including memory ops) even without user input.

2.2 Why This Matters

LLM context windows are limited (8K-200K tokens). Without virtual memory:

  • Conversations forget after context fills
  • No way to reference information from weeks ago
  • No persistent personality or learning

MemGPT’s approach enables “infinite” memory within finite context.

2.3 Common Misconceptions

  • “Just use RAG.” RAG retrieves; MemGPT actively manages and writes memory.
  • “Long context solves this.” Long context is expensive; virtual memory is efficient.
  • “Memory is read-only.” Self-editing is core to MemGPT’s power.

2.4 ASCII Diagram: Virtual Memory Architecture

MEMGPT VIRTUAL MEMORY ARCHITECTURE
══════════════════════════════════════════════════════════════

┌─────────────────────────────────────────────────────────────┐
│                    LLM CONTEXT WINDOW                        │
│                    (Limited: ~8K-128K tokens)                │
│                                                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  CORE MEMORY (Always Present)                         │  │
│  │  ┌─────────────────────────────────────────────────┐  │  │
│  │  │ System Prompt: "You are a helpful assistant..." │  │  │
│  │  ├─────────────────────────────────────────────────┤  │  │
│  │  │ Persona: "I am curious and precise..."         │  │  │
│  │  ├─────────────────────────────────────────────────┤  │  │
│  │  │ Human Facts:                                    │  │  │
│  │  │   - Name: Alice                                 │  │  │
│  │  │   - Prefers: Python, dark mode                  │  │  │
│  │  │   - Working on: API redesign                    │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  RECALL MEMORY (Recent, Pageable)                     │  │
│  │  ┌─────────────────────────────────────────────────┐  │  │
│  │  │ Recent Messages:                                │  │  │
│  │  │   [2 mins ago] User: "How's the API work?"     │  │  │
│  │  │   [1 min ago] Assistant: "Let me explain..."   │  │  │
│  │  │   [now] User: "What about authentication?"     │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  AVAILABLE SPACE (for new messages/retrieval)         │  │
│  │  [~~~~~~~~~~~~ ~2K tokens remaining ~~~~~~~~~~~~]    │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
└─────────────────────────────────────────────────────────────┘
           │                              ▲
           │ PAGE OUT                     │ PAGE IN
           │ (when full)                  │ (on demand)
           ▼                              │
┌─────────────────────────────────────────────────────────────┐
│                    ARCHIVAL MEMORY                           │
│                    (Unlimited: Vector + Graph Store)         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Conversation History (All episodes)                │    │
│  │  - Episode 001: "Introduction conversation..."      │    │
│  │  - Episode 002: "Discussed Python preferences..."   │    │
│  │  - Episode 003: "API design discussion..."          │    │
│  │  - ... (thousands of episodes)                      │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Knowledge Graph (Entities + Relationships)         │    │
│  │  - (Alice)-[:WORKS_ON]->(API_Redesign)              │    │
│  │  - (Alice)-[:PREFERS]->(Python)                     │    │
│  │  - ... (all extracted facts)                        │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
└─────────────────────────────────────────────────────────────┘


MEMORY OPERATIONS (Tool Calls)
══════════════════════════════

The LLM can call these tools to manage its own memory:

┌─────────────────────────────────────────────────────────────┐
│  core_memory_append(section, content)                        │
│  - Add fact to core memory                                   │
│  - Example: core_memory_append("human", "Alice's birthday    │
│             is March 15")                                    │
├─────────────────────────────────────────────────────────────┤
│  core_memory_replace(section, old_content, new_content)      │
│  - Update fact in core memory                                │
│  - Example: Replace "Working on: API redesign" with          │
│             "Working on: Mobile app"                         │
├─────────────────────────────────────────────────────────────┤
│  archival_memory_insert(content)                             │
│  - Store information for long-term                           │
│  - Example: Save detailed meeting notes                      │
├─────────────────────────────────────────────────────────────┤
│  archival_memory_search(query, limit)                        │
│  - Retrieve relevant information (PAGE IN)                   │
│  - Example: "What did we discuss about authentication?"      │
├─────────────────────────────────────────────────────────────┤
│  conversation_search(query, limit)                           │
│  - Search recent conversation history                        │
│  - Example: "What did the user say about deadlines?"        │
└─────────────────────────────────────────────────────────────┘


PAGE FAULT EXAMPLE
══════════════════

User: "What was that library you recommended last month?"

LLM Internal Process:
┌─────────────────────────────────────────────────────────────┐
│ 1. Check core memory → Not found                            │
│ 2. Check recall memory → Not found (too old)                │
│ 3. PAGE FAULT! Need archival retrieval                      │
│ 4. Call: archival_memory_search("library recommendation")   │
│ 5. Result: "On Nov 15, recommended 'FastAPI' for REST APIs" │
│ 6. Incorporate into response                                │
└─────────────────────────────────────────────────────────────┘

Response: "Last month I recommended FastAPI for building
REST APIs. It's great for your use case because..."

3. Project Specification

3.1 What You Will Build

A Python framework that:

  • Implements tiered memory (core, recall, archival)
  • Provides memory management tools for the LLM
  • Handles automatic context overflow
  • Supports self-editing core memory

3.2 Functional Requirements

  1. Initialize agent: agent = MemGPTAgent(persona, core_memory)
  2. Process message: agent.step(user_message) → Response + tool calls
  3. Core memory edit: agent.tools.core_memory_append(section, content)
  4. Archival search: agent.tools.archival_memory_search(query, limit)
  5. Context management: Auto page-out when context full
  6. Heartbeat: agent.heartbeat() → Allow autonomous actions

3.3 Example Usage / Output

from memgpt_agent import MemGPTAgent, Persona

# Initialize agent
persona = Persona(
    name="Claude",
    description="A helpful assistant with excellent memory",
    traits=["curious", "precise", "friendly"]
)

agent = MemGPTAgent(
    persona=persona,
    archival_store=archival_db,
    llm_client=llm_client
)

# First interaction
response = agent.step("Hi! I'm Alice, I'm a Python developer.")
print(response.message)
# "Hello Alice! Nice to meet you. I see you're a Python developer -
#  that's great! What kind of projects do you work on?"

# Agent internally called: core_memory_append("human", "Name: Alice, Occupation: Python developer")

# Later interaction (context may have overflowed)
response = agent.step("What's my name again?")
print(response.message)
# "Your name is Alice! You mentioned you're a Python developer."
# (Retrieved from core memory, no archival needed)

# Much later interaction (needs archival)
response = agent.step("What library did you recommend back in our first chat about APIs?")
print(response.message)
# "Let me check my archives..."
# (Agent calls archival_memory_search("library API recommendation first chat"))
# "In our early conversations, I recommended FastAPI for building REST APIs!"

# View agent's core memory
print(agent.core_memory)
# {
#   "persona": "I am Claude, a helpful assistant...",
#   "human": "Name: Alice\nOccupation: Python developer\nPrefers: FastAPI\n..."
# }

# Manual archival insertion
agent.step("Here's the meeting notes from today: [detailed notes]")
# Agent internally calls: archival_memory_insert("Meeting notes 2024-12-15: ...")

4. Solution Architecture

4.1 High-Level Design

┌─────────────────┐
│   User Input    │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│                    MEMGPT AGENT                              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐   │
│  │   Context   │────▶│    LLM      │────▶│   Tool      │   │
│  │   Builder   │     │   Client    │     │  Executor   │   │
│  └─────────────┘     └─────────────┘     └─────────────┘   │
│         │                   │                   │           │
│         │                   │                   │           │
│         ▼                   │                   ▼           │
│  ┌─────────────┐            │          ┌─────────────┐     │
│  │    Core     │            │          │  Archival   │     │
│  │   Memory    │◀───────────┼──────────│   Store     │     │
│  └─────────────┘            │          └─────────────┘     │
│         │                   │                   │           │
│         └───────────────────┴───────────────────┘           │
│                                                              │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│    Response     │
└─────────────────┘

4.2 Key Components

Component Responsibility Technology
ContextBuilder Assemble context from tiers Python class
LLMClient Call LLM with tools OpenAI/Anthropic
ToolExecutor Execute memory tools Function dispatch
CoreMemory In-context persistent facts Python dict
RecallMemory Recent conversation buffer Circular buffer
ArchivalStore Long-term storage Vector DB + Graph

4.3 Data Models

from pydantic import BaseModel
from typing import Literal

class CoreMemory(BaseModel):
    persona: str  # Agent's self-description
    human: str  # Facts about the user
    system: str  # Additional system context

class Message(BaseModel):
    role: Literal["user", "assistant", "system", "tool"]
    content: str
    tool_calls: list[dict] | None = None
    tool_call_id: str | None = None

class AgentResponse(BaseModel):
    message: str
    tool_calls: list[dict]
    memory_operations: list[str]  # Log of memory ops
    context_usage: int  # Tokens used

class MemoryTool(BaseModel):
    name: str
    description: str
    parameters: dict

5. Implementation Guide

5.1 Development Environment Setup

mkdir memgpt-agent && cd memgpt-agent
python -m venv .venv && source .venv/bin/activate
pip install openai chromadb pydantic tiktoken

5.2 Project Structure

memgpt-agent/
├── src/
│   ├── agent.py           # Main MemGPTAgent class
│   ├── context.py         # Context building
│   ├── memory/
│   │   ├── core.py        # Core memory management
│   │   ├── recall.py      # Recall buffer
│   │   └── archival.py    # Archival storage
│   ├── tools/
│   │   ├── definitions.py # Tool schemas
│   │   └── executor.py    # Tool execution
│   └── models.py          # Data models
├── tests/
│   ├── test_memory.py
│   └── test_agent.py
└── README.md

5.3 Implementation Phases

Phase 1: Core Memory + Basic Agent (10-12h)

Goals:

  • Agent with core memory always in context
  • Basic conversation handling
  • Core memory tools working

Tasks:

  1. Build CoreMemory class with append/replace
  2. Implement context builder with core memory injection
  3. Create basic agent loop
  4. Add core_memory_append and core_memory_replace tools

Checkpoint: Agent remembers facts added to core memory.

Phase 2: Archival Storage + Retrieval (10-12h)

Goals:

  • Archival memory storage working
  • Search and retrieval implemented
  • Page fault handling

Tasks:

  1. Set up vector store for archival
  2. Implement archival_memory_insert tool
  3. Implement archival_memory_search tool
  4. Add automatic page fault detection

Checkpoint: Agent can store and retrieve from archival.

Phase 3: Context Management + Polish (8-10h)

Goals:

  • Automatic context overflow handling
  • Recall memory buffer
  • Heartbeat loop

Tasks:

  1. Implement token counting
  2. Add automatic page-out when context full
  3. Build recall memory circular buffer
  4. Add heartbeat mechanism

Checkpoint: Full MemGPT-style memory management.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Test memory operations Core memory append
Integration Test full agent loop Message → response with tools
Long-running Test context management 100+ message conversation

6.2 Critical Test Cases

  1. Core memory persistence: Facts survive across messages
  2. Archival retrieval: Old information retrievable
  3. Context overflow: Graceful handling when full
  4. Self-editing: Agent updates its own memory

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Token counting wrong Context overflows Use tiktoken for accurate counting
Tool call loop Agent keeps calling tools Add recursion limit
Memory inconsistency Facts contradict Version core memory edits
Lost context Agent forgets mid-conversation Check recall buffer size

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add memory visualization dashboard
  • Implement memory export/import

8.2 Intermediate Extensions

  • Add memory summarization on page-out
  • Implement multi-agent memory sharing

8.3 Advanced Extensions

  • Add planning/reflection loops
  • Implement hierarchical memory

9. Real-World Connections

9.1 Industry Applications

  • MemGPT/Letta: The original implementation
  • Long-running Agents: Autonomous AI systems
  • Personal Assistants: Persistent memory across sessions

9.2 Interview Relevance

  • Explain virtual memory analogy for LLMs
  • Discuss context window management strategies
  • Describe self-editing memory tradeoffs

10. Resources

10.1 Essential Reading

  • MemGPT Paper — “MemGPT: Towards LLMs as Operating Systems”
  • Letta Documentation — https://docs.letta.com
  • “AI Engineering” by Chip Huyen — Ch. on Memory
  • Previous: Project 10 (Mem0g Memory Layer)
  • Next: Project 12 (Hybrid Retrieval Engine)

11. Self-Assessment Checklist

  • I understand the virtual memory analogy
  • I can implement tiered memory storage
  • I know how to handle context overflow
  • I understand self-editing memory tradeoffs

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Core memory with append/replace tools
  • Basic archival storage and search
  • Agent loop with tool execution

Full Completion:

  • Automatic context management
  • Recall buffer for recent messages
  • Heartbeat loop implemented

Excellence:

  • Memory summarization
  • Multi-session persistence
  • Production-ready error handling