Project 10: OS-Style Memory Manager (MemGPT-Inspired)

Build a hierarchical memory manager that pages memories in and out of the prompt like an operating system.

Quick Reference

Attribute	Value
Difficulty	Level 4
Time Estimate	4-6 weeks
Main Programming Language	Python (Alternatives: Rust, Go)
Alternative Programming Languages	Rust, Go
Coolness Level	Level 4
Business Potential	Level 4
Prerequisites	Routing, vector memory, summarization
Key Topics	Memory tiers, paging policies, prompt assembly

1. Learning Objectives

By completing this project, you will:

Design a multi-tier memory hierarchy.
Implement paging policies to move memory between tiers.
Assemble prompts with stable anchors and budgets.
Evaluate memory manager quality using probes and replay.

2. All Theory Needed (Per-Concept Breakdown)

OS-Style Memory Tiering and Paging Policies

Fundamentals An OS-style memory manager treats the prompt as limited RAM and external memory stores as disk. Core memories stay in prompt, while archival memories are paged in only when needed. This requires explicit policies for promotion, eviction, and paging. Without these policies, agents either forget critical context or overload the prompt.

Deep Dive into the concept MemGPT proposes a memory hierarchy for agents: a small core memory (always in context), a larger archival memory (retrieved on demand), and optional summary tiers that compress history. This mirrors operating system memory management: the prompt is RAM, the vector store is disk, and the memory manager is the OS kernel. The key is that paging decisions must be explicit and auditable. You must decide what stays in core (e.g., system rules, stable preferences, current task state) and what lives in archive (episodic logs, long histories). Paging policies decide when to promote a memory into core and when to evict it. Common signals include recency, importance, and retrieval frequency.

Prompt assembly is the last step. The memory manager must inject core memory into a stable anchor position, then add retrieved archival memories within a defined budget. If the prompt grows too large, the manager must drop or summarize lower-priority memories. This is similar to cache eviction policies like LRU (least recently used) or LFU (least frequently used), but applied to text. You will implement these policies and observe how they affect recall and quality.

Paging introduces a new problem: consistency. If a memory is promoted into core and then later demoted, the agent’s behavior may change unexpectedly. To handle this, paging should be logged and replayable. A replay mode allows you to run the same conversation under different paging policies and compare outcomes. This makes memory management a testable engineering discipline rather than a heuristic art.

The memory manager must also integrate safety policies. Quarantined or sensitive memories should never be promoted into core without explicit approval. This connects to Projects 7 and 9. A production-grade memory manager therefore combines routing, budgeting, paging, and governance. This project is the capstone: it forces you to integrate every previous concept into a single coherent system.

From a systems perspective, this concept must be treated as a first-class interface between data and behavior. That means you need explicit invariants (what must always be true), observability (how you know it is true), and failure signatures (how it breaks when it is not). In practice, engineers often skip this and rely on ad-hoc fixes, which creates hidden coupling between the memory subsystem and the rest of the agent stack. A better approach is to model the concept as a pipeline stage with clear inputs, outputs, and preconditions: if inputs violate the contract, the stage should fail fast rather than silently corrupt memory. This is especially important because memory errors are long-lived and compound over time. You should also define operational metrics that reveal drift early. Examples include: the percentage of memory entries that lack required metadata, the ratio of retrieved memories that are later unused by the model, or the fraction of queries that trigger a fallback route because the primary memory store is empty. These metrics are not just for dashboards; they are design constraints that force you to keep the system testable and predictable.

Another critical dimension is lifecycle management. The concept may work well at small scale but degrade as the memory grows. This is where policies and thresholds matter: you need rules for promotion, demotion, merging, or deletion that prevent the memory from becoming a landfill. The policy should be deterministic and versioned. When it changes, you should be able to replay historical inputs and measure the delta in outputs. This is the same discipline used in data engineering for schema changes and backfills, and it applies equally to memory systems. Finally, remember that memory is an interface to user trust. If the memory system is noisy, the agent feels unreliable; if it is overly strict, the agent feels forgetful. The best designs expose these trade-offs explicitly, so you can tune them according to product goals rather than guessing in the dark.

How this fits on projects This concept is the capstone and integrates Projects 1-9.

Definitions & key terms

Core memory: Always-present memories in prompt.
Archive memory: Retrieved on demand.
Paging: Moving memory between tiers.
Eviction policy: Rule for removing memory from core.

Mental model diagram (ASCII)

Core (RAM):    [Rules] [Preferences] [Current Task]
Archive (Disk): [Episodic Logs] [Summaries] [Graph]
Paging: archive -> core when needed

How It Works (Step-by-Step)

Maintain a core memory set with fixed size.
When a query arrives, retrieve candidates from archive.
Score candidates by importance and recency.
Promote top candidates into core (paging).
Evict low-priority core memories if budget exceeded.
Assemble prompt with core + retrieved memories.

Minimal Concrete Example

policy:
  core_budget_tokens: 600
  eviction: LRU
  promote_if: importance > 0.7

Common Misconceptions

“Paging is only about size.” (False: it also affects consistency.)
“Core memory never changes.” (False: it should evolve carefully.)

Check-Your-Understanding Questions

Why is prompt treated as RAM?
What is the role of eviction policies?
How do you ensure paging is auditable?

Check-Your-Understanding Answers

It is limited and expensive, like RAM.
They remove low-priority memories to make room.
Log every promotion and eviction.

Real-World Applications

Long-running personal assistants.
Multi-agent platforms with persistent memory.

Where You’ll Apply It

In this project: §5.4 Concepts You Must Understand First and §6 Testing Strategy.
Also used in: Project 4, Project 7, Project 9.

References

MemGPT - https://arxiv.org/abs/2310.08560

Key Insights Memory tiering turns the context window into a managed resource rather than a passive buffer.

Summary An OS-style memory manager integrates routing, paging, and governance to make agent memory reliable at scale.

Homework/Exercises to Practice the Concept

Define a core memory policy for a personal assistant.
Simulate a paging decision with five candidate memories.

Solutions to the Homework/Exercises

Core: system rules, top preferences, current tasks.
Promote highest importance and recent memories, evict least recently used.

3. Project Specification

3.1 What You Will Build

A memory manager that:

Maintains core and archive tiers
Applies paging and eviction policies
Assembles prompts with budgets
Logs decisions and supports replay

3.2 Functional Requirements

Tier Management: Separate core and archive memory.
Paging Policy: Promote and evict memories.
Prompt Assembly: Enforce token budgets.
Replay Mode: Re-run conversations with different policies.

3.3 Non-Functional Requirements

Performance: Paging decisions < 100ms.
Reliability: Deterministic replay with fixed seeds.
Usability: Clear logs of promotions and evictions.

3.4 Example Usage / Output

$ memos run --query "Summarize my preferences"
[CORE] 3 memories loaded
[PAGING] 2 memories promoted
[PROMPT] memory_tokens=620

3.5 Data Formats / Schemas / Protocols

{
  "memory_id": "PRF-00012",
  "tier": "core",
  "last_used": "2026-01-01T10:00:00Z",
  "importance": 0.9
}

3.6 Edge Cases

Core memory full
No archive candidates
Conflicting promotions

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

$ memos run --query "Summarize my preferences"
$ memos replay --session session_01 --policy LRU

3.7.2 Golden Path Demo (Deterministic)

$ memos run --query "Summarize my preferences"
[CORE] 3 memories loaded
[PAGING] 2 memories promoted
[RESULT] response includes PRF-00012
exit_code=0

3.7.3 Failure Demo (Deterministic)

$ memos run --query "" 
[ERROR] empty query
exit_code=1

4. Solution Architecture

4.1 High-Level Design

Query -> Router -> Archive Retrieval -> Paging -> Prompt Assembly -> LLM

4.2 Key Components

Component	Responsibility	Key Decisions
Tier Manager	Maintain core/archive	Budget sizes
Pager	Promote/evict memories	Policy choice
Assembler	Build prompt	Anchor placement
Logger	Record decisions	Replay format

4.3 Data Structures (No Full Code)

PagingEvent:
  memory_id: string
  action: promote/evict
  reason: string

4.4 Algorithm Overview

Retrieve candidates from archive.
Rank by importance and recency.
Promote top candidates.
Evict low-priority core memories.
Assemble prompt with budget.

5. Implementation Guide

5.1 Development Environment Setup

- Configure memory tier sizes
- Setup replay logging

5.2 Project Structure

project-root/
├── src/
│   ├── tier/
│   ├── paging/
│   ├── assemble/
│   └── replay/

5.3 The Core Question You’re Answering

“How do I manage memory like an operating system manages RAM?”

5.4 Concepts You Must Understand First

Memory tiering
Eviction policies

5.5 Questions to Guide Your Design

What belongs in core memory?
How do you choose eviction policies?

5.6 Thinking Exercise

Create a paging table with five memories and simulate an eviction.

5.7 The Interview Questions They’ll Ask

“How is agent memory like OS memory?”
“What is a paging policy?”
“How do you choose core memory?”
“How do you evaluate memory manager quality?”
“Why is replay important?”

5.8 Hints in Layers

Hint 1: Start with fixed budgets Hint 2: Add LRU eviction Hint 3: Log every paging event Hint 4: Implement replay mode

5.9 Books That Will Help

Topic	Book	Chapter
Storage	“Designing Data-Intensive Applications”	Ch. 3
Architecture	“Fundamentals of Software Architecture”	Ch. 2

5.10 Implementation Phases

Phase 1: Foundation

Core/archive separation

Phase 2: Core

Paging and assembly

Phase 3: Polish

Replay and evaluation

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Eviction	LRU / LFU	LRU	Simple and effective
Promotion	Recency / Importance	Importance + recency	Balanced relevance

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	Paging logic	Promote/evict
Integration	Full run	Query -> prompt
Edge	Budget overflow	Trimming

6.2 Critical Test Cases

Core budget never exceeded.
Paging events logged.
Replay produces identical output.

6.3 Test Data

query: "Summarize my preferences"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Over-promotion	Prompt overflow	Tighten promotion rules
Under-promotion	Agent forgets	Raise importance weights
No replay logs	Hard to debug	Log every decision

7.2 Debugging Strategies

Replay sessions with different policies.
Compare recall probes across versions.

7.3 Performance Traps

Excessive archive retrieval each query.

8. Extensions & Challenges

8.1 Beginner Extensions

Add summary tier between core and archive

8.2 Intermediate Extensions

Add adaptive eviction rules

8.3 Advanced Extensions

Add multi-agent shared memory tier

9. Real-World Connections

9.1 Industry Applications

Long-running agent platforms

MemGPT

9.3 Interview Relevance

System design and memory management topics.

10. Resources

10.1 Essential Reading

MemGPT paper

10.2 Video Resources

Talks on agent memory architectures

10.3 Tools & Documentation

Vector database docs

11. Self-Assessment Checklist

11.1 Understanding

I can explain tiering and paging.

11.2 Implementation

Paging and replay work correctly.

11.3 Growth

I can justify eviction and promotion policies.

12. Submission / Completion Criteria

Minimum Viable Completion:

Tiering and paging implemented

Full Completion:

Replay and evaluation metrics

Excellence (Going Above & Beyond):

Adaptive policies and multi-agent memory