Project 10: OS-Style Memory Manager (MemGPT-Inspired)
Build a hierarchical memory manager that pages memories in and out of the prompt like an operating system.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4 |
| Time Estimate | 4-6 weeks |
| Main Programming Language | Python (Alternatives: Rust, Go) |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | Level 4 |
| Business Potential | Level 4 |
| Prerequisites | Routing, vector memory, summarization |
| Key Topics | Memory tiers, paging policies, prompt assembly |
1. Learning Objectives
By completing this project, you will:
- Design a multi-tier memory hierarchy.
- Implement paging policies to move memory between tiers.
- Assemble prompts with stable anchors and budgets.
- Evaluate memory manager quality using probes and replay.
2. All Theory Needed (Per-Concept Breakdown)
OS-Style Memory Tiering and Paging Policies
Fundamentals An OS-style memory manager treats the prompt as limited RAM and external memory stores as disk. Core memories stay in prompt, while archival memories are paged in only when needed. This requires explicit policies for promotion, eviction, and paging. Without these policies, agents either forget critical context or overload the prompt.
Deep Dive into the concept MemGPT proposes a memory hierarchy for agents: a small core memory (always in context), a larger archival memory (retrieved on demand), and optional summary tiers that compress history. This mirrors operating system memory management: the prompt is RAM, the vector store is disk, and the memory manager is the OS kernel. The key is that paging decisions must be explicit and auditable. You must decide what stays in core (e.g., system rules, stable preferences, current task state) and what lives in archive (episodic logs, long histories). Paging policies decide when to promote a memory into core and when to evict it. Common signals include recency, importance, and retrieval frequency.
Prompt assembly is the last step. The memory manager must inject core memory into a stable anchor position, then add retrieved archival memories within a defined budget. If the prompt grows too large, the manager must drop or summarize lower-priority memories. This is similar to cache eviction policies like LRU (least recently used) or LFU (least frequently used), but applied to text. You will implement these policies and observe how they affect recall and quality.
Paging introduces a new problem: consistency. If a memory is promoted into core and then later demoted, the agent’s behavior may change unexpectedly. To handle this, paging should be logged and replayable. A replay mode allows you to run the same conversation under different paging policies and compare outcomes. This makes memory management a testable engineering discipline rather than a heuristic art.
The memory manager must also integrate safety policies. Quarantined or sensitive memories should never be promoted into core without explicit approval. This connects to Projects 7 and 9. A production-grade memory manager therefore combines routing, budgeting, paging, and governance. This project is the capstone: it forces you to integrate every previous concept into a single coherent system.
From a systems perspective, this concept must be treated as a first-class interface between data and behavior. That means you need explicit invariants (what must always be true), observability (how you know it is true), and failure signatures (how it breaks when it is not). In practice, engineers often skip this and rely on ad-hoc fixes, which creates hidden coupling between the memory subsystem and the rest of the agent stack. A better approach is to model the concept as a pipeline stage with clear inputs, outputs, and preconditions: if inputs violate the contract, the stage should fail fast rather than silently corrupt memory. This is especially important because memory errors are long-lived and compound over time. You should also define operational metrics that reveal drift early. Examples include: the percentage of memory entries that lack required metadata, the ratio of retrieved memories that are later unused by the model, or the fraction of queries that trigger a fallback route because the primary memory store is empty. These metrics are not just for dashboards; they are design constraints that force you to keep the system testable and predictable.
Another critical dimension is lifecycle management. The concept may work well at small scale but degrade as the memory grows. This is where policies and thresholds matter: you need rules for promotion, demotion, merging, or deletion that prevent the memory from becoming a landfill. The policy should be deterministic and versioned. When it changes, you should be able to replay historical inputs and measure the delta in outputs. This is the same discipline used in data engineering for schema changes and backfills, and it applies equally to memory systems. Finally, remember that memory is an interface to user trust. If the memory system is noisy, the agent feels unreliable; if it is overly strict, the agent feels forgetful. The best designs expose these trade-offs explicitly, so you can tune them according to product goals rather than guessing in the dark.
How this fits on projects This concept is the capstone and integrates Projects 1-9.
Definitions & key terms
- Core memory: Always-present memories in prompt.
- Archive memory: Retrieved on demand.
- Paging: Moving memory between tiers.
- Eviction policy: Rule for removing memory from core.
Mental model diagram (ASCII)
Core (RAM): [Rules] [Preferences] [Current Task]
Archive (Disk): [Episodic Logs] [Summaries] [Graph]
Paging: archive -> core when needed
How It Works (Step-by-Step)
- Maintain a core memory set with fixed size.
- When a query arrives, retrieve candidates from archive.
- Score candidates by importance and recency.
- Promote top candidates into core (paging).
- Evict low-priority core memories if budget exceeded.
- Assemble prompt with core + retrieved memories.
Minimal Concrete Example
policy:
core_budget_tokens: 600
eviction: LRU
promote_if: importance > 0.7
Common Misconceptions
- “Paging is only about size.” (False: it also affects consistency.)
- “Core memory never changes.” (False: it should evolve carefully.)
Check-Your-Understanding Questions
- Why is prompt treated as RAM?
- What is the role of eviction policies?
- How do you ensure paging is auditable?
Check-Your-Understanding Answers
- It is limited and expensive, like RAM.
- They remove low-priority memories to make room.
- Log every promotion and eviction.
Real-World Applications
- Long-running personal assistants.
- Multi-agent platforms with persistent memory.
Where You’ll Apply It
- In this project: §5.4 Concepts You Must Understand First and §6 Testing Strategy.
- Also used in: Project 4, Project 7, Project 9.
References
- MemGPT - https://arxiv.org/abs/2310.08560
Key Insights Memory tiering turns the context window into a managed resource rather than a passive buffer.
Summary An OS-style memory manager integrates routing, paging, and governance to make agent memory reliable at scale.
Homework/Exercises to Practice the Concept
- Define a core memory policy for a personal assistant.
- Simulate a paging decision with five candidate memories.
Solutions to the Homework/Exercises
- Core: system rules, top preferences, current tasks.
- Promote highest importance and recent memories, evict least recently used.
3. Project Specification
3.1 What You Will Build
A memory manager that:
- Maintains core and archive tiers
- Applies paging and eviction policies
- Assembles prompts with budgets
- Logs decisions and supports replay
3.2 Functional Requirements
- Tier Management: Separate core and archive memory.
- Paging Policy: Promote and evict memories.
- Prompt Assembly: Enforce token budgets.
- Replay Mode: Re-run conversations with different policies.
3.3 Non-Functional Requirements
- Performance: Paging decisions < 100ms.
- Reliability: Deterministic replay with fixed seeds.
- Usability: Clear logs of promotions and evictions.
3.4 Example Usage / Output
$ memos run --query "Summarize my preferences"
[CORE] 3 memories loaded
[PAGING] 2 memories promoted
[PROMPT] memory_tokens=620
3.5 Data Formats / Schemas / Protocols
{
"memory_id": "PRF-00012",
"tier": "core",
"last_used": "2026-01-01T10:00:00Z",
"importance": 0.9
}
3.6 Edge Cases
- Core memory full
- No archive candidates
- Conflicting promotions
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
$ memos run --query "Summarize my preferences"
$ memos replay --session session_01 --policy LRU
3.7.2 Golden Path Demo (Deterministic)
$ memos run --query "Summarize my preferences"
[CORE] 3 memories loaded
[PAGING] 2 memories promoted
[RESULT] response includes PRF-00012
exit_code=0
3.7.3 Failure Demo (Deterministic)
$ memos run --query ""
[ERROR] empty query
exit_code=1
4. Solution Architecture
4.1 High-Level Design
Query -> Router -> Archive Retrieval -> Paging -> Prompt Assembly -> LLM
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Tier Manager | Maintain core/archive | Budget sizes |
| Pager | Promote/evict memories | Policy choice |
| Assembler | Build prompt | Anchor placement |
| Logger | Record decisions | Replay format |
4.3 Data Structures (No Full Code)
PagingEvent:
memory_id: string
action: promote/evict
reason: string
4.4 Algorithm Overview
- Retrieve candidates from archive.
- Rank by importance and recency.
- Promote top candidates.
- Evict low-priority core memories.
- Assemble prompt with budget.
5. Implementation Guide
5.1 Development Environment Setup
- Configure memory tier sizes
- Setup replay logging
5.2 Project Structure
project-root/
├── src/
│ ├── tier/
│ ├── paging/
│ ├── assemble/
│ └── replay/
5.3 The Core Question You’re Answering
“How do I manage memory like an operating system manages RAM?”
5.4 Concepts You Must Understand First
- Memory tiering
- Eviction policies
5.5 Questions to Guide Your Design
- What belongs in core memory?
- How do you choose eviction policies?
5.6 Thinking Exercise
Create a paging table with five memories and simulate an eviction.
5.7 The Interview Questions They’ll Ask
- “How is agent memory like OS memory?”
- “What is a paging policy?”
- “How do you choose core memory?”
- “How do you evaluate memory manager quality?”
- “Why is replay important?”
5.8 Hints in Layers
Hint 1: Start with fixed budgets Hint 2: Add LRU eviction Hint 3: Log every paging event Hint 4: Implement replay mode
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Storage | “Designing Data-Intensive Applications” | Ch. 3 |
| Architecture | “Fundamentals of Software Architecture” | Ch. 2 |
5.10 Implementation Phases
Phase 1: Foundation
- Core/archive separation
Phase 2: Core
- Paging and assembly
Phase 3: Polish
- Replay and evaluation
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Eviction | LRU / LFU | LRU | Simple and effective |
| Promotion | Recency / Importance | Importance + recency | Balanced relevance |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | Paging logic | Promote/evict |
| Integration | Full run | Query -> prompt |
| Edge | Budget overflow | Trimming |
6.2 Critical Test Cases
- Core budget never exceeded.
- Paging events logged.
- Replay produces identical output.
6.3 Test Data
query: "Summarize my preferences"
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Over-promotion | Prompt overflow | Tighten promotion rules |
| Under-promotion | Agent forgets | Raise importance weights |
| No replay logs | Hard to debug | Log every decision |
7.2 Debugging Strategies
- Replay sessions with different policies.
- Compare recall probes across versions.
7.3 Performance Traps
- Excessive archive retrieval each query.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add summary tier between core and archive
8.2 Intermediate Extensions
- Add adaptive eviction rules
8.3 Advanced Extensions
- Add multi-agent shared memory tier
9. Real-World Connections
9.1 Industry Applications
- Long-running agent platforms
9.2 Related Open Source Projects
- MemGPT
9.3 Interview Relevance
- System design and memory management topics.
10. Resources
10.1 Essential Reading
- MemGPT paper
10.2 Video Resources
- Talks on agent memory architectures
10.3 Tools & Documentation
- Vector database docs
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain tiering and paging.
11.2 Implementation
- Paging and replay work correctly.
11.3 Growth
- I can justify eviction and promotion policies.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Tiering and paging implemented
Full Completion:
- Replay and evaluation metrics
Excellence (Going Above & Beyond):
- Adaptive policies and multi-agent memory