Project 10: OS-Style Memory Manager (MemGPT-Inspired)

Build a hierarchical memory manager that pages memories in and out of the prompt like an operating system.

Quick Reference

Attribute Value
Difficulty Level 4
Time Estimate 4-6 weeks
Main Programming Language Python (Alternatives: Rust, Go)
Alternative Programming Languages Rust, Go
Coolness Level Level 4
Business Potential Level 4
Prerequisites Routing, vector memory, summarization
Key Topics Memory tiers, paging policies, prompt assembly

1. Learning Objectives

By completing this project, you will:

  1. Design a multi-tier memory hierarchy.
  2. Implement paging policies to move memory between tiers.
  3. Assemble prompts with stable anchors and budgets.
  4. Evaluate memory manager quality using probes and replay.

2. All Theory Needed (Per-Concept Breakdown)

OS-Style Memory Tiering and Paging Policies

Fundamentals An OS-style memory manager treats the prompt as limited RAM and external memory stores as disk. Core memories stay in prompt, while archival memories are paged in only when needed. This requires explicit policies for promotion, eviction, and paging. Without these policies, agents either forget critical context or overload the prompt.

Deep Dive into the concept MemGPT proposes a memory hierarchy for agents: a small core memory (always in context), a larger archival memory (retrieved on demand), and optional summary tiers that compress history. This mirrors operating system memory management: the prompt is RAM, the vector store is disk, and the memory manager is the OS kernel. The key is that paging decisions must be explicit and auditable. You must decide what stays in core (e.g., system rules, stable preferences, current task state) and what lives in archive (episodic logs, long histories). Paging policies decide when to promote a memory into core and when to evict it. Common signals include recency, importance, and retrieval frequency.

Prompt assembly is the last step. The memory manager must inject core memory into a stable anchor position, then add retrieved archival memories within a defined budget. If the prompt grows too large, the manager must drop or summarize lower-priority memories. This is similar to cache eviction policies like LRU (least recently used) or LFU (least frequently used), but applied to text. You will implement these policies and observe how they affect recall and quality.

Paging introduces a new problem: consistency. If a memory is promoted into core and then later demoted, the agent’s behavior may change unexpectedly. To handle this, paging should be logged and replayable. A replay mode allows you to run the same conversation under different paging policies and compare outcomes. This makes memory management a testable engineering discipline rather than a heuristic art.

The memory manager must also integrate safety policies. Quarantined or sensitive memories should never be promoted into core without explicit approval. This connects to Projects 7 and 9. A production-grade memory manager therefore combines routing, budgeting, paging, and governance. This project is the capstone: it forces you to integrate every previous concept into a single coherent system.

From a systems perspective, this concept must be treated as a first-class interface between data and behavior. That means you need explicit invariants (what must always be true), observability (how you know it is true), and failure signatures (how it breaks when it is not). In practice, engineers often skip this and rely on ad-hoc fixes, which creates hidden coupling between the memory subsystem and the rest of the agent stack. A better approach is to model the concept as a pipeline stage with clear inputs, outputs, and preconditions: if inputs violate the contract, the stage should fail fast rather than silently corrupt memory. This is especially important because memory errors are long-lived and compound over time. You should also define operational metrics that reveal drift early. Examples include: the percentage of memory entries that lack required metadata, the ratio of retrieved memories that are later unused by the model, or the fraction of queries that trigger a fallback route because the primary memory store is empty. These metrics are not just for dashboards; they are design constraints that force you to keep the system testable and predictable.

Another critical dimension is lifecycle management. The concept may work well at small scale but degrade as the memory grows. This is where policies and thresholds matter: you need rules for promotion, demotion, merging, or deletion that prevent the memory from becoming a landfill. The policy should be deterministic and versioned. When it changes, you should be able to replay historical inputs and measure the delta in outputs. This is the same discipline used in data engineering for schema changes and backfills, and it applies equally to memory systems. Finally, remember that memory is an interface to user trust. If the memory system is noisy, the agent feels unreliable; if it is overly strict, the agent feels forgetful. The best designs expose these trade-offs explicitly, so you can tune them according to product goals rather than guessing in the dark.

How this fits on projects This concept is the capstone and integrates Projects 1-9.

Definitions & key terms

  • Core memory: Always-present memories in prompt.
  • Archive memory: Retrieved on demand.
  • Paging: Moving memory between tiers.
  • Eviction policy: Rule for removing memory from core.

Mental model diagram (ASCII)

Core (RAM):    [Rules] [Preferences] [Current Task]
Archive (Disk): [Episodic Logs] [Summaries] [Graph]
Paging: archive -> core when needed

How It Works (Step-by-Step)

  1. Maintain a core memory set with fixed size.
  2. When a query arrives, retrieve candidates from archive.
  3. Score candidates by importance and recency.
  4. Promote top candidates into core (paging).
  5. Evict low-priority core memories if budget exceeded.
  6. Assemble prompt with core + retrieved memories.

Minimal Concrete Example

policy:
  core_budget_tokens: 600
  eviction: LRU
  promote_if: importance > 0.7

Common Misconceptions

  • “Paging is only about size.” (False: it also affects consistency.)
  • “Core memory never changes.” (False: it should evolve carefully.)

Check-Your-Understanding Questions

  1. Why is prompt treated as RAM?
  2. What is the role of eviction policies?
  3. How do you ensure paging is auditable?

Check-Your-Understanding Answers

  1. It is limited and expensive, like RAM.
  2. They remove low-priority memories to make room.
  3. Log every promotion and eviction.

Real-World Applications

  • Long-running personal assistants.
  • Multi-agent platforms with persistent memory.

Where You’ll Apply It

References

  • MemGPT - https://arxiv.org/abs/2310.08560

Key Insights Memory tiering turns the context window into a managed resource rather than a passive buffer.

Summary An OS-style memory manager integrates routing, paging, and governance to make agent memory reliable at scale.

Homework/Exercises to Practice the Concept

  1. Define a core memory policy for a personal assistant.
  2. Simulate a paging decision with five candidate memories.

Solutions to the Homework/Exercises

  1. Core: system rules, top preferences, current tasks.
  2. Promote highest importance and recent memories, evict least recently used.

3. Project Specification

3.1 What You Will Build

A memory manager that:

  • Maintains core and archive tiers
  • Applies paging and eviction policies
  • Assembles prompts with budgets
  • Logs decisions and supports replay

3.2 Functional Requirements

  1. Tier Management: Separate core and archive memory.
  2. Paging Policy: Promote and evict memories.
  3. Prompt Assembly: Enforce token budgets.
  4. Replay Mode: Re-run conversations with different policies.

3.3 Non-Functional Requirements

  • Performance: Paging decisions < 100ms.
  • Reliability: Deterministic replay with fixed seeds.
  • Usability: Clear logs of promotions and evictions.

3.4 Example Usage / Output

$ memos run --query "Summarize my preferences"
[CORE] 3 memories loaded
[PAGING] 2 memories promoted
[PROMPT] memory_tokens=620

3.5 Data Formats / Schemas / Protocols

{
  "memory_id": "PRF-00012",
  "tier": "core",
  "last_used": "2026-01-01T10:00:00Z",
  "importance": 0.9
}

3.6 Edge Cases

  • Core memory full
  • No archive candidates
  • Conflicting promotions

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

$ memos run --query "Summarize my preferences"
$ memos replay --session session_01 --policy LRU

3.7.2 Golden Path Demo (Deterministic)

$ memos run --query "Summarize my preferences"
[CORE] 3 memories loaded
[PAGING] 2 memories promoted
[RESULT] response includes PRF-00012
exit_code=0

3.7.3 Failure Demo (Deterministic)

$ memos run --query "" 
[ERROR] empty query
exit_code=1

4. Solution Architecture

4.1 High-Level Design

Query -> Router -> Archive Retrieval -> Paging -> Prompt Assembly -> LLM

4.2 Key Components

Component Responsibility Key Decisions
Tier Manager Maintain core/archive Budget sizes
Pager Promote/evict memories Policy choice
Assembler Build prompt Anchor placement
Logger Record decisions Replay format

4.3 Data Structures (No Full Code)

PagingEvent:
  memory_id: string
  action: promote/evict
  reason: string

4.4 Algorithm Overview

  1. Retrieve candidates from archive.
  2. Rank by importance and recency.
  3. Promote top candidates.
  4. Evict low-priority core memories.
  5. Assemble prompt with budget.

5. Implementation Guide

5.1 Development Environment Setup

- Configure memory tier sizes
- Setup replay logging

5.2 Project Structure

project-root/
├── src/
│   ├── tier/
│   ├── paging/
│   ├── assemble/
│   └── replay/

5.3 The Core Question You’re Answering

“How do I manage memory like an operating system manages RAM?”

5.4 Concepts You Must Understand First

  1. Memory tiering
  2. Eviction policies

5.5 Questions to Guide Your Design

  1. What belongs in core memory?
  2. How do you choose eviction policies?

5.6 Thinking Exercise

Create a paging table with five memories and simulate an eviction.

5.7 The Interview Questions They’ll Ask

  1. “How is agent memory like OS memory?”
  2. “What is a paging policy?”
  3. “How do you choose core memory?”
  4. “How do you evaluate memory manager quality?”
  5. “Why is replay important?”

5.8 Hints in Layers

Hint 1: Start with fixed budgets Hint 2: Add LRU eviction Hint 3: Log every paging event Hint 4: Implement replay mode

5.9 Books That Will Help

Topic Book Chapter
Storage “Designing Data-Intensive Applications” Ch. 3
Architecture “Fundamentals of Software Architecture” Ch. 2

5.10 Implementation Phases

Phase 1: Foundation

  • Core/archive separation

Phase 2: Core

  • Paging and assembly

Phase 3: Polish

  • Replay and evaluation

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Eviction LRU / LFU LRU Simple and effective
Promotion Recency / Importance Importance + recency Balanced relevance

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Paging logic Promote/evict
Integration Full run Query -> prompt
Edge Budget overflow Trimming

6.2 Critical Test Cases

  1. Core budget never exceeded.
  2. Paging events logged.
  3. Replay produces identical output.

6.3 Test Data

query: "Summarize my preferences"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Over-promotion Prompt overflow Tighten promotion rules
Under-promotion Agent forgets Raise importance weights
No replay logs Hard to debug Log every decision

7.2 Debugging Strategies

  • Replay sessions with different policies.
  • Compare recall probes across versions.

7.3 Performance Traps

  • Excessive archive retrieval each query.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add summary tier between core and archive

8.2 Intermediate Extensions

  • Add adaptive eviction rules

8.3 Advanced Extensions

  • Add multi-agent shared memory tier

9. Real-World Connections

9.1 Industry Applications

  • Long-running agent platforms
  • MemGPT

9.3 Interview Relevance

  • System design and memory management topics.

10. Resources

10.1 Essential Reading

  • MemGPT paper

10.2 Video Resources

  • Talks on agent memory architectures

10.3 Tools & Documentation

  • Vector database docs

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain tiering and paging.

11.2 Implementation

  • Paging and replay work correctly.

11.3 Growth

  • I can justify eviction and promotion policies.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Tiering and paging implemented

Full Completion:

  • Replay and evaluation metrics

Excellence (Going Above & Beyond):

  • Adaptive policies and multi-agent memory