Project 4: Hybrid Memory Router

Build a policy-based router that decides which memory stores to query and how to merge results.

Quick Reference

Attribute	Value
Difficulty	Level 3
Time Estimate	2-3 weeks
Main Programming Language	Python (Alternatives: TypeScript, Go)
Alternative Programming Languages	TypeScript, Go
Coolness Level	Level 3
Business Potential	Level 3
Prerequisites	Memory taxonomy, vector search basics
Key Topics	Routing policies, prompt budgeting, conflict resolution

1. Learning Objectives

By completing this project, you will:

Design routing policies for different memory types.
Merge results from multiple memory stores.
Enforce a memory token budget.
Explain retrieval decisions with a debug trace.

2. All Theory Needed (Per-Concept Breakdown)

Memory Routing Policies and Prompt Budgeting

Fundamentals A hybrid memory router decides which memory stores to query and how to allocate limited prompt space. It is the layer that makes memory practical: without routing, every store is queried and the prompt overflows. Routing policies combine query intent, memory type, recency, and confidence. Prompt budgeting limits how much memory is injected, ensuring that high-priority memories fit within the context window.

Deep Dive into the concept Routing is a policy problem, not a storage problem. The router must interpret the query and decide which memory types are relevant. A question like “What did we decide last week?” should prioritize episodic memory and summaries, while “What tools did we use?” should prioritize procedural memory. You can implement this via explicit rules, lightweight classifiers, or retrieval-of-retrieval (e.g., using an LLM to select memory types). Regardless of approach, the router must be deterministic and auditable for evaluation.

Prompt budgeting is the second half of routing. Even if retrieval is perfect, injecting too much memory can harm answer quality. The model’s context window behaves like RAM: overfill it and you lose important task instructions. The router must allocate a memory budget (tokens or character count) and decide how to fill it. This often involves prioritization: high-confidence, high-recency, and policy-approved memories are injected first. Lower priority memories are either dropped or summarized.

Conflict resolution is another critical policy. Two memories may disagree due to drift or user updates. The router should not blindly inject both, as that confuses the model. Instead, it should apply conflict resolution rules: prefer newer memories, prefer memories with higher confidence, or surface conflicts explicitly. This is especially important for preference memory, where outdated preferences can cause user frustration.

Routing policies must also be safe. Sensitive memories should be filtered unless explicit consent is present. This requires the router to inspect metadata fields and enforce privacy constraints. A robust router therefore combines type selection, budgeting, conflict resolution, and safety filtering into a single decision pipeline. You will build that pipeline in this project.

From a systems perspective, this concept must be treated as a first-class interface between data and behavior. That means you need explicit invariants (what must always be true), observability (how you know it is true), and failure signatures (how it breaks when it is not). In practice, engineers often skip this and rely on ad-hoc fixes, which creates hidden coupling between the memory subsystem and the rest of the agent stack. A better approach is to model the concept as a pipeline stage with clear inputs, outputs, and preconditions: if inputs violate the contract, the stage should fail fast rather than silently corrupt memory. This is especially important because memory errors are long-lived and compound over time. You should also define operational metrics that reveal drift early. Examples include: the percentage of memory entries that lack required metadata, the ratio of retrieved memories that are later unused by the model, or the fraction of queries that trigger a fallback route because the primary memory store is empty. These metrics are not just for dashboards; they are design constraints that force you to keep the system testable and predictable.

Another critical dimension is lifecycle management. The concept may work well at small scale but degrade as the memory grows. This is where policies and thresholds matter: you need rules for promotion, demotion, merging, or deletion that prevent the memory from becoming a landfill. The policy should be deterministic and versioned. When it changes, you should be able to replay historical inputs and measure the delta in outputs. This is the same discipline used in data engineering for schema changes and backfills, and it applies equally to memory systems. Finally, remember that memory is an interface to user trust. If the memory system is noisy, the agent feels unreliable; if it is overly strict, the agent feels forgetful. The best designs expose these trade-offs explicitly, so you can tune them according to product goals rather than guessing in the dark.

How this fits on projects This concept is central to Project 4 and will be used directly in Project 10’s OS-style memory manager.

Definitions & key terms

Routing policy: Rule set for selecting memory stores.
Budgeting: Allocation of prompt space for memory.
Conflict resolution: Rules for handling contradictory memories.
Traceability: Logging why each memory was selected.

Mental model diagram (ASCII)

Query -> Router -> [Episodic] [Semantic] [Preference]
                     |         |          |
                  Rank/Filter  |          |
                     +----------+----------+
                                v
                             Prompt

How It Works (Step-by-Step)

Parse query and classify intent.
Select eligible memory stores.
Retrieve candidates from each store.
Apply filters (recency, sensitivity, confidence).
Resolve conflicts and enforce budget.
Inject into prompt with trace log.

Minimal Concrete Example

routing_policy:
  if query contains "decide" -> episodic + summary
  if query contains "preference" -> preference only
  budget: 800 tokens

Common Misconceptions

“More memory is always better.” (False: excess memory reduces quality.)
“Routing can be ad-hoc.” (False: inconsistent routing breaks evaluation.)

Check-Your-Understanding Questions

Why is budgeting necessary?
How do you resolve conflicting memories?
What metadata fields are critical for routing?

Check-Your-Understanding Answers

Because the context window is limited and overfilling reduces clarity.
Prefer higher confidence or more recent memory; optionally surface conflict.
Type, timestamp, confidence, sensitivity, consent.

Real-World Applications

Assistants that combine vector memory and summaries.
Enterprise agents with strict privacy constraints.

Where You’ll Apply It

In this project: §5.4 Concepts You Must Understand First and §6 Testing Strategy.
Also used in: Project 10.

References

MemGPT (memory routing concept) - https://arxiv.org/abs/2310.08560

Key Insights Routing policies determine whether memory helps or harms the agent.

Summary Hybrid routing is the decision layer that makes memory selective, safe, and effective.

Homework/Exercises to Practice the Concept

Write a routing policy for three query types.
Design a budgeting strategy for 600 tokens.

Solutions to the Homework/Exercises

Example: decision queries -> episodic+summary; preference queries -> preference; tool queries -> procedural.
Allocate 60% to high-priority memory, 40% to recent context.

3. Project Specification

3.1 What You Will Build

A hybrid router that:

Classifies query intent
Selects memory stores
Retrieves and merges candidates
Enforces token budget
Logs decisions

3.2 Functional Requirements

Store Selection: Choose relevant stores per query.
Budget Enforcement: Limit memory tokens.
Conflict Resolution: Handle contradictory memories.
Trace Log: Explain each selection.

3.3 Non-Functional Requirements

Performance: Routing adds < 50ms overhead.
Reliability: Same query yields same routing decisions.
Usability: Debug traces are human-readable.

3.4 Example Usage / Output

$ memrouter query "Summarize the deployment decision"
[ROUTE] episodic + summary
[FETCH] episodic=4 summary=1
[MERGE] selected=3 (budget=6)

3.5 Data Formats / Schemas / Protocols

route_trace:
  query: "Summarize the deployment decision"
  stores: ["episodic", "summary"]
  selected_ids: ["EPI-0021", "SUM-0008"]

3.6 Edge Cases

Conflicting memories with same confidence
Budget too small for required memory
Queries with ambiguous intent

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

$ memrouter query "Summarize the deployment decision"
$ memrouter trace last

3.7.2 Golden Path Demo (Deterministic)

$ memrouter query "Summarize the deployment decision"
[ROUTE] episodic + summary
[INJECT] selected_ids=EPI-0021,SUM-0008
exit_code=0

3.7.3 Failure Demo (Deterministic)

$ memrouter query "" 
[ERROR] empty query
exit_code=1

4. Solution Architecture

4.1 High-Level Design

Query -> Intent Classifier -> Store Selector -> Retrieve -> Merge -> Prompt

4.2 Key Components

Component	Responsibility	Key Decisions
Intent Classifier	Determine query type	Rule-based vs ML
Store Selector	Pick memory stores	Type mapping
Merger	Combine results	Conflict rules
Budgeter	Enforce prompt size	Token counting

4.3 Data Structures (No Full Code)

RouteDecision:
  stores: list
  budget: int
  selected: list

4.4 Algorithm Overview

Classify query intent.
Select eligible stores.
Retrieve candidates.
Filter and rerank.
Enforce budget and resolve conflicts.

Complexity Analysis: O(k) per store.

5. Implementation Guide

5.1 Development Environment Setup

- Load memory store configs
- Define token budget limits

5.2 Project Structure

project-root/
├── src/
│   ├── classify/
│   ├── route/
│   ├── merge/
│   └── trace/

5.3 The Core Question You’re Answering

“Which memories should be injected, and why?”

5.4 Concepts You Must Understand First

Routing policies
Budget enforcement

5.5 Questions to Guide Your Design

How will you resolve conflicts?
What is the maximum memory budget?

5.6 Thinking Exercise

Design a policy for handling preference vs episodic conflicts.

5.7 The Interview Questions They’ll Ask

“How does a router improve memory quality?”
“What is a memory budget?”
“How do you handle conflicts?”
“Why is traceability important?”
“How do you evaluate routing?”

5.8 Hints in Layers

Hint 1: Start with keyword-based routing Hint 2: Add confidence filters Hint 3: Add budgeting Hint 4: Add trace logging

5.9 Books That Will Help

Topic	Book	Chapter
Architecture	“Fundamentals of Software Architecture”	Ch. 2
Agent systems	“AI Engineering”	Ch. 6

5.10 Implementation Phases

Phase 1: Foundation

Intent classifier and store selector

Phase 2: Core Functionality

Retrieval and merge logic

Phase 3: Polish

Budget enforcement and trace logs

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Routing method	Rule-based / LLM	Rule-based	Deterministic and auditable
Conflict policy	Recency / Confidence	Confidence + recency	Balances freshness and reliability

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	Policy rules	Intent mapping
Integration	Full routing	Query -> prompt
Edge	Conflicts	Contradictory memories

6.2 Critical Test Cases

Budget enforcement limits memory size.
Conflicting memories resolved deterministically.
Trace log includes selection reasons.

6.3 Test Data

query: "Summarize the deployment decision"
expected_stores: ["episodic", "summary"]

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Over-routing	Prompt overflow	Enforce budget
Under-routing	Missing context	Expand store selection
No traceability	Hard to debug	Add trace log

7.2 Debugging Strategies

Compare routing outputs for similar queries.
Add trace IDs to each decision.

7.3 Performance Traps

Excessive reranking of large candidate sets.

8. Extensions & Challenges

8.1 Beginner Extensions

Add manual override rules

8.2 Intermediate Extensions

Add reranking by confidence

8.3 Advanced Extensions

Add adaptive budget based on task complexity

9. Real-World Connections

9.1 Industry Applications

Production memory routers in multi-store RAG systems

MemGPT

9.3 Interview Relevance

Routing policies are a common system design topic.

10. Resources

10.1 Essential Reading

MemGPT paper

10.2 Video Resources

Talks on RAG routing and retrieval policies

10.3 Tools & Documentation

LangChain memory routing examples

11. Self-Assessment Checklist

11.1 Understanding

I can explain routing and budgeting.

11.2 Implementation

Router selects stores and enforces budget.

11.3 Growth

I can justify my routing policy.

12. Submission / Completion Criteria

Minimum Viable Completion:

Routing and budgeting implemented

Full Completion:

Conflict resolution and trace logging

Excellence (Going Above & Beyond):

Adaptive routing based on task complexity