Project 1: Personal Memory Graph CLI

Build a command-line tool that stores personal facts as nodes and relationships in Neo4j, with CRUD operations and basic Cypher queries—your first hands-on experience with graph data modeling for AI memory.

Quick Reference

Attribute Value
Difficulty Level 1: Beginner
Time Estimate Weekend (8-12 hours)
Language Python (Alternatives: TypeScript, Go)
Prerequisites Basic Python, understanding of databases, Docker basics
Key Topics Graph data modeling, Neo4j, Cypher query language, nodes and relationships, property graphs

1. Learning Objectives

By completing this project, you will:

  1. Understand the fundamental difference between graph databases and relational databases.
  2. Model real-world knowledge as nodes (entities) and relationships (edges) with properties.
  3. Write basic Cypher queries for creating, reading, updating, and deleting graph data.
  4. Design a schema for personal facts that could power an AI assistant’s memory.
  5. Experience the “aha moment” of graph traversal vs. SQL joins.

2. Theoretical Foundation

2.1 Core Concepts

  • Property Graph Model: Nodes have labels (types) and properties (key-value pairs). Relationships have types and direction, and can also have properties. This is the data model used by Neo4j.

  • Index-Free Adjacency: Unlike relational databases that use foreign keys and joins, graph databases store direct pointers between connected nodes. This makes traversal O(1) per hop regardless of total graph size.

  • Cypher Query Language: Neo4j’s declarative query language for pattern matching. The syntax (a)-[r:KNOWS]->(b) reads naturally as “a KNOWS b”.

  • Labels vs. Properties: Labels categorize nodes (Person, Company, Fact); properties store attributes (name, date, value). Choose labels for filtering large sets; properties for specific attributes.

2.2 Why This Matters

Before you can build sophisticated AI memory systems, you need to internalize how graph databases think differently:

  • Relationships are first-class citizens: In SQL, relationships are implicit (foreign keys). In graphs, relationships have their own identity, type, and properties.
  • Traversal is cheap: Finding “friends of friends of friends” is one query, not multiple joins.
  • Schema flexibility: Add new relationship types without migrations.

2.3 Common Misconceptions

  • “Graph databases are just for social networks.” They excel at any connected data: knowledge bases, recommendations, fraud detection, and yes, AI memory.
  • “You need to know graph theory.” You don’t. The property graph model is intuitive—nodes are things, relationships connect things.
  • “Cypher is hard to learn.” It’s actually more readable than SQL for relationship queries. MATCH (a)-[:FRIEND]->(b) is clearer than three-way joins.

2.4 ASCII Diagram: Graph vs Relational

RELATIONAL (SQL)                    GRAPH (Neo4j)
================                    ==============

┌─────────────────┐
│ persons         │                      (Alice)
├─────────────────┤                         │
│ id │ name       │                    [:WORKS_AT]
│ 1  │ Alice      │                         │
│ 2  │ Bob        │                         ▼
└─────────────────┘                     (Acme Corp)
                                            │
┌─────────────────┐                    [:EMPLOYS]
│ employment      │                         │
├─────────────────┤                         ▼
│ person_id│org_id│                       (Bob)
│ 1        │ 1    │
│ 2        │ 1    │
└─────────────────┘

Query: "Who works at Acme?"          Query: "Who works at Acme?"
SELECT p.name                        MATCH (p)-[:WORKS_AT]->(c:Company)
FROM persons p                       WHERE c.name = 'Acme Corp'
JOIN employment e ON p.id = e.person_id   RETURN p.name
JOIN companies c ON e.org_id = c.id
WHERE c.name = 'Acme';

3. Project Specification

3.1 What You Will Build

A command-line tool that lets you:

  • Add facts about yourself (preferences, relationships, events)
  • Query facts using natural patterns
  • Update facts when things change
  • Delete facts that are no longer relevant
  • See the graph structure visually (ASCII or Neo4j Browser)

3.2 Functional Requirements

  1. Add a fact: memory add "I prefer Python for scripting"
  2. Add a relationship: memory relate "Alice" "WORKS_WITH" "Bob"
  3. Query by entity: memory query "Alice" → shows all facts about Alice
  4. Query by relationship: memory query --rel WORKS_WITH → all work relationships
  5. Update a fact: memory update <id> "I now prefer Rust for scripting"
  6. Delete a fact: memory delete <id>
  7. Visualize: memory show → ASCII representation of the graph

3.3 Non-Functional Requirements

  • Reliability: Handle Neo4j connection failures gracefully
  • Usability: Clear error messages and help text
  • Performance: Queries should complete in < 100ms for graphs under 1000 nodes

3.4 Example Usage / Output

$ memory add "I prefer dark mode in all applications"
Created fact: (Preference {value: "dark mode in all applications"})

$ memory relate "Me" "PREFERS" "dark mode" --since "2023-01-01"
Created relationship: (Me)-[:PREFERS {since: 2023-01-01}]->(Preference)

$ memory query "Me"
Entity: Me
├── [:PREFERS] → dark mode (since: 2023-01-01)
├── [:PREFERS] → Python (since: 2020-03-15)
├── [:WORKS_AT] → Acme Corp (since: 2022-06-01)
└── [:KNOWS] → Alice, Bob, Charlie

$ memory show
Graph Visualization (15 nodes, 23 relationships):
   Me ──PREFERS──► dark_mode
    │ ──PREFERS──► Python
    │ ──WORKS_AT─► Acme_Corp
    └──KNOWS─────► Alice ──WORKS_WITH──► Bob

4. Solution Architecture

4.1 High-Level Design

┌───────────────┐      commands      ┌──────────────────┐
│   CLI (Click) │───────────────────▶│  Memory Service  │
└───────────────┘                    └────────┬─────────┘
                                              │
                                              │ Cypher
                                              ▼
                                     ┌──────────────────┐
                                     │   Neo4j Driver   │
                                     └────────┬─────────┘
                                              │
                                              ▼
                                     ┌──────────────────┐
                                     │   Neo4j (Docker) │
                                     └──────────────────┘

4.2 Key Components

Component Responsibility Key Decisions
CLI Layer Parse commands, format output Use Click for argument parsing
Memory Service Business logic, query building Keep Cypher queries centralized
Neo4j Driver Database connection, transactions Use official neo4j Python driver
Data Model Define node labels and relationships Start simple: Entity, Fact, Relationship

4.3 Data Model

Node Labels:
- Entity: Anything that can have facts (Person, Place, Concept)
- Fact: A piece of information with a value
- Preference: A special type of fact about preferences

Relationship Types:
- HAS_FACT: Entity → Fact
- PREFERS: Entity → Entity/Concept
- KNOWS: Person → Person
- WORKS_AT: Person → Organization
- (custom types as needed)

Properties:
- All nodes: id (UUID), created_at, updated_at
- Entity: name, type
- Fact: value, source, confidence
- Relationships: since, until, source

4.4 Algorithm Overview

Adding a Fact:

  1. Parse the fact text to identify entity and value
  2. Check if entity already exists (MERGE)
  3. Create the fact node
  4. Create relationship from entity to fact
  5. Return confirmation with IDs

Querying:

  1. Match the starting pattern
  2. Optionally filter by relationship type or properties
  3. Collect connected nodes and relationships
  4. Format as tree or table for display

5. Implementation Guide

5.1 Development Environment Setup

# Start Neo4j with Docker
docker run -d \
  --name neo4j-memory \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:latest

# Create Python project
mkdir personal-memory-graph && cd personal-memory-graph
python -m venv .venv && source .venv/bin/activate
pip install neo4j click python-dotenv

# Verify connection
python -c "from neo4j import GraphDatabase; d = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password')); d.verify_connectivity(); print('Connected!')"

5.2 Project Structure

personal-memory-graph/
├── src/
│   ├── __init__.py
│   ├── cli.py          # Click command definitions
│   ├── service.py      # Memory service business logic
│   ├── driver.py       # Neo4j connection management
│   └── models.py       # Data structures
├── tests/
│   ├── test_service.py
│   └── test_queries.py
├── .env                # NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD
└── README.md

5.3 Implementation Phases

Phase 1: Connection and Basic CRUD (3-4h)

Goals:

  • Connect to Neo4j
  • Create and read single nodes

Tasks:

  1. Set up Neo4j driver with connection pooling
  2. Implement add command for simple facts
  3. Implement query command to retrieve nodes by name
  4. Add basic error handling

Checkpoint: Can add a fact and retrieve it by name.

Phase 2: Relationships and Traversal (3-4h)

Goals:

  • Create relationships between entities
  • Traverse the graph

Tasks:

  1. Implement relate command for creating relationships
  2. Enhance query to show connected nodes
  3. Add relationship type filtering
  4. Implement basic visualization

Checkpoint: Can create relationships and see graph structure.

Phase 3: Update, Delete, and Polish (2-3h)

Goals:

  • Complete CRUD operations
  • Improve UX

Tasks:

  1. Implement update command
  2. Implement delete command (with confirmation)
  3. Add timestamps and metadata
  4. Improve output formatting

Checkpoint: Full CRUD with nice output.

5.4 Key Implementation Decisions

Decision Options Recommendation Rationale
Entity identification Name-based vs UUID Both (name for UX, UUID for internal) Names can change; UUIDs are stable
Relationship direction Always directed vs bidirectional Always directed Matches graph semantics; query both ways
Schema enforcement Strict labels vs freeform Freeform initially Discover patterns before constraining

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Test query building Cypher generation, input parsing
Integration Test Neo4j operations CRUD operations, transactions
E2E Test CLI commands Full command execution

6.2 Critical Test Cases

  1. Create node: Verify node exists with correct properties
  2. Create relationship: Verify both nodes and relationship exist
  3. Duplicate handling: MERGE doesn’t create duplicates
  4. Query traversal: Returns all connected nodes within depth
  5. Delete cascade: Relationships are cleaned up

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Connection string wrong “Unable to connect” Use bolt:// not http://; check port 7687
Authentication failed “Invalid credentials” Check NEO4J_AUTH matches driver config
CREATE vs MERGE confusion Duplicate nodes Use MERGE for entities; CREATE for unique facts
Missing relationship direction Query returns nothing All relationships must be directed in schema
Transaction not committed Data disappears Use session.execute_write() not run() alone

Debugging Strategies:

  • Use Neo4j Browser (http://localhost:7474) to visualize graph state
  • Log Cypher queries before execution
  • Check EXPLAIN output for query plans

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a memory import command to load facts from a JSON file
  • Add colored output using rich library
  • Add --format json flag for machine-readable output

8.2 Intermediate Extensions

  • Add temporal properties (valid_from, valid_until) to relationships
  • Implement fuzzy search for entity names
  • Add graph export to GraphML or JSON format

8.3 Advanced Extensions

  • Add natural language parsing for fact extraction (basic NLP)
  • Implement shortest path queries between entities
  • Add Cypher REPL for direct query execution

9. Real-World Connections

9.1 Industry Applications

  • Personal Knowledge Management: Tools like Roam, Obsidian use graph structures
  • AI Memory Systems: Zep, Mem0, and LangGraph use graph databases
  • Enterprise Knowledge Graphs: Google Knowledge Graph, Amazon Product Graph
  • Neo4j: The graph database you’re using
  • LangChain Neo4j Integration: Graph-based RAG patterns
  • Memgraph: Alternative graph database with Python-first approach

9.3 Interview Relevance

  • Explain when to use graph vs. relational databases
  • Discuss trade-offs of property graph vs. RDF models
  • Describe how graph databases enable AI memory

10. Resources

10.1 Essential Reading

  • “Graph Databases” by Robinson, Webber, Eifrem — Neo4j fundamentals (Ch. 1-3)
  • Neo4j Cypher Manual — Official query language reference
  • “Designing Data-Intensive Applications” by Kleppmann — Ch. 2 (Data Models)

10.2 Tools & Documentation

  • Neo4j Desktop (visualization and development)
  • Neo4j Browser (web-based query interface)
  • Cypher Refcard (quick reference)
  • Previous: None (start here)
  • Next: Project 2 (Conversation Episode Store) — add time-series conversation storage

11. Self-Assessment Checklist

  • I can explain the difference between graph and relational databases
  • I can write MERGE, MATCH, and CREATE Cypher queries
  • I understand when to use labels vs. properties
  • I can traverse relationships with variable-length paths
  • I can design a graph schema for a new domain

12. Submission / Completion Criteria

Minimum Viable Completion:

  • CLI with add, query, and delete commands
  • Neo4j connection working
  • Can create entities and relationships

Full Completion:

  • All CRUD operations working
  • Relationship traversal in queries
  • ASCII or formatted visualization
  • Proper error handling

Excellence (Going Above & Beyond):

  • Temporal properties on relationships
  • Import/export functionality
  • Natural language fact parsing