Project 5: Bi-Temporal Fact Store

Build a fact storage system with both valid time (when facts are true) and transaction time (when facts were recorded), enabling temporal queries like “What did we know on date X about Y?”

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2 weeks (25-35 hours)
Language Python (Alternatives: TypeScript, Go)
Prerequisites Projects 1-4, temporal databases, SQL datetime handling
Key Topics Bi-temporal modeling, valid time, transaction time, temporal queries, point-in-time reconstruction, Allen’s interval algebra

1. Learning Objectives

By completing this project, you will:

  1. Understand the difference between valid time and transaction time.
  2. Design a bi-temporal schema for knowledge graph facts.
  3. Implement temporal CRUD operations that preserve history.
  4. Build queries for point-in-time reconstruction.
  5. Handle temporal predicates (before, during, overlaps, etc.).

2. Theoretical Foundation

2.1 Core Concepts

  • Valid Time (VT): When a fact is true in the real world. “Alice worked at Acme from 2020-2023.”

  • Transaction Time (TT): When a fact was recorded in the system. “We learned this on 2024-01-15.”

  • Bi-Temporal Model: Combining both dimensions allows four types of queries:
    • Current knowledge about current facts
    • Current knowledge about past facts
    • Past knowledge about past facts (what we knew then)
    • Past knowledge corrected (audit trails)
  • Temporal Predicates: Allen’s interval algebra defines 13 relations between time intervals:
    • before, after, meets, overlaps, during, starts, finishes (+ inverses + equals)
  • Snapshot vs. Delta Storage: Store full state at each point vs. store changes only.

2.2 Why This Matters

AI memory without temporality is broken:

  • “Alice works at Acme” vs “Alice worked at Acme” (validity)
  • “User said X yesterday but corrected to Y today” (transaction)
  • “What did the agent believe last week?” (debugging, audit)

Without bi-temporal tracking, you can’t distinguish outdated facts from corrections.

2.3 Common Misconceptions

  • “Just use updated_at.” That’s transaction time only—you lose when facts were actually true.
  • “Delete old facts.” Deletion destroys audit trails and makes debugging impossible.
  • “One timestamp is enough.” You need two independent dimensions for full temporal reasoning.

2.4 ASCII Diagram: Bi-Temporal Space

                     TRANSACTION TIME (when recorded)
                     ─────────────────────────────────►
                     Jan      Feb      Mar      Apr
                  ┌────────────────────────────────────┐
              2020│ ░░░░░░░░ ░░░░░░░░                  │
                  │          (recorded in Feb that     │
     V            │           Alice worked at Acme     │
     A            │           from 2020)               │
     L         2021│                                   │
     I            │                                    │
     D            │                                    │
                  │                                    │
     T         2022│          ████████ ████████        │
     I            │          (recorded in Mar that    │
     M            │           Alice left Acme in 2022)│
     E            │                                    │
              2023│                    ▓▓▓▓▓▓▓▓       │
     ↓            │                   (correction:     │
                  │                    actually 2023)  │
                  └────────────────────────────────────┘

Query: "What did we know in February about Alice's employment?"
Answer: Alice works at Acme (started 2020, no end recorded yet)

Query: "What do we know NOW about Alice's employment?"
Answer: Alice worked at Acme 2020-2023 (corrected in April)

2.5 Bi-Temporal Fact Example

FACT: Alice works at Acme

Version 1 (recorded Feb 1):
┌─────────────────────────────────────────────────────┐
│ subject: Alice                                       │
│ predicate: WORKS_AT                                  │
│ object: Acme                                         │
│ valid_from: 2020-03-15                              │
│ valid_to: NULL (ongoing)                            │
│ tx_from: 2024-02-01                                 │
│ tx_to: NULL (current version)                       │
└─────────────────────────────────────────────────────┘

Version 2 (recorded Apr 1, correcting end date):
┌─────────────────────────────────────────────────────┐
│ subject: Alice                                       │
│ predicate: WORKS_AT                                  │
│ object: Acme                                         │
│ valid_from: 2020-03-15                              │
│ valid_to: 2023-06-30  ◄── Correction                │
│ tx_from: 2024-04-01   ◄── New version               │
│ tx_to: NULL (current version)                       │
└─────────────────────────────────────────────────────┘

Previous version now closed:
┌─────────────────────────────────────────────────────┐
│ ... (same as Version 1)                             │
│ tx_to: 2024-04-01  ◄── No longer current            │
└─────────────────────────────────────────────────────┘

3. Project Specification

3.1 What You Will Build

A Python fact store with:

  • Bi-temporal fact storage (valid time + transaction time)
  • Temporal CRUD operations
  • Point-in-time queries
  • Interval predicates for complex temporal logic

3.2 Functional Requirements

  1. Store fact: store.add_fact(subject, predicate, object, valid_from, valid_to)
  2. Update fact: store.update_fact(fact_id, valid_to=new_date) (creates new version)
  3. Invalidate fact: store.invalidate(fact_id) (ends transaction time)
  4. Query current: store.query(subject, predicate) → current valid facts
  5. Query at time: store.query_at(subject, predicate, as_of, known_at) → point-in-time
  6. Temporal predicates: store.query(valid_time_overlaps=(start, end))
  7. Fact history: store.history(fact_id) → all versions

3.3 Example Usage / Output

from temporal_store import BiTemporalFactStore
from datetime import date

store = BiTemporalFactStore()

# Record that Alice works at Acme (learned today)
store.add_fact(
    subject="Alice",
    predicate="WORKS_AT",
    object="Acme",
    valid_from=date(2020, 3, 15),
    valid_to=None  # ongoing
)

# Later, we learn she left
store.update_fact(
    subject="Alice",
    predicate="WORKS_AT",
    object="Acme",
    valid_to=date(2023, 6, 30)
)

# Query current knowledge
facts = store.query(subject="Alice", predicate="WORKS_AT")
print(facts)
# [Fact(object="Acme", valid_from=2020-03-15, valid_to=2023-06-30)]

# Query what we knew last month
facts = store.query_at(
    subject="Alice",
    predicate="WORKS_AT",
    known_at=date(2024, 3, 1)  # Before correction
)
print(facts)
# [Fact(object="Acme", valid_from=2020-03-15, valid_to=None)]  # Still ongoing!

# Query facts valid during 2021
facts = store.query(
    subject="Alice",
    valid_time_overlaps=(date(2021, 1, 1), date(2021, 12, 31))
)
print(facts)
# [Fact(predicate="WORKS_AT", object="Acme", ...)]

# Get full history
history = store.history(subject="Alice", predicate="WORKS_AT", object="Acme")
for version in history:
    print(f"Version valid {version.tx_from} - {version.tx_to}: {version}")

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────┐
│  BiTemporalFactStore│
└──────────┬──────────┘
           │
     ┌─────┴─────┐
     │           │
     ▼           ▼
┌─────────┐ ┌─────────────┐
│ SQLite  │ │  Temporal   │
│ Storage │ │  Index      │
└─────────┘ └─────────────┘
     │           │
     └─────┬─────┘
           ▼
    ┌─────────────┐
    │   Query     │
    │   Engine    │
    └─────────────┘

4.2 Key Components

Component Responsibility Technology
FactStore CRUD operations with temporal semantics Python class
TemporalIndex Efficient interval queries Interval trees or SQL indexes
QueryEngine Temporal predicate evaluation Allen’s algebra implementation
VersionManager Handle fact versioning Transaction time tracking

4.3 Data Model

CREATE TABLE facts (
    id TEXT PRIMARY KEY,
    subject TEXT NOT NULL,
    predicate TEXT NOT NULL,
    object TEXT NOT NULL,
    properties JSON,

    -- Valid time (when true in reality)
    valid_from DATE NOT NULL,
    valid_to DATE,  -- NULL means ongoing

    -- Transaction time (when recorded)
    tx_from TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    tx_to TIMESTAMP,  -- NULL means current version

    -- Provenance
    source_episode TEXT,
    confidence REAL DEFAULT 1.0
);

-- Index for current facts
CREATE INDEX idx_current ON facts(subject, predicate)
    WHERE tx_to IS NULL;

-- Index for valid time queries
CREATE INDEX idx_valid_time ON facts(valid_from, valid_to);

5. Implementation Guide

5.1 Development Environment Setup

mkdir bi-temporal-store && cd bi-temporal-store
python -m venv .venv && source .venv/bin/activate
pip install pydantic sqlalchemy intervaltree

5.2 Project Structure

bi-temporal-store/
├── src/
│   ├── store.py          # BiTemporalFactStore
│   ├── models.py         # Fact, Query models
│   ├── temporal.py       # Allen's algebra
│   ├── indexing.py       # Interval tree index
│   └── queries.py        # Query builder
├── tests/
│   ├── test_temporal.py
│   └── test_queries.py
└── README.md

5.3 Implementation Phases

Phase 1: Basic Bi-Temporal Storage (8-10h)

Goals:

  • Store facts with valid_time and tx_time
  • Implement version-creating updates

Tasks:

  1. Design SQLite schema with temporal columns
  2. Implement add_fact with automatic tx_from
  3. Implement update_fact that closes old version, creates new
  4. Implement invalidate that sets tx_to

Checkpoint: Facts can be added and updated with version history.

Phase 2: Temporal Queries (8-10h)

Goals:

  • Query current facts
  • Query as-of any point in time

Tasks:

  1. Implement query for current facts (tx_to IS NULL)
  2. Implement query_at with both as_of and known_at parameters
  3. Add valid_time_overlaps predicate
  4. Build history retrieval

Checkpoint: Can query “what did we know on date X about Y?”

Phase 3: Advanced Predicates (6-8h)

Goals:

  • Full Allen’s interval algebra
  • Complex temporal logic

Tasks:

  1. Implement all 13 Allen relations
  2. Add compound temporal queries
  3. Optimize with interval tree indexes
  4. Add timeline visualization

Checkpoint: Complex temporal reasoning working.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Test temporal predicates Interval overlap logic
Integration Test versioning Update → query old version
Regression Test edge cases NULL end dates, same-day changes

6.2 Critical Test Cases

  1. Version isolation: Update doesn’t affect past queries
  2. Current facts: tx_to IS NULL filter works
  3. Overlapping intervals: Correct interval detection
  4. NULL handling: Ongoing facts (valid_to=NULL) handled correctly

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Timezone confusion Facts appear on wrong date Use UTC everywhere, convert at display
Inclusive vs exclusive Off-by-one errors Document and test boundary behavior
NULL comparisons Queries miss ongoing facts Handle NULL explicitly in SQL
Version ordering Wrong version returned Order by tx_from DESC, take first

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add CLI for temporal queries
  • Add fact timeline ASCII visualization

8.2 Intermediate Extensions

  • Implement temporal joins (facts valid at same time)
  • Add coalescing for adjacent intervals

8.3 Advanced Extensions

  • Integrate with Neo4j temporal properties
  • Build temporal reasoning rules engine

9. Real-World Connections

9.1 Industry Applications

  • Financial Systems: Trading records, audit trails
  • Healthcare: Patient history with corrections
  • AI Memory: Graphiti’s temporal edge model

9.2 Interview Relevance

  • Explain bi-temporal vs uni-temporal modeling
  • Discuss immutability and audit requirements
  • Describe temporal query optimization strategies

10. Resources

10.1 Essential Reading

  • “Temporal Data & the Relational Model” by Date, Darwen, Lorentzos
  • “Designing Data-Intensive Applications” by Kleppmann — Ch. on Change Data Capture
  • Allen’s Interval Algebra paper (1983)
  • Previous: Project 4 (Entity Resolution)
  • Next: Project 6 (Temporal Query Engine)

11. Self-Assessment Checklist

  • I can explain the difference between valid time and transaction time
  • I understand why updates create new versions instead of modifying
  • I can query “what did we know at time X about Y at time Z”
  • I know Allen’s basic interval relations

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Facts stored with valid_from, valid_to, tx_from, tx_to
  • Updates create new versions
  • Basic query_at working

Full Completion:

  • All temporal query types implemented
  • Allen’s interval predicates
  • Full version history retrieval

Excellence:

  • Interval tree optimization
  • Temporal join operations
  • Integration with graph database