LEARN ORMS DEEP DIVE
Learn ORMs: From First Principles to Advanced Implementation
Goal: Deeply understand Object-Relational Mapping (ORM)—from the raw mechanics of mapping C structs to SQL, to building a feature-rich, modern ORM in Python with sessions, relationships, and migrations.
Why Learn ORMs?
Object-Relational Mapping is the bridge between the object-oriented world of your application and the relational world of your database. Most developers use ORMs like SQLAlchemy or Django ORM, but treat them as a black box. Understanding how they work unlocks performance tuning, better data modeling, and the ability to solve complex database interaction problems.
After completing these projects, you will:
- Understand the impedance mismatch between objects and relational tables.
- Know how to map objects to tables, handle relationships, and manage data consistency.
- Be able to build your own ORM from scratch, understanding patterns like Unit of Work and Identity Map.
- Use existing ORMs more effectively, diagnosing performance issues and leveraging advanced features.
Core Concept Analysis
The Object-Relational Impedance Mismatch
This is the core problem ORMs solve. Object-oriented and relational paradigms are fundamentally different.
| Aspect | Object-Oriented (e.g., Python/C) | Relational (e.g., SQL) | The Gap (and ORM’s Solution) |
|---|---|---|---|
| Identity | Object identity (memory address) | Primary key | Identity Map Pattern: Ensures one row = one object instance. |
| Structure | Graphs of objects (pointers/references) | Flat rows in tables | Relationship Mapping: Manages foreign keys to represent one-to-many, many-to-many. |
| Data Types | Rich types (classes, enums, lists) | Simple types (INT, VARCHAR, TEXT) | Type Conversion: Automatically converts between application and DB types. |
| Inheritance | Class hierarchies | No direct equivalent | Inheritance Mapping Strategies: Single table, class table, etc. |
| State | In-memory object state | Persistent database state | Unit of Work Pattern: Tracks changes and syncs them to the DB in a transaction. |
The Anatomy of an ORM
┌───────────────────────────────────────────────────┐
│ Application Code │
│ user = User(name="Alice") │
│ session.save(user) │
│ session.commit() │
└───────────────────────────────────────────────────┘
│ (Calls to ORM Public API)
▼
┌───────────────────────────────────────────────────┐
│ Session / Unit of Work │
│ • Tracks objects (new, dirty, deleted) │
│ • Manages the Identity Map (cache) │
│ • Orchestrates transactions │
└───────────────────────────────────────────────────┘
│ (Delegates to Mapper)
▼
┌───────────────────────────────────────────────────┐
│ Mapper / Registry │
│ • Maps User class to 'users' table │
│ • Maps user.name to 'name' column │
│ • Defines relationships (e.g., user.posts) │
└───────────────────────────────────────────────────┘
│ (Uses Query Builder)
▼
┌───────────────────────────────────────────────────┐
│ Query Builder (DSL) │
│ • Creates SQL strings from Python expressions │
│ session.query(User).filter(User.name == "Alice")│
│ -> "SELECT id, name FROM users WHERE name = ?" │
└───────────────────────────────────────────────────┘
│ (Executes via DB-API Driver)
▼
┌───────────────────────────────────────────────────┐
│ Database Connector (DB-API) │
│ • Executes SQL against the database │
│ • Fetches raw rows (tuples/dicts) │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ Database (PostgreSQL, etc.) │
└───────────────────────────────────────────────────┘
Project List
These 10 projects will guide you from the absolute basics of manual mapping to building a sophisticated ORM.
Project 1: The Manual “ORM” in C
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: C++ (without an ORM lib), Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Database Interaction / Memory Management
- Software or Tool: libpq (PostgreSQL) or SQLite3 C API
- Main Book: “The C Programming Language” by Kernighan & Ritchie
What you’ll build: A set of C functions to perform basic CRUD (Create, Read, Update, Delete) operations for a single User struct by manually writing SQL strings and mapping results to and from the struct.
Why it teaches ORMs: This project makes you feel the pain that ORMs were invented to solve. You will write boilerplate code to convert structs to SQL and SQL results back to structs. You’ll understand the “mapping” problem at the most fundamental level.
Core challenges you’ll face:
- Manual SQL string formatting → maps to risk of SQL injection and the need for a query builder
- Mapping rows to structs → maps to the concept of “hydration” or “deserialization”
- Handling memory management → maps to understanding object lifecycles (who frees the struct?)
- Error handling with database drivers → maps to the need for a robust data layer
Key Concepts:
- Database Driver APIs: libpq documentation or SQLite3 C API docs
- Structs and Pointers: “The C Programming Language” Ch. 5-6 - K&R
- String Manipulation in C: “The C Programming Language” Ch. 5 - K&R
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Strong C programming, pointers, structs, dynamic memory allocation.
Real world outcome:
// Your C code
User* user = find_user_by_id(db_conn, 1);
if (user) {
printf("Found user: ID=%d, Name=%s\n", user->id, user->name);
// Update the user's name
strcpy(user->name, "Alice Smith");
update_user(db_conn, user);
free_user(user);
}
// In the background, your functions are doing this:
// find_user_by_id -> "SELECT id, name, email FROM users WHERE id = 1;"
// update_user -> "UPDATE users SET name = 'Alice Smith', ... WHERE id = 1;"
Implementation Hints:
Start with a simple User struct:
// Not real code, just for guidance
typedef struct {
int id;
char name[100];
char email[100];
} User;
// Function to create a user (builds an INSERT statement)
int create_user(PGconn* conn, const char* name, const char* email);
// Function to find a user (builds a SELECT, allocates and returns a User*)
User* find_user_by_id(PGconn* conn, int id);
// Function to free the memory for a user struct
void free_user(User* user);
Questions to guide you:
- How do you safely insert a string containing a single quote (like “O’Malley”) into the database? This reveals the need for parameter binding.
- If
find_user_by_idreturns aUser*, who is responsible for callingfree()on it? This highlights the challenge of object lifecycle management. - What happens if the database schema changes (e.g., a column is added)? Your manual mapping code breaks.
Learning milestones:
- Successfully save a struct → You understand serialization.
- Successfully read a row into a struct → You understand hydration.
- Handle updates and deletions → You complete the CRUD cycle.
- Appreciate how much boilerplate this is → You understand the core value proposition of an ORM.
Project 2: A Python Active Record ORM
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Ruby, C#
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Metaprogramming / Object-Oriented Design
- Software or Tool: sqlite3 or psycopg2 Python libraries
- Main Book: “Fluent Python” by Luciano Ramalho
What you’ll build: A base class Model that other classes can inherit from to get automatic CRUD functionality. This is the Active Record pattern, famous from Ruby on Rails.
Why it teaches ORMs: This project introduces the magic. You’ll use Python’s metaprogramming features (like __init_subclass__ or metaclasses) to inspect a class (e.g., User) and automatically know the table name and columns, generating SQL on the fly.
Core challenges you’ll face:
- Inspecting a class’s attributes → maps to automatic schema detection
- Dynamically generating SQL → maps to building a simple query builder
- Mixing class methods and instance methods → maps to
User.find()vsuser.save() - Separating object state from persistence logic → maps to the core Active Record pattern
Key Concepts:
- Active Record Pattern: “Patterns of Enterprise Application Architecture” - Martin Fowler
- Python Metaclasses: “Fluent Python” Ch. 21 - Luciano Ramalho
- Python’s
sqlite3module: Python official documentation
Difficulty: Intermediate Time estimate: 1-2 weeks
- Prerequisites: Solid Python OOP, basic understanding of SQL.
Real world outcome:
# Your user-facing code looks clean and declarative
class User(Model):
__table__ = "users"
id: int
name: str
email: str
# And the usage is elegant:
user = User(name="Bob", email="bob@example.com")
user.save() # Automatically generates and runs an INSERT query
found_user = User.find(id=user.id) # Runs a SELECT query
print(found_user.name) # "Bob"
found_user.name = "Robert"
found_user.save() # Automatically generates and runs an UPDATE query
Implementation Hints:
The Model base class will do all the heavy lifting.
# Not real code, for guidance
class Model:
# A class-level dictionary to store field names and types
_fields = {}
# This is where the magic happens. When a subclass like User is defined,
# this method is called.
def __init_subclass__(cls):
# Inspect the class's annotations (e.g., name: str)
# to populate _fields and determine table name.
...
def __init__(self, **kwargs):
# Initialize instance attributes from kwargs
...
def save(self):
# If self.id exists, generate an UPDATE statement.
# Otherwise, generate an INSERT statement.
# Execute the SQL.
...
@classmethod
def find(cls, id):
# Generate a "SELECT * FROM {cls.__table__} WHERE id = ?"
# Execute it, get the row.
# Create an instance of the class (e.g., User) with the row data.
# Return the instance.
...
You’ll need a global database connection object. How do you manage it? A simple global variable is a start, but this leads to thinking about connection management.
Learning milestones:
- A
Userclass can be defined with type hints → You understand schema declaration. user.save()creates a new row → You have a workingINSERTgenerator.User.find(1)returns aUserobject → You have working hydration.- Saving an existing object updates the correct row → You have a working
UPDATEgenerator.
Project 3: A Fluent Query Builder API
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: JavaScript, C#
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: API Design / Domain Specific Languages (DSL)
- Software or Tool: None (pure logic)
- Main Book: “Design Patterns: Elements of Reusable Object-Oriented Software” (Builder Pattern)
What you’ll build: A chainable API that constructs complex SQL queries programmatically, without ever making you write raw SQL.
Why it teaches ORMs: This isolates a critical component of any ORM: the Domain Specific Language (DSL) for queries. It teaches you how to translate object-oriented method calls into a structured, safe SQL query string.
Core challenges you’ll face:
- Chainable method design → maps to returning
selfto allowQuery().select().where().limit() - Managing query state → maps to storing selects, table names, and where clauses internally
- Separating query building from execution → maps to the
build()method vs theexecute()method - Handling different WHERE clause operators → maps to
equals,greater_than,like,in
Key Concepts:
- Builder Pattern: “Design Patterns” - Gamma et al.
- Fluent Interfaces: Martin Fowler’s article on the topic.
- Operator Overloading: “Fluent Python” Ch. 13 (for
__eq__,__gt__, etc. on columns)
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Python OOP.
Real world outcome:
# Your API allows for clean, readable query construction
query = (
Query()
.select("id", "name", "email")
.from_table("users")
.where("age", ">", 18)
.where("account_type", "=", "premium")
.order_by("name", "ASC")
.limit(10)
)
# The builder can output the final SQL and its parameters
sql, params = query.build()
# sql == "SELECT id, name, email FROM users WHERE age > ? AND account_type = ? ORDER BY name ASC LIMIT ?;"
# params == [18, "premium", 10]
# You can then execute this with any DB driver
results = db_connection.execute(sql, params).fetchall()
Implementation Hints:
Your Query class will hold the pieces of the SQL statement in lists.
# Guidance, not real code
class Query:
def __init__(self):
self._select_cols = []
self._from = ""
self._where_clauses = []
# ... other parts
def select(self, *cols):
self._select_cols.extend(cols)
return self # This is the key to chaining!
def from_table(self, table_name):
self._from = table_name
return self
def where(self, col, op, val):
self._where_clauses.append((col, op, val))
return self
def build(self):
# Assemble the parts into a SQL string and a list of parameters
sql = f"SELECT {', '.join(self._select_cols)} FROM {self._from}"
if self._where_clauses:
# build the WHERE part
...
return sql, params
A more advanced version would use operator overloading on column objects, so you could write User.age > 18 instead of where("age", ">", 18).
Learning milestones:
- A simple
select().from()query can be built → You understand the basic builder flow. - Multiple
where()clauses are joined byAND→ You are managing query state. - The
.build()method produces a correct, parameterized query → You have successfully separated building from execution and prevented SQL injection. order_by()andlimit()are added → You can handle the full query structure.
Project 4: Relationship Mapping (One-to-Many)
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Any OO language
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: ORM Internals / Data Loading Patterns
- Software or Tool: Your existing Python ORM
- Main Book: “Patterns of Enterprise Application Architecture” by Martin Fowler
What you’ll build: Extend your Active Record ORM to handle a one-to-many relationship. For example, a User has many Posts. Accessing user.posts should automatically fetch the related posts from the database.
Why it teaches ORMs: This is where ORMs start to show their true power and complexity. You’ll have to implement “lazy loading,” where related objects are only fetched when they are first accessed. This introduces concepts of proxies and dynamic attribute access.
Core challenges you’ll face:
- Lazy loading vs. Eager loading → maps to performance trade-offs in data fetching
- Dynamic attribute access → maps to using
__getattr__or descriptors to intercept access touser.posts - Caching results → maps to storing the loaded posts so you don’t query the DB every time
user.postsis accessed - Handling foreign keys → maps to associating a
Postwith aUserviauser_id
Key Concepts:
- Lazy Load: “Patterns of Enterprise Application Architecture” - Martin Fowler
- Foreign Key Mapping: “Patterns of Enterprise Application Architecture” - Martin Fowler
- Python Descriptors: “Fluent Python” Ch. 20 - Luciano Ramalho
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2 (Active Record ORM).
Real world outcome:
class User(Model):
# ...
posts = has_many("Post")
class Post(Model):
# ...
user = belongs_to("User")
# Find a user
user = User.find(1)
# The first time you access .posts, a query is fired
print("About to access posts...")
for post in user.posts: # SELECT * FROM posts WHERE user_id = 1;
print(post.title)
# The second time, no query is fired (it's cached)
print("Accessing posts again...")
print(f"User has {len(user.posts)} posts.") # No DB call
Implementation Hints: You’ll need special descriptor objects for relationships.
# Guidance
class has_many:
def __init__(self, related_class_name):
self.related_class_name = related_class_name
self._cache_name = f"_{related_class_name}_cache"
def __get__(self, instance, owner):
# instance is the User object
if not hasattr(instance, self._cache_name):
# Find the related class (e.g., Post)
RelatedModel = find_model_by_name(self.related_class_name)
# Find the foreign key (e.g., user_id)
fk_name = f"{owner.__name__.lower()}_id"
# Query the database for related objects
related_objects = RelatedModel.where({fk_name: instance.id})
# Cache the result on the instance
setattr(instance, self._cache_name, related_objects)
return getattr(instance, self._cache_name)
This is a simplified example. A real implementation is more complex, but this shows the core idea of intercepting attribute access to trigger a database query.
Learning milestones:
- Declare a
has_manyrelationship on a model → You have a way to define object graphs. - Accessing
user.poststriggers a lazy-load query → Your descriptor logic is working. - The results of the lazy-load are cached → You are preventing redundant DB calls.
- You can add a
belongs_torelationship for the other side → You have bi-directional relationships.
Project 5: The Identity Map Pattern
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Java, C#
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: ORM Internals / Caching
- Software or Tool: Your existing Python ORM
- Main Book: “Patterns of Enterprise Application Architecture” by Martin Fowler
What you’ll build: A “session” or “context” object that maintains a cache of all objects loaded from the database during its lifetime. If you ask for the user with ID 1 twice, you get the exact same Python object instance both times.
Why it teaches ORMs: This is a cornerstone of modern data mapper ORMs (like SQLAlchemy). It ensures data consistency. If you change an attribute on an object, all other parts of your code that hold a reference to that object see the change immediately, because they are all referencing the same object.
Core challenges you’ll face:
- Scoping the cache → maps to the lifecycle of a session object
- Creating a unique key for objects → maps to using (class, primary_key) as the dictionary key
- Integrating with your data-loading logic → maps to checking the map before querying the database
- Deciding when to clear the map → maps to committing or closing a session
Key Concepts:
- Identity Map: “Patterns of Enterprise Application Architecture” - Martin Fowler
- Weak References:
weakrefmodule in Python, to allow objects to be garbage collected if they are no longer used anywhere else.
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2 or a similar data mapper.
Real world outcome:
session = Session()
# The first time, it hits the database
user_a = session.find(User, 1) # SELECT * FROM users WHERE id = 1;
# The second time, it pulls from the session's internal cache
user_b = session.find(User, 1) # NO DATABASE QUERY
# This is the crucial test
assert user_a is user_b # They are the SAME object in memory!
user_a.name = "A-lice"
print(user_b.name) # "A-lice" -- The change is reflected everywhere
Implementation Hints:
The Session object will contain a dictionary.
# Guidance
class Session:
def __init__(self):
# The key will be a tuple like (User, 1)
# The value will be the actual User object
self.identity_map = {}
def find(self, model_class, pk):
key = (model_class, pk)
# 1. Check the map first
if key in self.identity_map:
print("Found in cache!")
return self.identity_map[key]
# 2. If not in map, query the database
print("Not in cache, querying DB...")
# ... logic to run SELECT query ...
row = db_connection.execute(...)
if row:
# 3. Create the object
obj = model_class(**row)
# 4. Store it in the map
self.identity_map[key] = obj
return obj
return None
This moves the find logic from the Model class to a Session object, shifting from an Active Record pattern towards a Data Mapper pattern.
Learning milestones:
- Create a
Sessionthat can load an object → You have a basic Data Mapper. - Loading the same object twice returns the same instance → Your Identity Map is working.
- Changes to one reference are visible in the other → You have achieved data consistency within a session.
- Consider weak references in your map → You are thinking about long-term memory management in long-lived sessions.
Project 6: The Unit of Work Pattern
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Java, C#
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: ORM Internals / Transaction Management
- Software or Tool: Your session-based ORM
- Main Book: “Patterns of Enterprise Application Architecture” by Martin Fowler
What you’ll build: Enhance your Session to track all changes to objects (newly added, modified, deleted). A commit() call will then figure out the correct INSERT, UPDATE, and DELETE statements and execute them all inside a single database transaction.
Why it teaches ORMs: This is the secret sauce of modern ORMs. It decouples making changes to your objects in memory from writing those changes to the database. This is more efficient (batching updates) and safer (all-or-nothing transactions).
Core challenges you’ll face:
- Tracking object state → maps to maintaining
new,dirty, anddeletedlists within the session - Detecting changes (“dirtiness”) → maps to comparing current attribute values to their original loaded values
- Ordering operations → maps to running
INSERTs beforeUPDATEs, and handling foreign key constraints - Managing transaction boundaries → maps to
BEGIN,COMMIT, andROLLBACK
Key Concepts:
- Unit of Work: “Patterns of Enterprise Application Architecture” - Martin Fowler
- Snapshotting: The technique of saving the original state of an object to compare for changes.
- Topological Sort: Useful for ordering inserts/deletes based on dependencies.
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 5 (Identity Map).
Real world outcome:
session = Session()
# Load objects
user = session.find(User, 1)
post = session.find(Post, 10)
# Modify one object
user.name = "Updated Name"
# Create a new object and add it to the session
new_post = Post(title="New Post", content="...", user_id=user.id)
session.add(new_post)
# Mark an object for deletion
session.delete(post)
# So far, NO database queries have been executed for these changes.
# Now, commit the unit of work.
session.commit()
# This will trigger:
# BEGIN;
# UPDATE users SET name = 'Updated Name' WHERE id = 1;
# INSERT INTO posts (title, content, user_id) VALUES ('New Post', ...);
# DELETE FROM posts WHERE id = 10;
# COMMIT;
Implementation Hints:
The Session gets more complex.
# Guidance
class Session:
def __init__(self):
self.identity_map = {}
self.new = []
self.dirty = set()
self.deleted = set()
self._original_state = {} # Maps object id() to its original db values
def find(self, ...):
# ... when an object is loaded ...
self._original_state[id(obj)] = # snapshot of its values
def add(self, obj):
self.new.append(obj)
def delete(self, obj):
self.deleted.add(obj)
def commit(self):
# Check identity map for dirty objects
for obj in self.identity_map.values():
if self._is_dirty(obj):
self.dirty.add(obj)
# Start transaction
# For obj in self.new: generate INSERT
# For obj in self.dirty: generate UPDATE
# For obj in self.deleted: generate DELETE
# Commit transaction
# Clear the new, dirty, deleted lists
Detecting “dirty” objects is the hardest part. When an object is loaded, you need to save a copy of its state. Then, before commit, you compare its current state to the saved state.
Learning milestones:
- Objects added with
session.add()areINSERTed oncommit()→ You have a working “new” list. - Objects marked with
session.delete()areDELETEd oncommit()→ You have a working “deleted” list. - Changes to loaded objects are
UPDATEd oncommit()→ You have a working “dirty” tracking mechanism. - All statements are wrapped in a single transaction → You have achieved atomicity.
Project 7: A Simple Migration Tool
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: DevOps / Database Administration
- Software or Tool: Your existing ORM
- Main Book: “Refactoring Databases” by Scott Ambler & Pramod Sadalage
What you’ll build: A command-line tool that can inspect your Python models, compare them to the current database schema, and generate (and apply) SQL scripts to update the database schema.
Why it teaches ORMs: An ORM is only half the story; managing schema changes over time is the other half. This project teaches you how tools like Alembic (for SQLAlchemy) or Django’s makemigrations work under the hood.
Core challenges you’ll face:
- Introspecting the database schema → maps to querying system tables like
information_schema - Introspecting Python models → maps to reusing the model inspection logic from your ORM
- Generating DDL statements → maps to creating
CREATE TABLE,ALTER TABLE ADD COLUMN, etc. - Versioning migrations → maps to creating a
schema_versionstable and ordered script files
Key Concepts:
- Schema Introspection: SQL commands to view tables, columns, and indexes.
- Database Change Management: The core ideas in “Refactoring Databases”.
- Directed Acyclic Graphs (DAGs): For handling complex migration dependencies (optional, but how real tools work).
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: A working ORM with model definitions (Project 2).
Real world outcome:
# You define your models in Python
# models.py
class User(Model):
id: int
name: str
# You just added this new field:
email: str
# Then you run your tool
$ ./manage.py makemigrations
Detecting model changes...
- Add column 'email' to table 'users'.
Generated new migration: 0002_add_email_to_user.sql
$ ./manage.py migrate
Applying migration 0002_add_email_to_user.sql...
> ALTER TABLE users ADD COLUMN email VARCHAR(255);
Done.
Implementation Hints: Your tool will have a few main commands.
introspect_models: Loads your Python code and builds a representation of what the schema should look like.{'users': {'id': 'INT', 'name': 'VARCHAR', 'email': 'VARCHAR'}}.introspect_db: Connects to the DB and builds a representation of the current schema.{'users': {'id': 'INT', 'name': 'VARCHAR'}}.diff_schemas: Compares the two representations and generates a list of changes.[('add_column', 'users', 'email', 'VARCHAR')].generate_sql: Takes the list of changes and generates SQL DDL.ALTER TABLE users ADD COLUMN email VARCHAR(255);.apply: Executes the SQL against the database and updates a specialversionstable.
Learning milestones:
- You can generate a
CREATE TABLEstatement from a model → You can map models to DDL. - You can detect a new field in a model → Your diffing logic works for simple cases.
- Your tool generates and applies an
ALTER TABLEscript → You have a working migration flow. - The tool knows not to re-apply old migrations → Your versioning system is effective.
Project 8: ORM Performance Analysis with a Profiler
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Any
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Performance Tuning / Database Internals
- Software or Tool: SQLAlchemy or Django ORM, and a profiler like
py-spyorcProfile - Main Book: “High Performance SQL” by Baron Schwartz
What you’ll build: A small web application with known performance problems (like the N+1 query problem), and then use profiling tools to identify and fix them using advanced ORM features.
Why it teaches ORMs: This project teaches you how to use a real ORM effectively. You’ll learn that while ORMs are convenient, they can generate inefficient queries if used naively. This bridges the gap from building an ORM to using one like an expert.
Core challenges you’ll face:
- Identifying the N+1 query problem → maps to seeing many identical queries in your logs
- Using eager loading (
selectinload,prefetch_related) → maps to the fix for the N+1 problem - Analyzing query plans → maps to using
EXPLAIN ANALYZEto see why a query is slow - Understanding connection pooling and caching → maps to advanced ORM performance features
Key Concepts:
- The N+1 Problem: A classic ORM pitfall explained in countless blog posts.
- Eager Loading Strategies: SQLAlchemy’s documentation on loader strategies is excellent.
- Query Plan Analysis: PostgreSQL or MySQL documentation on
EXPLAIN.
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic knowledge of a major Python ORM.
Real world outcome: You build a page that lists all blog authors and the title of their latest post.
Initial (slow) code:
# Generates 1 query for all users, then N queries for each user's posts.
users = session.query(User).all() # 1 query
for user in users:
print(user.name, user.posts[0].title) # N queries!
Profiler/Log Output:
SELECT * FROM users;
SELECT * FROM posts WHERE user_id = 1 LIMIT 1;
SELECT * FROM posts WHERE user_id = 2 LIMIT 1;
SELECT * FROM posts WHERE user_id = 3 LIMIT 1;
... (100 more times)
Fixed (fast) code:
# Using SQLAlchemy's eager loading
from sqlalchemy.orm import selectinload
# Generates just 2 queries, no matter how many users.
users = (
session.query(User)
.options(selectinload(User.posts)) # The magic is here
.all()
)
for user in users:
print(user.name, user.posts[0].title) # No new queries!
Learning milestones:
- You build an app with a clear N+1 problem → You can recognize the anti-pattern.
- You use logs or a profiler to confirm the problem → You can diagnose performance issues.
- You implement an eager loading strategy to fix it → You know how to use advanced ORM features.
- You use
EXPLAINto verify the new query is more efficient → You are thinking at the database level.
Project 9: Asynchronous ORM Support
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Rust, C#
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 4: Expert
- Knowledge Area: Asynchronous Programming / Concurrency
- Software or Tool:
asyncio,asyncpg,Tortoise ORMorSQLAlchemy 2.0+ - Main Book: “Using Asyncio in Python” by Caleb Hattingh
What you’ll build: Adapt your session-based ORM (from Unit of Work) to support async/await syntax for all database operations.
Why it teaches ORMs: Modern web applications are increasingly asynchronous. This project teaches you the deep architectural changes required to make an ORM non-blocking, which is a huge challenge in library design.
Core challenges you’ll face:
- Replacing the DB driver → maps to using an async driver like
asyncpginstead ofpsycopg2 - Making all I/O operations awaitable → maps to
await session.find()andawait session.commit() - Managing async context → maps to ensuring a “session” is tied to a single async task
- Preserving lazy loading in an async world → maps to
await user.posts
Key Concepts:
- Asyncio Event Loop: The core of Python’s async model.
- Asynchronous Context Managers:
async withfor sessions. - Awaitable Descriptors: Making
__get__on a descriptor anasync def.
Difficulty: Expert
Time estimate: 2-3 weeks
Prerequisites: Project 6 (Unit of Work) and a strong grasp of Python’s asyncio.
Real world outcome:
# Your ORM now works with modern async web frameworks
import asyncio
async def main():
# Session management uses async context
async with AsyncSession() as session:
# DB calls are now awaitable
user = await session.find(User, 1)
user.name = "Async Alice"
# Committing is also an async operation
await session.commit()
# Lazy loading becomes awaitable too
posts = await user.posts
print(f"User has {len(posts)} posts.")
asyncio.run(main())
Implementation Hints:
Every method in your Session and Query objects that touches the database needs to become async def.
# Guidance
class AsyncSession:
async def find(self, ...):
# The DB call must be awaited
row = await async_db_driver.fetchrow(...)
...
async def commit(self):
# Begin, execute, and commit are all awaitable
async with async_db_driver.transaction():
await self._flush_changes()
# ...
Lazy loading becomes much trickier. A simple descriptor __get__ can’t be async. You might need to return an “awaitable” object that, when awaited, performs the query.
Learning milestones:
- You can
await session.find()→ Your core data loading is now async. await session.commit()works → Your Unit of Work is async-compatible.- You can use
async with AsyncSession()→ Your session management is context-aware. - Lazy loading an attribute like
await user.postsworks → You have solved the async descriptor problem.
Project 10: Build a Full Application with Your ORM
- File: LEARN_ORMS_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Any
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Full-Stack Development / API Design
- Software or Tool: Flask or FastAPI, and your custom ORM
- Main Book: “Building Microservices” by Sam Newman
What you’ll build: A complete, real-world application (like a simple blog, a Reddit clone, or an e-commerce backend) using the custom ORM you’ve built throughout these projects. This is the ultimate test. Dogfooding your own creation will expose every design flaw, missing feature, and performance bottleneck. It forces you to think about the user experience of your ORM from the perspective of an application developer.
Core challenges you’ll face:
- Session management in a web context → maps to one session per request, and how to manage its lifecycle
- Handling concurrent requests → maps to ensuring sessions don’t interfere with each other
- Serialization (Objects to JSON) → maps to writing helper functions to convert your model instances to API responses
- Implementing complex business logic → maps to pushing your ORM’s query capabilities to their limits
Key Concepts:
- Request-Scoped Dependencies: How frameworks like FastAPI provide a “session” to each request.
- Object Serialization: Marshmallow or Pydantic are great libraries to study for this.
- Graceful Error Handling: What happens when a
commitfails due to a constraint violation?
Difficulty: Expert Time estimate: 1 month+ Prerequisites: All previous Python-based projects.
Real world outcome: A fully functional web API for a blog.
$ curl http://localhost:8000/posts/
[
{"id": 1, "title": "My First Post", "author_name": "Alice"},
{"id": 2, "title": "My Second Post", "author_name": "Alice"}
]
$ curl -X POST http://localhost:8000/posts/ -d '{"title": "New from API", "user_id": 1}'
{"id": 3, "title": "New from API", "author_name": "Alice"}
Implementation Hints: With a web framework like Flask, a common pattern for session management is to use middleware or decorators.
# Guidance with Flask
@app.before_request
def start_session():
# Store the session in a request-global object
g.db_session = ORMSession()
@app.after_request
def close_session(response):
if hasattr(g, 'db_session'):
g.db_session.close() # or commit()
return response
@app.route('/users/<int:user_id>')
def get_user(user_id):
# Access the session for this request
user = g.db_session.find(User, user_id)
if not user:
abort(404)
return jsonify(user.to_dict()) # You'll need to write to_dict()
This project will force you to add features you didn’t think you needed, like more complex query operators, better error handling, and serialization helpers.
Learning milestones:
- Your ORM is successfully integrated into a web framework → You have solved the session lifecycle problem.
- You can build a non-trivial API endpoint with multiple queries → Your ORM is expressive enough for real logic.
- You find and fix at least one major bug or design flaw in your ORM → You have experienced the full cycle of library development.
- The application is stable under basic load → Your ORM is robust enough for production-like use.
Summary
| Project | Main Language | Difficulty | Focus |
|---|---|---|---|
| 1. The Manual “ORM” in C | C | Advanced | The fundamental “pain” of mapping |
| 2. A Python Active Record ORM | Python | Intermediate | Automatic schema mapping with metaprogramming |
| 3. A Fluent Query Builder API | Python | Intermediate | Building a clean, chainable query DSL |
| 4. Relationship Mapping | Python | Advanced | Lazy loading for one-to-many relationships |
| 5. The Identity Map Pattern | Python | Advanced | Ensuring object identity and consistency |
| 6. The Unit of Work Pattern | Python | Expert | Transactional, batched database writes |
| 7. A Simple Migration Tool | Python | Advanced | Automating schema changes |
| 8. ORM Performance Analysis | Python | Intermediate | Diagnosing and fixing real-world ORM problems |
| 9. Asynchronous ORM Support | Python | Expert | Adapting an ORM for modern async applications |
| 10. Build a Full App | Python | Expert | Dogfooding your creation in a real project |
```