Project 7: Stored Logic Lab

Create functions and procedures that enforce domain rules.

Quick Reference

Attribute	Value
Difficulty	Level 3
Time Estimate	16-24 hours
Main Programming Language	PL/pgSQL (pseudocode)
Alternative Programming Languages	SQL only (limited)
Coolness Level	See REFERENCE.md
Business Potential	See REFERENCE.md
Prerequisites	PostgreSQL installed, psql available
Key Topics	Functions, Security

1. Learning Objectives

Explain the core ideas behind Stored Logic Lab and why they matter.
Demonstrate the workflow with a repeatable PostgreSQL session transcript.
Identify and explain key metrics or evidence from the run.
Document findings in a short operational report.

2. All Theory Needed (Per-Concept Breakdown)

Functions, Procedures, and Extensions

Fundamentals PostgreSQL supports user-defined functions and procedures, enabling server-side logic. Functions return values and can be used in queries. Procedures can manage transactions within their bodies. PL/pgSQL is the most common procedural language, but PostgreSQL also supports other languages. Extensions allow you to add new capabilities, such as full-text search enhancements, spatial types, or foreign data wrappers. This makes PostgreSQL a programmable platform rather than just a storage engine. Used well, it centralizes rules close to the data and reduces duplicated logic. It also improves auditability of core rules and reduces round trips. This is essential for shared data services.

Deep Dive Functions and procedures are central to advanced PostgreSQL usage. A function encapsulates logic and returns a value, making it usable in SQL expressions or SELECT lists. A procedure is invoked with a CALL and can include transaction control statements. This difference matters for workflows like batch processing, where you may want to commit in steps. PL/pgSQL provides control structures such as loops, conditionals, and error handling. It is tightly integrated with SQL and supports variables, records, and composite types.

Extensions are PostgreSQL’s modular system for adding features. Built-in extensions include additional data types, index types, or utilities. Third-party extensions include full-text search improvements, time-series features, and spatial support. Extensions are installed per database and often require superuser privileges. Understanding extensions matters because many real-world Postgres deployments rely on them. For example, full-text search uses text search configuration and indexes, while foreign data wrappers allow querying external data sources.

Functions and extensions are not purely a convenience; they affect performance and security. Server-side logic reduces round trips, but complex functions can hide expensive operations. Functions can be marked as immutable, stable, or volatile, which affects whether the planner can cache or reorder them. Security definer functions can run with the privileges of their owner, which is powerful but must be carefully controlled. Procedures can also be used to orchestrate maintenance tasks or data transformations.

When designing functions, you must consider how they are called, how they affect transaction boundaries, and how they interact with permissions. You should also think about how functions are tested and versioned. In PostgreSQL, functions are schema objects; altering them is part of schema migration, and their behavior should be tracked in version control like any other schema change.

Extensions expand the database’s capabilities but add operational constraints. Each extension has versioning, dependencies, and sometimes performance implications. A production team should document which extensions are installed and why, and should test upgrades carefully.

Language choice also matters. PL/pgSQL is tightly integrated and usually the right default, but other languages can be enabled for specialized tasks. Each language has its own security and performance profile. If you allow a language like PL/Python, you must consider dependency management and deterministic behavior. Function volatility settings also influence the planner: immutable functions can be used in indexes and precomputed, while volatile functions are re-evaluated each call. Mislabeling volatility can cause incorrect results or poor performance.

Finally, stored logic is part of your API surface. When you expose a function, you are effectively defining a contract for application teams. That contract should be stable, documented, and versioned. This is why many teams create a dedicated schema for API functions and restrict direct table access. It creates a controlled surface where permissions, behavior, and performance are easier to manage.

Testing stored logic requires deterministic inputs and clear error reporting. You should design functions to return structured results or explicit error codes, rather than relying on unhandled exceptions. In production, good stored logic is predictable and observable, which makes it safer than ad-hoc application-side scripts.

Definitions and key terms

Function: server-side logic that returns a value.
Procedure: server-side logic that can manage transactions.
Extension: packaged feature set installed into a database.

Mental model diagram

SQL -> Function -> Result
SQL -> Procedure -> Transaction steps
DB -> Extension -> New types/operators

How it works (step-by-step, include invariants and failure modes)

Define function/procedure in a schema (invariant: privileges required).
Call from SQL or application (failure: wrong search_path or permissions).
Execute logic (failure: unexpected exceptions or slow execution).
Return value or commit/rollback (failure: misuse of transaction control).

Minimal concrete example (pseudocode, config, protocol transcript, or CLI output)

Pseudo PL/pgSQL:
FUNCTION <name>(<args>) RETURNS <type>
BEGIN
  <logic>
  RETURN <value>
END

Common misconceptions

“Functions are always faster” -> They can hide heavy logic and slow queries.
“Extensions are optional” -> Many capabilities (like logical replication output) depend on extensions.

Check-your-understanding questions

Why would you choose a procedure over a function?
What is the risk of SECURITY DEFINER functions?

Check-your-understanding answers

Procedures can control transactions; functions cannot.
They can elevate privileges if not carefully controlled.

Real-world applications

Data validation and transformations in stored functions.
Installing extensions for full-text search or time-series use cases.

Where you will apply it

See Section 3.7 Real World Outcome and Section 5.10 Implementation Phases in this file.
Also used in: None

References

PostgreSQL CREATE PROCEDURE (postgresql.org/docs/current/sql-createprocedure.html)
PostgreSQL PL/pgSQL overview (postgresql.org/docs/current/plpgsql.html)

Key insight Functions and extensions turn PostgreSQL into a programmable platform with domain-specific power.

Summary Server-side logic and extensions expand PostgreSQL beyond raw SQL.

Homework/exercises to practice the concept

Outline a function that validates a business rule.
List three extensions you might need in a real project.

Solutions to the homework/exercises

Example: validate order total and return status.
Examples: pg_stat_statements, postgis, postgres_fdw.

Security, Roles, and Row Level Security

Fundamentals PostgreSQL security is based on roles, privileges, and policies. Roles can represent users or groups. Privileges control access to tables, schemas, and functions. Row level security (RLS) policies restrict which rows a role can see or modify. These mechanisms let you implement multi-tenant isolation and least privilege access at the database level, not just in application code. Security design is therefore part of schema design, not an afterthought. It also shapes how applications authenticate and connect. Small mistakes here have large consequences and can be difficult to detect. Strong defaults prevent silent data exposure and reduce blast radius today.

Deep Dive Roles are the foundation of PostgreSQL security. A role can have LOGIN and can be granted membership in other roles. Group roles simplify privilege management. Privileges can be granted on schemas, tables, and functions, and are inherited depending on role settings. This allows fine-grained control over who can read or write data.

Row level security builds on this by filtering rows based on policies. Policies are expressions that are applied to queries, ensuring that even if a role has access to a table, it can only see the rows allowed by the policy. Policies can differ for SELECT, INSERT, UPDATE, or DELETE. This makes RLS ideal for multi-tenant systems, where each tenant should only see their own data.

RLS is powerful but also subtle. Policies are applied as security barriers, and they depend on session context, such as the current user. Misconfigured policies can lead to data leaks or blocked writes. You must design and test policies carefully. Also, superusers bypass RLS by default, which means testing policies with superuser accounts can give misleading results.

Security design also intersects with functions. SECURITY DEFINER functions run with the privileges of their owner. This can be used to expose controlled operations without granting broad privileges, but it is also a risk if not carefully controlled. Role membership and inheritance interact with policies, so you must understand both to reason about security.

Security is not just about correctness; it is also about compliance and auditability. PostgreSQL provides system catalogs and logs that let you audit role memberships and privileges. A secure PostgreSQL deployment includes documented role models, explicit grants, and least privilege defaults.

RLS is especially powerful in multi-tenant architectures. It allows you to keep a single physical table while isolating rows logically. This can simplify operational overhead and reduce schema sprawl. The tradeoff is that application logic must set the correct context (for example, tenant identifiers) and must never bypass those policies. A common pattern is to use a secure function to set session-level variables and then enforce them in policies.

Privilege management also influences performance and developer workflows. If you grant blanket access to a role, you simplify development but weaken security. If you lock down everything, you may slow development and encourage unsafe workarounds. A good role model balances safety and productivity, often by creating separate roles for migrations, read-only analytics, and application services. This keeps production safe while still enabling operational tasks.

Security also includes authentication and connection policies. PostgreSQL supports password authentication, certificates, and host-based access rules. These are defined in configuration and enforced before a session even starts. If you treat these as defaults, you might accidentally expose your database to the network. A secure deployment includes explicit network rules, limited role privileges, and dedicated accounts for each service. Finally, auditing is a practical requirement. While PostgreSQL does not include full audit logging by default, you can use logs, extensions, or external monitoring to track changes and access patterns. Knowing who changed what, and when, is often required for compliance and incident response. Security is therefore not just access control; it is also traceability.

Definitions and key terms

Role: account or group entity with privileges.
Privilege: permission to access an object.
RLS: row level security policy applied to rows.

Mental model diagram

Role -> Privileges -> Table
Role + Policy -> Visible rows subset

How it works (step-by-step, include invariants and failure modes)

Create roles and group roles (invariant: no circular membership).
Grant privileges (failure: excessive grants cause leaks).
Enable RLS and define policies (failure: policy blocks writes).
Test as non-superuser (failure: superuser bypasses policies).

Minimal concrete example (pseudocode, config, protocol transcript, or CLI output)

Policy idea (pseudo):
ALLOW SELECT on <tenant_data> WHERE tenant_id = current_user

Common misconceptions

“RLS is just a view” -> It is enforced as a security barrier for all queries.
“Superuser testing is enough” -> Superusers bypass RLS.

Check-your-understanding questions

Why do group roles simplify privilege management?
What is the biggest risk of misconfigured RLS?

Check-your-understanding answers

Privileges can be granted to a group and inherited by members.
Data leakage or blocked updates across tenants.

Real-world applications

Multi-tenant SaaS databases.
Role-based access for analytics vs. write services.

Where you will apply it

See Section 3.7 Real World Outcome and Section 5.10 Implementation Phases in this file.
Also used in: P11-rls-multi-tenant

References

PostgreSQL Role Membership (postgresql.org/docs/current/role-membership.html)
PostgreSQL Row Security Policies (postgresql.org/docs/current/ddl-rowsecurity.html)

Key insight PostgreSQL security is enforceable at the data layer, not just in application code.

Summary Roles and RLS policies let you encode access control directly in the database.

Homework/exercises to practice the concept

Design a role hierarchy for read-only analytics and write services.
Write a policy rule in words that enforces tenant isolation.

Solutions to the homework/exercises

Create roles for analytics, service, and admin, with explicit grants.
Allow access only when tenant_id matches session tenant context.

3. Project Specification

3.1 What You Will Build

Create functions and procedures that enforce domain rules. You will produce a repeatable transcript and a short evidence report for the workflow.

3.2 Functional Requirements

Provide a deterministic scenario with a known outcome.
Capture at least two key pieces of evidence (metrics, plans, or outputs).
Produce a short report that another engineer could verify.

3.3 Non-Functional Requirements

Performance: Queries should complete within a few seconds on a small dataset.
Reliability: The workflow should be repeatable across multiple runs.
Usability: Outputs should be readable and structured.

3.4 Example Usage / Output

$ psql -d <database>
<database>=# SELECT <function_call_pseudo>;
 result
--------
 <expected value>

3.5 Data Formats / Schemas / Protocols

Use a simple evidence log format:

SESSION_START
command: <psql or SQL-like command>
result: <summary line>
KEY_EVIDENCE: name=value, name=value
SESSION_END

3.6 Edge Cases

Missing permissions for a required command.
Stale statistics leading to unexpected results.
Misconfiguration that hides expected data.

3.7 Real World Outcome

This section is the gold standard for verification.

3.7.1 How to Run (Copy/Paste)

Use a local PostgreSQL instance or container.
Load a minimal dataset with predictable values.
Execute the command sequence exactly as described.

3.7.2 Golden Path Demo (Deterministic)

Function outputs match expected business rules. Exit code: 0

3.7.3 If CLI: exact terminal transcript

$ psql -d <database>
<database>=# SELECT <function_call_pseudo>;
 result
--------
 <expected value>

3.7.4 Failure Demo (Deterministic)

A function error shows missing permissions or bad input. Exit code: 1

4. Solution Architecture

4.1 High-Level Design

+--------------------+     +----------------------+     +-------------------+
| Dataset + Schema   | --> | PostgreSQL Workflow  | --> | Evidence Report   |
+--------------------+     +----------------------+     +-------------------+

4.2 Key Components

Component	Responsibility	Key Decisions
Dataset	Provide deterministic inputs	Keep small and fixed
Workflow	Execute queries or commands	Use explicit steps
Evidence Report	Capture outcomes	Use consistent format

4.4 Data Structures (No Full Code)

EvidenceRecord:
  - step
  - command
  - output_summary
  - key_metrics

Report:
  - summary
  - evidence
  - conclusions

4.4 Algorithm Overview

Key Algorithm: Evidence Capture Loop

Prepare dataset and baseline metrics.
Execute the workflow step by step.
Capture outputs and compare with expected outcomes.
Write a short report with conclusions.

Complexity Analysis:

Time: O(Q) where Q is number of queries.
Space: O(R) for report size.

5. Implementation Guide

5.1 Development Environment Setup

psql --version

5.2 Project Structure

project-root/
|-- notes/
|   `-- report.md
|-- transcripts/
|   `-- session.txt
`-- README.md

5.3 The Core Question You’re Answering

“When should logic live inside the database?”

Write the evidence you need before you start.

5.4 Concepts You Must Understand First

Stop and research these before debugging:

Core PostgreSQL mechanics
- What does PostgreSQL guarantee at this layer?
- Book Reference: “PostgreSQL: The Definitive Guide” - Ch. 3
Planning and observation
- Which signals confirm success?
- Book Reference: “SQL Performance Explained” - Ch. 1
Data integrity rules
- Which invariants must always hold?
- Book Reference: “Database System Concepts” - Ch. 2

5.5 Questions to Guide Your Design

Before implementing, think through these:

Evidence capture
- Which outputs prove correctness?
- What is the smallest set of metrics that matters?
Repeatability
- How will you ensure the same results each run?
- Which inputs must be fixed?

5.6 Thinking Exercise

Sketch a timeline of actions and identify the key evidence at each step.

5.7 The Interview Questions They’ll Ask

“How do you verify correctness in PostgreSQL workflows?”
“What evidence shows a query plan change?”
“How do you ensure isolation and consistency?”
“Why do you need VACUUM?”
“How do you validate a backup?”

5.8 Hints in Layers

Hint 1: Start with a small dataset Keep the dataset minimal and predictable.

Hint 2: Record every step Use a transcript so you can compare runs.

Hint 3: Compare against expected outputs Use a golden path to validate correctness.

Hint 4: Use system catalogs for verification Confirm results with pg_stat and catalog views.

5.9 Books That Will Help

Topic	Book	Chapter
PostgreSQL fundamentals	“PostgreSQL: The Definitive Guide”	Ch. 3
SQL performance	“SQL Performance Explained”	Ch. 1-4
Data modeling	“Database System Concepts”	Ch. 2

5.10 Implementation Phases

Phase 1: Foundation (2-4 hours)

Goals:

Set up the environment.
Define the minimal dataset.

Tasks:

Install and connect to PostgreSQL.
Load a deterministic dataset.

Checkpoint: You can run a basic query and see expected output.

Phase 2: Core Workflow (4-8 hours)

Goals:

Execute the core steps.
Capture evidence.

Tasks:

Run the main workflow commands.
Capture outputs and metrics.

Checkpoint: Evidence matches the golden path.

Phase 3: Polish and Edge Cases (2-6 hours)

Goals:

Validate edge cases.
Document failures and fixes.

Tasks:

Trigger a failure scenario.
Document how you detected and fixed it.

Checkpoint: Failure case and fix are documented.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Dataset size	Small or large	Small	Easier to verify outputs
Evidence format	Transcript or report	Both	Transcript for detail, report for summary

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Verify single steps	One query or command
Integration Tests	End-to-end workflow	Full golden path
Edge Case Tests	Validate failures	Missing permissions

6.2 Critical Test Cases

Primary path: expected output is produced.
Failure path: error is detected and documented.
Repeatability: results match across two runs.

6.3 Test Data

Input: deterministic dataset with fixed values
Expected: same outputs each run

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Wrong database	Output references unexpected objects	Verify connection and search_path
Missing privileges	Permission denied errors	Grant least-privilege access for the task
Stale stats	Planner chooses unexpected plan	Run ANALYZE and re-check

7.2 Debugging Strategies

Capture before changing: record outputs first.
Verify connection context: ensure correct database and schema.

7.3 Performance Traps

If dataset size grows, some commands may appear slow. Keep tests small and repeatable.

8. Extensions & Challenges

8.1 Beginner Extensions

Add one extra table or metric and document it.
Create a short checklist for future runs.

8.2 Intermediate Extensions

Add a second scenario and compare results.
Automate part of the workflow with a script.

8.3 Advanced Extensions

Integrate the workflow into a CI or scheduled job.
Create a dashboard or long-term trend report.

9. Real-World Connections

9.1 Industry Applications

Operational runbooks for production PostgreSQL systems.
Data platform reliability and incident response.

PostgreSQL core project - open source database engine.
pg_stat_statements - query statistics extension.

9.3 Interview Relevance

Transaction semantics and MVCC.
Index selection and query planning.

10. Resources

10.1 Essential Reading

PostgreSQL documentation (current) - reference for commands and behavior.
SQL Performance Explained - reasoning about indexes and plans.
Designing Data-Intensive Applications - reliability patterns.

10.2 Video Resources

PostgreSQL conference talks on operations and performance
Database reliability talks from systems conferences

10.3 Tools & Documentation

psql - interactive CLI for PostgreSQL.
pg_stat_* views - built-in monitoring tables.
pg_dump and pg_basebackup - backup tooling.

Previous Project: P06-bloat-vacuum-lab.md
Next Project: P08-full-text-search.md

11. Self-Assessment Checklist

11.1 Understanding

I can explain the key concept without notes.
I can justify the evidence I collected.
I understand the tradeoffs in this workflow.

11.2 Implementation

All functional requirements are met.
The golden path transcript matches expected output.
Failure modes are documented with evidence.

11.3 Growth

I documented lessons learned.
I can explain this project in a job interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

A working transcript that matches the golden path.
A short report with evidence and conclusions.
A failure case documented with explanation.

Full Completion:

All minimum criteria plus:
Comparison across at least two runs.
Notes on tradeoffs and possible improvements.

Excellence (Going Above & Beyond):

Automated capture of transcripts.
A reusable checklist for future runs.

Project 7: Stored Logic Lab

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

Functions, Procedures, and Extensions

Security, Roles, and Row Level Security

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 If CLI: exact terminal transcript

3.7.4 Failure Demo (Deterministic)

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.4 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-4 hours)

Phase 2: Core Workflow (4-8 hours)

Phase 3: Polish and Edge Cases (2-6 hours)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria