25 Iron Rules for PostgreSQL Database Design
A comprehensive guide of 25 battle-tested rules for PostgreSQL schema design, covering schema foundations, foreign keys, normalization, indexing, and migrations — each rule backed by real production incidents.
This article presents 25 best practices for PostgreSQL database schema design, organized into five categories with code examples and explanations. Each "wrong" example is a real incident that someone survived (often outside of working hours).
I. Schema Foundation
1. Always Use a Surrogate Primary Key
Criticality: maximum
Natural keys (email, tax ID, username) change. When that happens, you face a cascading update of millions of rows and all FK references. Surrogate keys (BIGSERIAL / UUID) remain immutable.
-- Correct
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Wrong
CREATE TABLE users (
email TEXT PRIMARY KEY -- a catastrophe waiting to happen
);2. Every Table MUST Have created_at and updated_at
Criticality: maximum
Without timestamps you cannot debug production incidents, build an audit trail, or perform incremental ETL. Use TIMESTAMPTZ (with timezone), not TIMESTAMP.
-- Correct
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
-- Wrong
-- No timestamp columns at all
-- Or TIMESTAMP without timezone3. Use TIMESTAMPTZ, Not TIMESTAMP
Criticality: high
TIMESTAMP silently discards timezone information. When your application servers run in different time zones, the stored time becomes ambiguous and inconsistent. TIMESTAMPTZ stores everything in UTC internally.
-- Correct
event_time TIMESTAMPTZ NOT NULL
-- Wrong
event_time TIMESTAMP -- timezone lost forever4. Use TEXT Instead of VARCHAR(n)
Criticality: normal
In PostgreSQL, TEXT and VARCHAR deliver identical performance. VARCHAR(n) merely adds a CHECK constraint that you will have to migrate when requirements change. For real validation, use CHECK.
-- Correct
name TEXT NOT NULL,
CONSTRAINT chk_name_len CHECK(length(name) <= 255)
-- Wrong
name VARCHAR(255) -- will need changing when requirements change5. Use BIGINT / BIGSERIAL for IDs, Not INT
Criticality: high
INT has a maximum of ~2.1 billion. Companies like Slack and Digg have already hit this ceiling. BIGINT costs only 4 extra bytes per row but holds 9.2 quintillion values.
-- Correct
id BIGSERIAL PRIMARY KEY
-- Wrong
id SERIAL PRIMARY KEY -- a ticking time bombII. Relations and Foreign Keys
6. ALWAYS Define Explicit Foreign Keys
Criticality: maximum
Without FK constraints, orphaned records will silently accumulate. Application logic is no substitute: crashes, bugs, and race conditions will create inconsistency.
-- Correct
user_id BIGINT NOT NULL
REFERENCES users(id)
ON DELETE CASCADE
-- Wrong
user_id BIGINT -- "we'll check in the application code"7. Choose ON DELETE Deliberately
Criticality: maximum
The default is RESTRICT (blocks deletion). CASCADE automatically deletes child records. SET NULL preserves the row. The wrong choice either blocks operations or silently destroys data.
-- Dependent data: CASCADE
REFERENCES orders(id) ON DELETE CASCADE
-- Optional reference: SET NULL
REFERENCES users(id) ON DELETE SET NULL
-- Critical data: RESTRICT (default)
REFERENCES accounts(id) -- blocks deletion8. Use Junction Tables for Many-to-Many Relationships
Criticality: maximum
Never use arrays or comma-separated strings for many-to-many relationships. Junction tables are indexable, queryable, and can store relationship metadata.
-- Correct
CREATE TABLE user_roles (
user_id BIGINT REFERENCES users(id),
role_id BIGINT REFERENCES roles(id),
granted_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (user_id, role_id)
);
-- Wrong
role_ids INTEGER[] -- can't JOIN, can't FK
roles TEXT -- 'admin,editor'9. Index Every FK Column
Criticality: high
PostgreSQL does NOT automatically create indexes for FK columns (unlike MySQL). Without an index, every delete/update on the parent triggers a sequential scan of the child table — a table-level lock.
-- Correct
CREATE INDEX idx_orders_user_id
ON orders(user_id);
-- Wrong
-- FK without index = seq scan on JOIN10. Prefer Soft Deletes for Critical Business Data
Criticality: high
Hard deletes are irreversible and break the audit trail. Add a deleted_at column and filter in queries. Use partial indexes to maintain performance.
-- Correct
deleted_at TIMESTAMPTZ DEFAULT NULL;
-- Partial index: only active rows
CREATE INDEX idx_users_active
ON users(email)
WHERE deleted_at IS NULL;
-- Wrong
DELETE FROM users WHERE id = 42;
-- deleted forever, no audit trailIII. Normalization and Data Integrity
11. Normalize to 3NF Minimum, Denormalize Deliberately
Criticality: maximum
Start with normalization. Every denormalization is a conscious trade-off: read speed at the cost of write complexity and inconsistency risks. Document why you denormalized.
-- Correct: single source of truth
orders.user_id -> users.id -> users.email
-- Denormalization ONLY if measured:
-- "Added email to orders table for billing
-- report: query sped up from 800ms to 12ms
-- on 50M rows"
-- Wrong
-- Duplicating user_name, user_email
-- in every table "for convenience"12. Use NOT NULL by Default, NULL Only Intentionally
Criticality: maximum
NULL introduces three-valued logic. NULL != NULL, comparisons with NULL return NULL, aggregates silently skip NULL. Every nullable column requires COALESCE everywhere.
-- Correct
status TEXT NOT NULL DEFAULT 'pending',
deleted_at TIMESTAMPTZ -- NULL = intentional
-- Wrong
name TEXT, -- nullable by accident
price NUMERIC -- NULL or 0? who knows13. Use CHECK Constraints for Data Validation
Criticality: high
Application-level validation is bypassed by migrations, scripts, and direct DB access. Database constraints are the last line of defense, and they always work.
-- Correct
CONSTRAINT chk_price_positive
CHECK (price > 0),
CONSTRAINT chk_status_valid
CHECK (status IN ('active','inactive','suspended'))
-- Wrong
-- "validation is handled in the API"
-- *in the DB: price = -500 and status = 'yolo'*14. Use NUMERIC for Money, Never FLOAT/DOUBLE
Criticality: maximum
Floating-point arithmetic: 0.1 + 0.2 = 0.30000000000000004. For financial data, NUMERIC(precision, scale) provides exact decimal math. Or store cents as BIGINT.
-- Correct
price NUMERIC(12,2) NOT NULL,
balance NUMERIC(15,2) NOT NULL
-- or: price_cents BIGINT NOT NULL
-- Wrong
price FLOAT -- $0.30000000000000004
price DOUBLE PRECISION -- same problem15. Use ENUM Cautiously — Prefer CHECK or Lookup Tables
Criticality: normal
PostgreSQL ENUMs cannot be easily modified — you can add values but cannot remove or rename them without recreating the type. CHECK constraints or lookup tables are far more flexible.
-- Correct: CHECK constraint
status TEXT NOT NULL
CHECK(status IN ('draft','published'))
-- Or: lookup table for many values
REFERENCES statuses(code)
-- Wrong
CREATE TYPE status AS ENUM(
'draft','published'
); -- hard to change laterIV. Indexing and Performance
16. Create Indexes for Every WHERE, JOIN, and ORDER BY
Criticality: maximum
Without indexes, PostgreSQL performs a sequential scan — reading every row. On a table with 100M rows, that is the difference between 5 ms and 5 minutes.
-- Correct
-- Composite index for a typical query
CREATE INDEX idx_orders_user_status
ON orders(user_id, status)
WHERE deleted_at IS NULL;
-- Wrong
-- "We'll add indexes when it gets slow"
-- (3 AM, production is on fire)17. Use Partial Indexes to Index Only the Rows You Need
Criticality: high
If 95% of rows have status completed and you only query pending, a full index is a waste of space. Partial indexes are more compact, faster, and save memory.
-- Correct
CREATE INDEX idx_orders_pending
ON orders(created_at)
WHERE status = 'pending';
-- Wrong
CREATE INDEX idx_orders_created
ON orders(created_at);
-- indexes ALL 100M rows for 5% of queries18. Use EXPLAIN ANALYZE Before Deploying Queries
Criticality: high
You cannot guess query performance. EXPLAIN ANALYZE shows the actual execution plan, row estimates, and timing. A Seq Scan on a large table = add an index.
-- Correct
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending';
-- Wrong
-- Deploy query to prod
-- "On dev with 10 rows it seemed fine"19. Use Connection Pooling (PgBouncer)
Criticality: maximum
Every PostgreSQL connection costs ~10 MB of RAM. 1,000 direct connections = 10 GB just for connections. PgBouncer multiplexes connections and serves thousands of clients with minimal overhead.
-- Correct
App -> PgBouncer (port 6432) -> PostgreSQL
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
-- Wrong
App -> PostgreSQL (directly, 500 connections)
-- OOM killer enters the chatV. Migrations and Operations
20. Never Alter Columns in Production Without a Migration Plan
Criticality: maximum
ALTER TABLE can lock a table for hours with large data volumes. Always: add a new column -> backfill -> switch reads -> drop the old column.
-- Correct
-- Step 1: Add new column (instant)
ALTER TABLE users ADD COLUMN name_new TEXT;
-- Step 2: Backfill in batches
-- Step 3: Switch application to name_new
-- Step 4: Drop old column
-- Wrong
ALTER TABLE users RENAME COLUMN
name TO full_name;
-- application breaks instantly21. UUID v7 for Distributed Systems, BIGSERIAL for Single Node
Criticality: high
BIGSERIAL is simpler and more compact (8 bytes vs. 16). But in distributed systems / microservices, UUID eliminates coordination. UUIDv7 sorts by time and is index-friendly.
-- Correct
-- Single PostgreSQL instance:
id BIGSERIAL PRIMARY KEY
-- Distributed system / microservices:
id UUID PRIMARY KEY
DEFAULT gen_random_uuid() -- v4
-- Or generate UUIDv7 on the application side
-- Wrong
-- UUID v4 as clustered PK on huge tables
-- Random inserts = constant page splits22. Always Use Transactions for Multi-Step Operations
Criticality: maximum
Without explicit transactions, each statement auto-commits. If step 2 of 3 fails, you will have incomplete data. Wrap related operations in BEGIN/COMMIT.
-- Correct
BEGIN;
UPDATE accounts SET balance = balance - 100
WHERE id = 1;
UPDATE accounts SET balance = balance + 100
WHERE id = 2;
COMMIT;
-- Wrong
UPDATE accounts SET balance = balance - 100
WHERE id = 1;
-- crash here = money evaporates
UPDATE accounts SET balance = balance + 100
WHERE id = 2;23. Partition Large Tables (100M+ Rows)
Criticality: high
Partition by time (range) or by tenant (list/hash). Queries to one partition skip scanning the rest. VACUUM and index maintenance run per partition.
-- Correct
CREATE TABLE events (
id BIGSERIAL,
created_at TIMESTAMPTZ NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2025_01
PARTITION OF events
FOR VALUES FROM ('2025-01-01')
TO ('2025-02-01');
-- Wrong
-- 500M rows in a single table
-- VACUUM takes 6 hours
-- Every query = full table scan24. Store JSON in JSONB, Not JSON or TEXT
Criticality: normal
JSONB is a binary format, supports indexes (GIN) and containment operators (@>, ?). JSON is just validated text, re-parsed on every access. TEXT does not validate at all.
-- Correct
metadata JSONB NOT NULL DEFAULT '{}';
CREATE INDEX idx_meta_gin
ON products USING GIN(metadata);
-- Wrong
metadata JSON -- re-parsed on every read
metadata TEXT -- no validation whatsoever25. Use Row-Level Security (RLS) for Multi-Tenant Applications
Criticality: normal
Application bugs can lead to tenant data leaks. RLS enforces isolation at the database level — even SQL injection cannot cross tenant boundaries.
-- Correct
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
USING (tenant_id = current_setting('app.tenant_id'));
-- Wrong
-- WHERE tenant_id = ? in every query
-- One missed WHERE = data leakNaming Conventions Cheat Sheet
- Tables — plural, snake_case:
users,order_items. Never singular or camelCase. - Primary keys — always
idwith type BIGSERIAL or UUID. Never composite PKs on business data. - Foreign keys — pattern
{singular_table}_id:user_id,order_id. - Indexes — pattern
idx_{table}_{columns}:idx_users_email. - Constraints — pattern
chk_{table}_{desc}oruq_{table}_{cols}. Be explicit. - Timestamps — every table gets
created_at+updated_at. Always TIMESTAMPTZ, never TIMESTAMP.
Conclusion
These 25 rules are not dogma but the concentrated experience of teams that maintain databases with billions of rows. Every "wrong" example above is a real incident that someone survived (often outside of working hours).
Start small: go through your current project's schema and check at least the first five rules. You will likely find at least a couple of landmines that have not yet exploded. Better to defuse them now than to deal with them at the postmortem.
If this article was useful, save the cheat sheet and share it with your team. Fewer rakes stepped on means more sleep.