Programming

25 Iron Rules for PostgreSQL Database Design

A comprehensive guide of 25 battle-tested rules for PostgreSQL schema design, covering schema foundations, foreign keys, normalization, indexing, and migrations — each rule backed by real production incidents.

This article presents 25 best practices for PostgreSQL database schema design, organized into five categories with code examples and explanations. Each "wrong" example is a real incident that someone survived (often outside of working hours).

I. Schema Foundation

1. Always Use a Surrogate Primary Key

Criticality: maximum

Natural keys (email, tax ID, username) change. When that happens, you face a cascading update of millions of rows and all FK references. Surrogate keys (BIGSERIAL / UUID) remain immutable.

-- Correct
CREATE TABLE users (
  id         BIGSERIAL PRIMARY KEY,
  email      TEXT NOT NULL UNIQUE,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Wrong
CREATE TABLE users (
  email TEXT PRIMARY KEY  -- a catastrophe waiting to happen
);

2. Every Table MUST Have created_at and updated_at

Criticality: maximum

Without timestamps you cannot debug production incidents, build an audit trail, or perform incremental ETL. Use TIMESTAMPTZ (with timezone), not TIMESTAMP.

-- Correct
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()

-- Wrong
-- No timestamp columns at all
-- Or TIMESTAMP without timezone

3. Use TIMESTAMPTZ, Not TIMESTAMP

Criticality: high

TIMESTAMP silently discards timezone information. When your application servers run in different time zones, the stored time becomes ambiguous and inconsistent. TIMESTAMPTZ stores everything in UTC internally.

-- Correct
event_time TIMESTAMPTZ NOT NULL

-- Wrong
event_time TIMESTAMP  -- timezone lost forever

4. Use TEXT Instead of VARCHAR(n)

Criticality: normal

In PostgreSQL, TEXT and VARCHAR deliver identical performance. VARCHAR(n) merely adds a CHECK constraint that you will have to migrate when requirements change. For real validation, use CHECK.

-- Correct
name TEXT NOT NULL,
CONSTRAINT chk_name_len CHECK(length(name) <= 255)

-- Wrong
name VARCHAR(255)  -- will need changing when requirements change

5. Use BIGINT / BIGSERIAL for IDs, Not INT

Criticality: high

INT has a maximum of ~2.1 billion. Companies like Slack and Digg have already hit this ceiling. BIGINT costs only 4 extra bytes per row but holds 9.2 quintillion values.

-- Correct
id BIGSERIAL PRIMARY KEY

-- Wrong
id SERIAL PRIMARY KEY  -- a ticking time bomb

II. Relations and Foreign Keys

6. ALWAYS Define Explicit Foreign Keys

Criticality: maximum

Without FK constraints, orphaned records will silently accumulate. Application logic is no substitute: crashes, bugs, and race conditions will create inconsistency.

-- Correct
user_id BIGINT NOT NULL
  REFERENCES users(id)
  ON DELETE CASCADE

-- Wrong
user_id BIGINT  -- "we'll check in the application code"

7. Choose ON DELETE Deliberately

Criticality: maximum

The default is RESTRICT (blocks deletion). CASCADE automatically deletes child records. SET NULL preserves the row. The wrong choice either blocks operations or silently destroys data.

-- Dependent data: CASCADE
REFERENCES orders(id) ON DELETE CASCADE

-- Optional reference: SET NULL
REFERENCES users(id) ON DELETE SET NULL

-- Critical data: RESTRICT (default)
REFERENCES accounts(id)  -- blocks deletion

8. Use Junction Tables for Many-to-Many Relationships

Criticality: maximum

Never use arrays or comma-separated strings for many-to-many relationships. Junction tables are indexable, queryable, and can store relationship metadata.

-- Correct
CREATE TABLE user_roles (
  user_id    BIGINT REFERENCES users(id),
  role_id    BIGINT REFERENCES roles(id),
  granted_at TIMESTAMPTZ DEFAULT NOW(),
  PRIMARY KEY (user_id, role_id)
);

-- Wrong
role_ids INTEGER[]      -- can't JOIN, can't FK
roles    TEXT           -- 'admin,editor'

9. Index Every FK Column

Criticality: high

PostgreSQL does NOT automatically create indexes for FK columns (unlike MySQL). Without an index, every delete/update on the parent triggers a sequential scan of the child table — a table-level lock.

-- Correct
CREATE INDEX idx_orders_user_id
  ON orders(user_id);

-- Wrong
-- FK without index = seq scan on JOIN

10. Prefer Soft Deletes for Critical Business Data

Criticality: high

Hard deletes are irreversible and break the audit trail. Add a deleted_at column and filter in queries. Use partial indexes to maintain performance.

-- Correct
deleted_at TIMESTAMPTZ DEFAULT NULL;

-- Partial index: only active rows
CREATE INDEX idx_users_active
  ON users(email)
  WHERE deleted_at IS NULL;

-- Wrong
DELETE FROM users WHERE id = 42;
-- deleted forever, no audit trail

III. Normalization and Data Integrity

11. Normalize to 3NF Minimum, Denormalize Deliberately

Criticality: maximum

Start with normalization. Every denormalization is a conscious trade-off: read speed at the cost of write complexity and inconsistency risks. Document why you denormalized.

-- Correct: single source of truth
orders.user_id -> users.id -> users.email

-- Denormalization ONLY if measured:
-- "Added email to orders table for billing
--  report: query sped up from 800ms to 12ms
--  on 50M rows"

-- Wrong
-- Duplicating user_name, user_email
-- in every table "for convenience"

12. Use NOT NULL by Default, NULL Only Intentionally

Criticality: maximum

NULL introduces three-valued logic. NULL != NULL, comparisons with NULL return NULL, aggregates silently skip NULL. Every nullable column requires COALESCE everywhere.

-- Correct
status     TEXT NOT NULL DEFAULT 'pending',
deleted_at TIMESTAMPTZ  -- NULL = intentional

-- Wrong
name  TEXT,     -- nullable by accident
price NUMERIC  -- NULL or 0? who knows

13. Use CHECK Constraints for Data Validation

Criticality: high

Application-level validation is bypassed by migrations, scripts, and direct DB access. Database constraints are the last line of defense, and they always work.

-- Correct
CONSTRAINT chk_price_positive
  CHECK (price > 0),
CONSTRAINT chk_status_valid
  CHECK (status IN ('active','inactive','suspended'))

-- Wrong
-- "validation is handled in the API"
-- *in the DB: price = -500 and status = 'yolo'*

14. Use NUMERIC for Money, Never FLOAT/DOUBLE

Criticality: maximum

Floating-point arithmetic: 0.1 + 0.2 = 0.30000000000000004. For financial data, NUMERIC(precision, scale) provides exact decimal math. Or store cents as BIGINT.

-- Correct
price    NUMERIC(12,2) NOT NULL,
balance  NUMERIC(15,2) NOT NULL
-- or: price_cents BIGINT NOT NULL

-- Wrong
price FLOAT            -- $0.30000000000000004
price DOUBLE PRECISION -- same problem

15. Use ENUM Cautiously — Prefer CHECK or Lookup Tables

Criticality: normal

PostgreSQL ENUMs cannot be easily modified — you can add values but cannot remove or rename them without recreating the type. CHECK constraints or lookup tables are far more flexible.

-- Correct: CHECK constraint
status TEXT NOT NULL
  CHECK(status IN ('draft','published'))

-- Or: lookup table for many values
REFERENCES statuses(code)

-- Wrong
CREATE TYPE status AS ENUM(
  'draft','published'
);  -- hard to change later

IV. Indexing and Performance

16. Create Indexes for Every WHERE, JOIN, and ORDER BY

Criticality: maximum

Without indexes, PostgreSQL performs a sequential scan — reading every row. On a table with 100M rows, that is the difference between 5 ms and 5 minutes.

-- Correct
-- Composite index for a typical query
CREATE INDEX idx_orders_user_status
  ON orders(user_id, status)
  WHERE deleted_at IS NULL;

-- Wrong
-- "We'll add indexes when it gets slow"
-- (3 AM, production is on fire)

17. Use Partial Indexes to Index Only the Rows You Need

Criticality: high

If 95% of rows have status completed and you only query pending, a full index is a waste of space. Partial indexes are more compact, faster, and save memory.

-- Correct
CREATE INDEX idx_orders_pending
  ON orders(created_at)
  WHERE status = 'pending';

-- Wrong
CREATE INDEX idx_orders_created
  ON orders(created_at);
-- indexes ALL 100M rows for 5% of queries

18. Use EXPLAIN ANALYZE Before Deploying Queries

Criticality: high

You cannot guess query performance. EXPLAIN ANALYZE shows the actual execution plan, row estimates, and timing. A Seq Scan on a large table = add an index.

-- Correct
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending';

-- Wrong
-- Deploy query to prod
-- "On dev with 10 rows it seemed fine"

19. Use Connection Pooling (PgBouncer)

Criticality: maximum

Every PostgreSQL connection costs ~10 MB of RAM. 1,000 direct connections = 10 GB just for connections. PgBouncer multiplexes connections and serves thousands of clients with minimal overhead.

-- Correct
App -> PgBouncer (port 6432) -> PostgreSQL
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20

-- Wrong
App -> PostgreSQL (directly, 500 connections)
-- OOM killer enters the chat

V. Migrations and Operations

20. Never Alter Columns in Production Without a Migration Plan

Criticality: maximum

ALTER TABLE can lock a table for hours with large data volumes. Always: add a new column -> backfill -> switch reads -> drop the old column.

-- Correct
-- Step 1: Add new column (instant)
ALTER TABLE users ADD COLUMN name_new TEXT;

-- Step 2: Backfill in batches
-- Step 3: Switch application to name_new
-- Step 4: Drop old column

-- Wrong
ALTER TABLE users RENAME COLUMN
  name TO full_name;
-- application breaks instantly

21. UUID v7 for Distributed Systems, BIGSERIAL for Single Node

Criticality: high

BIGSERIAL is simpler and more compact (8 bytes vs. 16). But in distributed systems / microservices, UUID eliminates coordination. UUIDv7 sorts by time and is index-friendly.

-- Correct
-- Single PostgreSQL instance:
id BIGSERIAL PRIMARY KEY

-- Distributed system / microservices:
id UUID PRIMARY KEY
  DEFAULT gen_random_uuid()  -- v4
-- Or generate UUIDv7 on the application side

-- Wrong
-- UUID v4 as clustered PK on huge tables
-- Random inserts = constant page splits

22. Always Use Transactions for Multi-Step Operations

Criticality: maximum

Without explicit transactions, each statement auto-commits. If step 2 of 3 fails, you will have incomplete data. Wrap related operations in BEGIN/COMMIT.

-- Correct
BEGIN;
  UPDATE accounts SET balance = balance - 100
    WHERE id = 1;
  UPDATE accounts SET balance = balance + 100
    WHERE id = 2;
COMMIT;

-- Wrong
UPDATE accounts SET balance = balance - 100
  WHERE id = 1;
-- crash here = money evaporates
UPDATE accounts SET balance = balance + 100
  WHERE id = 2;

23. Partition Large Tables (100M+ Rows)

Criticality: high

Partition by time (range) or by tenant (list/hash). Queries to one partition skip scanning the rest. VACUUM and index maintenance run per partition.

-- Correct
CREATE TABLE events (
  id         BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  payload    JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2025_01
  PARTITION OF events
  FOR VALUES FROM ('2025-01-01')
  TO ('2025-02-01');

-- Wrong
-- 500M rows in a single table
-- VACUUM takes 6 hours
-- Every query = full table scan

24. Store JSON in JSONB, Not JSON or TEXT

Criticality: normal

JSONB is a binary format, supports indexes (GIN) and containment operators (@>, ?). JSON is just validated text, re-parsed on every access. TEXT does not validate at all.

-- Correct
metadata JSONB NOT NULL DEFAULT '{}';

CREATE INDEX idx_meta_gin
  ON products USING GIN(metadata);

-- Wrong
metadata JSON   -- re-parsed on every read
metadata TEXT   -- no validation whatsoever

25. Use Row-Level Security (RLS) for Multi-Tenant Applications

Criticality: normal

Application bugs can lead to tenant data leaks. RLS enforces isolation at the database level — even SQL injection cannot cross tenant boundaries.

-- Correct
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON documents
  USING (tenant_id = current_setting('app.tenant_id'));

-- Wrong
-- WHERE tenant_id = ? in every query
-- One missed WHERE = data leak

Naming Conventions Cheat Sheet

Tables — plural, snake_case: users, order_items. Never singular or camelCase.
Primary keys — always id with type BIGSERIAL or UUID. Never composite PKs on business data.
Foreign keys — pattern {singular_table}_id: user_id, order_id.
Indexes — pattern idx_{table}_{columns}: idx_users_email.
Constraints — pattern chk_{table}_{desc} or uq_{table}_{cols}. Be explicit.
Timestamps — every table gets created_at + updated_at. Always TIMESTAMPTZ, never TIMESTAMP.

Conclusion

These 25 rules are not dogma but the concentrated experience of teams that maintain databases with billions of rows. Every "wrong" example above is a real incident that someone survived (often outside of working hours).

Start small: go through your current project's schema and check at least the first five rules. You will likely find at least a couple of landmines that have not yet exploded. Better to defuse them now than to deal with them at the postmortem.

If this article was useful, save the cheat sheet and share it with your team. Fewer rakes stepped on means more sleep.

25 Iron Rules for PostgreSQL Database Design

I. Schema Foundation

1. Always Use a Surrogate Primary Key

2. Every Table MUST Have created_at and updated_at

3. Use TIMESTAMPTZ, Not TIMESTAMP

4. Use TEXT Instead of VARCHAR(n)

5. Use BIGINT / BIGSERIAL for IDs, Not INT

II. Relations and Foreign Keys

6. ALWAYS Define Explicit Foreign Keys

7. Choose ON DELETE Deliberately

8. Use Junction Tables for Many-to-Many Relationships

9. Index Every FK Column

10. Prefer Soft Deletes for Critical Business Data

III. Normalization and Data Integrity

11. Normalize to 3NF Minimum, Denormalize Deliberately

12. Use NOT NULL by Default, NULL Only Intentionally

13. Use CHECK Constraints for Data Validation

14. Use NUMERIC for Money, Never FLOAT/DOUBLE

15. Use ENUM Cautiously — Prefer CHECK or Lookup Tables

IV. Indexing and Performance

16. Create Indexes for Every WHERE, JOIN, and ORDER BY

17. Use Partial Indexes to Index Only the Rows You Need

18. Use EXPLAIN ANALYZE Before Deploying Queries

19. Use Connection Pooling (PgBouncer)

V. Migrations and Operations

20. Never Alter Columns in Production Without a Migration Plan

21. UUID v7 for Distributed Systems, BIGSERIAL for Single Node

22. Always Use Transactions for Multi-Step Operations

23. Partition Large Tables (100M+ Rows)

24. Store JSON in JSONB, Not JSON or TEXT

25. Use Row-Level Security (RLS) for Multi-Tenant Applications

Naming Conventions Cheat Sheet

Conclusion

Further reading

Why Airships Never Took Off. Part 12: Italian Semi-Rigid Airships

Why Airships Never Took Off. Part 11: Aircraft Carriers in the Sky

Why Airships Never Took Off. Part 10: The Most Famous and Successful Zeppelin

Why Airships Never Took Off. Part 9: Ashes of War and New Opportunities