System Design (HLD)PostgreSQL

PostgreSQL: The World's Most Advanced Open Source Database

LevelAdvanced

Duration90 mins

TopicPostgreSQL

2 / 5

Rich Feature Set

Beyond SQL: PostgreSQL's Feature Arsenal

When developers first encounter PostgreSQL, they often view it simply as 'another relational database.' This perception dramatically underestimates what PostgreSQL brings to the table. Over three decades of development have produced a system with capabilities that rival—and often exceed—commercial databases costing hundreds of thousands of dollars.

PostgreSQL isn't just SQL-compliant; it extends SQL with features that eliminate the need for external tools, additional databases, or complex application logic. Understanding this feature depth is essential for system designers who want to do more with less infrastructure complexity.

What You Will Learn

This page explores PostgreSQL's advanced capabilities: rich data types (arrays, JSON, geometric, network), full-text search, generated columns, table inheritance, powerful indexing options, and the procedural language ecosystem. You'll understand how these features can simplify your architecture by consolidating functionality that would otherwise require multiple systems.

Advanced Data Types: Native Support for Complex Data

PostgreSQL's type system extends far beyond the standard SQL types (INTEGER, VARCHAR, TIMESTAMP). It provides native support for complex data structures that, in other databases, would require application-level handling, external tools, or NoSQL databases.

Array Types:

PostgreSQL supports arrays of any built-in or user-defined type. This isn't just storage—arrays can be queried, indexed, and manipulated with dedicated operators and functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Storing and querying arrays
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    tags TEXT[]  -- Array of text
);
 
INSERT INTO products (name, tags) 
VALUES ('PostgreSQL Book', ARRAY['database', 'sql', 'learning']);
 
-- Query using array operators
SELECT * FROM products WHERE 'sql' = ANY(tags);  -- Contains element
SELECT * FROM products WHERE tags @> ARRAY['database', 'sql'];  -- Contains all
SELECT * FROM products WHERE tags && ARRAY['nosql', 'sql'];  -- Overlaps with
 
-- Array functions
SELECT array_length(tags, 1) FROM products;  -- Length of array
SELECT unnest(tags) FROM products;  -- Expand to rows
SELECT array_agg(DISTINCT tag) FROM products, unnest(tags) AS tag;  -- Aggregate back
 
-- GIN index for efficient array queries
CREATE INDEX idx_products_tags ON products USING GIN(tags);

JSON and JSONB Types:

PostgreSQL provides first-class JSON support through two types:

json: Stores JSON as text, preserving formatting and duplicate keys. Parsing happens on each access.
jsonb: Stores JSON in a decomposed binary format. Faster to query, supports indexing, but slightly slower to insert.

For most use cases, JSONB is preferred due to its query performance and indexing capabilities.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- JSONB column for flexible schemas
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
 
INSERT INTO events (event_type, payload) VALUES 
('user.created', '{"userId": 123, "email": "user@example.com", "plan": "pro"}'),
('order.placed', '{"orderId": 456, "items": [{"sku": "ABC", "qty": 2}], "total": 99.99}');
 
-- Query JSON fields
SELECT payload->>'email' AS email FROM events WHERE event_type = 'user.created';
SELECT payload->'items'->0->>'sku' AS first_item_sku FROM events WHERE event_type = 'order.placed';
 
-- Filter on JSON values
SELECT * FROM events WHERE payload->>'plan' = 'pro';
SELECT * FROM events WHERE (payload->>'total')::numeric > 50;
 
-- JSON containment queries (@> operator)
SELECT * FROM events WHERE payload @> '{"plan": "pro"}';
 
-- Index JSONB for performance
CREATE INDEX idx_events_payload ON events USING GIN(payload);
 
-- Index specific JSON paths
CREATE INDEX idx_events_user_id ON events ((payload->>'userId'));
 
-- JSON aggregation
SELECT jsonb_agg(payload) FROM events WHERE event_type LIKE 'user.%';
 
-- Build JSON dynamically
SELECT jsonb_build_object(
    'type', event_type,
    'data', payload,
    'timestamp', created_at
) FROM events;

PostgreSQL Specialized Data Types
Type Category	Types	Use Cases	Index Support
Network	inet, cidr, macaddr, macaddr8	IP addresses, subnets, MAC addresses	B-tree, GiST
Geometric	point, line, lseg, box, path, polygon, circle	2D spatial data, collision detection	GiST
Range	int4range, int8range, numrange, tsrange, tstzrange, daterange	Scheduling, availability windows, version ranges	GiST, SP-GiST
UUID	uuid	Distributed identifiers, primary keys	B-tree, hash
Bit String	bit, bit varying	Flags, permissions, packed data	B-tree
Text Search	tsvector, tsquery	Full-text search documents and queries	GIN, GiST
Composite	User-defined row types	Returning multiple values, complex structures	Limited

Range Types Eliminate Complex Logic

Range types are particularly powerful for scheduling systems. Instead of storing start_time and end_time as separate columns and writing complex overlap queries, you store a single tstzrange and use the && (overlaps) operator. PostgreSQL handles all the edge cases (inclusive/exclusive bounds, unbounded ranges, empty ranges) that typically cause application bugs.

Full-Text Search: Elasticsearch Alternative

PostgreSQL includes a sophisticated full-text search engine that, for many use cases, eliminates the need for external search infrastructure like Elasticsearch or Solr. This reduces operational complexity, keeps data in sync automatically, and leverages transactional guarantees.

How Full-Text Search Works:

Full-text search in PostgreSQL revolves around two special types:

tsvector: A sorted list of distinct lexemes (normalized words) extracted from a document, along with positional information
tsquery: A query tree of lexemes combined with boolean operators (AND, OR, NOT) and phrase search

The search process:

Documents are converted to tsvectors using language-aware parsing, normalization (stemming), and stop word removal
Queries are converted to tsqueries with the same normalization
The @@ (match) operator finds documents where tsvector matches tsquery
GIN indexes make this search extremely fast

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- Basic full-text search
SELECT to_tsvector('english', 'PostgreSQL is a powerful database system');
-- Result: 'databas':5 'postgresql':1 'power':4 'system':6
-- Note: 'is' and 'a' removed (stop words), words stemmed ('powerful' → 'power')
 
SELECT to_tsquery('english', 'database & powerful');
-- Result: 'databas' & 'power'
 
-- Search example
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    search_vector TSVECTOR GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED
);
 
-- GIN index for fast search
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);
 
-- Search with ranking
SELECT 
    title,
    ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database & replication') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
 
-- Phrase search: words must appear adjacent
SELECT * FROM articles 
WHERE search_vector @@ to_tsquery('english', 'master <-> replica');
 
-- Prefix search (autocomplete)
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'postg:*');
 
-- Highlighting matches
SELECT ts_headline(
    'english',
    body,
    to_tsquery('english', 'PostgreSQL & performance'),
    'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
) FROM articles
WHERE search_vector @@ to_tsquery('english', 'PostgreSQL & performance');

Advanced Search Features:

Weighted search: Assign weights (A, B, C, D) to different fields—title matches rank higher than body matches
Phrase search: Find exact phrases using <-> (followed by) and <N> (within N words) operators
Prefix matching: Autocomplete functionality with :* suffix on search terms
Multiple languages: Built-in dictionaries for English, German, French, Spanish, and many more
Synonym support: Configure synonym dictionaries for domain-specific terminology
Custom dictionaries: Create specialized parsing for technical domains

When to Use External Search

PostgreSQL full-text search is excellent for many use cases, but Elasticsearch or similar may still be warranted when you need: (1) highly complex relevance tuning with machine learning ranking, (2) faceted search with deep aggregation capabilities, (3) multi-language index with automatic language detection, or (4) search across terabytes of data with distributed architecture. For most applications with up to millions of documents, PostgreSQL's built-in search is sufficient and simpler to operate.

Advanced SQL Features: Beyond Standard Queries

PostgreSQL implements SQL more completely and extends it more thoughtfully than most other databases. These advanced features can dramatically simplify application logic and improve performance.

Common Table Expressions (CTEs) and Recursive Queries:

CTEs (WITH clauses) improve query readability and enable recursive queries for hierarchical data like organization charts, threaded comments, or bill-of-materials explosions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Employee hierarchy traversal
WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers (no manager)
    SELECT id, name, manager_id, 1 AS level, ARRAY[name] AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees with managers already in result
    SELECT e.id, e.name, e.manager_id, oc.level + 1, oc.path || e.name
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT 
    repeat('  ', level - 1) || name AS org_structure,
    level,
    array_to_string(path, ' → ') AS reporting_chain
FROM org_chart
ORDER BY path;
 
-- Result:
-- org_structure          | level | reporting_chain
-- CEO                    | 1     | CEO
--   VP Engineering       | 2     | CEO → VP Engineering
--     Senior Engineer    | 3     | CEO → VP Engineering → Senior Engineer
--   VP Sales             | 2     | CEO → VP Sales

Window Functions:

Window functions perform calculations across sets of rows related to the current row without collapsing the result set. They're essential for analytics, reporting, and complex business logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Running total and row numbers
SELECT 
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total,
    ROW_NUMBER() OVER (ORDER BY order_date) AS order_num
FROM orders;
 
-- Ranking within partitions
SELECT 
    department,
    employee_name,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS gap_from_prev
FROM employees;
 
-- Moving averages and analysis
SELECT 
    date,
    revenue,
    AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS week_avg,
    revenue - AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS deviation,
    NTILE(4) OVER (ORDER BY revenue) AS revenue_quartile
FROM daily_metrics;
 
-- First/last values in partitions
SELECT DISTINCT
    customer_id,
    FIRST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_order,
    LAST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_order
FROM orders;

Generated Columns and Computed Values:

Generated columns automatically maintain computed values, ensuring consistency without application logic or triggers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    price DECIMAL(10, 2) NOT NULL,
    quantity INTEGER NOT NULL,
    -- Virtual generated column (computed on read)
    total DECIMAL(12, 2) GENERATED ALWAYS AS (price * quantity) STORED,
    -- Can be indexed since it's STORED
    search_name TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', name)) STORED,
    -- Useful for normalized lookups
    lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED
);
 
-- Generated columns are maintained automatically
INSERT INTO products (price, quantity) VALUES (29.99, 10);
SELECT total FROM products;  -- Returns 299.90, computed by database

LATERAL Joins

LATERAL joins allow subqueries to reference columns from preceding FROM items—essentially a correlated subquery in the FROM clause. This is powerful for 'get the top N items for each category' queries and for calling set-returning functions with different parameters for each row.

Indexing Strategies: Beyond B-Tree

PostgreSQL offers multiple index types, each optimized for different query patterns. Understanding when to use each type is crucial for performance optimization.

B-Tree (Default):

The standard index type, optimal for equality and range queries on ordered data. Supports <, <=, =, >=, >, BETWEEN, IS NULL, and LIKE with left-anchored patterns.

Hash:

Optimized for simple equality comparisons. Uses less space than B-tree for long keys but only supports = operations. Now fully WAL-logged and crash-safe in PostgreSQL 10+.

GiST (Generalized Search Tree):

A balanced tree structure that supports complex queries: containment (@>, <@), overlap (&&), nearest-neighbor (ORDER BY ... <->). Used for geometric data, range types, and full-text search.

GIN (Generalized Inverted Index):

An inverted index that maps keys to multiple rows. Excellent for full-text search, JSONB containment, and array queries where you're looking for elements within composite values.

BRIN (Block Range Index):

A compact index that stores summary information about ranges of physical table blocks. Extremely small (often <1% of table size) and ideal for naturally ordered data like timestamps.

PostgreSQL Index Types Comparison
Index Type	Best For	Size	Supports ORDER BY	Update Cost
B-Tree	Equality, range queries, sorting	Medium	Yes	Medium
Hash	Equality only, very long keys	Small	No	Very Low
GiST	Geometric, range overlap, nearest-neighbor	Medium	For KNN only	High
GIN	Full-text search, JSONB, arrays	Large	No	Very High
BRIN	Large tables with naturally-ordered data	Tiny	No	Very Low
SP-GiST	Non-balanced structures (quadtrees, tries)	Medium	For some ops	Medium

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- B-Tree: Standard for most queries
CREATE INDEX idx_orders_created ON orders (created_at);
CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at);
 
-- Multicolumn index with INCLUDE (covering index)
CREATE INDEX idx_orders_covering ON orders (customer_id, created_at) 
    INCLUDE (status, total);  -- Include columns for index-only scans
 
-- Partial index: only index relevant rows
CREATE INDEX idx_orders_pending ON orders (created_at) 
    WHERE status = 'pending';  -- Much smaller than full index
 
-- Expression index: index computed values
CREATE INDEX idx_users_email_lower ON users (lower(email));
SELECT * FROM users WHERE lower(email) = 'user@example.com';  -- Uses index
 
-- GIN for JSONB
CREATE INDEX idx_events_payload ON events USING GIN (payload jsonb_path_ops);
SELECT * FROM events WHERE payload @> '{"user_id": 123}';
 
-- GiST for geometric/range operations
CREATE INDEX idx_reservations_during ON reservations USING GIST (during);
SELECT * FROM reservations WHERE during && '[2024-01-01, 2024-01-07]'::daterange;
 
-- BRIN for time-series data (append-only tables)
CREATE INDEX idx_logs_timestamp ON logs USING BRIN (timestamp);
-- BRIN index on 1 billion rows might be only 1-2 MB!

GIN Update Performance

GIN indexes are expensive to update because a single row change can affect many index entries. For write-heavy workloads, consider using the fastupdate=on option (default) which defers index updates into a pending list, periodically applied in bulk. However, this adds query overhead for checking the pending list. For OLTP with heavy writes, evaluate whether the query benefit justifies the write cost.

Procedural Languages: Code Inside the Database

PostgreSQL supports multiple procedural languages for writing functions, stored procedures, and triggers. This capability enables moving complex logic closer to the data, reducing round-trips and ensuring consistency.

PL/pgSQL (Built-in):

The default procedural language, designed for SQL operations. Supports variables, conditionals, loops, exception handling, and has direct access to all PostgreSQL types and operators.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- Complex business logic in the database
CREATE OR REPLACE FUNCTION transfer_funds(
    from_account_id INTEGER,
    to_account_id INTEGER,
    amount DECIMAL(10,2)
) RETURNS BOOLEAN AS $$
DECLARE
    from_balance DECIMAL(10,2);
BEGIN
    -- Check source account balance
    SELECT balance INTO from_balance
    FROM accounts WHERE id = from_account_id
    FOR UPDATE;  -- Lock the row
    
    IF from_balance IS NULL THEN
        RAISE EXCEPTION 'Source account % not found', from_account_id;
    END IF;
    
    IF from_balance < amount THEN
        RAISE EXCEPTION 'Insufficient funds: % available, % requested', 
            from_balance, amount;
    END IF;
    
    -- Perform transfer
    UPDATE accounts SET balance = balance - amount WHERE id = from_account_id;
    UPDATE accounts SET balance = balance + amount WHERE id = to_account_id;
    
    -- Log the transaction
    INSERT INTO transfer_log (from_id, to_id, amount, timestamp)
    VALUES (from_account_id, to_account_id, amount, NOW());
    
    RETURN TRUE;
EXCEPTION
    WHEN OTHERS THEN
        -- Log error and re-raise
        INSERT INTO error_log (operation, message, timestamp)
        VALUES ('transfer_funds', SQLERRM, NOW());
        RAISE;
END;
$$ LANGUAGE plpgsql;
 
-- Trigger for audit logging
CREATE OR REPLACE FUNCTION audit_changes() RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, operation, new_data, timestamp)
        VALUES (TG_TABLE_NAME, 'INSERT', to_jsonb(NEW), NOW());
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, operation, old_data, new_data, timestamp)
        VALUES (TG_TABLE_NAME, 'UPDATE', to_jsonb(OLD), to_jsonb(NEW), NOW());
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, operation, old_data, timestamp)
        VALUES (TG_TABLE_NAME, 'DELETE', to_jsonb(OLD), NOW());
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Other Supported Languages:

PL/Python: Full Python inside PostgreSQL, with access to NumPy, pandas, and other libraries
PL/Perl: Perl's text processing power for data transformations
PL/V8 (JavaScript): Modern JavaScript inside the database
PL/Java: For organizations with Java dependencies
PL/R: Statistical computing for analytics workloads

Each language runs in the PostgreSQL process space and can access the execution context, transaction state, and all PostgreSQL types.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Enable PL/Python
CREATE EXTENSION IF NOT EXISTS plpython3u;
 
-- Use Python for complex text processing
CREATE OR REPLACE FUNCTION extract_email_domain(email TEXT)
RETURNS TEXT AS $$
    import re
    match = re.search(r'@([w.-]+)', email)
    return match.group(1) if match else None
$$ LANGUAGE plpython3u;
 
-- Data science inside the database
CREATE OR REPLACE FUNCTION calculate_statistics(values FLOAT8[])
RETURNS TABLE(mean FLOAT8, std_dev FLOAT8, median FLOAT8) AS $$
    import numpy as np
    arr = np.array(values)
    return [(
        float(np.mean(arr)),
        float(np.std(arr)),
        float(np.median(arr))
    )]
$$ LANGUAGE plpython3u;

Security Considerations

Procedural languages come in 'trusted' and 'untrusted' variants. Trusted languages (like PL/pgSQL) can only access the database and have limited system access. Untrusted languages (like plpython3u—note the 'u') can access files, network, and system resources. Only superusers can create functions in untrusted languages. Choose appropriately based on security requirements.

Table Inheritance and Partitioning

PostgreSQL pioneered object-relational concepts including table inheritance, which has evolved into modern declarative partitioning—essential for managing large tables.

Table Inheritance (Legacy):

Child tables inherit columns and constraints from parent tables. While still available, declarative partitioning (PostgreSQL 10+) is preferred for most use cases.

Declarative Partitioning:

Partitioning divides a large table into smaller, more manageable pieces. PostgreSQL supports list, range, and hash partitioning with automatic query routing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- Range partitioning by date (common for time-series data)
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
 
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- ... continue for each month
 
-- PostgreSQL automatically routes inserts to correct partition
INSERT INTO events (event_type, payload, created_at)
VALUES ('click', '{}', '2024-01-15');  -- Goes to events_2024_01
 
-- Queries automatically scan only relevant partitions
SELECT * FROM events WHERE created_at >= '2024-01-10' AND created_at < '2024-01-20';
-- Only scans events_2024_01, skips all other partitions (partition pruning)
 
-- List partitioning by category
CREATE TABLE products (
    id SERIAL,
    name VARCHAR(100),
    category VARCHAR(50) NOT NULL,
    price DECIMAL(10,2)
) PARTITION BY LIST (category);
 
CREATE TABLE products_electronics PARTITION OF products
    FOR VALUES IN ('electronics', 'computers', 'phones');
CREATE TABLE products_clothing PARTITION OF products
    FOR VALUES IN ('clothing', 'shoes', 'accessories');
CREATE TABLE products_default PARTITION OF products DEFAULT;
 
-- Hash partitioning for even distribution
CREATE TABLE sessions (
    id UUID PRIMARY KEY,
    user_id INTEGER NOT NULL,
    data JSONB
) PARTITION BY HASH (user_id);
 
-- Create 16 partitions for parallel operations
CREATE TABLE sessions_0 PARTITION OF sessions FOR VALUES WITH (MODULUS 16, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions FOR VALUES WITH (MODULUS 16, REMAINDER 1);
-- ... through sessions_15

Partitioning Benefits

•Partition Pruning — Queries automatically skip irrelevant partitions, dramatically reducing I/O for filtered queries
•Efficient Maintenance — Drop old partitions instantly (vs. slow DELETE), analyze/vacuum individual partitions
•Parallel Operations — Each partition can be processed in parallel for bulk operations
•Storage Tiering — Move old partitions to slower storage while keeping recent data on fast SSDs
•Index Size Management — Smaller indexes per partition fit in memory more effectively

Automatic Partition Management

Use pg_partman extension for automatic partition management. It handles partition creation, detachment, and cleanup on schedule—essential for time-series data where you don't want manual intervention to create each month's partition.

Additional Capabilities

PostgreSQL's feature set continues well beyond what we've covered. Here are additional capabilities that system designers should know about:

Additional PostgreSQL Features
Feature	Description	Use Case
Foreign Data Wrappers	Query external data sources as if they were local tables	Access S3, MySQL, Oracle, MongoDB, CSV files, REST APIs from SQL
LISTEN/NOTIFY	Publisher-subscriber messaging within PostgreSQL	Real-time notifications, cache invalidation, simple pub/sub
Advisory Locks	Application-level locking without table locks	Coordinating distributed processes, preventing duplicate jobs
Row-Level Security	Policies that filter rows based on current user	Multi-tenant applications, data access control without views
Materialized Views	Cached query results with REFRESH capability	Complex analytics dashboards, pre-computed aggregations
Logical Decoding	Stream database changes as events	CDC pipelines, real-time replication to other systems
COPY Protocol	Bulk data import/export at wire-speed	Loading millions of rows per second, data lake integration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- LISTEN/NOTIFY for real-time updates
-- Session 1:
LISTEN order_updates;
 
-- Session 2 (different connection):
INSERT INTO orders (...) VALUES (...);
NOTIFY order_updates, '{"order_id": 123, "status": "created"}';
 
-- Session 1 receives: Asynchronous notification "order_updates" with payload "{"order_id": 123...}"
 
-- Row-Level Security for multi-tenancy
ALTER TABLE customer_data ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY tenant_isolation ON customer_data
    USING (tenant_id = current_setting('app.current_tenant')::int);
 
SET app.current_tenant = '42';
SELECT * FROM customer_data;  -- Only sees tenant 42's data, guaranteed by database
 
-- Materialized Views for fast dashboards
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT 
    date_trunc('day', created_at) AS day,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY 1;
 
CREATE UNIQUE INDEX ON daily_sales_summary (day);
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_summary;  -- Non-blocking refresh

Feature Discovery

PostgreSQL's documentation is exceptionally comprehensive. When facing a problem, search the docs before assuming you need external tools. More often than not, PostgreSQL has a built-in solution that's been battle-tested for years.

Summary: PostgreSQL's Rich Feature Set

We've explored the extensive capabilities that make PostgreSQL far more than a simple SQL database:

Key Takeaways

•Advanced data types (arrays, JSONB, ranges, network types) eliminate the need for application-level parsing and external databases
•Built-in full-text search provides Elasticsearch-like functionality without operational complexity
•Advanced SQL features (CTEs, window functions, LATERAL joins) enable complex analytics in pure SQL
•Multiple index types (B-tree, GIN, GiST, BRIN) allow optimization for any query pattern
•Procedural languages (PL/pgSQL, PL/Python, etc.) move complex logic closer to data
•Declarative partitioning enables efficient management of massive tables
•Additional features (FDW, LISTEN/NOTIFY, RLS, materialized views) cover specialized use cases

What's Next:

Now that we understand PostgreSQL's native capabilities, the next page explores the extension ecosystem—PostGIS for geospatial data, TimescaleDB for time-series, and other extensions that transform PostgreSQL into a specialized database for any domain.

Page Complete

You now understand PostgreSQL's rich feature set and how it can simplify your architecture by consolidating functionality that would otherwise require multiple systems. Next, we'll explore the powerful extension ecosystem that further extends PostgreSQL's capabilities.

2 / 5

Loading learning content...

System Design (HLD)PostgreSQL

PostgreSQL: The World's Most Advanced Open Source Database

LevelAdvanced

Duration90 mins

TopicPostgreSQL

2 / 5

Rich Feature Set

Beyond SQL: PostgreSQL's Feature Arsenal

What You Will Learn

Advanced Data Types: Native Support for Complex Data

Array Types:

PostgreSQL supports arrays of any built-in or user-defined type. This isn't just storage—arrays can be queried, indexed, and manipulated with dedicated operators and functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Storing and querying arrays
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    tags TEXT[]  -- Array of text
);
 
INSERT INTO products (name, tags) 
VALUES ('PostgreSQL Book', ARRAY['database', 'sql', 'learning']);
 
-- Query using array operators
SELECT * FROM products WHERE 'sql' = ANY(tags);  -- Contains element
SELECT * FROM products WHERE tags @> ARRAY['database', 'sql'];  -- Contains all
SELECT * FROM products WHERE tags && ARRAY['nosql', 'sql'];  -- Overlaps with
 
-- Array functions
SELECT array_length(tags, 1) FROM products;  -- Length of array
SELECT unnest(tags) FROM products;  -- Expand to rows
SELECT array_agg(DISTINCT tag) FROM products, unnest(tags) AS tag;  -- Aggregate back
 
-- GIN index for efficient array queries
CREATE INDEX idx_products_tags ON products USING GIN(tags);

JSON and JSONB Types:

PostgreSQL provides first-class JSON support through two types:

json: Stores JSON as text, preserving formatting and duplicate keys. Parsing happens on each access.
jsonb: Stores JSON in a decomposed binary format. Faster to query, supports indexing, but slightly slower to insert.

For most use cases, JSONB is preferred due to its query performance and indexing capabilities.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- JSONB column for flexible schemas
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
 
INSERT INTO events (event_type, payload) VALUES 
('user.created', '{"userId": 123, "email": "user@example.com", "plan": "pro"}'),
('order.placed', '{"orderId": 456, "items": [{"sku": "ABC", "qty": 2}], "total": 99.99}');
 
-- Query JSON fields
SELECT payload->>'email' AS email FROM events WHERE event_type = 'user.created';
SELECT payload->'items'->0->>'sku' AS first_item_sku FROM events WHERE event_type = 'order.placed';
 
-- Filter on JSON values
SELECT * FROM events WHERE payload->>'plan' = 'pro';
SELECT * FROM events WHERE (payload->>'total')::numeric > 50;
 
-- JSON containment queries (@> operator)
SELECT * FROM events WHERE payload @> '{"plan": "pro"}';
 
-- Index JSONB for performance
CREATE INDEX idx_events_payload ON events USING GIN(payload);
 
-- Index specific JSON paths
CREATE INDEX idx_events_user_id ON events ((payload->>'userId'));
 
-- JSON aggregation
SELECT jsonb_agg(payload) FROM events WHERE event_type LIKE 'user.%';
 
-- Build JSON dynamically
SELECT jsonb_build_object(
    'type', event_type,
    'data', payload,
    'timestamp', created_at
) FROM events;

PostgreSQL Specialized Data Types
Type Category	Types	Use Cases	Index Support
Network	inet, cidr, macaddr, macaddr8	IP addresses, subnets, MAC addresses	B-tree, GiST
Geometric	point, line, lseg, box, path, polygon, circle	2D spatial data, collision detection	GiST
Range	int4range, int8range, numrange, tsrange, tstzrange, daterange	Scheduling, availability windows, version ranges	GiST, SP-GiST
UUID	uuid	Distributed identifiers, primary keys	B-tree, hash
Bit String	bit, bit varying	Flags, permissions, packed data	B-tree
Text Search	tsvector, tsquery	Full-text search documents and queries	GIN, GiST
Composite	User-defined row types	Returning multiple values, complex structures	Limited

Range Types Eliminate Complex Logic

Full-Text Search: Elasticsearch Alternative

How Full-Text Search Works:

Full-text search in PostgreSQL revolves around two special types:

tsvector: A sorted list of distinct lexemes (normalized words) extracted from a document, along with positional information
tsquery: A query tree of lexemes combined with boolean operators (AND, OR, NOT) and phrase search

The search process:

Documents are converted to tsvectors using language-aware parsing, normalization (stemming), and stop word removal
Queries are converted to tsqueries with the same normalization
The @@ (match) operator finds documents where tsvector matches tsquery
GIN indexes make this search extremely fast

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- Basic full-text search
SELECT to_tsvector('english', 'PostgreSQL is a powerful database system');
-- Result: 'databas':5 'postgresql':1 'power':4 'system':6
-- Note: 'is' and 'a' removed (stop words), words stemmed ('powerful' → 'power')
 
SELECT to_tsquery('english', 'database & powerful');
-- Result: 'databas' & 'power'
 
-- Search example
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    search_vector TSVECTOR GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED
);
 
-- GIN index for fast search
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);
 
-- Search with ranking
SELECT 
    title,
    ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database & replication') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
 
-- Phrase search: words must appear adjacent
SELECT * FROM articles 
WHERE search_vector @@ to_tsquery('english', 'master <-> replica');
 
-- Prefix search (autocomplete)
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'postg:*');
 
-- Highlighting matches
SELECT ts_headline(
    'english',
    body,
    to_tsquery('english', 'PostgreSQL & performance'),
    'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
) FROM articles
WHERE search_vector @@ to_tsquery('english', 'PostgreSQL & performance');

Advanced Search Features:

Weighted search: Assign weights (A, B, C, D) to different fields—title matches rank higher than body matches
Phrase search: Find exact phrases using <-> (followed by) and <N> (within N words) operators
Prefix matching: Autocomplete functionality with :* suffix on search terms
Multiple languages: Built-in dictionaries for English, German, French, Spanish, and many more
Synonym support: Configure synonym dictionaries for domain-specific terminology
Custom dictionaries: Create specialized parsing for technical domains

When to Use External Search

Advanced SQL Features: Beyond Standard Queries

PostgreSQL implements SQL more completely and extends it more thoughtfully than most other databases. These advanced features can dramatically simplify application logic and improve performance.

Common Table Expressions (CTEs) and Recursive Queries:

CTEs (WITH clauses) improve query readability and enable recursive queries for hierarchical data like organization charts, threaded comments, or bill-of-materials explosions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Employee hierarchy traversal
WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers (no manager)
    SELECT id, name, manager_id, 1 AS level, ARRAY[name] AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees with managers already in result
    SELECT e.id, e.name, e.manager_id, oc.level + 1, oc.path || e.name
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT 
    repeat('  ', level - 1) || name AS org_structure,
    level,
    array_to_string(path, ' → ') AS reporting_chain
FROM org_chart
ORDER BY path;
 
-- Result:
-- org_structure          | level | reporting_chain
-- CEO                    | 1     | CEO
--   VP Engineering       | 2     | CEO → VP Engineering
--     Senior Engineer    | 3     | CEO → VP Engineering → Senior Engineer
--   VP Sales             | 2     | CEO → VP Sales

Window Functions:

Window functions perform calculations across sets of rows related to the current row without collapsing the result set. They're essential for analytics, reporting, and complex business logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Running total and row numbers
SELECT 
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total,
    ROW_NUMBER() OVER (ORDER BY order_date) AS order_num
FROM orders;
 
-- Ranking within partitions
SELECT 
    department,
    employee_name,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS gap_from_prev
FROM employees;
 
-- Moving averages and analysis
SELECT 
    date,
    revenue,
    AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS week_avg,
    revenue - AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS deviation,
    NTILE(4) OVER (ORDER BY revenue) AS revenue_quartile
FROM daily_metrics;
 
-- First/last values in partitions
SELECT DISTINCT
    customer_id,
    FIRST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_order,
    LAST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_order
FROM orders;

Generated Columns and Computed Values:

Generated columns automatically maintain computed values, ensuring consistency without application logic or triggers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    price DECIMAL(10, 2) NOT NULL,
    quantity INTEGER NOT NULL,
    -- Virtual generated column (computed on read)
    total DECIMAL(12, 2) GENERATED ALWAYS AS (price * quantity) STORED,
    -- Can be indexed since it's STORED
    search_name TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', name)) STORED,
    -- Useful for normalized lookups
    lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED
);
 
-- Generated columns are maintained automatically
INSERT INTO products (price, quantity) VALUES (29.99, 10);
SELECT total FROM products;  -- Returns 299.90, computed by database

LATERAL Joins

Indexing Strategies: Beyond B-Tree

PostgreSQL offers multiple index types, each optimized for different query patterns. Understanding when to use each type is crucial for performance optimization.

B-Tree (Default):

The standard index type, optimal for equality and range queries on ordered data. Supports <, <=, =, >=, >, BETWEEN, IS NULL, and LIKE with left-anchored patterns.

Hash:

Optimized for simple equality comparisons. Uses less space than B-tree for long keys but only supports = operations. Now fully WAL-logged and crash-safe in PostgreSQL 10+.

GiST (Generalized Search Tree):

A balanced tree structure that supports complex queries: containment (@>, <@), overlap (&&), nearest-neighbor (ORDER BY ... <->). Used for geometric data, range types, and full-text search.

GIN (Generalized Inverted Index):

An inverted index that maps keys to multiple rows. Excellent for full-text search, JSONB containment, and array queries where you're looking for elements within composite values.

BRIN (Block Range Index):

A compact index that stores summary information about ranges of physical table blocks. Extremely small (often <1% of table size) and ideal for naturally ordered data like timestamps.

PostgreSQL Index Types Comparison
Index Type	Best For	Size	Supports ORDER BY	Update Cost
B-Tree	Equality, range queries, sorting	Medium	Yes	Medium
Hash	Equality only, very long keys	Small	No	Very Low
GiST	Geometric, range overlap, nearest-neighbor	Medium	For KNN only	High
GIN	Full-text search, JSONB, arrays	Large	No	Very High
BRIN	Large tables with naturally-ordered data	Tiny	No	Very Low
SP-GiST	Non-balanced structures (quadtrees, tries)	Medium	For some ops	Medium

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- B-Tree: Standard for most queries
CREATE INDEX idx_orders_created ON orders (created_at);
CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at);
 
-- Multicolumn index with INCLUDE (covering index)
CREATE INDEX idx_orders_covering ON orders (customer_id, created_at) 
    INCLUDE (status, total);  -- Include columns for index-only scans
 
-- Partial index: only index relevant rows
CREATE INDEX idx_orders_pending ON orders (created_at) 
    WHERE status = 'pending';  -- Much smaller than full index
 
-- Expression index: index computed values
CREATE INDEX idx_users_email_lower ON users (lower(email));
SELECT * FROM users WHERE lower(email) = 'user@example.com';  -- Uses index
 
-- GIN for JSONB
CREATE INDEX idx_events_payload ON events USING GIN (payload jsonb_path_ops);
SELECT * FROM events WHERE payload @> '{"user_id": 123}';
 
-- GiST for geometric/range operations
CREATE INDEX idx_reservations_during ON reservations USING GIST (during);
SELECT * FROM reservations WHERE during && '[2024-01-01, 2024-01-07]'::daterange;
 
-- BRIN for time-series data (append-only tables)
CREATE INDEX idx_logs_timestamp ON logs USING BRIN (timestamp);
-- BRIN index on 1 billion rows might be only 1-2 MB!

GIN Update Performance

Procedural Languages: Code Inside the Database

PL/pgSQL (Built-in):

The default procedural language, designed for SQL operations. Supports variables, conditionals, loops, exception handling, and has direct access to all PostgreSQL types and operators.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- Complex business logic in the database
CREATE OR REPLACE FUNCTION transfer_funds(
    from_account_id INTEGER,
    to_account_id INTEGER,
    amount DECIMAL(10,2)
) RETURNS BOOLEAN AS $$
DECLARE
    from_balance DECIMAL(10,2);
BEGIN
    -- Check source account balance
    SELECT balance INTO from_balance
    FROM accounts WHERE id = from_account_id
    FOR UPDATE;  -- Lock the row
    
    IF from_balance IS NULL THEN
        RAISE EXCEPTION 'Source account % not found', from_account_id;
    END IF;
    
    IF from_balance < amount THEN
        RAISE EXCEPTION 'Insufficient funds: % available, % requested', 
            from_balance, amount;
    END IF;
    
    -- Perform transfer
    UPDATE accounts SET balance = balance - amount WHERE id = from_account_id;
    UPDATE accounts SET balance = balance + amount WHERE id = to_account_id;
    
    -- Log the transaction
    INSERT INTO transfer_log (from_id, to_id, amount, timestamp)
    VALUES (from_account_id, to_account_id, amount, NOW());
    
    RETURN TRUE;
EXCEPTION
    WHEN OTHERS THEN
        -- Log error and re-raise
        INSERT INTO error_log (operation, message, timestamp)
        VALUES ('transfer_funds', SQLERRM, NOW());
        RAISE;
END;
$$ LANGUAGE plpgsql;
 
-- Trigger for audit logging
CREATE OR REPLACE FUNCTION audit_changes() RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, operation, new_data, timestamp)
        VALUES (TG_TABLE_NAME, 'INSERT', to_jsonb(NEW), NOW());
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, operation, old_data, new_data, timestamp)
        VALUES (TG_TABLE_NAME, 'UPDATE', to_jsonb(OLD), to_jsonb(NEW), NOW());
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, operation, old_data, timestamp)
        VALUES (TG_TABLE_NAME, 'DELETE', to_jsonb(OLD), NOW());
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Other Supported Languages:

PL/Python: Full Python inside PostgreSQL, with access to NumPy, pandas, and other libraries
PL/Perl: Perl's text processing power for data transformations
PL/V8 (JavaScript): Modern JavaScript inside the database
PL/Java: For organizations with Java dependencies
PL/R: Statistical computing for analytics workloads

Each language runs in the PostgreSQL process space and can access the execution context, transaction state, and all PostgreSQL types.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Enable PL/Python
CREATE EXTENSION IF NOT EXISTS plpython3u;
 
-- Use Python for complex text processing
CREATE OR REPLACE FUNCTION extract_email_domain(email TEXT)
RETURNS TEXT AS $$
    import re
    match = re.search(r'@([w.-]+)', email)
    return match.group(1) if match else None
$$ LANGUAGE plpython3u;
 
-- Data science inside the database
CREATE OR REPLACE FUNCTION calculate_statistics(values FLOAT8[])
RETURNS TABLE(mean FLOAT8, std_dev FLOAT8, median FLOAT8) AS $$
    import numpy as np
    arr = np.array(values)
    return [(
        float(np.mean(arr)),
        float(np.std(arr)),
        float(np.median(arr))
    )]
$$ LANGUAGE plpython3u;

Security Considerations

Table Inheritance and Partitioning

PostgreSQL pioneered object-relational concepts including table inheritance, which has evolved into modern declarative partitioning—essential for managing large tables.

Table Inheritance (Legacy):

Child tables inherit columns and constraints from parent tables. While still available, declarative partitioning (PostgreSQL 10+) is preferred for most use cases.

Declarative Partitioning:

Partitioning divides a large table into smaller, more manageable pieces. PostgreSQL supports list, range, and hash partitioning with automatic query routing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- Range partitioning by date (common for time-series data)
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
 
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- ... continue for each month
 
-- PostgreSQL automatically routes inserts to correct partition
INSERT INTO events (event_type, payload, created_at)
VALUES ('click', '{}', '2024-01-15');  -- Goes to events_2024_01
 
-- Queries automatically scan only relevant partitions
SELECT * FROM events WHERE created_at >= '2024-01-10' AND created_at < '2024-01-20';
-- Only scans events_2024_01, skips all other partitions (partition pruning)
 
-- List partitioning by category
CREATE TABLE products (
    id SERIAL,
    name VARCHAR(100),
    category VARCHAR(50) NOT NULL,
    price DECIMAL(10,2)
) PARTITION BY LIST (category);
 
CREATE TABLE products_electronics PARTITION OF products
    FOR VALUES IN ('electronics', 'computers', 'phones');
CREATE TABLE products_clothing PARTITION OF products
    FOR VALUES IN ('clothing', 'shoes', 'accessories');
CREATE TABLE products_default PARTITION OF products DEFAULT;
 
-- Hash partitioning for even distribution
CREATE TABLE sessions (
    id UUID PRIMARY KEY,
    user_id INTEGER NOT NULL,
    data JSONB
) PARTITION BY HASH (user_id);
 
-- Create 16 partitions for parallel operations
CREATE TABLE sessions_0 PARTITION OF sessions FOR VALUES WITH (MODULUS 16, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions FOR VALUES WITH (MODULUS 16, REMAINDER 1);
-- ... through sessions_15

Partitioning Benefits

•Partition Pruning — Queries automatically skip irrelevant partitions, dramatically reducing I/O for filtered queries
•Efficient Maintenance — Drop old partitions instantly (vs. slow DELETE), analyze/vacuum individual partitions
•Parallel Operations — Each partition can be processed in parallel for bulk operations
•Storage Tiering — Move old partitions to slower storage while keeping recent data on fast SSDs
•Index Size Management — Smaller indexes per partition fit in memory more effectively

Automatic Partition Management

Additional Capabilities

PostgreSQL's feature set continues well beyond what we've covered. Here are additional capabilities that system designers should know about:

Additional PostgreSQL Features
Feature	Description	Use Case
Foreign Data Wrappers	Query external data sources as if they were local tables	Access S3, MySQL, Oracle, MongoDB, CSV files, REST APIs from SQL
LISTEN/NOTIFY	Publisher-subscriber messaging within PostgreSQL	Real-time notifications, cache invalidation, simple pub/sub
Advisory Locks	Application-level locking without table locks	Coordinating distributed processes, preventing duplicate jobs
Row-Level Security	Policies that filter rows based on current user	Multi-tenant applications, data access control without views
Materialized Views	Cached query results with REFRESH capability	Complex analytics dashboards, pre-computed aggregations
Logical Decoding	Stream database changes as events	CDC pipelines, real-time replication to other systems
COPY Protocol	Bulk data import/export at wire-speed	Loading millions of rows per second, data lake integration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- LISTEN/NOTIFY for real-time updates
-- Session 1:
LISTEN order_updates;
 
-- Session 2 (different connection):
INSERT INTO orders (...) VALUES (...);
NOTIFY order_updates, '{"order_id": 123, "status": "created"}';
 
-- Session 1 receives: Asynchronous notification "order_updates" with payload "{"order_id": 123...}"
 
-- Row-Level Security for multi-tenancy
ALTER TABLE customer_data ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY tenant_isolation ON customer_data
    USING (tenant_id = current_setting('app.current_tenant')::int);
 
SET app.current_tenant = '42';
SELECT * FROM customer_data;  -- Only sees tenant 42's data, guaranteed by database
 
-- Materialized Views for fast dashboards
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT 
    date_trunc('day', created_at) AS day,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY 1;
 
CREATE UNIQUE INDEX ON daily_sales_summary (day);
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_summary;  -- Non-blocking refresh

Feature Discovery

Summary: PostgreSQL's Rich Feature Set

We've explored the extensive capabilities that make PostgreSQL far more than a simple SQL database:

Key Takeaways

•Advanced data types (arrays, JSONB, ranges, network types) eliminate the need for application-level parsing and external databases
•Built-in full-text search provides Elasticsearch-like functionality without operational complexity
•Advanced SQL features (CTEs, window functions, LATERAL joins) enable complex analytics in pure SQL
•Multiple index types (B-tree, GIN, GiST, BRIN) allow optimization for any query pattern
•Procedural languages (PL/pgSQL, PL/Python, etc.) move complex logic closer to data
•Declarative partitioning enables efficient management of massive tables
•Additional features (FDW, LISTEN/NOTIFY, RLS, materialized views) cover specialized use cases

What's Next:

Page Complete

2 / 5