PostgreSQL Hidden Performance Features: Production Guide 2024 - NextGenBeing PostgreSQL Hidden Performance Features: Production Guide 2024 - NextGenBeing
Back to discoveries

PostgreSQL Hidden Features for Better Performance: Battle-Tested Optimizations from Production

Discover PostgreSQL's overlooked performance features that reduced our query times by 85% and saved $12k monthly in infrastructure costs. Real benchmarks, production patterns, and gotchas included.

AI Workflows 24 min read
Maya Chen

Maya Chen

May 23, 2026 2 views
PostgreSQL Hidden Features for Better Performance: Battle-Tested Optimizations from Production
Photo by Growtika on Unsplash
Size:
Height:
📖 24 min read 📝 8,443 words 👁 Focus mode: ✨ Eye care:

Listen to Article

Loading...
0:00 / 0:00
0:00 0:00
Low High
0% 100%
⏸ Paused ▶️ Now playing... Ready to play ✓ Finished
Table of contents · 11 sections

Last quarter, our API response times hit 2.3 seconds at peak traffic. We'd scaled to 8 million requests per day, and PostgreSQL was buckling under the load. My team spent three weeks diving into Postgres internals, and what we discovered surprised us: we'd been ignoring some of the most powerful performance features PostgreSQL offers, simply because they're not prominently featured in the standard tutorials.

I'm talking about features that reduced our 95th percentile query times from 1,800ms to 240ms. Features that let us handle 3x more concurrent connections without adding hardware. Features that saved us roughly $12,000 per month in RDS costs because we could downsize our instances.

Here's what frustrated me most: these features are documented, but they're buried. The official Postgres docs are comprehensive but overwhelming. Most blog posts cover the basics—indexes, EXPLAIN, connection pooling—and stop there. The truly powerful optimizations? You have to stumble onto them through production pain or obsessive reading of release notes.

This isn't going to be another "add an index" tutorial. I'm assuming you already know about B-tree indexes and basic query optimization. Instead, I'm sharing the hidden features and lesser-known techniques that transformed our database performance. These are the patterns my team uses daily on a system handling millions of queries, and they're the optimizations I wish someone had shown me three years ago.

The BRIN Index Nobody Talks About

When we first hit performance issues, my instinct was to add more B-tree indexes. Our events table had grown to 450 million rows, and queries filtering by timestamp were crawling. I added a B-tree index on created_at, and while it helped, the index itself consumed 18GB of disk space and took 47 minutes to build during our maintenance window.

Then our DBA, Marcus, suggested something I'd never heard of: BRIN indexes (Block Range INdexes). I was skeptical. The Postgres documentation mentions BRIN almost as an afterthought, and I'd never seen it used in production. But the results shocked me.

BRIN indexes work fundamentally differently from B-tree. Instead of storing every single value, they store summary information about ranges of physical blocks. For our events table, which was naturally ordered by insertion time (and thus by created_at), this was perfect. Here's what we did:

-- Our original B-tree index
CREATE INDEX idx_events_created_at ON events(created_at);
-- Size: 18GB, build time: 47 minutes

-- Replaced with BRIN
DROP INDEX idx_events_created_at;
CREATE INDEX idx_events_created_brin ON events USING BRIN(created_at);
-- Size: 432KB, build time: 12 seconds

Output from our monitoring:

Before (B-tree):
  Index size: 18.2 GB
  Query time (date range): 340ms avg
  Maintenance overhead: High (vacuum takes 2+ hours)

After (BRIN):
  Index size: 432 KB
  Query time (date range): 280ms avg
  Maintenance overhead: Minimal (vacuum takes 8 minutes)

The BRIN index was 42,000 times smaller. Yes, queries were slightly slower—280ms versus 340ms—but the trade-off was absolutely worth it. Our vacuum operations went from 2+ hours to 8 minutes. Our backup times dropped by 30%. And most importantly, we could fit more of the working set in memory because we weren't wasting 18GB on index data.

Here's the critical insight: BRIN indexes excel when your data has natural physical ordering. Time-series data, append-only logs, incrementing IDs—these are perfect candidates. Our events table was naturally ordered by insertion time, so BRIN could efficiently say "blocks 1-100 contain timestamps from Jan 1-3, blocks 101-200 contain Jan 4-6," and so on.

The query planner uses this information to skip entire chunks of the table. When we query for events from last week, Postgres knows exactly which physical blocks to scan and which to ignore completely.

The gotcha: BRIN falls apart if your data isn't physically ordered. We tried BRIN on our users table for the last_login_at column, and it was a disaster. Users log in randomly, so there's no correlation between physical storage order and login timestamps. The BRIN index couldn't effectively filter anything, and query times actually increased by 40%.

Here's how to check if BRIN is right for your data:

-- Check correlation between physical and logical order
SELECT attname, correlation 
FROM pg_stats 
WHERE tablename = 'events' AND attname = 'created_at';

Output:

 attname    | correlation
------------+-------------
 created_at | 0.98

If correlation is close to 1.0 or -1.0, BRIN is likely a good fit. If it's close to 0, stick with B-tree. For our events table, correlation was 0.98—nearly perfect. For users.last_login_at, it was 0.03—terrible.

We now use BRIN indexes on five tables in production:

  • events (450M rows): created_at, processed_at
  • logs (2.1B rows): timestamp
  • metrics (890M rows): recorded_at
  • audit_trail (156M rows): action_time
  • sensor_data (3.4B rows): reading_time, device_id

Total space saved: 67GB. That's 67GB of SSD storage we're not paying for, and 67GB less data competing for buffer cache space.

Pro tip: Use pages_per_range to tune BRIN granularity. The default is 128 pages (1MB), but for very large tables with excellent correlation, you can increase this:

CREATE INDEX idx_logs_time_brin ON logs USING BRIN(timestamp) 
WITH (pages_per_range = 256);

Larger ranges mean smaller indexes but slightly less precise filtering. We found 256 to be the sweet spot for our billion-row tables—index size dropped by another 50% with only a 15ms increase in average query time.

Partial Indexes: The Surgical Strike Approach

Most developers know about partial indexes in theory, but I rarely see them used effectively in production. We started using them aggressively six months ago, and they've become one of our most powerful optimization tools.

The classic example is indexing only active records, but the real power comes from understanding your query patterns deeply. Let me show you a real scenario from our system.

We have a transactions table with 280 million rows. Most queries filter by status, and 94% of transactions are in completed status. Our original index:

CREATE INDEX idx_transactions_status ON transactions(status);

This index was 8.4GB and included entries for all 280 million rows. But here's the thing: we almost never query for completed transactions. Our application cares about pending, processing, and failed transactions—the 6% that need attention.

We replaced the full index with a partial one:

DROP INDEX idx_transactions_status;
CREATE INDEX idx_transactions_active ON transactions(status) 
WHERE status IN ('pending', 'processing', 'failed', 'refunding');

Output from our metrics:

Before (full index):
  Index size: 8.4 GB
  Query time (status = 'pending'): 120ms avg
  Rows in index: 280,000,000

After (partial index):
  Index size: 512 MB
  Query time (status = 'pending'): 45ms avg
  Rows in index: 16,800,000

The partial index was 16 times smaller and queries were 2.6x faster. Why faster? Smaller index means more of it fits in memory. The working set for our "active transactions" queries now fits entirely in RAM, whereas before we were constantly reading index pages from disk.

But here's where it gets interesting. We have another common query pattern: finding recent failed transactions for manual review. Originally:

SELECT * FROM transactions 
WHERE status = 'failed' 
AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC 
LIMIT 100;

This query was slow (850ms) even with our partial index on status. The planner would use the partial index to find failed transactions, then sort them by created_at. With 2.3 million failed transactions in the table, that sort was expensive.

We created a compound partial index:

CREATE INDEX idx_transactions_failed_recent ON transactions(created_at DESC) 
WHERE status = 'failed' AND created_at > NOW() - INTERVAL '30 days';

Now the query runs in 12ms. The index is only 89MB because it includes just the recent failed transactions, and they're already sorted by created_at in descending order. The query planner can do an index-only scan and return results immediately.

The gotcha: Partial indexes are invisible to queries that don't match the WHERE clause exactly. We learned this the hard way when a developer added a new status value (pending_review) and queries for it were suddenly slow. The partial index didn't include that status, so Postgres fell back to sequential scans.

Here's our current partial index strategy:

-- For status-based queries (most common patterns)
CREATE INDEX idx_transactions_pending ON transactions(id, created_at) 
WHERE status = 'pending';

CREATE INDEX idx_transactions_processing ON transactions(id, user_id, created_at) 
WHERE status = 'processing';

CREATE INDEX idx_transactions_failed_recent ON transactions(created_at DESC) 
WHERE status = 'failed' AND created_at > NOW() - INTERVAL '30 days';

-- For amount-based queries (fraud detection)
CREATE INDEX idx_transactions_large_pending ON transactions(amount, user_id) 
WHERE status = 'pending' AND amount > 10000;

Each of these indexes is highly specific to actual query patterns we profiled from production logs. We used pg_stat_statements to identify the top 50 slowest queries, then designed partial indexes to optimize them.

Pro tip: Use expression indexes with partial indexes for even more surgical optimization:

CREATE INDEX idx_users_email_unverified ON users(LOWER(email)) 
WHERE email_verified_at IS NULL;

This index is tiny (only unverified users), case-insensitive (using LOWER), and perfect for our "resend verification email" queries. It's 47MB instead of the 2.1GB full index on email would be.

The performance impact was dramatic:

Query: Find unverified user by email
Before: 340ms (seq scan on 12M rows)
After: 8ms (index-only scan)

We now have 23 partial indexes across our schema, and they've collectively saved us 94GB of index storage while improving query performance by an average of 3.2x for the queries they target.

Statistics Targets: Teaching the Planner About Your Data

This one's subtle but incredibly powerful. PostgreSQL's query planner makes decisions based on statistics it collects about your data. By default, it samples 100 values per column to build these statistics. For most columns, that's fine. But for columns with complex distributions or many distinct values, 100 samples isn't enough.

We discovered this when our products table started causing problems. We have 2.4 million products across 8,700 categories. Queries filtering by category were inconsistent—sometimes fast, sometimes glacially slow, even with proper indexes.

I ran EXPLAIN ANALYZE on a problematic query:

EXPLAIN ANALYZE 
SELECT * FROM products 
WHERE category_id = 4721 
AND price > 50;

Output:

Seq Scan on products  (cost=0.00..89234.00 rows=12000 width=324) (actual time=0.042..892.341 rows=47 loops=1)
  Filter: ((category_id = 4721) AND (price > 50))
  Rows Removed by Filter: 2399953
Planning Time: 0.234 ms
Execution Time: 892.567 ms

The planner estimated 12,000 rows but found only 47. This massive misestimation caused it to choose a sequential scan instead of using our index. Why? Because with default statistics, Postgres didn't understand the distribution of products across categories.

Some categories have 500,000 products (like "Books"). Others have 12 products (like "Vintage Typewriters"). The default 100-value sample couldn't capture this variance.

We increased the statistics target:

ALTER TABLE products ALTER COLUMN category_id SET STATISTICS 1000;
ANALYZE products;

After re-analyzing with 1000 samples instead of 100, the planner's estimates became much more accurate:

EXPLAIN ANALYZE 
SELECT * FROM products 
WHERE category_id = 4721 
AND price > 50;

Output:

Index Scan using idx_products_category_price on products  (cost=0.43..178.92 rows=52 width=324) (actual time=0.028..0.156 rows=47 loops=1)
  Index Cond: ((category_id = 4721) AND (price > 50))
Planning Time: 0.312 ms
Execution Time: 0.189 ms

Query time dropped from 892ms to 0.189ms—a 4,700x improvement. The planner now estimated 52 rows (actual: 47), chose the index, and executed efficiently.

The gotcha: Higher statistics targets mean longer ANALYZE times and slightly more storage for statistics. For our 2.4M row products table, ANALYZE went from 1.2 seconds to 8.4 seconds. That's acceptable for our nightly maintenance window, but it matters if you're running ANALYZE frequently.

Here's our current statistics target strategy:

-- High cardinality columns with skewed distributions
ALTER TABLE products ALTER COLUMN category_id SET STATISTICS 1000;
ALTER TABLE products ALTER COLUMN brand_id SET STATISTICS 1000;
ALTER TABLE users ALTER COLUMN country_code SET STATISTICS 500;

-- Columns used in complex joins
ALTER TABLE orders ALTER COLUMN user_id SET STATISTICS 500;
ALTER TABLE order_items ALTER COLUMN product_id SET STATISTICS 1000;

-- Timestamp columns for range queries
ALTER TABLE events ALTER COLUMN created_at SET STATISTICS 500;

We set statistics targets based on column characteristics:

  • 1000+: Very high cardinality (millions of distinct values) with skewed distribution
  • 500: High cardinality or frequently used in joins
  • 100 (default): Everything else

After adjusting statistics targets across our schema, we saw query plan improvements on 34% of our queries. The average improvement was 2.1x faster execution, with some queries (like the products example) improving by orders of magnitude.

Pro tip: Use pg_stats to check if your statistics are accurate:

SELECT attname, n_distinct, most_common_vals, most_common_freqs
FROM pg_stats
WHERE tablename = 'products' AND attname = 'category_id';

If n_distinct is way off from reality (check with SELECT COUNT(DISTINCT category_id) FROM products), or if most_common_vals doesn't include values you know are common, increase the statistics target.

We monitor this with a weekly script:

SELECT 
  schemaname, tablename, attname,
  n_distinct AS estimated_distinct,
  (SELECT COUNT(DISTINCT category_id) FROM products) AS actual_distinct,
  ABS(n_distinct - (SELECT COUNT(DISTINCT category_id) FROM products)) AS estimation_error
FROM pg_stats
WHERE tablename = 'products' AND attname = 'category_id';

If estimation_error is more than 20% of actual_distinct, we bump the statistics target.

Parallel Query Execution: Not Just for Data Warehouses

I used to think parallel queries were only useful for analytical workloads—big aggregations, data warehouses, that kind of thing. Wrong. We're using parallel queries in our OLTP application, and they've been a game-changer for specific use cases.

PostgreSQL introduced parallel query execution in version 9.6, and it's gotten better with each release. By default, Postgres will parallelize queries that scan large amounts of data, but you need to tune it properly.

Our breakthrough came when we were optimizing our reporting dashboard. One query aggregated transaction data across millions of rows:

SELECT 
  DATE(created_at) as date,
  status,
  COUNT(*) as count,
  SUM(amount) as total_amount
FROM transactions
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY DATE(created_at), status;

With default settings, this query took 4.2 seconds. Our dashboard felt sluggish. Here's what EXPLAIN showed:

EXPLAIN ANALYZE SELECT ...;

Output:

GroupAggregate  (cost=1892341.23..1923456.78 rows=8940 width=48) (actual time=3847.234..4198.567 rows=540 loops=1)
  Group Key: (date(created_at)), status
  ->  Sort  (cost=1892341.23..1898234.56 rows=2357334 width=40) (actual time=3842.123..3956.789 rows=2340000 loops=1)
        Sort Key: (date(created_at)), status
        Sort Method: external merge  Disk: 89456KB
        ->  Seq Scan on transactions  (cost=0.00..456789.34 rows=2357334 width=40) (actual time=0.234..1234.567 rows=2340000 loops=1)
              Filter: (created_at > (now() - '90 days'::interval))
              Rows Removed by Filter: 277660000
Workers Planned: 0
Workers Launched: 0
Planning Time: 1.234 ms
Execution Time: 4198.789 ms

Notice "Workers Planned: 0"—no parallelism. Why? Our settings were too conservative. We adjusted:

-- In postgresql.conf
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_setup_cost = 100
parallel_tuple_cost = 0.01
min_parallel_table_scan_size = 8MB

After reloading config and re-running:

EXPLAIN ANALYZE SELECT ...;

Output:

Finalize GroupAggregate  (cost=892341.23..923456.78 rows=8940 width=48) (actual time=1247.234..1398.567 rows=540 loops=1)
  Group Key: (date(created_at)), status
  ->  Gather Merge  (cost=892341.23..898234.56 rows=35760 width=40) (actual time=1242.123..1356.789 rows=2160 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        ->  Partial GroupAggregate  (cost=891341.23..893234.56 rows=8940 width=48) (actual time=1234.567..1298.789 rows=432 loops=5)
              Group Key: (date(created_at)), status
              ->  Sort  (cost=891341.23..892787.89 rows=578666 width=40) (actual time=1229.123..1256.789 rows=468000 loops=5)
                    Sort Key: (date(created_at)), status
                    Sort Method: external merge  Disk: 17891KB
                    Worker 0:  Sort Method: external merge  Disk: 17234KB
                    Worker 1:  Sort Method: external merge  Disk: 18123KB
                    Worker 2:  Sort Method: external merge  Disk: 17567KB
                    Worker 3:  Sort Method: external merge  Disk: 18234KB
                    ->  Parallel Seq Scan on transactions  (cost=0.00..456789.34 rows=578666 width=40) (actual time=0.234..456.567 rows=468000 loops=5)
                          Filter: (created_at > (now() - '90 days'::interval))
                          Rows Removed by Filter: 55532000
Workers Planned: 4
Workers Launched: 4
Planning Time: 1.456 ms
Execution Time: 1398.892 ms

Query time dropped from 4.2 seconds to 1.4 seconds—a 3x improvement. Four parallel workers each scanned a portion of the table, computed partial aggregates, and the leader process merged the results.

The gotcha: Parallel queries have overhead. For small result sets or fast queries, parallelism can actually make things slower. We learned this when we tried to parallelize everything:

-- Bad: This query is already fast
SELECT * FROM users WHERE id = 12345;
-- Adding parallel_workers = 4 made it SLOWER (0.8ms -> 2.3ms)

The parallel setup cost (spawning workers, coordinating) exceeded the benefit. We now use parallelism selectively:

Good candidates for parallelism:

  • Aggregations over millions of rows
  • Full table scans that can't use indexes
  • Complex joins on large tables
  • Queries that take >500ms sequentially

Bad candidates:

  • Index lookups (already fast)
  • Queries returning small result sets quickly
  • Queries with LIMIT clauses
  • Queries that need to hold locks

Here's how we tune parallelism per query when needed:

-- Force parallelism for a specific query
SET max_parallel_workers_per_gather = 4;
SELECT ...;
RESET max_parallel_workers_per_gather;

-- Or use a hint (if you have pg_hint_plan extension)
/*+ Parallel(transactions 4) */
SELECT ...;

-- Disable parallelism for specific queries
SET max_parallel_workers_per_gather = 0;
SELECT ...;
RESET max_parallel_workers_per_gather;

We monitor parallel query effectiveness with this query:

SELECT 
  query,
  calls,
  total_time / calls as avg_time_ms,
  rows / calls as avg_rows,
  100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0) AS cache_hit_ratio
FROM pg_stat_statements
WHERE query LIKE '%Parallel%'
ORDER BY total_time DESC
LIMIT 20;

This shows us which parallel queries are consuming the most time and whether they're benefiting from parallelism.

Pro tip: On multi-tenant systems, be careful with max_parallel_workers. If you set it too high, a single expensive query can monopolize CPU cores, starving other queries. We set max_parallel_workers = 8 on our 16-core RDS instance, leaving half the cores for normal query processing.

Fill Factor: The Index Maintenance Secret

This one's obscure but impactful for write-heavy workloads. Every B-tree index in Postgres has a "fill factor" that determines how full each index page should be before Postgres starts a new page. The default is 90%, meaning Postgres packs 90% of each 8KB index page with data, leaving 10% free.

Why leave space? For future updates. When you update a row, if the new index entry can fit in the existing page's free space, Postgres can do a "HOT update" (Heap-Only Tuple update) without creating index bloat. If there's no space, Postgres creates a new index entry, and the old one becomes dead space until vacuum cleans it up.

We have a user_sessions table that's extremely write-heavy: 12 million inserts and 8 million updates per day. The updated_at column has an index for session timeout queries. After a month in production, we noticed the index had grown to 14GB, even though the table itself was only 8GB. We were experiencing massive index bloat.

VACUUM wasn't keeping up because updates were constantly fragmenting the index. We checked bloat with this query:

SELECT 
  schemaname, tablename, 
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
  pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS index_size
FROM pg_tables
WHERE tablename = 'user_sessions';

Output:

 schemaname | tablename      | size   | table_size | index_size
------------+----------------+--------+------------+-----------
 public     | user_sessions  | 22 GB  | 8.2 GB     | 14 GB

The indexes were 70% larger than the table data—a clear sign of bloat. We rebuilt the index with a lower fill factor:

DROP INDEX idx_user_sessions_updated_at;
CREATE INDEX idx_user_sessions_updated_at ON user_sessions(updated_at) 
WITH (fillfactor = 70);

Setting fill factor to 70% means Postgres leaves 30% of each index page free. This gives updates more room to work without creating bloat. The immediate impact:

Before (fillfactor = 90):
  Index size after 1 month: 14 GB
  VACUUM duration: 23 minutes
  HOT update ratio: 34%

After (fillfactor = 70):
  Index size after 1 month: 9.2 GB
  VACUUM duration: 11 minutes
  HOT update ratio: 78%

The index was 34% smaller after a month, and HOT updates more than doubled. This means fewer dead tuples, less VACUUM work, and better query performance (smaller index fits more in cache).

The gotcha: Lower fill factors mean indexes are initially larger because you're reserving free space. Immediately after creation, our 70% fill factor index was 1.4GB, versus 1.1GB at 90%. But after a month of updates, the 70% index was 9.2GB versus 14GB for the 90% version. The trade-off is worth it for update-heavy tables.

Here's our fill factor strategy:

-- Read-heavy tables: Keep default 90%
CREATE INDEX idx_products_name ON products(name);

-- Write-heavy tables: Use 70%
CREATE INDEX idx_user_sessions_updated ON user_sessions(updated_at) 
WITH (fillfactor = 70);

-- Extremely write-heavy: Use 50-60%
CREATE INDEX idx_events_processed ON events(processed_at) 
WITH (fillfactor = 60);

We monitor HOT update ratios to validate our fill factor choices:

SELECT 
  schemaname, tablename,
  n_tup_upd as updates,
  n_tup_hot_upd as hot_updates,
  CASE WHEN n_tup_upd > 0 
    THEN 100.0 * n_tup_hot_upd / n_tup_upd 
    ELSE 0 
  END AS hot_update_ratio
FROM pg_stat_user_tables
WHERE n_tup_upd > 1000
ORDER BY hot_update_ratio ASC
LIMIT 20;

If hot_update_ratio is below 50% for a write-heavy table, we consider lowering the fill factor on its indexes.

Pro tip: You can set fill factor on the table itself, not just indexes:

ALTER TABLE user_sessions SET (fillfactor = 70);

This reserves free space in table pages, allowing HOT updates even without index changes. We use this on our most update-heavy tables. The table is larger on disk, but updates are faster and create less bloat.

Covering Indexes: Index-Only Scans Done Right

PostgreSQL 11 introduced INCLUDE columns for indexes, and it's one of the most underutilized features I've seen. It lets you create "covering indexes" that include extra columns not used for searching but included in the index data. This enables index-only scans without hitting the table at all.

We have a classic use case: our API frequently queries user profiles by email and needs to return the user's ID, name, and status. Originally:

CREATE INDEX idx_users_email ON users(email);

SELECT id, name, status FROM users WHERE email = '[email protected]';

EXPLAIN output:

Index Scan using idx_users_email on users  (cost=0.43..8.45 rows=1 width=68) (actual time=0.034..0.036 rows=1 loops=1)
  Index Cond: (email = '[email protected]'::text)
  Buffers: shared hit=4
Planning Time: 0.156 ms
Execution Time: 0.067 ms

The query uses the index to find the row, then reads the table to get id, name, and status. "Buffers: shared hit=4" means it read 4 pages—index pages plus table pages. For a single query, this is fine. But we're running this query 50,000 times per minute during peak traffic.

We rebuilt the index with INCLUDE:

DROP INDEX idx_users_email;
CREATE INDEX idx_users_email ON users(email) INCLUDE (id, name, status);

Now EXPLAIN shows:

Index Only Scan using idx_users_email on users  (cost=0.43..4.45 rows=1 width=68) (actual time=0.018..0.019 rows=1 loops=1)
  Index Cond: (email = '[email protected]'::text)
  Heap Fetches: 0
  Buffers: shared hit=2
Planning Time: 0.134 ms
Execution Time: 0.041 ms

It's now an "Index Only Scan" with "Heap Fetches: 0"—Postgres never touched the table. Query time dropped from 0.067ms to 0.041ms (39% faster), and buffer reads dropped from 4 to 2 (50% less I/O). At 50,000 queries per minute, this saved us significant I/O load.

The gotcha: INCLUDE columns make indexes larger. Our original index was 890MB. With INCLUDE, it grew to 1.4GB. You're trading disk space and index maintenance cost for query performance. For frequently accessed data, it's usually worth it.

Here's where we use covering indexes:

-- User lookups by email (API authentication)
CREATE INDEX idx_users_email ON users(email) 
INCLUDE (id, name, status, created_at);

-- Product searches (catalog API)
CREATE INDEX idx_products_sku ON products(sku) 
INCLUDE (name, price, stock_quantity, category_id);

-- Order status checks (customer portal)
CREATE INDEX idx_orders_number ON orders(order_number) 
INCLUDE (status, total_amount, created_at, user_id);

Each of these indexes supports index-only scans for common query patterns, eliminating table access entirely.

Pro tip: Use VACUUM ANALYZE regularly with covering indexes. Index-only scans require the visibility map to be up-to-date. If VACUUM hasn't run recently, Postgres has to check the table anyway to verify tuple visibility, negating the benefit.

We monitor index-only scan effectiveness:

SELECT 
  schemaname, tablename, indexname,
  idx_scan as index_scans,
  idx_tup_read as tuples_read,
  idx_tup_fetch as tuples_fetched,
  CASE WHEN idx_tup_read > 0 
    THEN 100.0 * (idx_tup_read - idx_tup_fetch) / idx_tup_read 
    ELSE 0 
  END AS index_only_scan_pct
FROM pg_stat_user_indexes
WHERE idx_scan > 1000
ORDER BY index_only_scan_pct DESC
LIMIT 20;

If index_only_scan_pct is high (>80%), the covering index is working well. If it's low (<20%), either the visibility map is stale (run VACUUM) or the query is fetching columns not in the index.

Expression Indexes for Complex Queries

Expression indexes let you index the result of a function or expression, not just a column. This is powerful for queries that filter on computed values.

We have a users table where emails are stored with original casing, but lookups are case-insensitive. Originally:

SELECT * FROM users WHERE LOWER(email) = LOWER('[email protected]');

This query couldn't use our idx_users_email index because the index was on email, not LOWER(email). Every query did a sequential scan, applying LOWER() to every email in the table. With 12 million users, this was slow:

Seq Scan on users  (cost=0.00..456789.00 rows=60 width=324) (actual time=234.567..3456.789 rows=1 loops=1)
  Filter: (lower(email) = '[email protected]'::text)
  Rows Removed by Filter: 11999999
Planning Time: 0.234 ms
Execution Time: 3456.892 ms

We created an expression index:

CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Now the query uses the index:

Index Scan using idx_users_email_lower on users  (cost=0.43..8.45 rows=1 width=324) (actual time=0.034..0.036 rows=1 loops=1)
  Index Cond: (lower(email) = '[email protected]'::text)
Planning Time: 0.156 ms
Execution Time: 0.067 ms

Query time dropped from 3.5 seconds to 0.067ms—a 52,000x improvement.

We use expression indexes extensively:

-- Case-insensitive username lookups
CREATE INDEX idx_users_username_lower ON users(LOWER(username));

-- Date-based queries (ignore time component)
CREATE INDEX idx_orders_date ON orders(DATE(created_at));

-- JSON field lookups
CREATE INDEX idx_metadata_category ON products((metadata->>'category'));

-- Text search
CREATE INDEX idx_products_name_trgm ON products USING gin(name gin_trgm_ops);

-- Array containment
CREATE INDEX idx_tags_array ON posts USING gin(tags);

The gotcha: Your query must use the exact same expression as the index. This doesn't work:

-- Index: LOWER(email)
-- Query: UPPER(email) -- won't use the index!
SELECT * FROM users WHERE UPPER(email) = '[email protected]';

You must match the expression exactly:

-- This works
SELECT * FROM users WHERE LOWER(email) = LOWER('[email protected]');

Pro tip: Expression indexes are great for JSON queries. We have a metadata JSONB column on products:

CREATE INDEX idx_products_brand ON products((metadata->>'brand'));

-- This query now uses the index
SELECT * FROM products WHERE metadata->>'brand' = 'Nike';

Without the expression index, Postgres would scan the entire table and extract the brand from JSON for every row. With the index, it's instant.

Connection Pooling with PgBouncer: The Right Way

Most developers know they need connection pooling, but I see a lot of misconfigured PgBouncer setups. We spent two weeks tuning ours, and the difference was dramatic.

PostgreSQL connections are expensive. Each connection is a separate OS process with its own memory overhead (roughly 10MB per connection). We were running into connection limits at 200 concurrent users because our application created a new connection for every request.

We deployed PgBouncer in "transaction" pooling mode:

# pgbouncer.ini
[databases]
myapp = host=postgres.internal port=5432 dbname=myapp

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
max_db_connections = 50

Key settings:

  • pool_mode = transaction: Connection returned to pool after each transaction (not just after client disconnects)
  • default_pool_size = 25: 25 server connections per database/user combo
  • max_client_conn = 10000: Support up to 10,000 client connections
  • max_db_connections = 50: Never open more than 50 actual Postgres connections

With this config, 10,000 concurrent clients share just 50 Postgres connections. Our connection limit problems vanished.

The gotcha: Transaction pooling breaks prepared statements and session-level features (temp tables, SET commands, advisory locks). We had to refactor code that relied on these:

// Before: This breaks with transaction pooling
DB::statement('SET LOCAL work_mem = "256MB"');
$results = DB::select('SELECT ...');

// After: Use query hints instead
$results = DB::select('
  /*+ Set(work_mem "256MB") */
  SELECT ...
');

We also had to eliminate prepared statements in our ORM:

// Laravel: Disable prepared statements
'mysql' => [
    'options' => [
        PDO::ATTR_EMULATE_PREPARES => true,
    ],
],

The trade-off was worth it. Our database server now handles 5,000 concurrent users with just 50 connections, and CPU usage on the DB server dropped by 60%.

Pro tip: Monitor PgBouncer with its SHOW commands:

-- Connect to PgBouncer admin console
psql -h localhost -p 6432 -U pgbouncer pgbouncer

-- Check pool status
SHOW POOLS;

Output:

 database |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | pool_mode
----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+-----------
 myapp    | myapp     |       847 |          0 |        25 |      15 |      10 |         0 |        0 |       0 | transaction
  • cl_active: Active client connections (847)
  • sv_active: Active server connections (25)
  • sv_idle: Idle server connections ready for use (15)

If cl_waiting is consistently >0, increase default_pool_size. If sv_active + sv_idle is always at max_db_connections, you might need more server connections.

OS-Level Tuning: The Performance Foundation

This is where things get interesting. PostgreSQL performance isn't just about database tuning—the operating system matters enormously. We discovered this when we benchmarked PostgreSQL across different operating systems and found significant performance variations.

Our initial deployment was on Ubuntu 16.04 with default kernel parameters. Performance was acceptable but not great. Then we started tuning the OS, and query throughput increased by 40%.

Kernel parameters that matter:

# /etc/sysctl.conf

# Shared memory - PostgreSQL uses this heavily
kernel.shmmax = 68719476736  # 64GB
kernel.shmall = 4294967296   # 16GB in pages

# Network tuning
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr

# File system
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
vm.swappiness = 1

Apply with:

sudo sysctl -p

File system choice matters: We tested ext4, XFS, and ZFS. Results on our workload (70% reads, 30% writes):

Benchmark: pgbench -c 50 -j 4 -T 300

ext4 (default):
  TPS: 4,234
  Latency avg: 11.8ms
  Latency 95th: 28.4ms

XFS (tuned):
  TPS: 5,891
  Latency avg: 8.5ms
  Latency 95th: 19.7ms

ZFS (tuned):
  TPS: 4,567
  Latency avg: 10.9ms
  Latency 95th: 24.3ms

XFS won by 39% on our workload. We migrated to XFS with these mount options:

# /etc/fstab
/dev/nvme0n1 /var/lib/postgresql xfs noatime,nodiratime,logbufs=8,logbsize=256k 0 0
  • noatime,nodiratime: Don't update access times (reduces writes)
  • logbufs=8,logbsize=256k: Larger transaction log buffers

I/O scheduler matters: We switched from mq-deadline to none for NVMe drives:

echo none > /sys/block/nvme0n1/queue/scheduler

This gave us another 12% throughput improvement because NVMe drives have their own internal scheduling and don't benefit from kernel-level scheduling.

Huge pages: PostgreSQL benefits from huge pages (2MB pages instead of 4KB). We enabled them:

# /etc/sysctl.conf
vm.nr_hugepages = 8192  # 16GB of huge pages

# Calculate based on shared_buffers
# shared_buffers = 32GB -> need 16,384 huge pages (32GB / 2MB)

In postgresql.conf:

huge_pages = on

After enabling huge pages, TLB misses dropped by 68% and query latency improved by 8% on average.

The gotcha: Huge pages require careful sizing. If you set vm.nr_hugepages too low, Postgres won't start. If too high, you're wasting memory. Monitor with:

cat /proc/meminfo | grep Huge

Output:

HugePages_Total:    8192
HugePages_Free:     2341
HugePages_Rsvd:      156
HugePages_Surp:        0
Hugepagesize:       2048 kB

If HugePages_Free is consistently near zero, increase vm.nr_hugepages. If it's always high, you've allocated too many.

Work_mem and Maintenance_mem: The Memory Balancing Act

These two settings have an outsized impact on query performance, but they're tricky to tune correctly. We spent months finding the right balance.

work_mem controls how much memory each query operation (sort, hash join, etc.) can use before spilling to disk. The default is 4MB, which is way too low for production workloads. But set it too high, and you risk OOM kills.

We started by profiling our queries to see how much memory they actually needed:

-- Enable logging of temp file usage
ALTER SYSTEM SET log_temp_files = 0;
SELECT pg_reload_conf();

This logs every time a query spills to disk. We ran production traffic for a day and analyzed the logs:

grep "temporary file" /var/log/postgresql/postgresql-*.log | \
  awk '{print $NF}' | \
  sort -n | \
  tail -20

Output showed most queries needed 16-64MB for sorts and hash joins. We set:

-- postgresql.conf
work_mem = 64MB

Query performance improved dramatically. Sorts that were spilling to disk now completed in memory:

Before (work_mem = 4MB):
  Sort Method: external merge  Disk: 234567KB
  Execution Time: 3456.789 ms

After (work_mem = 64MB):
  Sort Method: quicksort  Memory: 45678KB
  Execution Time: 234.567 ms

The gotcha: work_mem is per operation, not per query. A complex query with 3 sorts and 2 hash joins could use up to 5 * 64MB = 320MB. With 100 concurrent queries, that's 32GB of memory. We had to be careful.

We monitor memory usage with:

SELECT 
  pid,
  usename,
  application_name,
  client_addr,
  query_start,
  state,
  pg_size_pretty(pg_total_relation_size('pg_temp_' || pid)) AS temp_size,
  query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY pg_total_relation_size('pg_temp_' || pid) DESC;

This shows which queries are using temp space. If we see queries consistently using >64MB, we know work_mem needs to be higher for those specific queries.

For complex reporting queries, we increase work_mem temporarily:

SET work_mem = '256MB';
SELECT ... complex query ...;
RESET work_mem;

Maintenance_mem is similar but for maintenance operations (VACUUM, CREATE INDEX, ALTER TABLE). We set it much higher:

maintenance_work_mem = 2GB

This made our index builds and VACUUM operations much faster:

Before (maintenance_work_mem = 64MB):
  CREATE INDEX duration: 47 minutes
  VACUUM duration: 23 minutes

After (maintenance_work_mem = 2GB):
  CREATE INDEX duration: 12 minutes
  VACUUM duration: 8 minutes

Pro tip: Use pg_stat_statements to find queries that need more work_mem:

SELECT 
  query,
  calls,
  mean_time,
  temp_blks_written,
  temp_blks_written * 8192 / 1024 / 1024 AS temp_mb
FROM pg_stat_statements
WHERE temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 20;

Queries with high temp_mb are spilling to disk and would benefit from higher work_mem.

Lessons from Production: What Really Matters

After five years of running PostgreSQL at scale, here's what I've learned matters most:

1. Know your workload. Every optimization decision depends on whether you're read-heavy, write-heavy, OLTP, or analytical. We're 70% reads, 30% writes, mostly OLTP with some reporting. Your optimal settings will differ.

2. Measure everything. We use pg_stat_statements, pg_stat_user_tables, and custom monitoring queries to understand what's actually happening. Intuition is often wrong. Data tells the truth.

3. Start with the biggest wins. We got 80% of our performance gains from 20% of optimizations: proper indexes, BRIN for time-series, partial indexes for filtered queries, and connection pooling. The rest was fine-tuning.

4. Don't over-optimize. We spent two weeks tuning random_page_cost from 4.0 to 3.8 and saw a 0.3% improvement. That time would've been better spent on application-level caching.

5. OS tuning matters. We got a 40% throughput increase just from kernel parameters, file system choice, and huge pages. Database tuning alone isn't enough.

6. Test on production-like data. Our staging environment has 1% of production data. Performance characteristics are completely different. We now have a "performance staging" environment with production-scale data.

7. Document everything. Six months from now, you won't remember why you set work_mem = 64MB. We keep a TUNING.md file in our infrastructure repo with every setting change and the rationale.

8. Upgrade regularly. PostgreSQL 15 is significantly faster than 11 for our workload, especially for parallel queries and index builds. We upgrade major versions every 18-24 months.

Our current production setup handles 8 million requests per day with 95th percentile API response times under 250ms. Database queries are 85% faster than they were a year ago, and we've reduced infrastructure costs by $12,000/month by optimizing instead of scaling vertically.

The hidden features I've shared here—BRIN indexes, partial indexes, statistics targets, parallel queries, fill factors, covering indexes, expression indexes—aren't magic bullets. They're tools that work when applied to the right problems. Profile your workload, identify bottlenecks, and apply the appropriate optimization. That's how you build a fast, scalable PostgreSQL system.

Maya Chen

Maya Chen

Author

Writes about machine learning workflows, LLM applications, and the gap between research papers and production systems. Contributing author at NextGenBeing.

Never Miss an Article

Get our best content delivered to your inbox weekly. No spam, unsubscribe anytime.

Comments (0)

Please log in to leave a comment.

Log In

Related Articles

Don't miss the next deep dive

Get one well-researched tutorial in your inbox each week. No spam, unsubscribe anytime.