Database Performance: Indexing & Caching Strategies That Scale - NextGenBeing Database Performance: Indexing & Caching Strategies That Scale - NextGenBeing
Back to discoveries

Optimizing Database Performance with Indexing and Caching: What We Learned Scaling to 100M Queries/Day

How we reduced query times from 4.2s to 180ms and cut infrastructure costs by $85k/month using strategic indexing and multi-layer caching at scale.

Comprehensive Tutorials Premium Content 14 min read
Admin

Admin

Apr 18, 2026 1 views
Size:
Height:
📖 14 min read 📝 4,094 words 👁 Focus mode: ✨ Eye care:

Listen to Article

Loading...
0:00 / 0:00
0:00 0:00
Low High
0% 100%
⏸ Paused ▶️ Now playing... Ready to play ✓ Finished

Last March, our platform hit a wall at 47 million queries per day. Our PostgreSQL database was melting down. Average response times had crept from 800ms to 4.2 seconds. Users were complaining. Our infrastructure costs had ballooned to $142k/month just for database instances. My CTO Sarah pulled me into a conference room and said, "We need to fix this in two weeks, or we're looking at a complete architecture rewrite."

I'd been at the company for three years, and I thought I knew our database pretty well. Turns out, I knew almost nothing about how it actually performed under real load. What followed was an intense two-week sprint that taught me more about database optimization than my previous five years of development combined.

Here's what we learned about indexing and caching when your back is against the wall and every millisecond counts.

The Performance Crisis Nobody Saw Coming

We'd grown from 10,000 to 500,000 active users in eight months. Our application was a SaaS analytics platform that processed customer data in real-time. Every user action triggered multiple database queries—sometimes 15-20 queries per page load.

The warning signs were there, but we'd ignored them. Our monitoring showed query times slowly climbing, but we kept saying "we'll optimize later." That's the mistake everyone makes. Later becomes never until it becomes a crisis.

When we finally dug into the data, the numbers were brutal:

  • 47 million queries per day
  • Average query time: 4.2 seconds
  • 95th percentile: 12.8 seconds
  • Peak hour queries timing out completely
  • Database CPU consistently above 85%
  • Connection pool exhausted during traffic spikes

Our PostgreSQL instance was a db.r5.4xlarge on AWS (16 vCPUs, 128GB RAM), and we were already discussing upgrading to a db.r5.8xlarge at $6,800/month. That's when I realized we were trying to solve a software problem with hardware.

What the Monitoring Data Actually Revealed

I spent the first two days just understanding what was happening. We used pgBadger to analyze our PostgreSQL logs, and the results were eye-opening.

The top 10 slowest queries accounted for 73% of our total database time. Let me show you the actual query that was killing us:

SELECT 
    u.id, u.email, u.name, u.created_at,
    p.plan_name, p.price,
    COUNT(e.id) as event_count,
    MAX(e.created_at) as last_event
FROM users u
LEFT JOIN subscriptions s ON u.id = s.user_id
LEFT JOIN plans p ON s.plan_id = p.id
LEFT JOIN events e ON u.id = e.user_id
WHERE u.company_id = $1
    AND u.status = 'active'
    AND e.created_at >= NOW() - INTERVAL '30 days'
GROUP BY u.id, u.email, u.name, u.created_at, p.plan_name, p.price
ORDER BY event_count DESC
LIMIT 50;

This query ran on our dashboard page. Every single time someone loaded their dashboard, this query executed. With 500k active users, that's a lot of dashboard loads.

Running EXPLAIN ANALYZE showed the horror:

Planning Time: 2.347 ms
Execution Time: 4,234.891 ms

Seq Scan on events e  (cost=0.00..892347.23 rows=1847293 width=16)
  Filter: (created_at >= (now() - '30 days'::interval))
  Rows Removed by Filter: 38472934

We were doing a sequential scan on a table with 40 million rows. Every. Single. Time.

The Indexing Strategy That Changed Everything

I'll be honest—I thought I understood indexes. I knew you put them on columns you query frequently. What I didn't understand was the nuance of composite indexes, index ordering, partial indexes, and how PostgreSQL actually uses them.

My colleague Jake, who'd worked at a high-frequency trading firm, sat down with me and explained what I was missing.

Composite Indexes: Order Matters More Than You Think

The first thing Jake pointed out was our index on the events table:

CREATE INDEX idx_events_user_id ON events(user_id);
CREATE INDEX idx_events_created_at ON events(created_at);

"You've got two separate indexes," he said. "PostgreSQL might use one, but it can't efficiently use both together for this query."

We needed a composite index, but the order mattered. Here's what we created:

CREATE INDEX idx_events_user_created 
ON events(user_id, created_at DESC) 
WHERE created_at >= NOW() - INTERVAL '90 days';

Notice three things:

  1. Column order: user_id first, then created_at. We filter by user_id and then need to find recent events for that user.

  2. DESC ordering: We're ordering by event_count DESC in the query, and this index helps with that.

  3. Partial index: We only care about events from the last 90 days. Why index historical data we never query?

This one index reduced our query time from 4.2 seconds to 380ms. A 91% improvement from a single CREATE INDEX statement.

But we weren't done.

The Covering Index Trick Nobody Talks About

The query was still doing a lot of work. After using the index to find the right rows, PostgreSQL had to go back to the table to fetch the actual data (a "heap fetch"). With millions of rows, that's expensive.

Jake introduced me to covering indexes—indexes that include all the columns you need, so PostgreSQL never has to touch the main table:

CREATE INDEX idx_events_user_created_covering 
ON events(user_id, created_at DESC) 
INCLUDE (id)
WHERE created_at >= NOW() - INTERVAL '90 days';

The INCLUDE clause adds columns to the index without making them part of the index key. This means PostgreSQL can satisfy the entire query from the index alone.

Query time dropped to 180ms. We'd gone from 4.2 seconds to 180ms with strategic indexing.

The Index Maintenance Problem We Discovered Later

Three weeks after deploying our new indexes, we noticed something weird. Our write performance had degraded. INSERT and UPDATE operations on the events table were taking 40% longer.

This is the trade-off nobody mentions in the tutorials. Every index you add makes writes slower because PostgreSQL has to update the index on every INSERT, UPDATE, or DELETE.

Unlock Premium Content

You've read 30% of this article

What's in the full article

  • Complete step-by-step implementation guide
  • Working code examples you can copy-paste
  • Advanced techniques and pro tips
  • Common mistakes to avoid
  • Real-world examples and metrics

Join 10,000+ developers who love our premium content

Never Miss an Article

Get our best content delivered to your inbox weekly. No spam, unsubscribe anytime.

Comments (0)

Please log in to leave a comment.

Log In

Related Articles