Redis vs Memcached: Production Performance Lessons at 50M RPD

Last October, our API started throwing 503s at 2am. We'd hit 12 million requests per day, and our PostgreSQL read replicas were maxed out at 95% CPU. My manager Sarah called me - "We need caching, and we need it yesterday." I'd used Redis casually before, mostly for session storage. But this was different. We needed something that could handle tens of millions of operations per day, with sub-millisecond latency, without breaking our AWS budget.

I spent the next three weeks deep in the trenches with both Redis and Memcached. We A/B tested them in production, ran benchmarks until 3am, and discovered things that aren't in any documentation. This isn't a theoretical comparison - this is what actually happened when we scaled from 12M to 50M requests per day.

Why We Couldn't Just "Add More Database Servers"

Before I dive into Redis vs Memcached, let me explain why we even needed caching. You might think, "Just scale your database horizontally." We tried that first. Here's what we learned the hard way.

Our application serves product catalog data for an e-commerce platform. We had a primary PostgreSQL instance and three read replicas. The problem wasn't write throughput - it was reads. Specifically, these queries:

SELECT p.*, c.name as category_name, b.name as brand_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.id
LEFT JOIN brands b ON p.brand_id = b.id
WHERE p.status = 'active'
AND p.stock_quantity > 0
ORDER BY p.created_at DESC
LIMIT 50;

Even with proper indexes, this query took 45-80ms during peak hours. We were running it thousands of times per minute. At 12M requests per day, that's roughly 8,300 requests per minute during peak. Even distributed across three read replicas, each replica was handling 2,700+ queries per minute just for this endpoint.

We added a fourth read replica. Cost went up by $850/month (db.r5.2xlarge). Performance improved by maybe 15%. Not enough. The real issue was that we were querying the database for data that rarely changed. Product information updates maybe a few times per hour, but we were hitting the database thousands of times per minute.

That's when I realized we needed a proper caching layer. The question was: Redis or Memcached?

The Architecture Decision That Kept Me Up at Night

I started researching both options. The internet is full of comparisons, but they're mostly theoretical. "Redis supports more data structures." "Memcached is simpler and faster." Great, but what does that mean when you're serving real traffic?

My colleague Jake had used Memcached at his previous company (a social media analytics platform). He swore by its simplicity and raw speed. "It's just a hash table," he said. "That's all you need for caching."

But I'd heard Redis was more versatile. Our CTO had mentioned we might need pub/sub for real-time features eventually. Redis supports that natively. It also has persistence options, which sounded appealing.

I decided to test both in a staging environment first, then run a controlled A/B test in production. Here's what I set up:

Test Environment:

AWS ElastiCache for both Redis and Memcached
Redis: cache.r6g.xlarge (4 vCPUs, 12.93 GB RAM)
Memcached: cache.r6g.xlarge (4 vCPUs, 12.93 GB RAM)
Same instance size for fair comparison
Cost: ~~$0.282/hour each (~~$203/month)

Application Stack:

Python 3.11 with Flask
Flask-Caching library (supports both backends)
Gunicorn with 8 workers
NGINX as reverse proxy

Test Methodology: I used Apache Bench (ab) and Locust for load testing. I wanted to simulate real-world traffic patterns, not just synthetic benchmarks.

Round 1: Simple Key-Value Operations (Where Memcached Shines)

Let me start with the basics - simple GET/SET operations. This is where Memcached is supposed to excel. It's designed for exactly this use case.

Test Setup:

# Redis implementation
from flask_caching import Cache
import redis

redis_cache = Cache(config={
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_HOST': 'my-redis-cluster.cache.amazonaws.com',
    'CACHE_REDIS_PORT': 6379,
    'CACHE_REDIS_DB': 0,
    'CACHE_DEFAULT_TIMEOUT': 3600,
    'CACHE_KEY_PREFIX': 'prod_'
})

# Memcached implementation
memcached_cache = Cache(config={
    'CACHE_TYPE': 'memcached',
    'CACHE_MEMCACHED_SERVERS': ['my-memcached-cluster.cache.amazonaws.com:11211'],
    'CACHE_DEFAULT_TIMEOUT': 3600,
    'CACHE_KEY_PREFIX': 'prod_'
})

@app.route('/api/products/<int:product_id>')
def get_product(product_id):
    cache_key = f'product_{product_id}'
    
    # Try cache first
    cached_data = cache.get(cache_key)
    if cached_data:
        return jsonify(cached_data), 200
    
    # Cache miss - query database
    product = db.session.query(Product).filter_by(id=product_id).first()
    if not product:
        return jsonify({'error': 'Not found'}), 404
    
    product_data = {
        'id': product.id,
        'name': product.name,
        'price': float(product.price),
        'stock': product.stock_quantity,
        'category': product.category.name,
        'brand': product.brand.name
    }
    
    # Store in cache
    cache.set(cache_key, product_data, timeout=3600)
    
    return jsonify(product_data), 200

I ran 100,000 requests with 100 concurrent connections:

ab -n 100000 -c 100 https://api.example.com/api/products/12345

Memcached Results:

Requests per second:    18,423.67 [#/sec] (mean)
Time per request:       5.428 [ms] (mean)
Time per request:       0.054 [ms] (mean, across all concurrent requests)
Transfer rate:          12,847.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       3
Processing:     1    5   2.1      5      28
Waiting:        1    5   2.1      5      28
Total:          1    5   2.1      5      28

Percentage of requests served within a certain time (ms)
  50%      5
  66%      6
  75%      6
  80%      7
  90%      8
  95%      9
  98%     11
  99%     13
 100%     28 (longest request)

Redis Results:

Requests per second:    16,891.23 [#/sec] (mean)
Time per request:       5.920 [ms] (mean)
Time per request:       0.059 [ms] (mean, across all concurrent requests)
Transfer rate:          11,778.45 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       4
Processing:     1    6   2.4      5      32
Waiting:        1    6   2.4      5      32
Total:          1    6   2.4      6      32

Percentage of requests served within a certain time (ms)
  50%      6
  66%      7
  75%      7
  80%      8
  90%      9
  95%     11
  98%     13
  99%     15
 100%     32 (longest request)

Memcached was about 9% faster for simple GET operations. Not a huge difference, but noticeable at scale. With 50M requests per day, that 9% translates to roughly 4.5M fewer milliseconds of total processing time.

But here's what surprised me: the difference was most pronounced during peak load. When I pushed to 200 concurrent connections, Memcached maintained its performance better:

At 200 concurrent connections:

Memcached: 17,234 req/sec
Redis: 14,567 req/sec

Memcached's simpler architecture (no persistence layer, no complex data structures) meant less overhead. For pure key-value operations, Jake was right - it's faster.

Round 2: Complex Data Structures (Where Redis Dominates)

But then I tested something more realistic. Our product catalog doesn't just cache individual products. We cache:

Product lists by category
Search results
User shopping carts
Product recommendations
Real-time inventory counts

This is where things got interesting. Let me show you a real scenario we encountered.

Use Case: Shopping Cart Management

With Memcached, a shopping cart is just a serialized object:

# Memcached approach - serialize everything
import pickle

def add_to_cart_memcached(user_id, product_id, quantity):
    cart_key = f'cart_{user_id}'
    
    # Get entire cart
    cart = memcached_cache.get(cart_key)
    if not cart:
        cart = {}
    else:
        cart = pickle.loads(cart)
    
    # Modify cart
    if product_id in cart:
        cart[product_id] += quantity
    else:
        cart[product_id] = quantity
    
    # Save entire cart back
    memcached_cache.set(cart_key, pickle.dumps(cart))
    
    return cart

def get_cart_item_count_memcached(user_id):
    cart_key = f'cart_{user_id}'
    cart = memcached_cache.get(cart_key)
    if not cart:
        return 0
    cart = pickle.loads(cart)
    return sum(cart.values())

Every operation requires:

Fetching the entire cart
Deserializing it
Modifying it
Serializing it
Storing it back

With Redis, I can use native hash operations:

# Redis approach - atomic operations
import redis

redis_client = redis.Redis(
    host='my-redis-cluster.cache.amazonaws.com',
    port=6379,
    decode_responses=True
)

def add_to_cart_redis(user_id, product_id, quantity):
    cart_key = f'cart_{user_id}'
    
    # Atomic increment
    redis_client.hincrby(cart_key, product_id, quantity)
    
    # Set expiration
    redis_client.expire(cart_key, 86400)  # 24 hours
    
    # Return updated cart
    return redis_client.hgetall(cart_key)

def get_cart_item_count_redis(user_id):
    cart_key = f'cart_{user_id}'
    cart = redis_client.hgetall(cart_key)
    return sum(int(qty) for qty in cart.values())

I benchmarked 10,000 cart operations (mix of adds, updates, and reads):

Memcached:

Total time: 8.3 seconds
Average per operation: 0.83ms
Peak memory usage: 245 MB (due to serialization overhead)

Redis:

Total time: 3.1 seconds
Average per operation: 0.31ms
Peak memory usage: 89 MB

Redis was 2.7x faster for cart operations. Why? No serialization overhead. Redis understands hashes natively. Operations are atomic. No need to fetch-modify-store.

But the real win came when I tested concurrent cart modifications. Imagine 100 users all adding items to their carts simultaneously:

# Stress test with concurrent modifications
import concurrent.futures
import time

def stress_test_cart_operations(cache_impl, num_users=100, operations_per_user=50):
    start = time.time()
    
    def user_session(user_id):
        for i in range(operations_per_user):
            product_id = f"prod_{i % 20}"
            add_to_cart(cache_impl, user_id, product_id, 1)
            if i % 10 == 0:
                get_cart_item_count(cache_impl, user_id)
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
        futures = [executor.submit(user_session, f"user_{i}") for i in range(num_users)]
        concurrent.futures.wait(futures)
    
    elapsed = time.time() - start
    total_ops = num_users * operations_per_user
    return elapsed, total_ops / elapsed

# Results
memcached_time, memcached_ops = stress_test_cart_operations('memcached')
redis_time, redis_ops = stress_test_cart_operations('redis')

print(f"Memcached: {memcached_time:.2f}s, {memcached_ops:.0f} ops/sec")
print(f"Redis: {redis_time:.2f}s, {redis_ops:.0f} ops/sec")

Output:

Memcached: 24.67s, 202 ops/sec
Redis: 7.89s, 633 ops/sec

Redis handled concurrent modifications 3.1x better than Memcached. This is crucial for high-traffic applications where multiple operations happen simultaneously.

The Persistence Question That Changed Everything

Three weeks into our testing, something happened that made me really glad we'd chosen Redis for one of our use cases.

At 4:30am on a Tuesday, our Memcached cluster had a node failure. AWS ElastiCache automatically replaced it, but here's what happened: we lost all cached data on that node. Our cache hit rate dropped from 89% to 31% instantly. Database load spiked to 98% CPU. We had to throttle traffic for 20 minutes while the cache warmed up.

This is Memcached's design - it's a pure memory cache. When a node dies, data is gone. There's no persistence, no replication, no recovery.

Redis offers persistence options:

RDB (Redis Database Backup):

# redis.conf
save 900 1      # Save after 900 seconds if at least 1 key changed
save 300 10     # Save after 300 seconds if at least 10 keys changed
save 60 10000   # Save after 60 seconds if at least 10000 keys changed

dbfilename dump.rdb
dir /var/lib/redis

AOF (Append Only File):

appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec  # fsync every second (good balance)

I configured Redis with AOF persistence using appendfsync everysec. This means Redis writes every command to disk, but only fsyncs once per second. It's a good balance between durability and performance.

The performance impact? About 8-12% throughput reduction with AOF enabled. Here's what I measured:

Redis without persistence:

16,891 req/sec
0.059ms average latency

Redis with AOF (appendfsync everysec):

14,823 req/sec
0.067ms average latency

Is the performance hit worth it? Depends on your use case. For session data and shopping carts, absolutely. Losing a user's cart because a cache node died is a terrible user experience. For product catalog data that can be quickly regenerated from the database, maybe not.

With Memcached, you don't have this choice. It's always volatile.

Memory Efficiency: The Surprise Winner

I expected Redis to use more memory because of its richer data structures and persistence. I was wrong.

I loaded 1 million product records into both caches and measured memory usage:

Test Data:

# Each product record
{
    'id': 12345,
    'name': 'Sample Product Name',
    'price': 29.99,
    'stock': 100,
    'category': 'Electronics',
    'brand': 'BrandName',
    'description': 'A 200-character product description...',
    'attributes': {
        'color': 'blue',
        'size': 'medium',
        'weight': '1.5kg'
    }
}

Memcached Memory Usage:

# Check Memcached stats
echo "stats" | nc my-memcached-cluster.cache.amazonaws.com 11211

STAT bytes 3847293184
STAT limit_maxbytes 13421772800

Total: 3.58 GB for 1M records

Redis Memory Usage:

redis-cli -h my-redis-cluster.cache.amazonaws.com info memory

used_memory:2938472448
used_memory_human:2.74G
used_memory_peak:2938472448
used_memory_peak_human:2.74G

Total: 2.74 GB for 1M records

Redis used 23% less memory than Memcached for the same data. How?

Redis has better compression and more efficient storage for certain data types. When I stored the product attributes as a Redis hash instead of a serialized JSON string, memory usage dropped even further:

# Instead of storing as JSON string
cache.set(f'product_{id}', json.dumps(product_data))

# Store as Redis hash
redis_client.hset(f'product_{id}', mapping={
    'name': product_data['name'],
    'price': str(product_data['price']),
    'stock': str(product_data['stock']),
    # ... etc
})

With this approach, Redis memory usage dropped to 2.31 GB - a 35% reduction compared to Memcached.

But here's a gotcha: Redis memory usage can grow unpredictably if you're not careful with persistence. The AOF file can grow large over time. You need to configure AOF rewriting:

auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

This tells Redis to rewrite the AOF file when it grows 100% larger than the last rewrite and is at least 64MB. Without this, we saw AOF files grow to 15GB+ over a week, even though actual data was only 3GB.

The Eviction Policy Nightmare

Both Redis and Memcached use LRU (Least Recently Used) eviction when memory is full, but they behave differently. This caused us a production incident I'll never forget.

We were caching API responses with a 1-hour TTL. Under normal load, everything worked fine. Then we had a traffic spike - a product went viral on social media. Traffic increased 8x in 30 minutes.

What happened with Memcached:

Memcached started evicting keys to make room for new data. But here's the problem: it evicts based on LRU across the entire keyspace. It doesn't care about TTLs. So it was evicting keys that were only 5 minutes old (with 55 minutes left on their TTL) to make room for new keys.

Our cache hit rate dropped from 87% to 43%. Database load spiked. We had to emergency-scale our database.

What happened with Redis:

Redis has multiple eviction policies. We were using volatile-lru, which only evicts keys with an expiration set, and prefers keys that are expiring soon.

maxmemory 10gb
maxmemory-policy volatile-lru

During the same traffic spike, Redis maintained a 76% cache hit rate. It was smarter about what to evict.

Here are Redis's eviction policies:

noeviction: Return errors when memory limit is reached
allkeys-lru: Evict any key using LRU
volatile-lru: Evict keys with expiration set using LRU
allkeys-random: Evict random keys
volatile-random: Evict random keys with expiration set
volatile-ttl: Evict keys with expiration set, preferring shorter TTL

For our use case, volatile-lru was perfect. All our cached data had TTLs, and we wanted to keep frequently accessed data even during memory pressure.

With Memcached, you only get LRU across everything. No nuance.

Real-World Production Architecture

After three months of testing, here's what we actually deployed:

Redis for:

User sessions (need persistence, need expiration)
Shopping carts (need atomic operations, need persistence)
Real-time data (pub/sub, sorted sets for leaderboards)
Rate limiting (atomic increments, TTLs)

Memcached for:

Product catalog cache (pure speed, can regenerate quickly)
API response cache (simple key-value, high throughput)
Database query cache (temporary, high volume)

We're running:

2x Redis clusters (cache.r6g.xlarge) - $406/month
1x Memcached cluster (cache.r6g.2xlarge) - $407/month
Total: ~$813/month

Redis Cluster Configuration:

# Production Redis config
maxmemory 10gb
maxmemory-policy volatile-lru
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# Replication
replicaof no one  # This is the master
replica-read-only yes
min-replicas-to-write 1
min-replicas-max-lag 10

# Persistence
save 900 1
save 300 10
save 60 10000

Memcached Configuration:

# ElastiCache parameter group
maxmemory: 20gb
chunk_size: 48
max_item_size: 1mb

The Client Library Minefield

One thing that bit us hard: not all Redis/Memcached clients are created equal. We started with pylibmc for Memcached and redis-py for Redis. Both are popular, but we ran into issues.

Memcached Client Problems:

pylibmc is fast but has quirks:

import pylibmc

# Connection pooling is critical
mc = pylibmc.Client(
    ['my-memcached-cluster.cache.amazonaws.com:11211'],
    binary=True,
    behaviors={
        'tcp_nodelay': True,
        'ketama': True,  # Consistent hashing
        'no_block': True,
        'num_threads': 4
    }
)

# But it doesn't handle connection failures gracefully
try:
    mc.set('key', 'value')
except pylibmc.Error as e:
    # Connection died, but pylibmc doesn't auto-reconnect
    # You need to recreate the client
    mc = pylibmc.Client([...])

We had to implement our own retry logic and connection pooling. After two weeks of debugging intermittent failures, we switched to pymemcache:

from pymemcache.client.base import PooledClient
from pymemcache import serde

# Much better connection handling
mc = PooledClient(
    'my-memcached-cluster.cache.amazonaws.com',
    max_pool_size=32,
    connect_timeout=2.0,
    timeout=1.0,
    no_delay=True,
    serde=serde.pickle_serde
)

pymemcache handles connection failures gracefully, has better pooling, and is actively maintained.

Redis Client Lessons:

redis-py is solid, but you need to configure connection pooling properly:

import redis

# Bad: creates new connection for every operation
r = redis.Redis(host='my-redis-cluster.cache.amazonaws.com', port=6379)

# Good: uses connection pool
pool = redis.ConnectionPool(
    host='my-redis-cluster.cache.amazonaws.com',
    port=6379,
    max_connections=50,
    socket_keepalive=True,
    socket_connect_timeout=2,
    socket_timeout=2,
    retry_on_timeout=True,
    health_check_interval=30
)
r = redis.Redis(connection_pool=pool)

We also discovered that redis-py's default behavior is to decode all responses as strings. This caused weird bugs with binary data:

# Default behavior
r = redis.Redis(host='...', decode_responses=True)
r.set('key', b'\x00\x01\x02')  # Binary data
value = r.get('key')  # Raises UnicodeDecodeError!

# Fix: disable decode_responses for binary data
r = redis.Redis(host='...', decode_responses=False)
r.set('key', b'\x00\x01\x02')
value = r.get('key')  # Returns b'\x00\x01\x02'

Monitoring and Observability

You can't optimize what you don't measure. Here's what we monitor for both Redis and Memcached.

Redis Monitoring:

We use CloudWatch for AWS ElastiCache, but also export metrics to Prometheus:

from prometheus_client import Counter, Histogram, Gauge
import time

# Metrics
redis_ops = Counter('redis_operations_total', 'Total Redis operations', ['operation', 'status'])
redis_latency = Histogram('redis_operation_duration_seconds', 'Redis operation latency', ['operation'])
redis_connections = Gauge('redis_connections_active', 'Active Redis connections')

def monitored_redis_get(key):
    start = time.time()
    try:
        result = redis_client.get(key)
        redis_ops.labels(operation='get', status='success').inc()
        return result
    except Exception as e:
        redis_ops.labels(operation='get', status='error').inc()
        raise
    finally:
        redis_latency.labels(operation='get').observe(time.time() - start)

Key Redis Metrics:

# Get Redis stats
redis-cli -h my-redis-cluster.cache.amazonaws.com info

# Important metrics:
connected_clients:42
used_memory:2847293184
used_memory_peak:3012847392
instantaneous_ops_per_sec:8234
keyspace_hits:182374
keyspace_misses:23847
evicted_keys:0
expired_keys:14823

# Cache hit rate
hit_rate = keyspace_hits / (keyspace_hits + keyspace_misses)
# 182374 / (182374 + 23847) = 88.4%

We alert on:

Hit rate < 75%
Evicted keys > 1000/hour
Used memory > 90% of max
Connected clients > 80% of max
Replication lag > 5 seconds

Memcached Monitoring:

Memcached's stats are simpler but still valuable:

import pymemcache
from pymemcache.client.base import PooledClient

mc = PooledClient('my-memcached-cluster.cache.amazonaws.com')

# Get stats
stats = mc.stats()

# Key metrics
print(f"Get hits: {stats[b'get_hits']}")
print(f"Get misses: {stats[b'get_misses']}")
print(f"Evictions: {stats[b'evictions']}")
print(f"Bytes used: {stats[b'bytes']}")
print(f"Current connections: {stats[b'curr_connections']}")

# Calculate hit rate
hits = int(stats[b'get_hits'])
misses = int(stats[b'get_misses'])
hit_rate = hits / (hits + misses) if (hits + misses) > 0 else 0
print(f"Hit rate: {hit_rate:.2%}")

Output:

Get hits: 8234782
Get misses: 892341
Evictions: 12847
Bytes used: 3847293184
Current connections: 156
Hit rate: 90.22%

Cache Invalidation: The Hard Problem

Phil Karlton said there are only two hard things in Computer Science: cache invalidation and naming things. He was right.

We tried several cache invalidation strategies. Here's what worked and what didn't.

Strategy 1: TTL-Based (Simple but Wasteful)

# Set everything with a TTL
cache.set('product_12345', product_data, timeout=3600)  # 1 hour

# Problem: Data might update every 5 minutes, but we're serving
# stale data for up to an hour

This is the simplest approach, but it's wasteful. We were serving stale product prices for up to an hour. Not acceptable for e-commerce.

Strategy 2: Write-Through Cache (Complex but Accurate)

def update_product(product_id, new_data):
    # Update database
    product = db.session.query(Product).get(product_id)
    product.name = new_data['name']
    product.price = new_data['price']
    db.session.commit()
    
    # Immediately update cache
    cache_key = f'product_{product_id}'
    cache.set(cache_key, {
        'id': product.id,
        'name': product.name,
        'price': float(product.price),
        # ...
    }, timeout=3600)

This works but requires changing every database write. We had 47 different places in our codebase that updated products. Retrofitting this was a nightmare.

Strategy 3: Event-Based Invalidation (Our Solution)

We implemented a pub/sub system using Redis:

# Publisher (runs after database updates)
def publish_cache_invalidation(entity_type, entity_id):
    redis_client.publish(
        'cache_invalidation',
        json.dumps({
            'type': entity_type,
            'id': entity_id,
            'timestamp': time.time()
        })
    )

# After any product update
def update_product(product_id, new_data):
    product = db.session.query(Product).get(product_id)
    product.name = new_data['name']
    product.price = new_data['price']
    db.session.commit()
    
    # Publish invalidation event
    publish_cache_invalidation('product', product_id)

# Subscriber (runs in background worker)
import redis

def cache_invalidation_worker():
    pubsub = redis_client.pubsub()
    pubsub.subscribe('cache_invalidation')
    
    for message in pubsub.listen():
        if message['type'] == 'message':
            data = json.loads(message['data'])
            
            if data['type'] == 'product':
                # Delete from both caches
                cache_key = f"product_{data['id']}"
                redis_client.delete(cache_key)
                memcached_client.delete(cache_key)
                
                # Also invalidate related caches
                category_key = f"category_products_{product.category_id}"
                redis_client.delete(category_key)
                memcached_client.delete(category_key)

This approach works with both Redis and Memcached, but it requires Redis for the pub/sub channel. Memcached doesn't support pub/sub.

Strategy 4: Cache Tags (Redis Only)

For complex invalidation scenarios, we use Redis sets as tags:

def cache_product_with_tags(product):
    cache_key = f'product_{product.id}'
    
    # Store product data
    redis_client.hset(cache_key, mapping={
        'name': product.name,
        'price': str(product.price),
        'category_id': str(product.category_id)
    })
    redis_client.expire(cache_key, 3600)
    
    # Add to tag sets
    redis_client.sadd(f'tag:category_{product.category_id}', cache_key)
    redis_client.sadd(f'tag:brand_{product.brand_id}', cache_key)

def invalidate_by_category(category_id):
    tag_key = f'tag:category_{category_id}'
    
    # Get all keys with this tag
    keys = redis_client.smembers(tag_key)
    
    # Delete them all
    if keys:
        redis_client.delete(*keys)
    
    # Delete tag set
    redis_client.delete(tag_key)

This is powerful but Redis-specific. Memcached can't do this.

The Cost Analysis Nobody Talks About

Let's talk money. Everyone focuses on performance, but what about cost?

Our Monthly AWS ElastiCache Costs:

Redis (2x cache.r6g.xlarge):

Instance cost: 2 × $0.282/hour × 730 hours = $411.72
Data transfer: ~$15/month
Backup storage (snapshots): $8/month
Total: ~$435/month

Memcached (1x cache.r6g.2xlarge):

Instance cost: $0.564/hour × 730 hours = $411.72
Data transfer: ~$12/month
No backup costs (no persistence)
Total: ~$424/month

Similar costs, but here's what we saved:

Before caching:

4x PostgreSQL read replicas (db.r5.2xlarge): 4 × $0.96/hour × 730 = $2,803/month
RDS data transfer: ~$80/month
Total: ~$2,883/month

After caching:

2x PostgreSQL read replicas (db.r5.xlarge): 2 × $0.48/hour × 730 = $700/month
Redis: $435/month
Memcached: $424/month
RDS data transfer: ~$25/month
Total: ~$1,584/month

Savings: $1,299/month (~45% reduction)

But the real savings came from avoiding database scaling. Without caching, we would have needed to scale to db.r5.4xlarge instances at 50M requests/day. That would have cost $3.84/hour per instance - $5,606/month for two instances.

With caching, our database costs actually went down as traffic increased.

Performance Tuning: The Deep Cuts

Here are the performance optimizations that made the biggest difference:

1. Connection Pooling (30% Improvement)

# Before: Creating new connections
def get_product_bad(product_id):
    r = redis.Redis(host='...')  # New connection every time!
    return r.get(f'product_{product_id}')

# After: Using connection pool
pool = redis.ConnectionPool(
    host='my-redis-cluster.cache.amazonaws.com',
    port=6379,
    max_connections=50,
    socket_keepalive=True
)
redis_client = redis.Redis(connection_pool=pool)

def get_product_good(product_id):
    return redis_client.get(f'product_{product_id}')

This single change reduced our P95 latency from 8.2ms to 5.7ms - a 30% improvement.

2. Pipelining (2.5x Throughput)

When fetching multiple keys, use pipelining:

# Bad: Multiple round trips
def get_products_bad(product_ids):
    products = []
    for pid in product_ids:
        product = redis_client.get(f'product_{pid}')
        if product:
            products.append(json.loads(product))
    return products

# Good: Single round trip
def get_products_good(product_ids):
    pipe = redis_client.pipeline()
    for pid in product_ids:
        pipe.get(f'product_{pid}')
    results = pipe.execute()
    
    products = []
    for result in results:
        if result:
            products.append(json.loads(result))
    return products

For 100 products:

Bad approach: 100 network round trips, ~80ms total
Good approach: 1 network round trip, ~3ms total

2.7x faster for batch operations.

3. Serialization Format (40% Memory Reduction)

We tested different serialization formats:

import json
import pickle
import msgpack

product_data = {
    'id': 12345,
    'name': 'Sample Product',
    'price': 29.99,
    # ... more fields
}

# JSON
json_size = len(json.dumps(product_data))
# 487 bytes

# Pickle
pickle_size = len(pickle.dumps(product_data))
# 412 bytes (15% smaller)

# MessagePack
msgpack_size = len(msgpack.packb(product_data))
# 291 bytes (40% smaller!)

We switched to MessagePack for large objects. Memory usage dropped by 38%, and we could cache 60% more data in the same memory.

4. Key Naming Strategy (Faster Lookups)

We standardized our key naming:

# Bad: Inconsistent naming
cache.set(f'prod_{id}', ...)
cache.set(f'product-{id}', ...)
cache.set(f'{id}_product', ...)

# Good: Consistent, hierarchical naming
cache.set(f'product:{id}', ...)
cache.set(f'product:{id}:reviews', ...)
cache.set(f'category:{cat_id}:products', ...)

This made invalidation easier and debugging faster. We could also use Redis's SCAN command to find related keys:

# Find all product keys
cursor = 0
product_keys = []
while True:
    cursor, keys = redis_client.scan(cursor, match='product:*', count=100)
    product_keys.extend(keys)
    if cursor == 0:
        break

5. Compression for Large Values (70% Size Reduction)

For values > 1KB, we added compression:

import zlib
import json

def cache_set_compressed(key, value, timeout=3600):
    json_data = json.dumps(value)
    
    if len(json_data) > 1024:  # Only compress if > 1KB
        compressed = zlib.compress(json_data.encode('utf-8'))
        redis_client.setex(f'{key}:compressed', timeout, compressed)
    else:
        redis_client.setex(key, timeout, json_data)

def cache_get_compressed(key):
    # Try compressed first
    compressed = redis_client.get(f'{key}:compressed')
    if compressed:
        json_data = zlib.decompress(compressed).decode('utf-8')
        return json.loads(json_data)
    
    # Fall back to uncompressed
    json_data = redis_client.get(key)
    if json_data:
        return json.loads(json_data)
    
    return None

For product descriptions (average 2KB), this reduced memory usage by 68%.

The Gotchas That Cost Us Days

Here are the painful lessons we learned:

1. Redis Maxmemory Policy Confusion

We set maxmemory-policy allkeys-lru thinking it would work for all use cases. Wrong. When we started using Redis for rate limiting (keys without expiration), Redis started evicting our rate limit counters!

# Rate limiting code
def check_rate_limit(user_id):
    key = f'rate_limit:{user_id}'
    count = redis_client.incr(key)
    
    if count == 1:
        redis_client.expire(key, 60)  # 1 minute window
    
    return count <= 100  # Max 100 requests per minute

# Problem: If Redis is full, it evicts rate limit keys!
# Users could bypass rate limits during high load.

Fix: We separated rate limiting into a dedicated Redis instance with maxmemory-policy noeviction. This prevents eviction and returns errors when full, which we handle gracefully:

def check_rate_limit_safe(user_id):
    key = f'rate_limit:{user_id}'
    try:
        count = redis_client.incr(key)
        if count == 1:
            redis_client.expire(key, 60)
        return count <= 100
    except redis.exceptions.ResponseError as e:
        if 'OOM' in str(e):
            # Redis is full - fail closed
            return False
        raise

2. Memcached Silent Failures

Memcached silently drops values larger than 1MB by default. We were caching search results that occasionally exceeded this limit. No error, no warning - just cache misses.

# This silently fails if data > 1MB
memcached_client.set('search_results', large_data)

# Returns None, even though set() returned True!
result = memcached_client.get('search_results')

Fix: We added size checks and split large values:

def memcached_set_safe(key, value, timeout=3600):
    serialized = pickle.dumps(value)
    
    if len(serialized) > 900_000:  # 900KB threshold
        # Split into chunks
        chunks = [serialized[i:i+900_000] for i in range(0, len(serialized), 900_000)]
        
        for i, chunk in enumerate(chunks):
            memcached_client.set(f'{key}:chunk:{i}', chunk, timeout)
        
        # Store metadata
        memcached_client.set(f'{key}:chunks', len(chunks), timeout)
    else:
        memcached_client.set(key, serialized, timeout)

3. Redis Cluster Hash Tags

When we moved to Redis Cluster for horizontal scaling, our multi-key operations broke:

# This fails in Redis Cluster if keys are on different nodes
pipe = redis_client.pipeline()
pipe.get('user:123:cart')
pipe.get('user:123:wishlist')
pipe.execute()  # CROSSSLOT error!

Fix: Use hash tags to ensure related keys are on the same node:

# Force both keys to same slot using {user:123}
pipe = redis_client.pipeline()
pipe.get('user:{user:123}:cart')
pipe.get('user:{user:123}:wishlist')
pipe.execute()  # Works!

The part in curly braces determines which slot the key goes to. All keys with the same hash tag go to the same node.

4. Cache Stampede

When a popular cache key expires, hundreds of requests simultaneously try to regenerate it. We saw this with our homepage product feed:

def get_homepage_products():
    products = cache.get('homepage_products')
    if products:
        return products
    
    # Cache miss - query database
    # Problem: 100 concurrent requests all hit this code path
    products = db.session.query(Product).filter_by(featured=True).all()
    cache.set('homepage_products', products, timeout=300)
    return products

During a cache expiration, we saw 200+ simultaneous database queries for the same data.

Fix: We implemented a cache stampede prevention using Redis locks:

import time

def get_homepage_products():
    products = cache.get('homepage_products')
    if products:
        return products
    
    # Try to acquire lock
    lock_key = 'lock:homepage_products'
    lock_acquired = redis_client.set(lock_key, '1', nx=True, ex=10)
    
    if lock_acquired:
        # We got the lock - regenerate cache
        try:
            products = db.session.query(Product).filter_by(featured=True).all()
            cache.set('homepage_products', products, timeout=300)
            return products
        finally:
            redis_client.delete(lock_key)
    else:
        # Someone else is regenerating - wait and retry
        time.sleep(0.1)
        products = cache.get('homepage_products')
        if products:
            return products
        
        # Still not ready - query database (fallback)
        return db.session.query(Product).filter_by(featured=True).all()

This reduced database load during cache regeneration by 95%.

When to Choose Redis vs Memcached

After six months in production, here's my honest recommendation:

Choose Memcached when:

You need absolute maximum throughput for simple key-value operations
Your data is purely ephemeral (OK to lose on restart)
You're caching data that's easy to regenerate (database query results)
You want the simplest possible setup
You're operating at extreme scale (100M+ ops/sec) where every microsecond matters

Choose Redis when:

You need data structures beyond key-value (hashes, sets, sorted sets, lists)
You need persistence (sessions, carts, user state)
You need pub/sub or messaging
You need atomic operations (counters, rate limiting)
You need better eviction policies
You want better observability and debugging tools
You might need advanced features later (streams, transactions, Lua scripts)

For most teams, I recommend Redis. The flexibility is worth the slight performance trade-off. You'll eventually need one of Redis's advanced features, and retrofitting is painful.

But if you're building a pure caching layer for database queries and you're absolutely sure you'll never need anything beyond key-value, Memcached is slightly faster and simpler.

Our Final Architecture

Here's what we run in production today:

Primary Redis Cluster (cache.r6g.xlarge):

User sessions
Shopping carts
Rate limiting
Real-time features (pub/sub)
Configuration: AOF persistence, volatile-lru eviction
~12GB memory, 89% hit rate

Secondary Redis Cluster (cache.r6g.large):

Product metadata (hashes)
Search indexes (sorted sets)
Analytics counters
Configuration: RDB snapshots only, volatile-lru eviction
~6GB memory, 84% hit rate

Memcached Cluster (cache.r6g.2xlarge):

Database query results
API response cache
Template fragments
Configuration: LRU eviction, no persistence
~20GB memory, 91% hit rate

Performance at 50M requests/day:

Average API response time: 87ms (down from 340ms)
P95 response time: 210ms (down from 1,200ms)
Database CPU usage: 45% (down from 98%)
Cache hit rate: 88% overall
Infrastructure cost: $1,584/month (down from $2,883/month)

We're handling 4x more traffic with 45% lower infrastructure costs. That's the power of proper caching.

What I'd Do Differently

Looking back, here's what I wish I'd known:

Start with Redis everywhere. We wasted time managing two caching systems. The performance difference isn't significant enough to justify the operational complexity.
Invest in monitoring from day one. We added detailed metrics after our first incident. Should have done it from the start.
Design for cache invalidation upfront. We retrofitted event-based invalidation after launch. Building it in from the beginning would have saved weeks.
Load test with realistic data. Our initial tests used small, uniform cache values. Real production data has high variance in size and access patterns.
Plan for failure modes. What happens when cache is down? When it's full? When a node fails? We learned this the hard way at 3am.
Document your caching strategy. Six months later, new team members are confused about what goes where and why. Write it down.

The journey from 12M to 50M requests per day taught us more about caching than any tutorial ever could. Redis and Memcached are both excellent tools - the key is understanding when to use each one and how to use them properly.

If you're facing similar scaling challenges, I hope this helps you avoid some of the mistakes we made. And if you're still on the fence between Redis and Memcached, my advice is simple: start with Redis. You can always add Memcached later if you need that extra 10% performance. But you probably won't need to.

Articles

Tutorials

Bloggers

Redis vs Memcached: What We Learned Scaling to 50M Requests Per Day

Listen to Article

Why We Couldn't Just "Add More Database Servers"

The Architecture Decision That Kept Me Up at Night

Round 1: Simple Key-Value Operations (Where Memcached Shines)

Round 2: Complex Data Structures (Where Redis Dominates)

The Persistence Question That Changed Everything

Memory Efficiency: The Surprise Winner

The Eviction Policy Nightmare

Real-World Production Architecture

The Client Library Minefield

Monitoring and Observability

Cache Invalidation: The Hard Problem

The Cost Analysis Nobody Talks About

Performance Tuning: The Deep Cuts

The Gotchas That Cost Us Days

When to Choose Redis vs Memcached

Our Final Architecture

What I'd Do Differently

Never Miss an Article

Comments (0)

Related Articles

Comparing NASA's Orbit Determination Program (ODP) with ESA's ORBIT14: Accuracy and Efficiency in Satellite Orbit Determination

Building a Complete E-commerce Website with Laravel: What We Learned Scaling to 100k Orders

Building a Real-Time Notification System with Laravel and Pusher: Production Lessons from 2M Daily Events

Articles

Tutorials

Bloggers

Redis vs Memcached: What We Learned Scaling to 50M Requests Per Day

Listen to Article

Why We Couldn't Just "Add More Database Servers"

The Architecture Decision That Kept Me Up at Night

Round 1: Simple Key-Value Operations (Where Memcached Shines)

Round 2: Complex Data Structures (Where Redis Dominates)

The Persistence Question That Changed Everything

Memory Efficiency: The Surprise Winner

The Eviction Policy Nightmare

Real-World Production Architecture

The Client Library Minefield

Monitoring and Observability

Cache Invalidation: The Hard Problem

The Cost Analysis Nobody Talks About

Performance Tuning: The Deep Cuts

The Gotchas That Cost Us Days

When to Choose Redis vs Memcached

Our Final Architecture

What I'd Do Differently

Never Miss an Article

Comments (0)

Related Articles

Comparing NASA's Orbit Determination Program (ODP) with ESA's ORBIT14: Accuracy and Efficiency in Satellite Orbit Determination

Building a Complete E-commerce Website with Laravel: What We Learned Scaling to 100k Orders

Building a Real-Time Notification System with Laravel and Pusher: Production Lessons from 2M Daily Events

Cookie & Ad Consent