Caching
Systems
The art of trading memory for speed โ and knowing exactly when that tradeoff breaks.
Caching 01
Why Caching Matters
A database query might take 50ms. The same query result served from Redis takes 0.5ms โ 100x faster. Multiply that by millions of requests and caching becomes the difference between a system that works and one that collapses under load.
Your brain doesn't re-read a textbook every time you need a fact you've already learned. You cached that knowledge in your working memory. A cache works the same way โ store the answer to expensive computations close to where you'll need them again.
What to Cache
- Expensive database queries: Complex JOINs, aggregations, full-text search results.
- Computed values: User recommendation scores, trending post counts, analytics dashboards.
- External API responses: Exchange rates, weather data, third-party lookups.
- Static/semi-static data: Configuration, feature flags, product catalogues.
- Session data: Authentication state, shopping carts.
What NOT to Cache
- User-specific highly dynamic data (real-time inventory)
- Data where stale reads cause financial or safety harm
- Data that changes on every request
- Sensitive data you don't want lingering in memory
Caching 02
Cache Strategies
Cache-Aside (Lazy Loading)
โพThe application manages the cache. On a read: check cache โ if miss, read DB โ write to cache โ return. On a write: update DB โ invalidate (or update) cache.
def get_user(user_id):
user = cache.get(f"user:{user_id}")
if user is None: # Cache miss
user = db.query("SELECT * FROM users WHERE id=?", user_id)
cache.set(f"user:{user_id}", user, ttl=300) # Cache for 5 min
return user
Pros: Only caches what's actually needed. Cache failure degrades gracefully (falls back to DB). Cons: First request is always a miss (cold start). Risk of stale data if DB is updated without invalidating cache.
Write-Through
โพEvery write goes to cache AND database synchronously. Cache is always up-to-date. Pros: No stale data. Cons: Write latency is higher (must wait for both). Cache fills with data that may never be read (write-heavy workloads waste cache memory).
def update_user(user_id, data):
db.update("UPDATE users SET ... WHERE id=?", user_id, data)
cache.set(f"user:{user_id}", data) # Update cache synchronously
Write-Back (Write-Behind)
โพWrite to cache only. Asynchronously flush to DB later. Pros: Fastest writes possible. Batching reduces DB load. Cons: Risk of data loss if cache crashes before flush. Complex to implement correctly. Only appropriate where occasional data loss is acceptable (analytics counters, view counts).
Read-Through
โพCache sits in front of DB. Application only talks to cache. On miss, the cache itself fetches from DB and stores. Application code is simpler. Used by: AWS ElastiCache read-through mode, some ORMs.
Caching 03
Eviction Policies
When the cache is full, what do you remove? The eviction policy determines this โ and it has a massive impact on your cache hit rate.
- LRU (Least Recently Used): Evict the item that hasn't been accessed in the longest time. Works well for most workloads. Assumption: recently used items are more likely to be needed again. Redis default (approximated).
- LFU (Least Frequently Used): Evict the item accessed fewest times. Better for skewed workloads where some items are always popular. Redis 4.0+ supports this.
- FIFO (First In, First Out): Evict oldest item regardless of access. Simple but poor hit rate for most real workloads.
- Random: Evict a random item. Surprisingly effective and cheap to implement. Used in CPU hardware caches.
- TTL-based: Items expire after a set time. Not an eviction policy per se โ expired items are removed regardless of other policies.
- allkeys-lru vs volatile-lru (Redis):
allkeys-lruevicts any key.volatile-lruonly evicts keys with a TTL set. Useallkeys-lruwhen all data is cache data. Usevolatile-lruwhen some keys are persistent.
Caching 04
TTL & Cache Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." โ Phil Karlton
TTL Strategies
- Fixed TTL: Every item expires after N seconds. Simple. Risk: if your data changes before TTL, clients see stale data for up to N seconds. Choose TTL based on acceptable staleness.
- Sliding TTL (access-based): TTL resets every time the item is accessed. Hot items stay cached indefinitely. Risk: stale but frequently accessed data stays cached forever.
- Event-driven invalidation: When data changes, explicitly delete or update the cache entry. Most accurate but tightly couples your cache logic to your write logic.
- Versioned keys: Cache key includes a version:
user:123:v7. When user changes, increment version. Old key naturally expires. No explicit invalidation needed โ old keys expire via TTL.
Classic invalidation bug: You update a user's email in the DB, but forget to invalidate the cache. For the next 5 minutes (TTL), any service reading from cache gets the old email. This causes: wrong email notifications, auth failures, data inconsistency. Always invalidate or update cache on writes.
Caching 05
Cache Stampede (Thundering Herd)
A cache stampede occurs when a popular cache entry expires and hundreds of concurrent requests simultaneously discover a cache miss. They all hit the database at the same time, potentially overwhelming it.
Scenario: Your homepage cache expires at midnight. 10,000 users are active. All 10,000 requests miss the cache simultaneously and fire 10,000 identical database queries. Your DB struggles. Responses are slow. Users retry. More queries. DB dies. Site down. All because of a TTL expiry.
Solutions
- Mutex / Locking: First request to detect a miss acquires a distributed lock, fetches from DB, populates cache. Other requests wait. After lock is released, they all hit the now-populated cache. Risk: lock contention, lock failure.
- Probabilistic Early Recomputation (PER): Before the TTL expires, with some probability, recompute the value early. The probability increases as you approach the TTL. Ensures cache is refreshed before expiry without a central lock.
- Jitter / Staggered TTLs: Add random jitter to TTLs:
ttl = base_ttl + random(0, base_ttl * 0.1). Prevents all keys from expiring at the same second. - Background refresh: A background job refreshes cache entries before they expire. Cache never actually misses for popular items. Used by CDNs (stale-while-revalidate).
- Serve stale: Return the stale cached value immediately while asynchronously refreshing in the background. Users see slightly stale data but never see a slow response.
Caching 06
Consistent Hashing
When you have multiple cache servers, you need to know which server holds which key. Naive approach: hash(key) % N. Problem: adding or removing a server changes N, which remaps almost every key โ causing a thundering herd as all keys miss and hit the database simultaneously.
Consistent hashing maps both keys AND servers onto a hash ring (0 to 2ยณยฒ). A key is owned by the first server encountered clockwise on the ring. Adding a server only remaps ~1/N of keys. Removing a server only remaps those keys to the next server.
Hash Ring (simplified):
0
330 30
300 60 S1 at position 45
270 90 S2 at position 160
180 S3 at position 270
Key "user:123" hashes to position 80
โ First server clockwise is S2 (position 160)
Add S4 at position 100:
โ Keys between 80 and 100 move from S2 to S4
โ All other keys unaffected โ
Virtual Nodes (Vnodes)
A single physical server is placed at multiple positions on the ring (virtual nodes). This ensures even distribution even with few physical servers, and allows proportional capacity โ a server with 2x memory gets 2x virtual nodes.
Caching 07
Hot Keys
A hot key is a cache key accessed so frequently that a single cache server becomes a bottleneck โ even if you have 100 servers. The key "trending:feed:global" might receive 100,000 requests per second, all hitting one Redis node.
Solutions
- Key replication: Store the same value under N keys:
trending:feed:0throughtrending:feed:9. On read, pick a random key. Spreads load across 10 cache nodes. - Local in-process cache: Add a tiny in-memory cache (LRU, 100 entries) inside each application server. Super hot keys are cached in-process โ zero network hop. Valid for data that can tolerate a few seconds of staleness.
- CDN for truly global hot content: If a response is the same for all users, cache it at the CDN edge. 0 database hits, 0 Redis hits.
- Read replicas for hot DB queries: If hot keys are missing and hammering DB, add DB read replicas and route hot reads there.
Caching 08
Live Cache Simulator
Watch LRU eviction in action. Click "Send Request" to request random keys. Cache size is 5. When the cache is full and a new key arrives, the Least Recently Used item is evicted.