Module 04

Caching
Systems

The art of trading memory for speed — and knowing exactly when that tradeoff breaks.

High Impact Interview Critical

Caching 01

Why Caching Matters

A database query might take 50ms. The same query result served from Redis takes 0.5ms — 100x faster. Multiply that by millions of requests and caching becomes the difference between a system that works and one that collapses under load.

Your brain doesn't re-read a textbook every time you need a fact you've already learned. You cached that knowledge in your working memory. A cache works the same way — store the answer to expensive computations close to where you'll need them again.

What to Cache

Expensive database queries: Complex JOINs, aggregations, full-text search results.
Computed values: User recommendation scores, trending post counts, analytics dashboards.
External API responses: Exchange rates, weather data, third-party lookups.
Static/semi-static data: Configuration, feature flags, product catalogues.
Session data: Authentication state, shopping carts.

What NOT to Cache

User-specific highly dynamic data (real-time inventory)
Data where stale reads cause financial or safety harm
Data that changes on every request
Sensitive data you don't want lingering in memory

❦

Caching 02

Cache Strategies

Cache-Aside (Lazy Loading)

▾

The application manages the cache. On a read: check cache → if miss, read DB → write to cache → return. On a write: update DB → invalidate (or update) cache.

def get_user(user_id):
    user = cache.get(f"user:{user_id}")
    if user is None:                    # Cache miss
        user = db.query("SELECT * FROM users WHERE id=?", user_id)
        cache.set(f"user:{user_id}", user, ttl=300)  # Cache for 5 min
    return user

Pros: Only caches what's actually needed. Cache failure degrades gracefully (falls back to DB). Cons: First request is always a miss (cold start). Risk of stale data if DB is updated without invalidating cache.

Write-Through

▾

Every write goes to cache AND database synchronously. Cache is always up-to-date. Pros: No stale data. Cons: Write latency is higher (must wait for both). Cache fills with data that may never be read (write-heavy workloads waste cache memory).

def update_user(user_id, data):
    db.update("UPDATE users SET ... WHERE id=?", user_id, data)
    cache.set(f"user:{user_id}", data)   # Update cache synchronously

Write-Back (Write-Behind)

▾

Write to cache only. Asynchronously flush to DB later. Pros: Fastest writes possible. Batching reduces DB load. Cons: Risk of data loss if cache crashes before flush. Complex to implement correctly. Only appropriate where occasional data loss is acceptable (analytics counters, view counts).

Read-Through

▾

Cache sits in front of DB. Application only talks to cache. On miss, the cache itself fetches from DB and stores. Application code is simpler. Used by: AWS ElastiCache read-through mode, some ORMs.

❦

Caching 03

Eviction Policies

When the cache is full, what do you remove? The eviction policy determines this — and it has a massive impact on your cache hit rate.

LRU (Least Recently Used): Evict the item that hasn't been accessed in the longest time. Works well for most workloads. Assumption: recently used items are more likely to be needed again. Redis default (approximated).
LFU (Least Frequently Used): Evict the item accessed fewest times. Better for skewed workloads where some items are always popular. Redis 4.0+ supports this.
FIFO (First In, First Out): Evict oldest item regardless of access. Simple but poor hit rate for most real workloads.
Random: Evict a random item. Surprisingly effective and cheap to implement. Used in CPU hardware caches.
TTL-based: Items expire after a set time. Not an eviction policy per se — expired items are removed regardless of other policies.
allkeys-lru vs volatile-lru (Redis): allkeys-lru evicts any key. volatile-lru only evicts keys with a TTL set. Use allkeys-lru when all data is cache data. Use volatile-lru when some keys are persistent.

❦

Caching 04

TTL & Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

TTL Strategies

Fixed TTL: Every item expires after N seconds. Simple. Risk: if your data changes before TTL, clients see stale data for up to N seconds. Choose TTL based on acceptable staleness.
Sliding TTL (access-based): TTL resets every time the item is accessed. Hot items stay cached indefinitely. Risk: stale but frequently accessed data stays cached forever.
Event-driven invalidation: When data changes, explicitly delete or update the cache entry. Most accurate but tightly couples your cache logic to your write logic.
Versioned keys: Cache key includes a version: user:123:v7. When user changes, increment version. Old key naturally expires. No explicit invalidation needed — old keys expire via TTL.

Classic invalidation bug: You update a user's email in the DB, but forget to invalidate the cache. For the next 5 minutes (TTL), any service reading from cache gets the old email. This causes: wrong email notifications, auth failures, data inconsistency. Always invalidate or update cache on writes.

❦

Caching 05

Cache Stampede (Thundering Herd)

A cache stampede occurs when a popular cache entry expires and hundreds of concurrent requests simultaneously discover a cache miss. They all hit the database at the same time, potentially overwhelming it.

Scenario: Your homepage cache expires at midnight. 10,000 users are active. All 10,000 requests miss the cache simultaneously and fire 10,000 identical database queries. Your DB struggles. Responses are slow. Users retry. More queries. DB dies. Site down. All because of a TTL expiry.

Solutions

Mutex / Locking: First request to detect a miss acquires a distributed lock, fetches from DB, populates cache. Other requests wait. After lock is released, they all hit the now-populated cache. Risk: lock contention, lock failure.
Probabilistic Early Recomputation (PER): Before the TTL expires, with some probability, recompute the value early. The probability increases as you approach the TTL. Ensures cache is refreshed before expiry without a central lock.
Jitter / Staggered TTLs: Add random jitter to TTLs: ttl = base_ttl + random(0, base_ttl * 0.1). Prevents all keys from expiring at the same second.
Background refresh: A background job refreshes cache entries before they expire. Cache never actually misses for popular items. Used by CDNs (stale-while-revalidate).
Serve stale: Return the stale cached value immediately while asynchronously refreshing in the background. Users see slightly stale data but never see a slow response.

❦

Caching 06

Consistent Hashing

When you have multiple cache servers, you need to know which server holds which key. Naive approach: hash(key) % N. Problem: adding or removing a server changes N, which remaps almost every key — causing a thundering herd as all keys miss and hit the database simultaneously.

Consistent hashing maps both keys AND servers onto a hash ring (0 to 2³²). A key is owned by the first server encountered clockwise on the ring. Adding a server only remaps ~1/N of keys. Removing a server only remaps those keys to the next server.

Hash Ring (simplified):
        0
    330   30
  300       60       S1 at position 45
    270   90         S2 at position 160
        180          S3 at position 270

Key "user:123" hashes to position 80
→ First server clockwise is S2 (position 160)

Add S4 at position 100:
→ Keys between 80 and 100 move from S2 to S4
→ All other keys unaffected ✓

Virtual Nodes (Vnodes)

A single physical server is placed at multiple positions on the ring (virtual nodes). This ensures even distribution even with few physical servers, and allows proportional capacity — a server with 2x memory gets 2x virtual nodes.

❦

Caching 07

Hot Keys

A hot key is a cache key accessed so frequently that a single cache server becomes a bottleneck — even if you have 100 servers. The key "trending:feed:global" might receive 100,000 requests per second, all hitting one Redis node.

Solutions

Key replication: Store the same value under N keys: trending:feed:0 through trending:feed:9. On read, pick a random key. Spreads load across 10 cache nodes.
Local in-process cache: Add a tiny in-memory cache (LRU, 100 entries) inside each application server. Super hot keys are cached in-process — zero network hop. Valid for data that can tolerate a few seconds of staleness.
CDN for truly global hot content: If a response is the same for all users, cache it at the CDN edge. 0 database hits, 0 Redis hits.
Read replicas for hot DB queries: If hot keys are missing and hammering DB, add DB read replicas and route hot reads there.

❦

Caching 08

Live Cache Simulator

Watch LRU eviction in action. Click "Send Request" to request random keys. Cache size is 5. When the cache is full and a new key arrives, the Least Recently Used item is evicted.

LRU Cache — Capacity: 5 Hits: 0 | Misses: 0 | Hit Rate: —

Cache is empty. Send a request!

Caching
Systems

Caching 01

Why Caching Matters

What to Cache

What NOT to Cache

Caching 02

Cache Strategies

Cache-Aside (Lazy Loading)

Write-Through

Write-Back (Write-Behind)

Read-Through

Caching 03

Eviction Policies

Caching 04

TTL & Cache Invalidation

TTL Strategies

Caching 05

Cache Stampede (Thundering Herd)

Solutions

Caching 06

Consistent Hashing

Virtual Nodes (Vnodes)

Caching 07

Hot Keys

Solutions

Caching 08

Live Cache Simulator

Module 04 Quiz

Test Your Caching Knowledge

CachingSystems

Caching 01

Why Caching Matters

What to Cache

What NOT to Cache

Caching 02

Cache Strategies

Cache-Aside (Lazy Loading)

Write-Through

Write-Back (Write-Behind)

Read-Through

Caching 03

Eviction Policies

Caching 04

TTL & Cache Invalidation

TTL Strategies

Caching 05

Cache Stampede (Thundering Herd)

Solutions

Caching 06

Consistent Hashing

Virtual Nodes (Vnodes)

Caching 07

Hot Keys

Solutions

Caching 08

Live Cache Simulator

Module 04 Quiz

Test Your Caching Knowledge

Caching
Systems