Module 07

Scalability
Patterns

Architecture patterns that let systems grow from thousands to billions of users.

Scalability 01

CQRS — Command Query Responsibility Segregation

CQRS separates read and write models. Commands (writes) go to one model optimised for writes. Queries (reads) go to a separate model optimised for reads. This lets you scale reads and writes independently, and optimise each for its purpose.

A bank has tellers for transactions (writes) and an ATM network for balance checks (reads). The systems are separate, scale independently, and are optimised for their specific jobs. The ATM doesn't need the same infrastructure as the teller system.

CQRS Flow

Write side:
  User → Command (CreateOrder) → Command Handler
       → Validates, updates Write DB
       → Emits domain event (OrderCreated)
       → Event updates Read Model (async)

Read side:
  User → Query (GetOrderSummary) → Query Handler
       → Reads from denormalised Read DB (fast!)
       → Returns response immediately

Tradeoff: Read model is eventually consistent (updated asynchronously after writes). There will be a lag (milliseconds to seconds) between a write and the read model reflecting it. Acceptable for most use cases. Not appropriate for banking balances or inventory checks.

When to Use CQRS

Read-to-write ratio is very high (100:1 or more)
Read and write workloads have very different scaling needs
Complex domain logic on writes, simple projections on reads
You need multiple read models (mobile app view vs analytics view)

❦

Scalability 02

Event Sourcing

Instead of storing current state (balance = £500), store all events that led to that state (deposited £300, withdrew £100, deposited £300). The current state is derived by replaying events.

Traditional: users table
  id=1, name='Alice', balance=500

Event Sourced: events table
  {userId:1, type:'AccountOpened',  amount:0,   ts: t1}
  {userId:1, type:'MoneyDeposited', amount:300,  ts: t2}
  {userId:1, type:'MoneyWithdrawn', amount:100,  ts: t3}
  {userId:1, type:'MoneyDeposited', amount:300,  ts: t4}
  → replay → balance = 500

Event Sourcing Benefits

Complete audit trail — every change recorded
Time travel — replay to any point in time
Event-driven integration — publish events
Easy debugging — see exactly what happened

Event Sourcing Costs

Querying current state requires replay (use snapshots)
Schema evolution is hard — old events must still be valid
Increased storage
Mental model shift — unfamiliar to most teams

❦

Scalability 03

Geo-Distribution & Multi-Region

Running your application in multiple geographic regions reduces latency for global users and provides disaster recovery. A user in Tokyo gets served from Tokyo, not US-East.

Data Residency Challenge

The hardest part of geo-distribution is data. You want user data close to the user, but you also need global consistency. Strategies:

Data locality: Pin user data to their region (EU users' data stays in EU). GDPR compliance + lower latency. Challenge: what if user travels?
Global replicated DB: Google Spanner, CockroachDB — replicate across regions with strong consistency. Higher write latency (cross-region round trips), extreme durability.
Async cross-region replication: Primary region handles writes, replicates asynchronously to other regions. Low latency writes, possible staleness on reads in non-primary regions.

In interviews: "How would you make this globally available?" → Discuss: which data is user-specific (can be local), which is global (e.g., follower counts, trending feeds), and how you'd handle cross-region writes. Latency numbers matter: cross-US ~40ms, US-EU ~80ms, US-Asia ~150ms.

❦

Scalability 04

Active-Active vs Active-Passive

Active-Active

All nodes serve traffic simultaneously
No wasted capacity
Immediate failover (no promotion delay)
Write conflicts if multiple masters
Complex data synchronisation
Good for: stateless services, read-heavy workloads

Active-Passive

One primary serves traffic; others on standby
Standby capacity is wasted (or used for reads)
Failover requires promotion (seconds of downtime)
No write conflicts — single writer
Simpler consistency model
Good for: stateful databases, write-heavy workloads

❦

Scalability 05

Bulkhead Pattern

The bulkhead pattern isolates different parts of a system into pools so that if one fails, the others continue. Named after the bulkheads in ships — watertight compartments that prevent one leak from sinking the whole ship.

Without bulkheads: Your app has one shared thread pool (200 threads). A slow third-party payment API ties up 200 threads waiting for responses. No threads available for other requests. Entire app becomes unresponsive — even the user login feature which doesn't use payments.

Implementation

Thread pool isolation: Separate thread pool per downstream dependency. Payment pool: 50 threads. Email pool: 20 threads. Core app pool: 130 threads. Payment slowdown only exhausts payment pool.
Connection pool isolation: Separate DB connection pool for analytics vs. transactional queries.
Process isolation: Run risky operations in separate processes/containers. A crash in one doesn't affect others.

❦

Scalability 06

Service Mesh & Sidecars

A service mesh handles cross-cutting concerns (auth, observability, retries, circuit breaking) at the infrastructure layer — not in application code. A sidecar proxy (e.g., Envoy) runs next to each service instance and intercepts all traffic.

mTLS: All service-to-service traffic is automatically encrypted and mutually authenticated.
Observability: Metrics, traces, and logs collected automatically for every request.
Traffic management: Canary deployments, A/B testing, circuit breaking — configured centrally.
Examples: Istio (uses Envoy), Linkerd, Consul Connect.

Service meshes add 1–5ms latency per hop (sidecar overhead). For very latency-sensitive systems, this matters. They also add operational complexity. Only adopt when you genuinely need the capabilities — not just because it's fashionable.

Module 07 Quiz

Test Your Scalability Knowledge

Scenario-based questions. Select the best answer.

Q1. Your e-commerce site uses a single SQL database. Black Friday causes 20× normal traffic. The DB CPU hits 100%. Which is the FASTEST immediate mitigation?

Q2. A service has the architecture: Client → Service A → Service B → DB. Service B fails. Service A waits for B, timeout after 30s. Clients wait 30s and give up. What pattern prevents this cascade?

Q3. You're designing a system using CQRS. Your read model (query side) is getting stale — it's 5 seconds behind the write model. A user writes a review and doesn't see it appear. What is the correct response?

Q4. Your system uses active-active multi-region deployment (US and EU). A user in the US updates their profile. 500ms later, the same user (or another client) reads from the EU region and sees the old profile. This is:

Q5. In event sourcing, you have 5 years of events (2 billion records). Replaying them from scratch to rebuild current state takes 4 hours. How do you fix this?

ScalabilityPatterns

Scalability 01

CQRS — Command Query Responsibility Segregation

CQRS Flow

When to Use CQRS

Scalability 02

Event Sourcing

Event Sourcing Benefits

Event Sourcing Costs

Scalability 03

Geo-Distribution & Multi-Region

Data Residency Challenge

Scalability 04

Active-Active vs Active-Passive

Active-Active

Active-Passive

Scalability 05

Bulkhead Pattern

Implementation

Scalability 06

Service Mesh & Sidecars

Module 07 Quiz

Test Your Scalability Knowledge

Scalability
Patterns