Module 01

Foundations of
System Design

The mental models every backend engineer needs — explained from first principles, not memorisation.

Beginner Friendly Interview Critical 30+ Topics

Foundation 01

Client-Server Architecture

Before anything else, understand the fundamental model that powers virtually every application you've ever used. A client is any process that initiates a request. A server is any process that responds to it. The key insight is that these are roles, not fixed machines.

Think of a restaurant. You (the client) sit down and order food. The kitchen (server) receives your order, prepares it, and sends it back. The waiter is your network. The menu is your API contract. If the kitchen is overwhelmed, orders slow down — that's your server under load.

The Request-Response Cycle

Every interaction follows a pattern: client sends a request, server processes it, server sends a response. Understanding this cycle deeply — including what happens in each step, what can fail, and what adds latency — is the foundation for diagnosing every system problem you'll encounter.

Client                Network              Server
  |                     |                    |
  |--- HTTP GET /user -->|                    |
  |                     |--- TCP segment --->|
  |                     |                    |-- Parse request
  |                     |                    |-- Auth check
  |                     |                    |-- DB query
  |                     |<-- TCP segment ----|
  |<-- HTTP 200 JSON ---|                    |
  |                     |                    |

What Can Go Wrong

Network failures

Packet loss → retransmission
High latency → timeouts
Partition → no response
DNS failure → unreachable

Server failures

CPU/memory exhaustion → slow responses
DB connection pool full → queue buildup
Disk I/O saturation → timeouts
Application bug → 500 errors

Interviewers love asking "walk me through what happens when a user types google.com." A strong answer covers: DNS resolution, TCP handshake, TLS negotiation, HTTP request, load balancer, application server, database, CDN — and what can fail at each step.

❦

Foundation 02

Monolith vs Microservices

This is one of the most misunderstood topics in the industry. Teams rush to microservices without understanding the costs. Let's think through both options rigorously.

🏢 Monolith

Single deployable unit
In-process communication (fast)
Simple local transactions (ACID)
Easy to test end-to-end
Scales as a whole unit
Shared memory state possible

🔬 Microservices

Multiple independently deployable services
Network calls between services (slow, fallible)
Distributed transactions (hard)
Independent scaling per service
Technology heterogeneity possible
Team autonomy and ownership

When Microservices Make Sense

Microservices solve organisational problems more than technical ones. When you have 200 engineers all deploying to the same codebase, deployments become a coordination nightmare. Microservices let teams own their services independently.

The microservices trap: Teams adopt microservices at 5 engineers, then spend 80% of their time on distributed systems problems — service discovery, distributed tracing, cross-service transactions, eventual consistency — when a monolith would have shipped features 10x faster.

The Real Costs of Microservices

Network latency: In-process calls take ~100ns. Network calls take ~1ms. That's 10,000x slower — per hop.
Distributed transactions: You lose ACID guarantees. You now need sagas, two-phase commit, or eventual consistency.
Operational overhead: Each service needs its own CI/CD, monitoring, logging, scaling policies, health checks.
Debugging complexity: A bug might span 6 services. Distributed tracing becomes essential.
Data consistency: Each service owns its data. Cross-service queries become API calls, not SQL joins.

Don't default to microservices in interviews. Say "I'd start with a modular monolith, with clear bounded contexts, and extract services when we hit specific bottlenecks." This shows engineering maturity.

❦

Foundation 03

Horizontal vs Vertical Scaling

⬆️ Vertical Scaling (Scale Up)

Add more CPU, RAM, disk to one machine
Simple — no code changes needed
Has a hard ceiling (biggest machine exists)
Single point of failure
Downtime usually required to upgrade
Good for: databases, stateful services

↔️ Horizontal Scaling (Scale Out)

Add more machines
Theoretically unlimited
Requires stateless services or shared state
Requires load balancer in front
Zero downtime possible
Good for: stateless web servers, APIs

Vertical scaling is like making one cashier work faster. Horizontal scaling is like opening more checkout lanes. There's a limit to how fast one cashier can move, but you can keep adding lanes. The challenge with more lanes: how do customers know which lane to join? That's your load balancer.

The Stateless Requirement

Horizontal scaling requires your servers to be stateless — any server must be able to handle any request. This means session state must live outside the server: in a database, Redis, or a JWT token that the client carries.

Stateless tradeoff: Moving state out of servers adds a network hop to read/write it. A request that previously hit local memory now hits Redis (1–5ms) or a database (5–50ms). This is acceptable and expected — but account for it in your latency budgets.

❦

Foundation 04

Latency vs Throughput

Two of the most important metrics in system design — and they're often in tension with each other.

Operation	Typical Latency	Notes
L1 cache reference	0.5 ns	CPU on-chip
L2 cache reference	7 ns	14x L1
RAM access	100 ns	200x L1
SSD random read	150 µs	150,000 ns
HDD random read	10 ms	10,000,000 ns
Same datacenter network	0.5 ms	Varies widely
Cross-datacenter (e.g. US → EU)	150 ms	Speed of light + hops
DNS lookup (uncached)	20–120 ms	Multiple round trips

Memorise the orders of magnitude. L1 cache → RAM → SSD → network → disk. Each is 10x–1000x slower than the previous. When you understand this, caching strategies become obvious: keep hot data as close to the CPU as possible.

Throughput vs Latency Tradeoff

Throughput is how many requests you can handle per second. Latency is how long each request takes. These can conflict:

Batching increases throughput but increases latency (you wait to fill a batch)
Caching reduces latency but doesn't always improve throughput
More connections can increase throughput but also increase contention and latency

❦

Foundation 05

CAP Theorem

The CAP Theorem states that a distributed system can guarantee at most two of these three properties simultaneously: Consistency, Availability, and Partition Tolerance.

Imagine a bank with two branches. Customer A withdraws £100 at Branch 1. The network between branches goes down (partition). Now Branch 2 still shows the old balance. Do you: (a) let Branch 2 show stale data and remain available, or (b) block all transactions at Branch 2 until the network recovers? That's the CA vs CP choice.

CP Systems (Consistency + Partition Tolerance)

Returns an error when partition occurs
Data is always accurate
Examples: HBase, MongoDB (majority concern), ZooKeeper
Good for: banking, inventory, anything where stale data = real harm

AP Systems (Availability + Partition Tolerance)

Returns potentially stale data during partition
System always responds
Examples: Cassandra, DynamoDB, CouchDB
Good for: social feeds, DNS, shopping carts, recommendations

Why "CA" Doesn't Exist in Practice

A system that sacrifices partition tolerance assumes the network never fails. Networks always fail. Every distributed system must be partition tolerant. So the real choice is: CP or AP when a partition happens.

Don't just say "I'd use a CP system." Explain why based on requirements. "Our payment system can't show incorrect balances, so we'd accept temporary unavailability. Our social feed can show stale posts for a few seconds — availability matters more there."

PACELC: A Better Model

The PACELC theorem extends CAP: even when the system is running normally (no partition), there's a tradeoff between Latency and Consistency. Strong consistency requires waiting for all nodes to agree, which adds latency. This is the tradeoff that Cassandra, DynamoDB, and others expose through tunable consistency levels.

❦

Foundation 06

ACID vs BASE

ACID (Relational DBs)

Atomicity: Transaction is all-or-nothing
Consistency: DB moves from valid state to valid state
Isolation: Concurrent transactions behave as serial
Durability: Committed data survives crashes

BASE (NoSQL/Distributed)

Basically Available: System guarantees availability
Soft State: State may change over time without input
Eventually Consistent: Given enough time, all nodes agree

ACID is like a bank vault. Every transaction is verified, logged, and reversible. BASE is like a whiteboard that multiple people can edit simultaneously — eventually everyone's seeing the same thing, but for a moment some people see different versions.

Isolation Levels Deep Dive

Most engineers underestimate how much ACID's Isolation property costs. The four isolation levels (weakest to strongest):

Read Uncommitted: Reads dirty data. Almost never used. Maximum concurrency.
Read Committed: Only sees committed data. Default in most DBs. Prevents dirty reads but allows non-repeatable reads.
Repeatable Read: Same query returns same results within a transaction. Prevents phantom reads in MySQL (with gap locks). Default in MySQL/InnoDB.
Serializable: Full isolation. Transactions behave as if serial. Heaviest locking. Lowest throughput.

Classic failure: Two transactions both read a seat availability as "1 available", both decide to book it, both write "booked". Overbooking! You need Serializable isolation or an optimistic lock (compare-and-swap) to prevent this.

❦

Foundation 07

Consistency Models

Consistency is a spectrum. Understanding where each model sits — and what guarantees it provides — is essential for designing distributed systems.

Linearisability (Strong Consistency): Every read returns the most recently written value. Operations appear instantaneous. Requires coordination (Raft, Paxos). High latency cost. Examples: ZooKeeper, etcd, Google Spanner.
Sequential Consistency: Operations appear in the same order to all nodes, but not necessarily in real-time order. Weaker than linearisability. Hard to implement efficiently.
Causal Consistency: Operations that are causally related are seen in the same order by all nodes. Independent operations may be seen in any order. MongoDB sessions.
Eventual Consistency: Given no new updates, all replicas eventually agree. Reads may return stale data in the interim. DynamoDB, Cassandra, DNS. High availability, low latency.
Read-Your-Writes: After a write, you always see your own write. Even if other nodes are stale. Session consistency. User profile updates must have this.

A brilliant interview move: "What consistency model does this feature need?" For a user's own profile, you need read-your-writes. For a global news feed, eventual consistency is fine. Don't over-engineer — strong consistency is expensive.

❦

Foundation 08

Replication

Replication means keeping copies of data on multiple nodes. It serves two purposes: fault tolerance (if one node dies, data survives) and performance (read from the nearest replica).

Single-Leader Replication

▾

One node is the leader (primary). All writes go to the leader. Leader replicates to followers. Reads can come from followers (may be slightly stale) or the leader (strongly consistent). Used by: PostgreSQL, MySQL, MongoDB primary-secondary.

Replication lag: If a follower is 5 seconds behind and the leader crashes, you lose 5 seconds of writes. This is why write-ahead logs and replication acknowledgement settings matter.

Multi-Leader Replication

▾

Multiple nodes accept writes. Used for multi-datacenter setups. Problem: write conflicts. If two users edit the same row in different datacenters simultaneously, which wins? Requires conflict resolution: last-write-wins, CRDTs, or application-level merge.

Leaderless Replication (Dynamo-style)

▾

Any node can accept writes. Uses quorum reads/writes for consistency. If N=3 nodes, W=2 write quorum, R=2 read quorum: W+R > N means you'll always read at least one up-to-date node. Used by: DynamoDB, Cassandra, Riak.

Lower W = faster writes but more risk of stale reads. Higher W = stronger consistency but higher write latency. Tune W and R based on your consistency vs latency requirements.

❦

Foundation 09

Partitioning & Sharding

When a single database becomes too large or too slow, you split the data across multiple nodes. Each piece is called a shard or partition.

Imagine a phone book. Instead of one massive book (A–Z), you split it into 26 books — one per letter. If you need to find "Smith", you immediately go to the S-book. That's range-based partitioning. But if all your users are called "Smith" or "Jones", your S-book and J-book are overwhelmed while A-book is empty. That's a hot partition.

Partitioning Strategies

Range-based: Users A–M on Shard 1, N–Z on Shard 2. Simple but prone to hot spots if distribution is uneven.
Hash-based: hash(user_id) % N shards. Even distribution but makes range queries expensive (data is scattered).
Consistent hashing: Hash both keys and nodes onto a ring. Minimises data movement when nodes are added/removed. Used by: DynamoDB, Cassandra, Chord.
Directory-based: A lookup table maps keys to shards. Flexible but the directory is a bottleneck and single point of failure.

The Hot Partition Problem

If 10% of your users generate 90% of your traffic (celebrity accounts, viral posts), they land on the same shard. That shard becomes overwhelmed while others are idle. Solutions:

Add a random suffix to hot keys (user_123_<0-9>) and fan out reads
Cache hot data in front of the database
Use a separate "celebrity" tier with different hardware

Always ask about data access patterns before choosing a partition key. If users mostly look up by user_id, partition by user_id. If they mostly query by location, partition by region. The wrong partition key causes hot spots that no amount of hardware can fix.

❦

Foundation 10

Load Balancing

A load balancer distributes incoming traffic across multiple servers. It sits between clients and your server pool, providing: horizontal scaling, fault tolerance, and session management.

Load Balancing Algorithms

Round Robin: Send request 1 to server A, 2 to B, 3 to C, cycle. Simple, ignores server load. Bad if servers are heterogeneous.
Weighted Round Robin: More powerful servers get more requests. Better resource utilisation.
Least Connections: Send to the server with fewest active connections. Better than round robin when requests have variable duration.
Least Response Time: Combine fewest connections + fastest response time. Most intelligent but requires monitoring overhead.
IP Hash: Hash client IP to always route to the same server. Useful for session affinity (sticky sessions). Problem: if server dies, those sessions are lost.
Consistent Hashing: Used by CDNs and distributed caches. Minimises reassignment when servers change.

Layer 4 vs Layer 7 Load Balancing

L4 (Transport Layer)

Routes based on IP + TCP/UDP
Very fast, minimal overhead
Cannot inspect HTTP headers
Cannot route based on URL path
Examples: AWS NLB, HAProxy TCP mode

L7 (Application Layer)

Routes based on HTTP headers, URL, cookies
Can route /api to API servers, /static to CDN
Can modify requests/responses
SSL termination here
Examples: NGINX, AWS ALB, Envoy

❦

Foundation 11

Reverse Proxy & API Gateway

Reverse Proxy (NGINX, HAProxy)

SSL termination
Static file serving
Compression
Caching
Rate limiting
Hides backend topology

API Gateway (Kong, AWS API GW)

Authentication & authorisation
Rate limiting per client
Request/response transformation
Service routing in microservices
Analytics & logging
Circuit breaking

In a microservices architecture, every external request should enter through an API Gateway. This gives you a single point to enforce auth, rate limiting, and observability — instead of implementing these in every service.

❦

Foundation 12

Content Delivery Network (CDN)

A CDN is a globally distributed network of cache servers (edge nodes) that serve content from the location nearest to the user. This reduces latency from ~150ms (cross-continent) to ~10ms (local edge node).

Amazon doesn't ship everything from one warehouse in Seattle. They have fulfilment centres near major cities. When you order, it ships from the closest warehouse. A CDN works the same way — your JavaScript files, images, and videos are pre-positioned close to users worldwide.

Push vs Pull CDN

Push CDN

You upload content to CDN proactively
Good for large, static, infrequently changing files
More control over what's cached
Higher storage cost

Pull CDN

CDN fetches from origin on cache miss
First request is slow; subsequent are fast
Less maintenance, auto-scales
Popular for web assets

❦

Foundation 13

Rate Limiting

Rate limiting controls how many requests a client can make in a time window. Essential for: preventing abuse, preventing DDoS, ensuring fair usage, and protecting downstream services from being overwhelmed.

Algorithms

Token Bucket: Bucket holds N tokens. Each request consumes one token. Tokens replenish at a constant rate. Allows short bursts (up to bucket size) while maintaining a long-term rate. Used by: most APIs (Twitter, Stripe).
Leaky Bucket: Requests enter a queue (bucket) and are processed at a fixed rate. Smooths out bursts. If the queue is full, requests are dropped. Good for traffic shaping.
Fixed Window Counter: Count requests per fixed window (e.g., 100 req/minute). Simple. Vulnerable to boundary attacks (100 at 00:59 + 100 at 01:00 = 200 in 2 seconds).
Sliding Window Log: Track timestamps of all requests. Check if N requests in the past 60s. Accurate but memory-intensive.
Sliding Window Counter: Hybrid: use current + previous window weighted by overlap. Good accuracy at low memory cost.

Rate limiting in distributed systems: if you have 10 servers, a per-server counter means each server allows N requests, so the real limit is 10N. You need a shared counter in Redis. Use Redis INCR with EXPIRE for distributed rate limiting. Be careful of race conditions — use a Lua script or atomic pipeline.

❦

Foundation 14

Circuit Breakers

A circuit breaker is a pattern that prevents cascading failures. When a downstream service starts failing, the circuit breaker "opens" and stops sending requests to it, giving it time to recover.

Like an electrical circuit breaker — when there's a fault (short circuit), the breaker trips and stops power flow, preventing the fault from burning out the whole system. Once the fault is fixed, you reset the breaker.

Three States

Closed (normal): Requests flow through. Failures are counted. If failures exceed threshold, transition to Open.
Open (tripped): All requests immediately fail (fast fail) without hitting the downstream service. After a timeout, transition to Half-Open.
Half-Open (testing): Allow a small number of trial requests. If they succeed, transition to Closed. If they fail, back to Open.

Cascading failure example: Service A calls Service B. B is slow (taking 30s per request). A's thread pool fills up waiting for B. Now A is slow. Service C calls A, and C's thread pool fills up. The entire system grinds to a halt because of one slow downstream service. A circuit breaker at A→B would have failed fast and kept A's thread pool free.

❦

Foundation 15

Idempotency

An operation is idempotent if calling it multiple times produces the same result as calling it once. Critical for distributed systems where retries are inevitable.

Classic disaster: User clicks "Pay £50". Network times out. Client retries. User is charged twice. To prevent this, the payment endpoint must be idempotent — if the same payment request arrives twice, it processes once. Idempotency keys (UUID sent by client) are the standard solution.

Making Operations Idempotent

HTTP GET/PUT/DELETE are naturally idempotent by definition. POST is not.
Idempotency keys: Client sends a unique key with each request. Server stores the key + result. If the key is seen again, return the cached result without reprocessing.
Conditional updates: Use version numbers or ETags. "Update record only if version = 5." If version has changed, reject the update.
Natural idempotency: "Set balance to £100" is idempotent. "Add £10 to balance" is not. Design APIs to use set semantics where possible.

❦

Foundation 16

Fault Tolerance & Availability

The Nines of Availability

SLA	Downtime/Year	Downtime/Month
99% (two nines)	3.65 days	7.2 hours
99.9% (three nines)	8.7 hours	43.8 minutes
99.99% (four nines)	52 minutes	4.4 minutes
99.999% (five nines)	5.2 minutes	26 seconds

If your system calls three services each with 99.9% availability, the combined availability is 99.9% × 99.9% × 99.9% = 99.7%. Chaining services degrades availability. This is why timeouts, retries, fallbacks, and circuit breakers are essential.

Strategies for High Availability

Redundancy: Multiple instances of every component. No single point of failure.
Health checks + auto-restart: Load balancers probe health endpoints and route around unhealthy nodes.
Graceful degradation: If recommendations service is down, show popular items. Don't make the whole page fail.
Chaos engineering: Intentionally kill services in production to test failover (Netflix Chaos Monkey).
Geographic redundancy: Run in multiple regions. If us-east-1 goes down, failover to eu-west-1.

Knowledge Check

Test Your Understanding

1. A distributed database must choose between strong consistency and availability during a network partition. Which theorem describes this tradeoff?

ACID theorem

CAP theorem

PACELC theorem

BASE theorem

2. A payment API receives the same £50 charge request twice due to a network retry. What property prevents the user from being charged twice?

Atomicity

Consistency

Idempotency

Durability

3. Service A calls Service B, which is slow (30s response). This causes Service A's thread pool to exhaust. Service C calls A, same thing happens. What pattern prevents this cascading failure?

Rate Limiter

Load Balancer

Reverse Proxy

Circuit Breaker

Foundations ofSystem Design

Foundation 01

Client-Server Architecture

The Request-Response Cycle

What Can Go Wrong

Network failures

Server failures

Foundation 02

Monolith vs Microservices

🏢 Monolith

🔬 Microservices

When Microservices Make Sense

The Real Costs of Microservices

Foundation 03

Horizontal vs Vertical Scaling

⬆️ Vertical Scaling (Scale Up)

↔️ Horizontal Scaling (Scale Out)

The Stateless Requirement

Foundation 04

Latency vs Throughput

Throughput vs Latency Tradeoff

Foundation 05

CAP Theorem

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

Why "CA" Doesn't Exist in Practice

PACELC: A Better Model

Foundation 06

ACID vs BASE

ACID (Relational DBs)

BASE (NoSQL/Distributed)

Isolation Levels Deep Dive

Foundation 07

Consistency Models

Foundation 08

Replication

Single-Leader Replication

Multi-Leader Replication

Leaderless Replication (Dynamo-style)

Foundation 09

Partitioning & Sharding

Partitioning Strategies

The Hot Partition Problem

Foundation 10

Load Balancing

Load Balancing Algorithms

Layer 4 vs Layer 7 Load Balancing

L4 (Transport Layer)

L7 (Application Layer)

Foundation 11

Reverse Proxy & API Gateway

Reverse Proxy (NGINX, HAProxy)

API Gateway (Kong, AWS API GW)

Foundation 12

Content Delivery Network (CDN)

Push vs Pull CDN

Push CDN

Pull CDN

Foundation 13

Rate Limiting

Algorithms

Foundation 14

Circuit Breakers

Three States

Foundation 15

Idempotency

Making Operations Idempotent

Foundation 16

Fault Tolerance & Availability

The Nines of Availability

Strategies for High Availability

Test Your Understanding

Foundations of
System Design