Module 10

Real System Design
Interviews

20 complete system designs — each with requirements, capacity estimation, architecture, database design, scaling strategy, and tradeoff analysis.

How to use this module: First, spend 35 minutes designing each system yourself without looking at the solution. Then compare. The thinking process matters more than the answer. Interviewers are evaluating your reasoning, not whether you got the "right" answer.

🔗

Design a URL Shortener (bit.ly / TinyURL)

Classic starter problem · Hashing · Redirects · High read traffic

▾

Functional Requirements

Given a long URL, return a short URL
Redirect short URL to original long URL
Custom aliases (optional)
Expiry time per URL
Analytics: click count, geo, referrer

Non-Functional Requirements

100M URLs created per day
10:1 read:write ratio
URLs must be unique
Redirect latency < 10ms
99.99% availability
Short URLs: 7 characters

Capacity Estimation

Write QPS: 100M / 86400 ≈ 1,200 writes/sec Read QPS: 10 × 1,200 = 12,000 reads/sec Storage: 100M/day × 365 × 10 years = 365B URLs Each URL ≈ 500 bytes → 365B × 500B ≈ 182 TB

Core Design: Short URL Generation

Option 1 — Hash the URL: MD5(longURL) → take first 7 chars → base62 encode. Problem: collisions (different URLs same hash prefix). Need to handle collision with a suffix.

Option 2 — Auto-increment ID: DB auto-increment ID → base62 encode. 7 chars of base62 = 62⁷ = 3.5 trillion URLs. Simple, no collisions. Problem: IDs are predictable (enumerable). Use a counter + shuffle bits.

Option 3 — Pre-generate keys (Twitter Snowflake style): Generate all 7-char keys in advance, store in a Key DB. Assign on request. Eliminates hotspot on ID generation. Used in production.

High-Level Architecture

Client → Load Balancer → [URL Service (stateless, N instances)] ↓ ↓ Key DB Write → MySQL (pre-generated keys) Read → Redis Cache (CDN edge) ↓ miss MySQL DB Short URL redirect: GET /abc1234 → Redis lookup → 301/302 redirect to longURL Cache hit rate: ~99% (popular URLs cached at edge)

Key Design Decisions

301 vs 302 redirect: 301 (permanent) is cached by browsers — reduces server load but you lose analytics (browsers don't re-request). 302 (temporary) hits your server every time — better analytics. Choose based on requirements.
Database: MySQL with key-value access pattern. Could use Cassandra for global scale. Redis for hot URLs (99% of reads will be top 20% of URLs).
Analytics: Stream click events to Kafka → Flink/Spark → analytics DB (ClickHouse/BigQuery). Don't put analytics logic in the hot redirect path.
Custom aliases: Store in same table, check uniqueness on create. Rate limit per user to prevent squatting.

▶️

Design YouTube / Netflix Video Streaming

Video upload · Transcoding pipeline · CDN streaming · Adaptive bitrate

▾

Functional Requirements

Upload videos (up to 10GB)
Stream videos globally
Multiple resolutions (360p–4K)
Recommendations, search, comments
Like/dislike, subscriptions

Scale

2 billion users, 500 hours/min uploaded
1 billion hours watched per day
Read:write ≈ 1000:1
Peak: 10M concurrent viewers

Video Upload & Processing Pipeline

User → Upload Service → Raw Video Blob Storage (S3) ↓ Message Queue (Kafka/SQS) ↓ Transcoding Workers (many, parallel) ├── 360p → CDN ├── 720p → CDN ├── 1080p → CDN └── 4K → CDN ↓ Metadata DB: {videoId, title, userId, urls[], status} ↓ Notify user: "Video ready"

Adaptive Bitrate Streaming (ABR)

Videos are split into 2–10 second segments. The player downloads a manifest file (HLS/DASH) listing all available quality levels. Based on network speed, it dynamically switches between qualities mid-stream. If bandwidth drops, it switches to 360p. If it improves, it switches back to 1080p — seamlessly.

CDN Strategy

99% of views are on videos uploaded >24h ago. Pre-warm CDN edge nodes with popular videos. New video uploads go to origin, then are pushed/pulled to CDN as popularity grows. Long-tail videos stay on origin storage (rarely accessed, no need to cache everywhere).

Database Design

Videos: {videoId, userId, title, description, thumbnail_url, segment_urls[], duration, status, created_at} Users: {userId, name, channel_name, subscriber_count} Comments: {commentId, videoId, userId, text, likes, created_at} → Cassandra (high write) Likes: {videoId, userId} → Redis sorted set for counts Views: Kafka → Flink aggregation → analytics store (not real-time DB)

Key Tradeoffs

View count: Don't update the DB on every view. Buffer in Redis, flush to DB every minute. Accept slight inaccuracy.
Recommendations: Offline ML pipeline. Pre-compute recommendations per user. Store in Redis. Don't compute in real-time for each request.
Search: Elasticsearch for full-text title/description search. Index updated asynchronously after video is published.

💬

Design WhatsApp / Chat System

Real-time messaging · Message delivery guarantees · Group chat · Media sharing

▾

Functional Requirements

1:1 real-time messaging
Group chats (up to 1000 members)
Message delivery receipts (sent/delivered/read)
Media sharing (images, videos, documents)
Online presence indicator
End-to-end encryption

Scale

2 billion active users
100 billion messages/day ≈ 1.2M msg/sec
Each user: ~50 messages/day avg
Max message size: 64KB text, 2GB media

Connection Architecture

Client ←→ WebSocket ←→ Chat Server (stateful) ↓ Message Queue (Kafka) ↓ Message Processor ├── Persist to Cassandra ├── Route to recipient's Chat Server └── Push notification if offline Service Registry (ZooKeeper): User 123 → connected to Chat Server A:8001 User 456 → connected to Chat Server B:8003 Routing: User A → sends to User B Chat Server A looks up: "User B is on Server B" Chat Server A → publishes to Kafka partition(userId_B) Chat Server B consumes → delivers to User B's WebSocket

Message Storage

Cassandra table: messages PRIMARY KEY (chat_id, message_id) -- chat_id partitions, message_id (UUID timeuuid) sorts Fields: sender_id, content, type (text/image/video), status, created_at Why Cassandra: - Write-heavy (1.2M msg/sec) - Append-only (messages are immutable) - Time-ordered within partition - Horizontal scale, no single point of failure

Group Messaging at Scale

For a group of 1000 members: when a message is sent, the server must deliver to all 1000 members. Fan-out approaches:

Fan-out on write: When message received, immediately deliver to all 1000 members' queues. Low read latency but high write amplification. OK for small groups (<100).
Fan-out on read: Store message once. Each member fetches on next poll. Lower write load but higher read latency. Good for large groups.
Hybrid: WhatsApp uses a hybrid — deliver immediately to online members (fan-out on write), store for offline members (fan-out on read when they come online).

End-to-End Encryption

WhatsApp uses the Signal Protocol. Each device has a public/private key pair. Messages are encrypted with the recipient's public key. Only the recipient's private key can decrypt. WhatsApp servers see only encrypted bytes — even they cannot read your messages.

🚗

Design Uber / Ride Matching System

Geo-spatial queries · Real-time location tracking · Matching algorithm · Surge pricing

▾

Functional Requirements

Rider requests a ride with pickup/dropoff
Match rider to nearest available driver
Real-time driver location tracking
ETA estimation
Trip tracking during ride
Pricing (surge based on supply/demand)

Scale

5M rides/day globally
3M active drivers
Driver location update every 4 seconds
3M × 15 updates/min ≈ 750K location updates/sec

Location Storage — The Core Problem

The challenge: Given a rider's location, find all available drivers within 5km — in milliseconds, updated every 4 seconds, for 3M drivers globally. A naive SQL query WHERE distance(lat,lon, driver_lat, driver_lon) < 5000 requires computing distance for every driver — O(3M) per request.

Solution: Geohash / S2 cells. Divide the world into a grid. Each cell has an ID. A driver at lat/lon belongs to cell "9q8yy". "Find nearby drivers" = "find all drivers in cell 9q8yy and adjacent cells" — a simple key lookup.

Redis Geospatial: GEOADD active_drivers driver_123 -73.9857 40.7484 (NYC) GEORADIUS active_drivers -73.9857 40.7484 5 km → [driver_123, driver_456, ...] Updates every 4 seconds per driver: GEOADD active_drivers driver_123 {new_lat} {new_lon} This is 750K writes/sec → need Redis cluster, pipeline writes

Matching Algorithm

1. Rider requests → Matching Service 2. Get nearby drivers (Redis GEORADIUS, r=2km → expand to 5km if insufficient) 3. Filter: available, correct vehicle type, acceptable rating 4. Rank by: ETA (not distance — traffic-aware), driver rating, surge zone 5. Send ride offer to top driver (5 second timeout) 6. If rejected/timeout → next driver 7. If accepted → create trip, notify rider, start tracking

Surge Pricing

Surge multiplier = f(demand/supply in geohash cell). Computed every 5 minutes. Demand = ride requests. Supply = available drivers. If demand/supply > threshold → trigger surge. Multiplier increases until supply meets demand (higher price attracts more drivers). Implemented as a background job, result stored in Redis by geohash.

📸

Design Instagram / Photo Sharing

Media upload · News feed · Follow graph · Stories

▾

Functional Requirements

Upload photos/videos with captions
Follow/unfollow users
Home feed (posts from followed users)
Like, comment on posts
Stories (24h expiry)
Explore/discovery page

Scale

1 billion MAU, 500M DAU
100M photos uploaded/day
4.2B likes/day
Read:write ≈ 100:1
Celebrity accounts: 100M+ followers

Feed Generation — The Hard Part

Fan-out on write (push model): When a user posts, push to all followers' feed caches immediately. Reading your feed is fast (pre-computed). Problem: a celebrity with 100M followers posting = 100M cache writes. Celebrities can't use push model.

Fan-out on read (pull model): When you open your feed, fetch the latest posts from everyone you follow. Merge and rank. Problem: if you follow 5000 people, you need 5000 DB reads per feed load. Too slow.

Hybrid (Instagram's actual approach): Regular users: push model (pre-computed feed). Celebrity users: pull model on read, merged with your pre-computed feed. Best of both worlds.

Architecture

Post Upload: User → Upload Service → S3 (raw image) ↓ async Image Processing (resize, CDN push) ↓ PostgreSQL (metadata) + Cassandra (feed events) ↓ Kafka → Feed Worker → Push to followers' Redis feed lists Feed Read: User → Feed Service → Redis (precomputed feed list of post IDs) ↓ batch fetch PostgreSQL (post metadata, like counts) ↓ CDN URLs for images → response assembled

Follow Graph

Store follow relationships in a graph-like structure. For feed generation, you need: "give me the user IDs of everyone that user 123 follows." Use a column-family DB (Cassandra) or denormalised Redis set. Avoid JOINs on a large social graph — they don't scale.

🐦

Design Twitter/X

Tweets · Timeline · Trending · Search · Notifications

▾

Functional Requirements

Post tweets (280 chars, media)
Follow users
Home timeline (followed users' tweets)
Retweet, like, reply
Trending hashtags
Search tweets

Scale

300M MAU, 100M DAU
500M tweets/day ≈ 6K tweets/sec
Read:write ≈ 1000:1
Timeline read: <100ms p99
Max followers: Elon Musk ~150M

Timeline Architecture

Tweet posted by user 123 (5K followers): → Written to Tweets table (MySQL/Cassandra) → Kafka event: {tweetId, authorId, timestamp} → Timeline Worker: for each follower_id, push tweetId to their Redis timeline list LPUSH timeline:follower_id {tweetId} (keep last 800 tweets) User reads home timeline: → LRANGE timeline:user_id 0 20 (get 20 tweet IDs from Redis) → Batch fetch tweet data from Tweets cache/DB → Merge with "celebrity tweets" fetched directly on read Trending hashtags: → Count #hashtag occurrences in real-time with Redis ZINCRBY → ZREVRANGE trending 0 9 → top 10 trending → Recompute every 5 minutes, cache result

💾

Design Google Drive / Dropbox

File storage · Sync · Sharing · Conflict resolution · Versioning

▾

Functional Requirements

Upload/download files (any size)
File sync across devices
Share files/folders with others
File versioning (restore previous versions)
Offline access
Collaborative editing

Scale

1 billion users, 15GB free storage each
15 exabytes total storage
1B × 15GB = 15 EB data
Strong consistency required (no conflicting edits)

File Upload — Block-Level Chunking

Large file upload (e.g., 4GB video): Client splits file into 4MB chunks For each chunk: hash(chunk) → check if already uploaded (deduplication) Upload only missing chunks to S3 (parallel, resumable) After all chunks uploaded: send manifest {file_id, chunk_ids[]} to Metadata DB Benefits: - Resume interrupted uploads (only re-upload missing chunks) - Deduplication: if 1000 users upload same PDF, store once - Delta sync: only upload changed chunks on file update

Sync Algorithm

Client maintains local file metadata + last_sync_version On change: upload delta → server stores new version, broadcasts to other clients via WebSocket Other clients: receive notification → download changed chunks → apply to local copy Conflict resolution: Two clients edit same file offline: → Both upload changes → Server creates conflict copy: "file (John's copy).docx" → User resolves manually (Google Docs uses CRDT for real-time, no conflicts)

Storage Architecture

Blob storage: S3/GCS for actual file chunks. Content-addressed by hash.
Metadata DB: PostgreSQL for file hierarchy, permissions, version history, user accounts.
Block DB: Mapping of chunk hashes to storage locations. Enables deduplication across users.
CDN: For frequently accessed files — thumbnails, profile pictures, shared documents.

🔔

Design a Notification System

Push · Email · SMS · In-app · Prioritisation · Throttling

▾

Functional Requirements

Support push (iOS/Android), email, SMS, in-app
User notification preferences
Priority levels (critical vs marketing)
Deduplication (don't send same notification twice)
Scheduled notifications
Template management

Scale

10M notifications/day baseline
Spikes: 100M/day for breaking news
Email SLA: delivered within 1 minute
SMS SLA: delivered within 30 seconds
Push: delivered within 5 seconds

Architecture

Event Sources (any service) → Notification Service API ↓ Notification DB (store all notifications, status) ↓ Kafka Topics (partitioned by user_id for ordering): ├── push-notifications ├── email-notifications └── sms-notifications ↓ Workers (per channel): ├── Push Worker → APNs (Apple) / FCM (Google) ├── Email Worker → SendGrid / SES └── SMS Worker → Twilio / SNS ↓ Delivery Status → Update Notification DB Failed → DLQ → Retry with exponential backoff

Key Design Decisions

Deduplication: Before sending, check Redis: SET sent:{notifId} 1 NX EX 86400. NX = only set if not exists. If already set, skip sending.
User preferences: Check opt-out before sending. Cache preferences in Redis to avoid DB lookup per notification.
Throttling: Don't send 50 emails in 5 minutes to one user. Use a rate limiter per user per channel per time window.
Priority queues: Critical alerts (security, transaction) jump the queue. Marketing notifications are deprioritised during high load.

🚦

Design a Distributed Rate Limiter

Token bucket · Redis · Per-user limits · Multi-region

▾

Requirements

Limit requests per user/IP per time window
Different limits per API endpoint
Return rate limit headers (X-RateLimit-*)
Distributed (multiple servers, same limits)
Low latency (<5ms overhead)

Scale

10,000 QPS per server
100 server instances
Rate limit state must be shared
Redis cluster for shared state

Token Bucket in Redis

-- Lua script (atomic in Redis):
local key = "ratelimit:" .. user_id .. ":" .. endpoint
local limit = 100   -- 100 requests
local window = 60   -- per 60 seconds
local now = tonumber(redis.call('TIME')[1])

local count = redis.call('INCR', key)
if count == 1 then
  redis.call('EXPIRE', key, window)
end
if count > limit then
  return 0  -- rate limited
end
return 1  -- allowed

Architecture

Client Request → API Gateway ↓ Rate Limiter Middleware ↓ Redis Cluster (shared counter per user+endpoint) ├── INCR + EXPIRE (sliding window) ├── Returns: {allowed: true/false, remaining: N, reset_at: T} ↓ If allowed → forward to backend If denied → 429 Too Many Requests + Retry-After: 30 seconds

🎟️

Design Ticketmaster / High-Concurrency Booking

Seat reservation · Race conditions · Virtual queue · Inventory management

▾

Functional Requirements

Browse events, view seat map
Reserve seats (10-min hold)
Purchase (payment + confirmation)
No double-booking (critical)
Handle traffic spikes (Taylor Swift = 10M concurrent)

The Core Challenge

Concert with 50,000 seats
5M people try to buy at 10am
Must not oversell even 1 seat
Response must be fast (or queue)
Seat holds must expire correctly

Seat Reservation — Race Condition Prevention

Option 1 — Pessimistic Locking (DB): BEGIN TRANSACTION SELECT * FROM seats WHERE id=123 AND status='available' FOR UPDATE -- Lock held: no one else can select this seat UPDATE seats SET status='held', held_by=user_id, held_until=now()+600 COMMIT ✓ Correct, but locks can queue up under high concurrency Option 2 — Redis Atomic Hold: SET seat:123 user_456 NX EX 600 -- NX = only if not exists, EX = expire in 600 seconds -- Atomic: exactly one user wins the race ✓ Fast (sub-ms), automatically expires, no DB lock needed ✗ If Redis fails, seats might be lost (persistence + replica) Option 3 — Virtual Queue (Ticketmaster's approach): Under extreme load: put users in a virtual queue Issue queue position token, show wait time Process N users at a time from the queue Prevents stampede, gives fair access

Booking Flow

1. User views event → cached seat map (Redis/CDN) 2. User selects seat → Reserve: Redis SET seat:X user:Y NX EX 600 3. If reserved: proceed to payment (10 min window) 4. Payment success → atomically: release Redis lock + mark seat SOLD in DB 5. If payment fails/timeout: Redis key expires → seat available again 6. Confirm ticket → generate PDF, send email notification

Interview Strategy

How to Structure Any System Design Answer

The 35-minute framework interviewers use at FAANG: Requirements (5 min) → Capacity estimation (3 min) → High-level design (10 min) → Deep dive on 2-3 components (15 min) → Wrap up: tradeoffs, bottlenecks, improvements (2 min)

Step 1 — Clarify requirements: Never start designing without asking. "Is this global or single region? Read-heavy or write-heavy? What's the expected scale? What are the top 3 most important features?" Silence here is a red flag.
Step 2 — Estimate capacity: Users → DAU → requests/sec → storage → bandwidth. Round numbers. The goal is to identify: do we need caching? Do we need sharding? What's the bottleneck?
Step 3 — High-level design first: Draw the main components. Don't dive deep yet. Get alignment on the overall approach before detailing any one part.
Step 4 — Deep dive on hard parts: Ask the interviewer which parts to focus on. Usually: the part you mentioned you'd cache, the database schema, the queue design, the API design.
Step 5 — Discuss tradeoffs: For every decision, explain why. "I chose Cassandra over MySQL because writes are 10x more frequent than reads." Interviewers want to see you reason, not recite.

Red Flags in Interviews

Starting to design without asking requirements
Defaulting to microservices without justifying why
Saying "I'd use Kafka" without explaining what problem it solves here
Ignoring failure scenarios ("What happens if the cache goes down?")
Not mentioning tradeoffs — every design has them
Talking for 30 minutes without drawing anything

Real System DesignInterviews

Design a URL Shortener (bit.ly / TinyURL)

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Core Design: Short URL Generation

High-Level Architecture

Key Design Decisions

Design YouTube / Netflix Video Streaming

Functional Requirements

Scale

Video Upload & Processing Pipeline

Adaptive Bitrate Streaming (ABR)

CDN Strategy

Database Design

Key Tradeoffs

Design WhatsApp / Chat System

Functional Requirements

Scale

Connection Architecture

Message Storage

Group Messaging at Scale

End-to-End Encryption

Design Uber / Ride Matching System

Functional Requirements

Scale

Location Storage — The Core Problem

Matching Algorithm

Surge Pricing

Design Instagram / Photo Sharing

Functional Requirements

Scale

Feed Generation — The Hard Part

Architecture

Follow Graph

Design Twitter/X

Functional Requirements

Scale

Timeline Architecture

Design Google Drive / Dropbox

Functional Requirements

Scale

File Upload — Block-Level Chunking

Sync Algorithm

Storage Architecture

Design a Notification System

Functional Requirements

Scale

Architecture

Key Design Decisions

Design a Distributed Rate Limiter

Requirements

Scale

Token Bucket in Redis

Architecture

Design Ticketmaster / High-Concurrency Booking

Functional Requirements

The Core Challenge

Seat Reservation — Race Condition Prevention

Booking Flow

How to Structure Any System Design Answer

Red Flags in Interviews

Real System Design
Interviews