Databases
Deep Dive
80% of system design problems are data layer problems. Master this module.
Database 01
Indexing & B+ Trees
An index is the single most impactful performance tool in a database. Understanding why indexes work the way they do โ through B+ trees โ gives you the intuition to design them correctly every time.
Imagine a textbook with no index. To find every mention of "sharding", you read every page. With an index, you go straight to page 347. A database without an index does a full table scan โ reads every row. With an index, it jumps directly to the matching rows. The tradeoff: the index itself takes up space and must be updated on every write.
How B+ Trees Work
Most database indexes use a B+ Tree โ a balanced tree where all data lives in leaf nodes, and internal nodes are navigation pointers. Leaf nodes are linked in a doubly-linked list, enabling efficient range scans.
B+ Tree (index on age):
[30 | 50]
/ | \
[15|25] [35|45] [55|65]
| | | | | |
Data Data Data โ Leaf nodes (sorted, linked)
Query: WHERE age BETWEEN 25 AND 45
โ Find 25 in tree (O log n)
โ Scan linked leaf nodes rightward until > 45
โ Very efficient range scan!
Index Types
- Primary Index (Clustered): Data rows are physically stored in index order. Only one per table. Fast for range queries on primary key. In InnoDB, the primary key IS the clustered index.
- Secondary Index (Non-clustered): Separate structure pointing back to primary key. Multiple per table. Looking up a non-key column requires index lookup + primary key lookup (two hops).
- Composite Index: Index on (col_a, col_b, col_c). Useful for queries filtering/sorting multiple columns. Must follow leftmost prefix rule โ index (a,b,c) is used by queries on (a), (a,b), (a,b,c) but NOT (b,c) alone.
- Covering Index: Index contains all columns a query needs โ no need to touch the actual row. Extremely fast. Design indexes for your most critical queries.
- Partial Index: Index only rows matching a condition. e.g., WHERE status='active'. Smaller, faster than full index. Supported in PostgreSQL, SQLite.
Index downsides: Every index slows down INSERT/UPDATE/DELETE (the index must also be updated). Don't add indexes blindly. Measure with EXPLAIN ANALYSE before and after. A table with 20 indexes on a write-heavy workload can be slower than one with 3.
Database 02
Query Optimisation
How the Query Planner Works
When you run a query, the database doesn't execute it literally. The query planner generates multiple possible execution plans, estimates the cost of each (using table statistics), and picks the cheapest one. EXPLAIN ANALYSE shows you what it chose.
EXPLAIN ANALYSE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.country = 'UK'
GROUP BY u.id;
-- Output shows:
-- Seq Scan on orders (cost=0.00..4200.00 rows=50000)
-- Hash Join (cost=120.00..5400.00 rows=800)
-- Index Scan on users_country_idx (cost=0.15..45.00 rows=800)
Common Query Anti-Patterns
- SELECT *: Fetches all columns, defeating covering indexes, saturating network. Always select only what you need.
- Function on indexed column:
WHERE YEAR(created_at) = 2024cannot use an index oncreated_at. UseWHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'. - Implicit type conversion:
WHERE user_id = '123'whenuser_idis INT forces a cast on every row โ index not used. - N+1 queries: Loop over 100 users and query their orders one by one = 101 queries. Use JOIN or IN clause instead.
- Missing index on JOIN column: Joining on an unindexed foreign key causes a full table scan on every join. Always index foreign keys.
Database 03
Transactions, Locks & Deadlocks
Locking Strategies
Pessimistic Locking
- Lock the row before reading it
SELECT ... FOR UPDATE- Prevents concurrent modification
- Risk: deadlocks if lock order differs
- Good for: high contention, short transactions
Optimistic Locking
- Read row + version number
- Write only if version unchanged
- No lock held during processing
- Risk: retry storms under high contention
- Good for: low contention, long processing
Deadlock Example
Transaction A: Transaction B:
Lock row 1 โ Lock row 2 โ
Wait for row 2... Wait for row 1...
โ DEADLOCK โ
Solution: always acquire locks in the same order.
Or use SELECT ... FOR UPDATE SKIP LOCKED for queue-like patterns.
PostgreSQL automatically detects deadlocks and kills one transaction (returning an error). Your application must handle this and retry. Design transactions to be short, acquire locks in a consistent order, and avoid user interaction during a transaction.
Database 04
NoSQL Database Types
NoSQL doesn't mean "no structure" โ it means "different structure optimised for different access patterns." There are five major categories.
Key-Value Stores
โพExamples: Redis, DynamoDB, Memcached
Simplest NoSQL. A hash map at scale. O(1) reads and writes by key. No query language โ you must know the exact key. Perfect for: session storage, caching, user preferences, rate limiting counters, leaderboards.
You lose the ability to query by any field other than the key. If you store user sessions by session_id, you cannot efficiently find all sessions for user_123 without a secondary index (which most KV stores don't have).
Document Databases
โพExamples: MongoDB, CouchDB, Firestore
Stores documents (JSON/BSON). Flexible schema โ different documents in the same collection can have different fields. Rich query language. Indexes on any field. Atomic updates within a single document. Perfect for: user profiles, product catalogues, content management, event logging.
The embedding trap: MongoDB lets you embed related data (e.g., posts inside a user document). This is fast for reads but causes problems when embedded data grows unboundedly (a user with 10M posts in their document becomes huge), or when embedded data is accessed independently.
Column-Family (Wide-Column) Databases
โพExamples: Cassandra, HBase, Google Bigtable
Data is organised by row key, then column families. Designed for write-heavy, append-heavy workloads. Excellent horizontal scaling and time-series data. Queries must be designed around your partition key โ ad hoc queries are not efficient.
Cassandra table for user events:
PRIMARY KEY (user_id, event_time)
-- Partition key: user_id โ same node
-- Clustering key: event_time โ sorted on disk
-- Query: get last 100 events for user 123
SELECT * FROM events WHERE user_id = 123
ORDER BY event_time DESC LIMIT 100;
-- Extremely fast: single partition, sorted read
Graph Databases
โพExamples: Neo4j, Amazon Neptune, JanusGraph
Stores nodes (entities) and edges (relationships). Optimised for traversing relationships. SQL joins across many hops are O(n) โ graph databases do it in O(1) per hop. Perfect for: social networks, recommendation engines, fraud detection, knowledge graphs.
-- Find friends of friends of user 123 in Cypher (Neo4j):
MATCH (u:User {id: 123})-[:FOLLOWS*2]->(fof:User)
WHERE NOT (u)-[:FOLLOWS]->(fof)
RETURN fof.name LIMIT 20;
Time-Series Databases
โพExamples: InfluxDB, TimescaleDB (PostgreSQL extension), Prometheus
Optimised for data that's always appended, queried over time ranges, and aggregated. Stores metrics, sensor readings, financial ticks. Features: automatic data compression (time-series data is highly compressible), downsampling (store raw data for 7 days, hourly aggregates for 1 year), built-in time functions.
Database 05
Redis Internals
Redis is an in-memory data structure store. Not just a cache โ it's a powerful data structure server with persistence options.
Data Structures
- String: Binary-safe. Max 512MB. Atomic INCR/DECR. Use for: counters, rate limiting, session tokens, feature flags.
- List: Linked list. O(1) push/pop from head or tail. O(n) for middle access. Use for: message queues, activity feeds (LPUSH, LTRIM), task queues.
- Hash: Field-value pairs. O(1) field access. Memory-efficient for small hashes. Use for: user profiles, object storage.
- Set: Unordered unique strings. O(1) add/remove/contains. Set operations (UNION, INTERSECT, DIFF). Use for: tags, social connections, unique visitors.
- Sorted Set (ZSet): Set + floating point score. O(log n) operations. Range queries by score or rank. Use for: leaderboards, priority queues, rate limiting windows, delayed tasks.
- Bitmap: Bit array. Space-efficient. 512MB = 4 billion bits. Use for: daily active users, feature toggles per user ID.
- HyperLogLog: Probabilistic cardinality estimation. ~12KB regardless of set size. 0.81% error rate. Use for: unique visitor counts at massive scale.
- Stream: Append-only log. Consumer groups. Like a lightweight Kafka. Use for: event sourcing, audit logs, chat history.
Persistence Options
RDB (Snapshots)
- Point-in-time snapshot to disk
- Compact binary format
- Fast restarts
- Risk: up to minutes of data loss on crash
AOF (Append-Only File)
- Log every write command
- fsync every second (default) or every write
- Near-zero data loss
- Larger files, slower restart
Database 06
Cassandra Architecture
Cassandra is a masterless, AP distributed database designed for massive write throughput and linear scalability. Used by Netflix, Instagram, Apple (10+ petabytes).
Key Design Principles
- Masterless ring: All nodes are equal peers. No single point of failure. Any node can accept reads and writes. Coordinator node routes requests.
- Consistent hashing: Data is distributed around a hash ring using virtual nodes (vnodes). Adding nodes is seamless โ they take over a portion of the ring.
- Tunable consistency: Per-query consistency level. ONE (fastest, least consistent), QUORUM (balanced), ALL (slowest, strongest). Quorum with RF=3 means W+R > N, guaranteeing at least one node has latest data.
- SSTable writes: Writes go to in-memory MemTable + commit log. MemTable flushes to immutable SSTable on disk. SSTables are compacted periodically. Writes are always sequential I/O โ extremely fast.
- No joins, no transactions: By design. If you need them, think again about your data model. Cassandra requires query-first design.
"Why use Cassandra over PostgreSQL?" โ Write throughput at scale (millions of writes/sec), geographic distribution, no downtime for node additions, no single point of failure. Don't use Cassandra for: complex queries, strong consistency requirements, small datasets, or when your team doesn't have operational expertise.
Database 07
Choosing the Right Database
| Use Case | Best Choice | Why |
|---|---|---|
| User accounts, orders, financial data | PostgreSQL / MySQL | ACID, complex queries, JOINs |
| Session storage, caching | Redis | Sub-ms latency, TTL, data structures |
| Time-series metrics | InfluxDB / TimescaleDB | Compression, time functions, downsampling |
| Social graph, recommendations | Neo4j | Relationship traversal performance |
| Write-heavy event log at scale | Cassandra | Masterless, linear scale, sequential writes |
| Full-text search | Elasticsearch | Inverted index, relevance scoring |
| Flexible schema, mobile apps | MongoDB / Firestore | Document model, offline sync |
| Data warehouse / analytics | BigQuery / Redshift / Snowflake | Columnar, MPP, massive aggregations |
In interviews, don't just pick one database. Real systems use multiple: PostgreSQL for transactions, Redis for caching, Cassandra for event logs, Elasticsearch for search. Explain why each is used and what each one's limitations are. This shows real engineering maturity.
Test Your Understanding
1. You need to find all friends of friends of a user in a social network. You have 1 billion users. Which database type is most appropriate?
2. A composite index exists on columns (country, age, name). Which query can use this index?