Networking
Fundamentals
Understanding the pipes your data travels through โ and exactly where latency hides.
Networking 01
TCP vs UDP
Every byte your application sends travels over one of two transport protocols. Choosing the wrong one for your use case is a common architecture mistake.
TCP โ Transmission Control Protocol
- Connection-oriented (3-way handshake)
- Guaranteed delivery โ retransmits lost packets
- Ordered delivery โ data arrives in sequence
- Flow control & congestion control
- Higher overhead, higher latency
- Use: HTTP, databases, file transfer, email
UDP โ User Datagram Protocol
- Connectionless โ no handshake
- No delivery guarantee โ packets may be lost
- No ordering โ packets can arrive out of order
- No flow control
- Lower overhead, lower latency
- Use: video streaming, gaming, DNS, VoIP
TCP is like sending a registered letter โ you get confirmation of delivery, it arrives in order, and if it's lost it's resent. UDP is like shouting across a room โ fast, but some words might not reach everyone. For live video, a slightly choppy frame is better than waiting for a resend.
The TCP 3-Way Handshake
Client Server
| |
|--- SYN (seq=x) -------->| Client initiates
| |
|<-- SYN-ACK (seq=y, | Server acknowledges
| ack=x+1) ------------|
| |
|--- ACK (ack=y+1) ------>| Client confirms
| |
|=== Connection Open ======|
| |
This handshake adds one round-trip time (RTT) before any data flows. At 150ms cross-continent latency, that's 150ms just to open the connection โ before a single byte of your HTTP request is sent. This is why HTTP keep-alive and connection pooling matter so much.
When asked "what happens when you open a WebSocket?": TCP handshake (1 RTT) โ TLS handshake (1โ2 RTT) โ HTTP Upgrade request โ WebSocket connection established. Total: 3โ4 RTTs before real-time messaging starts. This is why proximity to users matters for latency-sensitive apps.
Networking 02
HTTP/1.1 vs HTTP/2 vs HTTP/3
Each version of HTTP was invented to solve performance bottlenecks of the previous one. Understanding why they were created reveals deep insight into how the web works.
HTTP/1.1 โ The Baseline
โพOne request per TCP connection at a time. The connection can be reused (keep-alive), but only one outstanding request allowed. Problem: loading a page with 100 assets means 100 sequential requests on each connection. Browsers work around this by opening 6 parallel connections per domain โ a hack, not a solution.
- Head-of-line blocking: If request #1 stalls, requests #2โ6 wait behind it on that connection.
- No header compression: Same headers (cookies, user-agent) are sent with every request โ can be kilobytes of overhead.
- Text-based protocol: Human-readable but inefficient to parse.
HTTP/2 โ Multiplexing
โพHTTP/2 sends multiple requests over a single TCP connection simultaneously using streams. Each request is a stream; they're multiplexed. This eliminates the need for 6 parallel connections.
- Multiplexing: Many requests in parallel over one connection. No more 6-connection hack.
- Header compression (HPACK): Headers are compressed and deduplicated โ massive bandwidth saving for repeated requests.
- Server Push: Server can proactively send resources the client hasn't requested yet (e.g., push CSS before browser parses HTML). Rarely used well in practice.
- Binary framing: Efficient machine-readable format instead of text.
- Still TCP: TCP-level head-of-line blocking remains. One lost packet stalls all streams.
HTTP/3 โ Built on QUIC (UDP)
โพHTTP/3 replaces TCP with QUIC โ a protocol built on UDP that reimplements TCP's reliability features but eliminates TCP's head-of-line blocking.
- QUIC solves TCP HOL blocking: Each stream is independent. A lost packet only stalls its stream, not all streams.
- 0-RTT connection resumption: Reconnecting to a known server can send data immediately โ no handshake wait.
- Built-in TLS 1.3: Connection establishment and encryption happen in parallel.
- Connection migration: Switching from WiFi to mobile doesn't drop the connection โ identified by connection ID, not IP.
QUIC on UDP means firewalls and middleboxes that block UDP traffic can't use HTTP/3. Many corporate networks block UDP port 443. Clients fall back to HTTP/2 in those cases. Always support both.
| Feature | HTTP/1.1 | HTTP/2 | HTTP/3 |
|---|---|---|---|
| Multiplexing | No | Yes (TCP) | Yes (QUIC) |
| HOL Blocking | Yes | At TCP level | No |
| Header Compression | No | HPACK | QPACK |
| Transport | TCP | TCP | UDP (QUIC) |
| 0-RTT Reconnect | No | No | Yes |
| TLS Required | Optional | Effectively yes | Always |
Networking 03
TLS & HTTPS
TLS (Transport Layer Security) provides encryption, authentication, and integrity for network communication. HTTPS = HTTP over TLS.
TLS 1.3 Handshake (Simplified)
Client Server
| |
|-- ClientHello (supported ciphers, |
| random, key_share) ------------->|
| |
|<-- ServerHello (chosen cipher, |
| random, key_share, cert, |
| Finished) ----------------------|
| |
|-- Finished + HTTP Request -------->| โ 1 RTT only in TLS 1.3!
| |
|<-- HTTP Response ------------------|
TLS 1.3 reduced the handshake from 2 RTTs (TLS 1.2) to 1 RTT โ and 0 RTTs for session resumption. This was a major latency improvement. Always use TLS 1.3 in new systems.
Where TLS Termination Happens
- At the load balancer / reverse proxy: Most common. NGINX/ALB decrypts traffic, forwards plain HTTP to backend. Simpler certificate management, backend servers don't need TLS config. Traffic inside your network is unencrypted.
- End-to-end (mTLS): Traffic is encrypted all the way to the backend service. Required for zero-trust architectures. Each service has a certificate and verifies the other (mutual TLS). More secure, more operational overhead.
Networking 04
DNS Resolution
DNS translates human-readable domain names (google.com) into IP addresses (142.250.80.46). Every network request begins with DNS โ it's the phone book of the internet.
Resolution Flow
Browser โ OS DNS cache? โ Yes: use IP
โ No:
Browser โ Local Resolver (ISP / 8.8.8.8)
Resolver โ Root Name Server (knows TLDs)
Resolver โ TLD Name Server (.com, .net)
Resolver โ Authoritative Name Server (owns google.com records)
Authoritative โ returns IP + TTL
Resolver โ caches + returns to browser
DNS Record Types
| Record | Purpose | Example |
|---|---|---|
| A | IPv4 address | google.com โ 142.250.80.46 |
| AAAA | IPv6 address | google.com โ 2a00:1450:... |
| CNAME | Alias to another name | www โ google.com |
| MX | Mail server | gmail.com โ smtp.google.com |
| TXT | Arbitrary text (SPF, verification) | v=spf1 include:... |
| NS | Name servers for domain | ns1.example.com |
| SOA | Start of Authority โ zone metadata | Serial, refresh intervals |
DNS for System Design
- DNS load balancing: Return multiple A records for the same domain. Client picks one (often first). Low-overhead global LB, but no health checks โ dead servers still get traffic until TTL expires.
- GeoDNS: Return different IPs based on client's location. Users in Asia get Asia region IPs. Used by CDNs and multi-region apps.
- Low TTL for failover: Set TTL to 60 seconds for services that need fast failover. Cost: more DNS queries, more load on resolvers.
- DNS propagation: Changing a DNS record takes up to TTL seconds to propagate. Plan deployments around this.
Networking 05
WebSockets
WebSockets provide full-duplex, persistent communication between client and server over a single TCP connection. Unlike HTTP (request โ response), WebSockets allow the server to push data to the client at any time.
HTTP is like a walkie-talkie โ one party talks, the other listens, then they switch. WebSockets is like a phone call โ both parties can talk simultaneously, any time, without one needing to "initiate".
Upgrade Handshake
Client โ Server:
GET /chat HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZQ==
Server โ Client:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
// Connection is now a WebSocket. Both sides can send frames anytime.
WebSockets at Scale: The Hard Parts
- Sticky sessions required: A WebSocket connection is stateful โ it lives on one server. Your load balancer must route a client to the same server every time (sticky sessions / IP hash).
- Horizontal scaling problem: If User A is on Server 1 and User B is on Server 2, how does a message from A reach B? You need a pub/sub broker (Redis Pub/Sub, Kafka) to broadcast between servers.
- Connection limits: Each open WebSocket is a file descriptor. A Linux server defaults to ~65k open connections. Tune
ulimit, use event-driven servers (Node.js, Golang), not one thread per connection. - Heartbeats: Idle connections are killed by NATs and load balancers after 60โ90 seconds. Send ping/pong frames every 30s to keep connections alive.
Networking 06
REST APIs
REST (Representational State Transfer) is an architectural style for building APIs over HTTP. It's not a protocol โ it's a set of constraints. When followed correctly, REST gives you a predictable, cacheable, scalable interface.
REST Constraints
- Stateless: Each request contains all information needed. Server holds no client state. Enables horizontal scaling.
- Uniform interface: Resources identified by URLs. Standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Standard status codes.
- Cacheable: GET responses can be cached by browsers, CDNs, proxies. Must be explicit about cache headers.
- Client-Server: Clear separation of concerns. UI and data storage evolve independently.
HTTP Status Codes That Matter
| Code | Meaning | When to Use |
|---|---|---|
| 200 OK | Success | Successful GET, PUT, PATCH |
| 201 Created | Resource created | Successful POST |
| 204 No Content | Success, no body | Successful DELETE |
| 400 Bad Request | Invalid input | Validation errors |
| 401 Unauthorized | Not authenticated | Missing/invalid token |
| 403 Forbidden | Not authorized | Valid token, wrong permissions |
| 404 Not Found | Resource missing | ID doesn't exist |
| 409 Conflict | State conflict | Duplicate create, optimistic lock fail |
| 429 Too Many Requests | Rate limited | Rate limit exceeded |
| 500 Internal Server Error | Server bug | Unhandled exceptions |
| 503 Service Unavailable | Overloaded/down | Circuit open, health check fail |
Networking 07
GraphQL
GraphQL is a query language for APIs. Instead of the server deciding what data to return, the client specifies exactly what it needs. This eliminates over-fetching and under-fetching.
REST over-fetching problem
- GET /users/123 returns 40 fields
- You need 3 fields
- 37 fields are wasted bandwidth
- Mobile clients on 3G are hurt most
REST under-fetching (N+1)
- GET /users returns 20 users
- For each user, GET /users/id/posts
- = 21 HTTP requests for one screen
- Each is an extra RTT
# GraphQL: client asks for exactly what it needs
query {
user(id: "123") {
name
email
posts(last: 5) {
title
createdAt
}
}
}
# One request โ exactly the data you asked for
GraphQL tradeoffs: More complex caching (POST requests, dynamic queries โ harder to CDN cache). N+1 queries on the server side without DataLoader. Schema complexity grows over time. Not always the right choice โ REST is simpler for internal APIs. GraphQL shines for mobile apps and when multiple clients need different data shapes from the same API.
Networking 08
gRPC
gRPC is a high-performance RPC framework by Google. Uses HTTP/2 as transport, Protocol Buffers for serialisation. Designed for internal service-to-service communication.
gRPC Advantages
- ~10x faster than JSON/REST (binary Protobuf)
- Strongly typed schema (proto files)
- Auto-generated client SDKs in any language
- Streaming support (client, server, bidirectional)
- HTTP/2 multiplexing
gRPC Disadvantages
- Not human-readable (binary)
- Browser support requires gRPC-Web proxy
- Harder to debug without tooling
- Schema evolution needs care
- Not appropriate for public APIs
In system design: use REST or GraphQL for external/public APIs (developer-friendly, cacheable, browser-native). Use gRPC for internal microservice communication (performance, type safety, streaming). This is what Google, Netflix, and Uber do.
Networking 09
Connection Pooling
Opening a TCP connection + TLS handshake takes ~3 RTTs and significant CPU. Creating a database connection is even more expensive (auth, session setup). Connection pooling reuses existing connections instead of creating a new one per request.
Classic failure: An application opens a new DB connection per request. At 1000 req/s, it tries to open 1000 DB connections. PostgreSQL has a default max of 100. Connections are refused. The app fails. A connection pool of 20โ50 connections handles this easily โ requests wait briefly in the pool queue rather than failing.
Sizing a Connection Pool
A common formula: pool_size = (core_count * 2) + effective_spindle_count. More connections don't always mean more throughput โ too many connections cause context switching overhead. Start small (10โ20), measure, then tune.
Networking 10
Long Polling & Server-Sent Events
Long Polling
- Client sends request, server holds it open
- Server responds when data is available
- Client immediately re-connects after response
- Works through all firewalls (plain HTTP)
- Higher latency than WebSockets
- Good for: chat fallback, notifications
Server-Sent Events (SSE)
- One-way: server โ client only
- Single long-lived HTTP connection
- Auto-reconnect built in
- Text-based (easy to debug)
- Works through HTTP/2 multiplexing
- Good for: live dashboards, stock tickers, notifications
Choosing the right real-time protocol: SSE for one-way server push (simpler), WebSockets for bidirectional (chat, gaming, collaboration), Long Polling as a fallback when WebSockets are blocked. Many production systems (Slack, GitHub) use WebSockets with Long Polling fallback.
Module 02 Quiz
Test Your Networking Knowledge
Scenario-based questions. Select the best answer โ then read the explanation.
Q1. Your API uses HTTP/1.1. Users complain about slow page loads โ the browser makes 40 parallel requests to load a page. You switch to HTTP/2. What is the PRIMARY improvement?
Q2. You're building a multiplayer game that sends 60 position updates per second per player. Which protocol should you use?
Q3. A client profiling tool shows: DNS 50ms, TCP connect 80ms, TLS handshake 160ms, server processing 2,600ms, transfer 110ms โ total 3s. Where do you focus optimization?
Q4. A response arrives with Cache-Control: no-cache. What does this actually mean?
Q5. You need real-time serverโclient notifications (new messages, alerts) for 500K concurrent users. Lowest server resource footprint?
Q6. Your gRPC service handles 50K RPC calls/second. New requirement: add a browser client. Browsers cannot use gRPC natively. Standard solution?