Networking Protocols
Networking Protocols
HTTP (HyperText Transfer Protocol)
- The foundation of data communication on the web.
- HTTP/1.1: One request per TCP connection (or keep-alive for reuse). Head-of-line blocking is a problem.
- HTTP/2: Multiplexing — multiple requests over a single TCP connection. Header compression (HPACK). Still has TCP-level head-of-line blocking.
- HTTP/3: Built on QUIC (UDP-based). Eliminates TCP head-of-line blocking. Faster connection setup.
WebSockets
A full-duplex, persistent, bidirectional communication channel over a single TCP connection.
How it works
- Client initiates an HTTP handshake with an
Upgrade: websocketheader. - Server responds with
101 Switching Protocols. - Both sides can now send messages freely at any time without re-establishing the connection.
Characteristics
- Bidirectional: Both client and server can push messages independently.
- Low latency: No HTTP overhead after the initial handshake.
- Stateful: The connection is long-lived and maintained.
- Protocol:
ws://orwss://(secure).
When to use WebSockets
- Real-time, interactive applications where both sides need to send data frequently.
- Examples:
- Chat applications (e.g., Slack, WhatsApp Web)
- Multiplayer online games
- Collaborative editing (e.g., Google Docs live cursors)
- Live trading dashboards (stock prices with user interactions)
- Live sports scoreboards with user interactions
Drawbacks
- More complex to scale horizontally — sticky sessions or a pub/sub layer (e.g., Redis) needed.
- Not ideal if only the server needs to push data (SSE is simpler for that).
- Firewalls/proxies may block WebSocket upgrades.
Server-Sent Events (SSE)
A unidirectional, server-to-client streaming mechanism over a standard HTTP connection.
How it works
- Client makes a regular HTTP GET request with
Accept: text/event-stream. - Server keeps the connection open and streams events as
data: ...\n\nformatted text. - Client receives events via the
EventSourceAPI in the browser. - If the connection drops, the browser automatically reconnects.
Characteristics
- Unidirectional: Only server → client.
- Built on HTTP: Works over HTTP/1.1 and HTTP/2. No special protocol upgrade needed.
- Auto-reconnect: Browser
EventSourcehandles reconnection automatically withLast-Event-ID. - Text-based: Events are UTF-8 text. Binary data requires encoding (e.g., base64).
When to use SSE
- Server needs to push updates to the client, but the client doesn’t need to send data back frequently.
- Examples:
- Live news feeds or social media timelines
- Notification systems
- Progress updates for long-running jobs (e.g., file upload processing)
- Live dashboards (metrics, monitoring) where the client only reads
- AI/LLM streaming responses (e.g., ChatGPT token-by-token output)
Drawbacks
- Unidirectional — client must use separate HTTP requests to send data back.
- Limited to ~6 concurrent connections per domain in HTTP/1.1 (not an issue with HTTP/2).
- Text-only by default.
Long Polling
A technique to simulate server push over plain HTTP before WebSockets/SSE were widely supported.
How it works
- Client sends an HTTP request.
- Server holds the request open until it has new data (or a timeout occurs).
- Server responds, client immediately sends another request.
When to use Long Polling
- Legacy systems or environments where WebSockets/SSE are not supported.
- Low-frequency updates where the overhead of a persistent connection isn’t justified.
Drawbacks
- Higher latency than WebSockets or SSE.
- More server resource usage (holding open connections).
- Largely superseded by SSE and WebSockets.
WebRTC (Web Real-Time Communication)
A protocol and API standard that enables direct peer-to-peer communication between browsers (or native clients) — for audio, video, and arbitrary data — without needing a server to relay the media.
How it works
Connection setup is the complex part. Two peers can’t just connect directly because they’re usually behind NATs and firewalls. The process:
- Signaling — peers exchange session descriptions (SDP: what codecs, formats they support) and network candidates via a signaling server. This is typically done over WebSockets.
- ICE (Interactive Connectivity Establishment) — each peer gathers its possible network addresses (candidates) and they try to find a path to each other.
- STUN — a lightweight server that tells a peer its public IP/port as seen from the outside. Used to attempt a direct connection.
- TURN — a relay server used as a fallback when a direct connection can’t be established (e.g., symmetric NATs). Media flows through TURN, so it’s more expensive.
- Once connected, media/data flows directly peer-to-peer, bypassing the server entirely.
Key components
- MediaStream — captures audio/video from camera/mic.
- RTCPeerConnection — manages the peer connection, codec negotiation, and media transmission.
- RTCDataChannel — sends arbitrary data (text, binary) directly between peers. Can be configured as reliable (like TCP) or unreliable (like UDP).
Characteristics
- Peer-to-peer: After signaling, no server is in the media path.
- UDP-based: Lower latency, tolerates some packet loss (acceptable for video/audio).
- Encryption mandatory: All streams are encrypted via DTLS and SRTP.
- Complex setup: ICE/STUN/TURN negotiation is significantly more involved than WebSockets.
When to use WebRTC
- Any use case requiring real-time audio or video between users.
- Examples:
- Video/voice calling (e.g., Google Meet, Zoom web client)
- Screen sharing
- P2P file transfer between browsers
- Low-latency multiplayer games requiring direct peer connections
Drawbacks
- Complex to implement — ICE negotiation, STUN/TURN infrastructure, SDP handling.
- TURN servers are needed as fallback and can be costly at scale (media flows through them).
- Doesn’t scale well for group calls in pure P2P mesh — 10 participants means each peer maintains 9 connections. SFU (Selective Forwarding Unit) servers are used to solve this.
- ~95% browser support (slightly less than WebSockets).
WebRTC + WebSockets together
WebRTC needs a signaling channel to bootstrap the peer connection — WebSockets are the standard choice for this. So in a video calling app:
- WebSockets handle signaling (SDP exchange, ICE candidates, chat messages, presence)
- WebRTC handles the actual audio/video/data once the peer connection is established
Comparison: WebSockets vs SSE vs Long Polling
| Feature | WebSockets | SSE | Long Polling |
|---|---|---|---|
| Direction | Bidirectional | Server → Client only | Server → Client (simulated) |
| Protocol | ws:// / wss:// |
HTTP | HTTP |
| Latency | Very low | Low | Medium |
| Auto-reconnect | Manual | Built-in | Manual |
| Browser support | Excellent | Excellent | Universal |
| Complexity | Higher | Low | Low |
| Binary support | Yes | No (text only) | No |
| HTTP/2 compatible | Yes | Yes (multiplexed) | Yes |
| Best for | Real-time interactive apps | Server push / streaming | Legacy / low-frequency updates |
When to Use What
| Scenario | Recommended |
|---|---|
| Chat / messaging app | WebSockets |
| Multiplayer game | WebSockets |
| Collaborative editing | WebSockets |
| Live notifications | SSE |
| LLM streaming output | SSE |
| Progress bar for background job | SSE |
| Live metrics dashboard (read-only) | SSE |
| Legacy browser support needed | Long Polling |
| Client sends data frequently too | WebSockets |
| Video/voice calling | WebRTC |
| P2P file transfer | WebRTC |
| Screen sharing | WebRTC |