Networking Protocols

Table of contents

Networking Protocols

HTTP (HyperText Transfer Protocol)

The foundation of data communication on the web.
HTTP/1.1: One request per TCP connection (or keep-alive for reuse). Head-of-line blocking is a problem.
HTTP/2: Multiplexing — multiple requests over a single TCP connection. Header compression (HPACK). Still has TCP-level head-of-line blocking.
HTTP/3: Built on QUIC (UDP-based). Eliminates TCP head-of-line blocking. Faster connection setup.

WebSockets

A full-duplex, persistent, bidirectional communication channel over a single TCP connection.

How it works

Client initiates an HTTP handshake with an Upgrade: websocket header.
Server responds with 101 Switching Protocols.
Both sides can now send messages freely at any time without re-establishing the connection.

Characteristics

Bidirectional: Both client and server can push messages independently.
Low latency: No HTTP overhead after the initial handshake.
Stateful: The connection is long-lived and maintained.
Protocol: ws:// or wss:// (secure).

When to use WebSockets

Real-time, interactive applications where both sides need to send data frequently.
Examples:
- Chat applications (e.g., Slack, WhatsApp Web)
- Multiplayer online games
- Collaborative editing (e.g., Google Docs live cursors)
- Live trading dashboards (stock prices with user interactions)
- Live sports scoreboards with user interactions

Drawbacks

More complex to scale horizontally — sticky sessions or a pub/sub layer (e.g., Redis) needed.
Not ideal if only the server needs to push data (SSE is simpler for that).
Firewalls/proxies may block WebSocket upgrades.

Server-Sent Events (SSE)

A unidirectional, server-to-client streaming mechanism over a standard HTTP connection.

How it works

Client makes a regular HTTP GET request with Accept: text/event-stream.
Server keeps the connection open and streams events as data: ...\n\n formatted text.
Client receives events via the EventSource API in the browser.
If the connection drops, the browser automatically reconnects.

Characteristics

Unidirectional: Only server → client.
Built on HTTP: Works over HTTP/1.1 and HTTP/2. No special protocol upgrade needed.
Auto-reconnect: Browser EventSource handles reconnection automatically with Last-Event-ID.
Text-based: Events are UTF-8 text. Binary data requires encoding (e.g., base64).

When to use SSE

Server needs to push updates to the client, but the client doesn’t need to send data back frequently.
Examples:
- Live news feeds or social media timelines
- Notification systems
- Progress updates for long-running jobs (e.g., file upload processing)
- Live dashboards (metrics, monitoring) where the client only reads
- AI/LLM streaming responses (e.g., ChatGPT token-by-token output)

Drawbacks

Unidirectional — client must use separate HTTP requests to send data back.
Limited to ~6 concurrent connections per domain in HTTP/1.1 (not an issue with HTTP/2).
Text-only by default.

Long Polling

A technique to simulate server push over plain HTTP before WebSockets/SSE were widely supported.

How it works

Client sends an HTTP request.
Server holds the request open until it has new data (or a timeout occurs).
Server responds, client immediately sends another request.

When to use Long Polling

Legacy systems or environments where WebSockets/SSE are not supported.
Low-frequency updates where the overhead of a persistent connection isn’t justified.

Drawbacks

Higher latency than WebSockets or SSE.
More server resource usage (holding open connections).
Largely superseded by SSE and WebSockets.

WebRTC (Web Real-Time Communication)

A protocol and API standard that enables direct peer-to-peer communication between browsers (or native clients) — for audio, video, and arbitrary data — without needing a server to relay the media.

How it works

Connection setup is the complex part. Two peers can’t just connect directly because they’re usually behind NATs and firewalls. The process:

Signaling — peers exchange session descriptions (SDP: what codecs, formats they support) and network candidates via a signaling server. This is typically done over WebSockets.
ICE (Interactive Connectivity Establishment) — each peer gathers its possible network addresses (candidates) and they try to find a path to each other.
STUN — a lightweight server that tells a peer its public IP/port as seen from the outside. Used to attempt a direct connection.
TURN — a relay server used as a fallback when a direct connection can’t be established (e.g., symmetric NATs). Media flows through TURN, so it’s more expensive.
Once connected, media/data flows directly peer-to-peer, bypassing the server entirely.

Key components

MediaStream — captures audio/video from camera/mic.
RTCPeerConnection — manages the peer connection, codec negotiation, and media transmission.
RTCDataChannel — sends arbitrary data (text, binary) directly between peers. Can be configured as reliable (like TCP) or unreliable (like UDP).

Characteristics

Peer-to-peer: After signaling, no server is in the media path.
UDP-based: Lower latency, tolerates some packet loss (acceptable for video/audio).
Encryption mandatory: All streams are encrypted via DTLS and SRTP.
Complex setup: ICE/STUN/TURN negotiation is significantly more involved than WebSockets.

When to use WebRTC

Any use case requiring real-time audio or video between users.
Examples:
- Video/voice calling (e.g., Google Meet, Zoom web client)
- Screen sharing
- P2P file transfer between browsers
- Low-latency multiplayer games requiring direct peer connections

Drawbacks

Complex to implement — ICE negotiation, STUN/TURN infrastructure, SDP handling.
TURN servers are needed as fallback and can be costly at scale (media flows through them).
Doesn’t scale well for group calls in pure P2P mesh — 10 participants means each peer maintains 9 connections. SFU (Selective Forwarding Unit) servers are used to solve this.
~95% browser support (slightly less than WebSockets).

WebRTC + WebSockets together

WebRTC needs a signaling channel to bootstrap the peer connection — WebSockets are the standard choice for this. So in a video calling app:

WebSockets handle signaling (SDP exchange, ICE candidates, chat messages, presence)
WebRTC handles the actual audio/video/data once the peer connection is established

Comparison: WebSockets vs SSE vs Long Polling

Feature	WebSockets	SSE	Long Polling
Direction	Bidirectional	Server → Client only	Server → Client (simulated)
Protocol	`ws://` / `wss://`	HTTP	HTTP
Latency	Very low	Low	Medium
Auto-reconnect	Manual	Built-in	Manual
Browser support	Excellent	Excellent	Universal
Complexity	Higher	Low	Low
Binary support	Yes	No (text only)	No
HTTP/2 compatible	Yes	Yes (multiplexed)	Yes
Best for	Real-time interactive apps	Server push / streaming	Legacy / low-frequency updates

When to Use What

Scenario	Recommended
Chat / messaging app	WebSockets
Multiplayer game	WebSockets
Collaborative editing	WebSockets
Live notifications	SSE
LLM streaming output	SSE
Progress bar for background job	SSE
Live metrics dashboard (read-only)	SSE
Legacy browser support needed	Long Polling
Client sends data frequently too	WebSockets
Video/voice calling	WebRTC
P2P file transfer	WebRTC
Screen sharing	WebRTC