WebSocket

Reading time: 11 minute Word count: 2175

Network WebSocket HTTP Web

WebSocket is often described as “HTTP that can talk both ways” or “HTTP better suited for real-time scenarios”. That gets the direction right, but if you stop there, implementation, packet capture, and troubleshooting will go off track. WebSocket is not adding a push feature to HTTP. It keeps the browser and existing Web infrastructure path intact, then turns request/response, which is naturally half-duplex, into a persistent full-duplex channel.

What it really solves is not “how to let the server send something occasionally”. It is “how to keep one connection alive between browser and server where both sides can speak at any time, while still reusing the current HTTP / TCP / TLS deployment reality as much as possible.” That is why the important parts are not the JavaScript API, but the upgrade handshake, connection lifetime, frame boundaries, heartbeats, and middlebox compatibility.

WebSocket is not about “being faster than HTTP”. It first uses HTTP to switch protocols, then turns later communication into a persistent full-duplex connection with clear message boundaries

Where the Problem Comes From

HTTP is great for request/response, but its default model is always:

The client speaks first
The server replies later
One request maps to one response

That works naturally for page loads, form submissions, and REST APIs. But it starts to struggle with scenarios like these:

Chat messages and collaborative docs need low-latency two-way updates
Market data, alerts, monitoring, and game state need continuous push
Browser and server want to keep one long-lived session instead of repeatedly opening many requests

Early Web certainly tried other approaches:

Polling
Long polling
Comet-style hacks that try to force real-time behavior through an HTTP shell

They are not impossible, but they all run into the same kinds of problems:

Repeated request-header overhead is expensive
Server push is unnatural and still has to start from a client-initiated request
Latency and concurrency cost grow with the number of connections
Implementation and intermediary behavior become awkward

So WebSocket is not trying to “simulate real-time better”. It is trying to let browser and server keep one bidirectional channel alive.

The Background It Came From

WebSocket appeared when the Web had already stopped being just document retrieval and had started carrying chat, collaboration, real-time interfaces, and high-frequency interaction. It faced a very concrete reality:

The client is usually a browser, so you cannot just invent a completely new access protocol
The path already contains HTTP proxies, load balancers, TLS termination, and other middleboxes
Developers want to reuse the same domain, the same port, and the same access stack

That directly shapes the design:

The establishment phase stays close to HTTP so it can pass through the existing Web infrastructure
Once the handshake is complete, later traffic should no longer be bound by HTTP request/response semantics
The connection must have clear message boundaries instead of being a raw TCP byte stream
The protocol must consider idle timeouts, heartbeats, and large-scale long-lived connection management

So WebSocket is not a “more pure real-time protocol” that ignores the world. It is a very practical compromise for the Web.

Grasp the Main Model First

Separate these layers:

Establishment layer: use HTTP/1.1 Upgrade to switch protocols
Transport layer: usually still TCP, or TLS on top for wss
Session layer: one long-lived connection where both sides can send messages whenever they want
Message layer: framed messages, control frames, closing semantics, and heartbeats

The easiest thing to misread is to mix the handshake phase and the data phase into one thing. The point of WebSocket is:

The front half looks like HTTP so it fits the real world
The back half is no longer HTTP so it can become a real full-duplex message channel

The Most Common Interaction Path

The usual WebSocket main path can be simplified like this:

The browser sends an HTTP request that declares it wants to upgrade to websocket
The server returns 101 Switching Protocols
Once the upgrade succeeds, both sides stop exchanging data as HTTP request/response
Client and server can both send message frames at any time
Heartbeats, closing frames, and TCP/TLS state maintain the connection lifetime

You can think of it like this:

Client -> Server: GET /chat
                  Upgrade: websocket
                  Connection: Upgrade
                  Sec-WebSocket-Key: ...
                  Sec-WebSocket-Version: 13

Server -> Client: HTTP/1.1 101 Switching Protocols
                  Upgrade: websocket
                  Connection: Upgrade
                  Sec-WebSocket-Accept: ...

Client <-> Server: WebSocket Frames

The most important thing in that path is not the 101 status code itself. It is that the protocol boundary changes there:

Before 101, intermediaries mainly interpret the traffic using HTTP rules
After 101, the two sides are speaking WebSocket frames, not alternating HTTP request/response

So many problems first have to be split into: is the issue in the handshake phase, or in the long-lived connection phase after upgrade?

What It Actually Solves

The core problems WebSocket solves can be compressed into three:

It creates a long-lived bidirectional channel between browser and server
It keeps clear message boundaries instead of exposing only a raw TCP byte stream
It reuses the existing Web access path as much as possible instead of requiring a new port and a new infrastructure path

The things it does not solve should be equally clear:

It does not erase TCP’s reliability cost
It does not automatically provide publish/subscribe, replay, persistence, or offline catch-up
It does not define message semantics, ordering policy, or authentication model for your application

So WebSocket is good for “browser-to-server real-time sessions”, not for “all real-time systems should default to WebSocket”.

Why It Was Designed This Way

Why upgrade over HTTP instead of opening a brand-new browser protocol

If WebSocket had required browsers to talk a completely new protocol from the start, deployment problems would have appeared immediately:

Proxies and firewalls might not pass it
Developers could not easily reuse existing domains, ports, and TLS setup
It would be harder to connect to the browser security model and the current access path

Using HTTP for the upgrade has real value:

The handshake can pass through the existing Web stack
Load balancers, reverse proxies, and TLS termination layers are easier to reuse
Browser-side development feels more natural

The tradeoff is:

The handshake phase must respect the reality of HTTP middleboxes
Not every proxy can support upgrade and long-lived connections correctly

So WebSocket is not “the purest protocol design”. It is “the design most likely to survive in the Web world”.

Why the post-upgrade traffic can no longer be read as HTTP

Many people instinctively think of WebSocket as “continuous HTTP”. That leads to the wrong mental model immediately.

HTTP’s core is discrete requests and discrete responses. After a successful upgrade, the situation is different:

The connection stays alive
Both sides can send messages actively
Data enters the application as frames, not as alternating HTTP responses and requests

If you keep using HTTP thinking on the later traffic, you will misread a lot of things:

A server-initiated message is not a response
Two client messages in a row are not “two HTTP requests”
A live link does not mean one side is still logically healthy at the business layer

Why there are frames instead of plain TCP bytes

If the upgraded WebSocket connection only exposed a raw TCP byte stream, browser APIs and the application layer would immediately run into problems:

Where does one message start and end
How do text and binary differ
How do heartbeats and close signals stay separate from business messages

That is why WebSocket is a message-frame protocol. The benefits are:

Applications can handle messages instead of raw bytes
Text frames, binary frames, and control frames are clearly separated
Fragmentation and reassembly can stay inside the protocol layer

The tradeoff is also there:

The frame layer itself needs extra format and state logic
Applications still need to understand that message boundaries and business boundaries are not always identical

Why ping/pong exists instead of relying only on TCP keepalive

TCP keepalive exists, but it is not enough for WebSocket engineering. The reason is practical:

Default probe intervals are often too long
Middle proxies and load balancers may time out idle connections sooner
The application wants to know whether this WebSocket session is still alive, not only whether the socket has physically broken

So WebSocket ping/pong is more like application-visible session keepalive and liveness probing.

It does not solve pure link math. It answers:

Can this long connection still pass through the middle layers
Is the peer application still responding
Did some proxy reclaim the connection while it was idle

Why closing also needs a close frame

If the connection just drops at TCP level, the application cannot easily tell:

Normal business close
Abnormal interruption
Idle timeout reclaimed by an intermediary

The close frame gives the protocol a chance to say “I am ending now, and here is roughly why.” That matters for browsers, servers, and troubleshooting.

It does not guarantee graceful shutdown, because the network can still die suddenly, but on the normal path it at least provides a clear ending semantic.

The Design Choices That Look Less Obvious but Matter A Lot

WebSocket is not a “better push version of HTTP”

If you only think of WebSocket as “the server can push messages”, you are writing it too narrowly. What it really changes is the communication model itself:

From request/response to full-duplex session
From discrete transactions to long-lived connections
From HTTP messages to application frames

Push is a result, not the essence.

The hardest part is often not the protocol. It is the intermediaries

Many production issues are not WebSocket frame-format problems. They are:

Reverse proxies not forwarding the upgrade headers correctly
Load balancer idle timeouts being too short
TLS termination layers and backends not matching on connection policy
Certain gateways being unfriendly to long-lived connections or high concurrency

So WebSocket troubleshooting cannot stop at the browser and the application server. The middle of the path is often the first scene.

A long-lived connection does not automatically fit every semantic

WebSocket is easy to turn into a “dump everything into one pipe” bus, but it does not naturally provide:

Topic routing
Message persistence
Catch-up after reconnect
Multi-consumer semantics

If the business needs those, the application protocol, broker, or a higher-level system usually has to provide them. Otherwise “great while connected, messy after reconnect” becomes very common.

WebSocket still runs on TCP, so TCP’s benefits and costs are still there

WebSocket gives you message frames and full duplex, but it does not remove TCP’s in-order delivery, retransmission, and head-of-line blocking cost.

That means:

Weak-network latency spikes can still be amplified by lower-layer retransmissions
Stuffing too many different business concerns into one connection can make them slow each other down
“Real-time” does not mean “no waiting in the transport layer”

If an article calls WebSocket a low-latency real-time protocol without mentioning TCP reality, the boundary is already overstated.

How It Has Evolved

WebSocket did not evolve by throwing away the original model. It kept reinforcing the surrounding deployment and usage reality.

Secure access largely moved to wss
Proxies, reverse proxies, and cloud load balancers gradually improved support for upgrade and long-lived connections
Application-side conventions around reconnection, heartbeats, session recovery, and message idempotency became common practice

So when you read WebSocket today, the important question is not “how much did the frame format change?” The model usually did not change much:

It still starts with HTTP upgrade
It still becomes a long-lived full-duplex message connection afterward
It still has to live with proxies, middleboxes, and TCP reality
Most engineering complexity grows in connection management and application semantics, not in the base frame format

How to Use This Understanding in Engineering

If you are implementing a minimal viable version, what should you get right first

Get these right first:

The upgrade headers in the handshake and the 101 response
Basic handling of text frames, binary frames, close frames, and ping/pong
Connection lifetime management
Heartbeat and idle-timeout policy
The minimum constraints for authentication, reconnection, and message idempotency

If those do not stand, the connection count or network jitter will expose problems very quickly.

What to inspect first in packet capture

The higher-value order is usually:

Check whether the HTTP upgrade succeeded
Check who closed the connection first and whether there was a close frame
Check ping/pong and idle duration to see whether an intermediary reclaimed the link
Only then look at business message frames and application semantics

Many “WebSocket instability” incidents are not frame-format bugs. They are:

Upgrade headers not forwarded correctly
Proxy timeouts
Heartbeats too weak
Reconnection state recovery not designed well

The most common troubleshooting misreads

Treating WebSocket as a long HTTP request
Looking only at the browser while ignoring reverse proxies and load balancers
Thinking that once it connects, it will stay stable forever
Thinking that using WebSocket automatically gives you message-system features

Which default assumptions are dangerous in system design

Assuming every real-time requirement fits WebSocket
Assuming one long connection can carry unlimited concerns forever
Assuming “real-time on top of TCP” will not still feel retransmission and congestion
Assuming reconnect alone naturally restores the previous state