HTTP

Reading time: 10 minute Word count: 1937

Network HTTP TCP Web

HTTP is so common that it is easy to reduce it to “the client sends a request and the server returns a response.” But as soon as you work on caching, proxies, authentication, cross-origin access, long connections, packet capture, or performance debugging, the hard part turns out not to be the start line and headers. It is the fact that HTTP must serve browsers, origins, intermediaries, caches, and later Web applications all at once.

This article treats HTTP as an evolving protocol family. The three main questions are: why it first converged on a shared resource semantic, why intermediaries were always part of the design, and why version evolution keeps trading transmission cost against compatibility. HTML, browser APIs, cookie details, CORS, and HTTP/3 framing are mentioned, but not expanded into separate topics.

HTTP is not mainly about "serving pages". It decouples clients, resources, and intermediaries through a shared request semantic so the same interaction model can keep evolving on the open Internet

Why It Appeared

HTTP was not first created for today’s backend API world. It was created so distributed document systems could reliably reference and fetch content across heterogeneous machines.

Early networks already had file transfer, remote login, and mail protocols, but none of them fit hypertext very well:

Documents needed links to each other, not just file movement
Clients needed a uniform way to fetch many resource types
Servers had to grow capability gradually instead of requiring everyone to upgrade at once

Once browsers and servers started exchanging not just static files but interactions based on resource requests and semantic handling, a shared interface like HTTP became necessary.

From the beginning, HTTP optimized not just for transfer, but also for:

Being simple enough to deploy quickly on the open Internet
Being general enough for documents, images, scripts, and later APIs
Being loosely coupled enough for proxies, caches, and gateways to sit in the middle
Being extensible enough for new headers, methods, and versions to grow over time

Who Built It and Under What Conditions

HTTP began in the CERN Web environment and was later standardized mainly through the IETF. It was never a vendor-specific RPC protocol. It was built for interoperability on the open Internet.

The default conditions were always harsh:

Clients and servers came from different operating systems and implementations
Links were slow, expensive, unstable, and full of middleboxes
Once deployed at scale, the protocol could not be replaced casually

That is why HTTP often looks less pure than a textbook design, but it is deeply compatibility-driven:

Semantics and transport can evolve separately
New capabilities should not break old intermediaries
Methods, status codes, and caching semantics should stay stable across versions whenever possible

HTTP did not get replaced by a wholly new semantic system. Instead, it kept adding intermediary support, cache control, connection reuse, and version upgrade mechanisms to the same model. That evolution path is typical for IETF Internet protocols.

The Main Model

The most important mental model is not the shape of a packet. It is this:

The client states what it wants to do to which resource
The server returns the result of that action
Intermediaries may forward, cache, compress, authenticate, or route without changing the business semantic

The key is not “client and server talk directly.” HTTP has always allowed other parties on the path:

User Agent: browser, SDK, crawler, CLI client
Origin Server: the final authority for the resource semantic
Proxy / Gateway / CDN: forward on behalf of client or server, cache nearby, or provide unified access
Cache: reuse previous responses according to protocol rules

HTTP really manages two things:

Request and response semantics
How those semantics move through a path that may contain many intermediaries

A Typical Main Path

When a browser visits https://example.com/articles/http, the logical HTTP path can be simplified as:

The client builds a request for a resource, such as GET /articles/http
The request travels through the browser, local proxy, CDN, or reverse proxy
A cache may return a fresh response immediately
Otherwise the request reaches the origin, which decides status code, response headers, and body
The response travels back along the same path, while intermediaries may log, compress, cache, or add forwarding metadata
The client then decides what to do next based on status code, cache policy, content type, and security context

Simplified:

Client
  -> CDN / Reverse Proxy: GET /articles/http

CDN / Reverse Proxy
  -> Origin Server: GET /articles/http

Origin Server
  -> CDN / Reverse Proxy: 200 OK + Cache-Control + Content-Type + Body

CDN / Reverse Proxy
  -> Client: 200 OK + ...

HTTP decides:

Which resource this is
What action the client wants
Whether the response can be cached
Whether this representation is still fresh
Whether an intermediary can safely do part of the work on behalf of the origin

Without that semantics, intermediaries would be hard to trust, and the Web would be much harder to scale.

Why It Was Designed This Way

Why Resource + Method, Not “Every Service Invents Its Own Dialect”

HTTP chose a shared set of methods and resource identifiers instead of letting every service invent its own interaction format. The main benefit is shared semantics.

The meaning of GET is not just “read data on one site.” Across the Web, it means “retrieve a representation.” PUT, DELETE, HEAD, conditional requests, and range requests all follow the same idea. Once semantics are shared, caches, proxies, browsers, and debugging tools know how to participate.

The downside is also real:

Not every application action fits neatly into a small set of methods
Application teams often turn HTTP into a tunnel and end up using only POST
Many systems run on HTTP but do not really use HTTP semantics well

The hard part is often not that the specification is too weak. It is that real application logic may not want to model itself in that shared semantic.

Why It Is Naturally Stateless

HTTP’s request-response model treats each request as independently understandable. That is not because sessions do not matter, but because stateless requests are much easier for open networks, intermediaries, and servers to handle.

Benefits:

Easier horizontal scaling
Easier forwarding and caching
A single connection drop does not necessarily destroy the whole business state

The costs remain:

Login state, carts, and preferences still exist
State moves into cookies, tokens, sessions, and application storage
Debugging must distinguish “HTTP itself is stateless” from “the business is absolutely not”

Cookies are often treated as part of HTTP, but they are really a session mechanism layered on top of HTTP’s stateless model.

Why Caching Is Core, Not an Add-On

Many protocols define correctness first and treat caching as an optimization. HTTP does not. Caching has been central for a long time because repeated reads on an open network are too expensive to always send back to the origin.

Cache-Control, ETag, Last-Modified, If-None-Match, and 304 Not Modified all answer the same question: how can we reuse a previous response without requiring everyone to stay strongly consistent?

In engineering terms:

200 OK is not always more “successful” than 304 Not Modified; it is just a different cost model
Many performance problems are not about the origin being slow, but about cache policy not holding up at all
Many consistency problems are not “the data is wrong,” but “different nodes are in different freshness windows”

HTTP here is similar to DNS: it accepts limited inconsistency in exchange for system-wide scalability.

Why Intermediaries Never Went Away

HTTP was never designed only for direct browser-to-origin traffic. Proxies, gateways, load balancers, reverse proxies, and CDNs exist not because deployment happened to need them, but because HTTP semantics naturally allow them to take part.

Intermediaries can:

Cache static resources
Terminate TLS and forward internally
Centralize authentication, rate limiting, logging, and compression
Route by Host, path, headers, or method

Metadata like Host, Via, X-Forwarded-For, and Forwarded matters a lot in real systems. Many “the same URL behaves differently” problems are not in application code at all. They are caused by rewriting, caching, or routing somewhere in the middle.

Design Choices That Look Awkward but Matter

Why Does an Application Protocol Carry So Much History

Because HTTP is deployed everywhere. Browsers, servers, proxies, CDNs, enterprise gateways, and mobile networks all depend on it. A version upgrade that requires every link in the chain to change at the same time is nearly impossible in practice.

HTTP has therefore evolved in a very specific way:

Keep the original semantics as much as possible
Allow transport-layer changes to be more aggressive in new versions
Preserve compatibility through negotiation and fallback

HTTP/2 rewrote message framing and multiplexing, but kept methods, status codes, and header semantics largely intact. HTTP/3 moved transport to QUIC, but still tries to preserve the same semantic boundary. These are not totally new protocols replacing old ones. They are the same protocol family changing its transport strategy.

Why `Host` Was Needed Even Though URLs Already Exist

When one server meant one site, this was not a big deal. But once virtual hosting became common, one IP had to serve multiple domains. Without an explicit signal saying “which host name do I want,” the server could not route correctly on the same listening address.

Host looks like a patch, but it captures the shift from “small document site” to “shared Internet infrastructure.” Reverse proxies, SNI, CDNs, and multi-tenant deployments still depend on the same patch-like evolution.

Why HTTP/1.1’s Long Connections Were Not Enough

Because the problem was no longer just “connections are expensive.” A page could now fetch many resources in parallel, and HTTP/1.1’s serial request patterns and application-layer concurrency tricks started to hurt.

HTTP/2 mainly solved:

Too much header repetition
No real multiplexing on one connection
Browsers needing many TCP connections for parallel resource loading

But HTTP/2 still runs on TCP. If one packet is lost, every stream on the same connection is affected. HTTP/3 goes further by moving multiplexing semantics down to QUIC streams and reducing transport-layer head-of-line blocking.

The main line never changed: HTTP version evolution is mostly about reducing transport cost, not replacing resource semantics.

How to Read HTTP in Real Engineering Work

When implementing, capturing, or debugging, HTTP is most often misunderstood in three ways.

First, do not think of HTTP as just the application talking directly to the origin. The first question is always: which intermediaries are in the path, which one terminates the connection, which one caches, and which one rewrites headers?

Second, do not read a status code alone. 301, 304, 401, 403, 404, 429, 502, and 503 all need to be interpreted together with the method, cache policy, authentication context, and reverse-proxy logs.

Third, do not treat a protocol upgrade as “newer must be faster.” Whether HTTP/2 or HTTP/3 is better depends on path quality, intermediary support, multiplexing gains, header size, loss behavior, and deployment maturity.

In one sentence, HTTP should be understood as:

A resource interaction semantic
An open protocol designed for large intermediary participation
A protocol family that keeps evolving to fit real deployments

HTTP has not been replaced by some “more modern” protocol. What the Web depends on is not only its ability to carry data, but also the fact that it has already turned semantics, caching, intermediaries, and evolution into infrastructure.

HTTP

Why It Appeared

Who Built It and Under What Conditions

The Main Model

A Typical Main Path

Why It Was Designed This Way

Why Resource + Method, Not “Every Service Invents Its Own Dialect”

Why It Is Naturally Stateless

Why Caching Is Core, Not an Add-On

Why Intermediaries Never Went Away

Design Choices That Look Awkward but Matter

Why Does an Application Protocol Carry So Much History

Why `Host` Was Needed Even Though URLs Already Exist

Why HTTP/1.1’s Long Connections Were Not Enough

How to Read HTTP in Real Engineering Work

Further Reading

References

Why It Appeared

Who Built It and Under What Conditions

The Main Model

A Typical Main Path

Why It Was Designed This Way

Why Resource + Method, Not “Every Service Invents Its Own Dialect”

Why It Is Naturally Stateless

Why Caching Is Core, Not an Add-On

Why Intermediaries Never Went Away

Design Choices That Look Awkward but Matter

Why Does an Application Protocol Carry So Much History

Why Host Was Needed Even Though URLs Already Exist

Why HTTP/1.1’s Long Connections Were Not Enough

How to Read HTTP in Real Engineering Work

Further Reading

References

Why `Host` Was Needed Even Though URLs Already Exist