Skip to main content HTTP | IoT Worker

HTTP

HTTP is so common that it is easy to reduce it to “the client sends a request and the server returns a response.” But as soon as you work on caching, proxies, authentication, cross-origin access, long connections, packet capture, or performance debugging, the hard part turns out not to be the start line and headers. It is the fact that HTTP must serve browsers, origins, intermediaries, caches, and later Web applications all at once.

This article treats HTTP as an evolving protocol family. The three main questions are: why it first converged on a shared resource semantic, why intermediaries were always part of the design, and why version evolution keeps trading transmission cost against compatibility. HTML, browser APIs, cookie details, CORS, and HTTP/3 framing are mentioned, but not expanded into separate topics.

HTTP is not mainly about "serving pages". It decouples clients, resources, and intermediaries through a shared request semantic so the same interaction model can keep evolving on the open Internet

Why It Appeared

HTTP was not first created for today’s backend API world. It was created so distributed document systems could reliably reference and fetch content across heterogeneous machines.

Early networks already had file transfer, remote login, and mail protocols, but none of them fit hypertext very well:

  • Documents needed links to each other, not just file movement
  • Clients needed a uniform way to fetch many resource types
  • Servers had to grow capability gradually instead of requiring everyone to upgrade at once

Once browsers and servers started exchanging not just static files but interactions based on resource requests and semantic handling, a shared interface like HTTP became necessary.

From the beginning, HTTP optimized not just for transfer, but also for:

  • Being simple enough to deploy quickly on the open Internet
  • Being general enough for documents, images, scripts, and later APIs
  • Being loosely coupled enough for proxies, caches, and gateways to sit in the middle
  • Being extensible enough for new headers, methods, and versions to grow over time

Who Built It and Under What Conditions

HTTP began in the CERN Web environment and was later standardized mainly through the IETF. It was never a vendor-specific RPC protocol. It was built for interoperability on the open Internet.

The default conditions were always harsh:

  • Clients and servers came from different operating systems and implementations
  • Links were slow, expensive, unstable, and full of middleboxes
  • Once deployed at scale, the protocol could not be replaced casually

That is why HTTP often looks less pure than a textbook design, but it is deeply compatibility-driven:

  • Semantics and transport can evolve separately
  • New capabilities should not break old intermediaries
  • Methods, status codes, and caching semantics should stay stable across versions whenever possible

HTTP did not get replaced by a wholly new semantic system. Instead, it kept adding intermediary support, cache control, connection reuse, and version upgrade mechanisms to the same model. That evolution path is typical for IETF Internet protocols.

The Main Model

The most important mental model is not the shape of a packet. It is this:

The client states what it wants to do to which resource
The server returns the result of that action
Intermediaries may forward, cache, compress, authenticate, or route without changing the business semantic

The key is not “client and server talk directly.” HTTP has always allowed other parties on the path:

  • User Agent: browser, SDK, crawler, CLI client
  • Origin Server: the final authority for the resource semantic
  • Proxy / Gateway / CDN: forward on behalf of client or server, cache nearby, or provide unified access
  • Cache: reuse previous responses according to protocol rules

HTTP really manages two things:

  • Request and response semantics
  • How those semantics move through a path that may contain many intermediaries

A Typical Main Path

When a browser visits https://example.com/articles/http, the logical HTTP path can be simplified as:

  1. The client builds a request for a resource, such as GET /articles/http
  2. The request travels through the browser, local proxy, CDN, or reverse proxy
  3. A cache may return a fresh response immediately
  4. Otherwise the request reaches the origin, which decides status code, response headers, and body
  5. The response travels back along the same path, while intermediaries may log, compress, cache, or add forwarding metadata
  6. The client then decides what to do next based on status code, cache policy, content type, and security context

Simplified:

Client
  -> CDN / Reverse Proxy: GET /articles/http

CDN / Reverse Proxy
  -> Origin Server: GET /articles/http

Origin Server
  -> CDN / Reverse Proxy: 200 OK + Cache-Control + Content-Type + Body

CDN / Reverse Proxy
  -> Client: 200 OK + ...

HTTP decides:

  • Which resource this is
  • What action the client wants
  • Whether the response can be cached
  • Whether this representation is still fresh
  • Whether an intermediary can safely do part of the work on behalf of the origin

Without that semantics, intermediaries would be hard to trust, and the Web would be much harder to scale.

Why It Was Designed This Way

Why Resource + Method, Not “Every Service Invents Its Own Dialect”

HTTP chose a shared set of methods and resource identifiers instead of letting every service invent its own interaction format. The main benefit is shared semantics.

The meaning of GET is not just “read data on one site.” Across the Web, it means “retrieve a representation.” PUT, DELETE, HEAD, conditional requests, and range requests all follow the same idea. Once semantics are shared, caches, proxies, browsers, and debugging tools know how to participate.

The downside is also real:

  • Not every application action fits neatly into a small set of methods
  • Application teams often turn HTTP into a tunnel and end up using only POST
  • Many systems run on HTTP but do not really use HTTP semantics well

The hard part is often not that the specification is too weak. It is that real application logic may not want to model itself in that shared semantic.

Why It Is Naturally Stateless

HTTP’s request-response model treats each request as independently understandable. That is not because sessions do not matter, but because stateless requests are much easier for open networks, intermediaries, and servers to handle.

Benefits:

  • Easier horizontal scaling
  • Easier forwarding and caching
  • A single connection drop does not necessarily destroy the whole business state

The costs remain:

  • Login state, carts, and preferences still exist
  • State moves into cookies, tokens, sessions, and application storage
  • Debugging must distinguish “HTTP itself is stateless” from “the business is absolutely not”

Cookies are often treated as part of HTTP, but they are really a session mechanism layered on top of HTTP’s stateless model.

Why Caching Is Core, Not an Add-On

Many protocols define correctness first and treat caching as an optimization. HTTP does not. Caching has been central for a long time because repeated reads on an open network are too expensive to always send back to the origin.

Cache-Control, ETag, Last-Modified, If-None-Match, and 304 Not Modified all answer the same question: how can we reuse a previous response without requiring everyone to stay strongly consistent?

In engineering terms:

  • 200 OK is not always more “successful” than 304 Not Modified; it is just a different cost model
  • Many performance problems are not about the origin being slow, but about cache policy not holding up at all
  • Many consistency problems are not “the data is wrong,” but “different nodes are in different freshness windows”

HTTP here is similar to DNS: it accepts limited inconsistency in exchange for system-wide scalability.

Why Intermediaries Never Went Away

HTTP was never designed only for direct browser-to-origin traffic. Proxies, gateways, load balancers, reverse proxies, and CDNs exist not because deployment happened to need them, but because HTTP semantics naturally allow them to take part.

Intermediaries can:

  • Cache static resources
  • Terminate TLS and forward internally
  • Centralize authentication, rate limiting, logging, and compression
  • Route by Host, path, headers, or method

Metadata like Host, Via, X-Forwarded-For, and Forwarded matters a lot in real systems. Many “the same URL behaves differently” problems are not in application code at all. They are caused by rewriting, caching, or routing somewhere in the middle.

Design Choices That Look Awkward but Matter

Why Does an Application Protocol Carry So Much History

Because HTTP is deployed everywhere. Browsers, servers, proxies, CDNs, enterprise gateways, and mobile networks all depend on it. A version upgrade that requires every link in the chain to change at the same time is nearly impossible in practice.

HTTP has therefore evolved in a very specific way:

  • Keep the original semantics as much as possible
  • Allow transport-layer changes to be more aggressive in new versions
  • Preserve compatibility through negotiation and fallback

HTTP/2 rewrote message framing and multiplexing, but kept methods, status codes, and header semantics largely intact. HTTP/3 moved transport to QUIC, but still tries to preserve the same semantic boundary. These are not totally new protocols replacing old ones. They are the same protocol family changing its transport strategy.

Why Host Was Needed Even Though URLs Already Exist

When one server meant one site, this was not a big deal. But once virtual hosting became common, one IP had to serve multiple domains. Without an explicit signal saying “which host name do I want,” the server could not route correctly on the same listening address.

Host looks like a patch, but it captures the shift from “small document site” to “shared Internet infrastructure.” Reverse proxies, SNI, CDNs, and multi-tenant deployments still depend on the same patch-like evolution.

Why HTTP/1.1’s Long Connections Were Not Enough

Because the problem was no longer just “connections are expensive.” A page could now fetch many resources in parallel, and HTTP/1.1’s serial request patterns and application-layer concurrency tricks started to hurt.

HTTP/2 mainly solved:

  • Too much header repetition
  • No real multiplexing on one connection
  • Browsers needing many TCP connections for parallel resource loading

But HTTP/2 still runs on TCP. If one packet is lost, every stream on the same connection is affected. HTTP/3 goes further by moving multiplexing semantics down to QUIC streams and reducing transport-layer head-of-line blocking.

The main line never changed: HTTP version evolution is mostly about reducing transport cost, not replacing resource semantics.

How to Read HTTP in Real Engineering Work

When implementing, capturing, or debugging, HTTP is most often misunderstood in three ways.

First, do not think of HTTP as just the application talking directly to the origin. The first question is always: which intermediaries are in the path, which one terminates the connection, which one caches, and which one rewrites headers?

Second, do not read a status code alone. 301, 304, 401, 403, 404, 429, 502, and 503 all need to be interpreted together with the method, cache policy, authentication context, and reverse-proxy logs.

Third, do not treat a protocol upgrade as “newer must be faster.” Whether HTTP/2 or HTTP/3 is better depends on path quality, intermediary support, multiplexing gains, header size, loss behavior, and deployment maturity.

In one sentence, HTTP should be understood as:

  • A resource interaction semantic
  • An open protocol designed for large intermediary participation
  • A protocol family that keeps evolving to fit real deployments

HTTP has not been replaced by some “more modern” protocol. What the Web depends on is not only its ability to carry data, but also the fact that it has already turned semantics, caching, intermediaries, and evolution into infrastructure.

Further Reading

  • TCP: the transport layer that HTTP/1.1 and HTTP/2 have long depended on
  • HTTPS: how HTTP enters a TLS-secured channel
  • QUIC: why HTTP/3 moved onto a new transport layer

References