CoAP

Reading time: 15 minute Word count: 3030

Network CoAP UDP IoT

CoAP is often described as “a lightweight HTTP over UDP.” That is not wrong, but it only helps you remember the name. It is not enough to judge an implementation, a packet capture, or a debugging session. The real difficulty is that CoAP is not facing browsers and cloud services. It is facing device networks with battery-powered nodes, small memory, lossy links, and constrained packet sizes. In that environment, the HTTP assumptions of reliable connections, long messages, and relatively rich endpoints often do not hold.

The hard part is not the method names or the option list either. It is how CoAP compresses REST semantics into a constrained network: it tries to preserve URI, methods, status codes, and caching, all of which have already been validated in the Web world, but it cannot carry over TCP, large messages, or high state overhead at the same time. So the result looks like this: the resource model feels like HTTP, but the transport reality is closer to “make interactions as solid as possible on an unreliable datagram network.”

The core of CoAP is not to reinvent HTTP for device networks. It is to use the smallest possible message mechanism over UDP to carry resource-access semantics on constrained endpoints, while splitting reliability, deduplication, asynchronous notifications, and large-message handling into several mechanisms that cooperate but do different jobs

Where the Problem Comes From

IoT devices also need capabilities such as “read a resource,” “change a parameter,” and “subscribe to a state change.” If every device were powerful enough, running HTTP/TLS directly would seem like the easiest path, because:

The method semantics are already mature
The URI model is clear
Proxies, caches, and status codes are widely understood
Cloud-side systems can reuse existing Web infrastructure more easily

But once you move onto low-power devices and low-power networks, that path quickly runs into trouble.

Many devices have very limited memory and code space, so a full HTTP stack is not cheap
The link MTU is small, so long headers and long text formats are too expensive
Network quality is unstable, so connection maintenance and reconnects are costly
Devices often sleep, so they cannot stay online like a server all the time
Multi-hop low-power networks care about every byte and every round trip

So what CoAP needs to solve is not “how do we make another request-response protocol,” but “how do we preserve the most valuable part of the Web interaction model under these constraints?”

Who Built It and Under What Background

CoAP came out of the IETF work on constrained RESTful environments. The background is explicit: it targets constrained nodes and constrained networks, especially environments with low power, low bandwidth, high packet loss, and very small messages. If you look at it together with technologies such as 6LoWPAN and RPL from the same era, the positioning becomes clear.

The default world assumed by the designers looks roughly like this:

Endpoints are not general-purpose computers, but resource-constrained sensors, actuators, and edge nodes
The network is not stable Ethernet, but may be slow, lossy, and possibly multi-hop
Interactions cannot assume the connection is always there, and too much state cannot be pushed onto the device
But the system still wants to reuse Web ideas such as resource naming, method semantics, and caching

This is also where CoAP and MQTT differ sharply. MQTT explicitly places a central broker in the middle of the system. CoAP is more like putting “resource access” directly back between endpoints, while reworking the transport reality so it fits constrained networks.

The Main Model

Do not start with option encoding yet. The most important parts of CoAP to keep in your head first are these layers of responsibility:

Resource semantics: GET, POST, PUT, DELETE, URI, response codes, and caching
Message-layer semantics: acknowledgments, retransmission, deduplication, and message types
Request-matching semantics: which request goes with which response
Extension mechanisms: observing resource changes, block-wise transfer, and proxies and caches

The easiest mistake is to mix these responsibilities into one layer. CoAP very deliberately does not make a single field do all the jobs.

Token matches a response back to a request
Message ID handles message deduplication and ACK pairing
Confirmable / Non-confirmable decides whether the message needs acknowledgment
Method codes and response codes carry resource semantics, not transport reliability

If you do not separate those objects, packet capture analysis quickly becomes misleading.

Follow One Typical Interaction

The most common CoAP flow can be compressed like this:

The client builds a CON-type GET
The message carries the URI-related options and a Token for matching the response
The server replies with an ACK, or puts the response directly into the ACK
If the response takes longer, the server first sends a separate ACK and then later sends the actual response asynchronously
The client uses the Token to map the response back to the correct request

Logically, there are two common paths.

The first is a piggybacked response, where the result is fast enough to be carried inside the ACK:

Client -> Server: CON GET /temp, MID=0x1001, Token=0x52
Server -> Client: ACK 2.05 Content, MID=0x1001, Token=0x52, Payload=23.4

The second is a separate response, where the server first confirms receipt and later sends the result:

Client -> Server: CON GET /fw/version, MID=0x1002, Token=0x77
Server -> Client: ACK, MID=0x1002
Server -> Client: CON 2.05 Content, Token=0x77, Payload="1.3.8"
Client -> Server: ACK

This main flow already exposes the most important tradeoffs in CoAP:

It does not build business transfer on top of a prior connection step. It handles reliability directly at the message level
It allows “I have received your request” and “the business result is ready” to be separated
It splits “request matching” and “message acknowledgment” into two different mechanisms instead of merging them into one sequence number

That all looks much more like a design for unstable, low-state networks than for rich endpoints on a stable long connection.

What It Actually Solves

CoAP solves three core problems:

It preserves a resource-oriented interaction model on constrained devices
It adds the minimum recoverable semantics on top of a connectionless datagram transport such as UDP
It lets asynchronous events, caching, and proxies fit into the same model

It also does not solve these things, and that matters just as much:

It does not guarantee that the underlying link is reliable
It does not automatically provide end-to-end security
It does not define the business object model or device semantics for you
It is not optimized for large file transfer or streaming interaction

So CoAP is a good fit for “resource access + constrained network + small and frequent interactions,” not for “all IoT communication should default to it.”

Why It Was Designed This Way

Why keep REST semantics instead of inventing a device-specific command protocol

If CoAP only cared about saving bytes, it could have been a pure binary command protocol: resource IDs, opcodes, and result codes, all defined by device-specific conventions. But that would quickly create two problems:

Systems would not interoperate well, and gateways and proxies would not have a common understanding
Cloud-side, management-side, and device-side systems would end up with many duplicated but incompatible resource models

Keeping Web resource, method, status-code, and caching semantics is valuable because:

“Read a resource” and “change a resource” mean the same thing across systems
Proxies, caches, and discovery can be combined more easily
When a gateway maps CoAP to HTTP or the other way around, the semantic mapping is more natural

The cost is also real:

More general semantics mean extra option and status-code machinery in the encoding
Beginners can easily assume that because it looks like HTTP, it should also be implemented like HTTP

CoAP is not about abandoning the Web. It keeps the most valuable abstractions and removes the transport assumptions that do not fit constrained networks.

Why sit on UDP instead of directly reusing TCP

The era and target environment of CoAP make it care most about small messages, low state, and application-controlled message exchange. For those scenarios, putting TCP underneath is often not worth it:

Connection establishment and maintenance add extra state cost
If one packet is lost, connection semantics make recovery heavier
In multi-hop low-power networks, both byte overhead and round-trip overhead are highly sensitive
Many interactions are short enough that it is not worth carrying a full connection model for each resource access

Of course, choosing UDP also means several things do not come for free:

Loss and duplication must be handled explicitly
Ordering cannot be assumed
Path MTU and message size become constant practical constraints

So CoAP does not choose UDP because “UDP is faster.” It chooses UDP because it wants to shrink reliable interaction down to the minimum message-layer mechanism and leave the remaining choices to upper-layer logic.

Why `Token` and `Message ID` must be separate

This is one of the most important CoAP design choices to understand clearly.

If one sequence number had to do all of these jobs at once: acknowledgment pairing, duplicate detection, and request-response matching, it would look smaller on paper, but it would immediately collide with real interactions:

A request may first receive an empty ACK, and the real response may arrive later
In proxy and retry paths, message-level behavior and request-level behavior are not always one-to-one
Asynchronous notifications and later responses need to continue the request context, but should not inherit the original message deduplication semantics

So CoAP separates the responsibilities:

Message ID is for the message exchange itself and mainly serves acknowledgment and deduplication
Token is for the request context and mainly serves to return the response to the correct request

This is not as neat as “one field does everything,” but it is much more stable in practice. The same is true when you capture packets: if an ACK does not line up with a response, that does not necessarily mean there is an error, because ACKs are matched at the message level, while the business response still has to be matched by Token.

Why reliability is only made Confirmable instead of making every message confirmed

In CoAP’s operating environment, many data items are simply not worth paying acknowledgment and retransmission cost for if they are not strictly required. A periodic temperature reading may not matter if one sample is lost, but a “unlock” or “close the valve” command can become an incident if it is lost.

So CoAP does not make every message strongly reliable. It splits them into:

CON: confirmation required, retransmit on timeout
NON: no confirmation required, lower interaction cost

The value of this design is that reliability can be layered by message type and business meaning instead of being one-size-fits-all across the whole protocol.

The tradeoff is:

The application must genuinely decide which messages need reliability and which do not
In packet captures, you cannot just ask “did a response appear?” without first checking whether the message was CON

Why separate responses are allowed

If CoAP forced every request to return a result immediately, like the simplest request-response protocol, devices would have a hard time in many situations:

Reading a sensor may need to wait for a sample
Some actions only produce a result later
A node may want to confirm request receipt first and then schedule the work afterward

The value of a separate response is that “I have received the request” and “the business result is ready” can be split apart. That means the server does not need to stay stuck in one synchronous processing step, which is also a better fit for task scheduling on constrained devices.

The cost is:

The client cannot treat the ACK as final business completion
The implementation must clearly distinguish transport acknowledgment from resource response completion

Design Choices That Do Not Look Obvious at First

ACK is not a synonym for “success”

One of the easiest mistakes for CoAP beginners is to read ACK as “the server has successfully executed the request.” That is only approximately true on the piggybacked-response fast path.

More precisely:

ACK only means that a CON message has been received by the peer at the message layer
Whether the business action succeeded depends on the later response code and payload

If you merge those two layers, debugging becomes full of false conclusions: “the server already ACKed, so why does the client still say failure?” The answer is often that the message layer succeeded while the resource layer failed, or the separate response has not arrived yet.

Observe is not just polling reduction. It fits resource changes into the existing request model

Observe is often described in a shallow way as “subscription.” What is clever about it is not just that it reduces polling. It is that it does not invent a completely separate message system. The client still starts by sending a request for a resource, but it adds a declaration: I do not only want the current value, I also want to keep receiving changes.

The advantages are:

It still uses resource URI, methods, and response semantics
Proxies, caches, and resource models do not need to be split by a second event protocol
The client can enter asynchronous notifications with the familiar “request a resource” mental model

The cost is that Observe implementation and debugging depend more on context. A stream of notifications is not a set of independent new requests. It is the continuation of one observation relationship.

Block-wise shows that CoAP was never mainly about moving large content

CoAP can carry larger content, but it does not pretend that the small-message limit does not exist. Block-wise transfer is basically an admission of reality: message size is limited, and large payloads cannot simply be sent all at once.

The point of this design is not “CoAP can also transfer big files.” It is:

When you occasionally must transfer a larger resource, the protocol still has a workable path
But that path itself reminds you that CoAP’s main job is still small messages and small resources

If a system relies on large payloads, frequent block transfers, and complicated reassembly all the time, it is usually worth asking whether the main protocol choice was wrong.

Proxies and caches are not side features. They are an important part of preserving Web semantics in CoAP

CoAP looks device-side, but from the beginning it was not limited to a “private device bus protocol.” Resource, URI, method, and response-code semantics are valuable in large part because they can travel through proxies, caches, and cross-protocol gateways.

That means CoAP’s design goal is not only point-to-point communication. It also includes:

Semantic mapping between constrained networks and larger systems
Intermediate nodes taking on caching and forwarding to reduce load on low-power networks

That is also why treating CoAP only as “a lower-bandwidth device RPC” is too narrow.

How It Evolved

CoAP’s evolution did not overthrow the original resource model and message model. It kept that foundation and added the repeated real-world needs that showed up over time.

Observe reduces polling pressure by tracking resource changes
Block-wise allows larger resources to be exchanged in segments
Security for more sensitive scenarios can be added later in a way that fits constrained environments
Proxies, discovery, and cross-protocol integration keep getting stronger, which improves how CoAP connects with the HTTP/Web world

So when you look at CoAP today, the question is not “how many more options did it grow?” It is whether those extensions changed its core judgment. Most of the time, they did not:

The resource model is still there
The minimum message reliability on UDP is still there
The division of labor between Token and Message ID is still there
The extensions mainly improve asynchronous observation, large-message handling, and real deployment capability

How to Use This Understanding in Practice

If you are implementing a minimal usable version, what should you get right first?

Get these things right first, then worry about the more complicated extensions:

Basic parsing of methods, URI, response codes, and options
The four message semantics of CON / NON / ACK / RST
Token matching for request and response
Message ID deduplication and ACK pairing
Timeout retransmission and duplicate-message handling for CON

If these basic behaviors are not stable, Observe, block-wise transfer, and proxy scenarios will all become shaky as well.

When you capture packets, what should you check first?

The most informative order is usually:

First check the message type: is it CON, NON, ACK, or RST?
Then check the Token: which request does the response actually belong to?
Then check the Message ID: did retransmission, deduplication, or ACK pairing happen?
Only then look at options and payload: is the resource semantic correct?

Many CoAP problems initially look like “the server did not reply.” Once you break them apart, they often become completely different kinds of issues:

The message never made it to the message layer
The ACK arrived, but the business response did not
The response arrived, but the Token did not match
The packet was too large, and block-wise transfer or path MTU became the problem

What are the most common misjudgments during debugging?

Treating ACK as final success
Treating Message ID as the request identifier instead of the message identifier
Looking only at the payload and ignoring message type and retransmission
Ignoring block-wise transfer and link constraints in large-payload scenarios
Misreading timing problems caused by device sleep, NAT timeout, or link jitter as application-logic errors

What default assumptions are the most dangerous in system design?

Assuming all messages should be reliable
Assuming every device should be exposed as a CoAP resource
Assuming large-content transfer can stay equally simple
Assuming that because CoAP “looks like HTTP,” deployment conditions are also like HTTP

What CoAP is really good at is small-resource interaction, state reading, parameter configuration, event observation, and gateway bridging in constrained environments. Once you go beyond that boundary, it is not that CoAP cannot do the job. It is that the cost rises quickly.

Further Deep Dive

Start with RFC 7252 for the main specification, and separate the message layer from the request/response layer in your head
If you want resource-change notifications, look at the Observe extension
If you need larger payloads, look at block-wise transfer
If you need to integrate with HTTP or gateways, focus on proxies, caches, and semantic mapping, not just the packet fields