CAN

Reading time: 8 minute Word count: 1527

Bus CAN Protocol

CAN

People often remember CAN through a few surface facts first: two wires, differential signaling, 1 Mbps, common in cars, and no traditional source or destination address in the frame. Those facts are true, but they are not enough for implementation, packet capture, or debugging. What really defines CAN is not what it looks like, but how it lets many nodes watch the same wire and still keep working when they try to send at the same time.

The first judgment worth keeping in mind is this: CAN is fundamentally a predictable contention mechanism on a shared bus. Nodes do not need a central scheduler, and they do not need to take turns in advance. Any node may try to send, but when multiple nodes start at the same time, arbitration is resolved at the bit level immediately, without destroying the whole transfer the way an early Ethernet collision would.

Why This Design Is Needed

If multiple controllers sit on the same bus, the most obvious designs are usually one of two kinds.

The first is master polling. It is simple, but it scales poorly. If the master fails, the whole scheduling ability of the link is gone. The second is “everyone sends whenever they want and retransmits after collisions.” That can work when the load is low, but control systems expose the weaknesses quickly: high-priority messages cannot reliably get a slot, latency bounds are hard to estimate, and the more collisions there are, the worse the real-time behavior becomes.

CAN chooses a third path: it allows multi-master transmission, but it does not wait until the end of the frame to resolve contention. Instead, it compares bits while the frame is being sent. The goal is not maximum throughput. The priority is determinism, real-time behavior, and controlled fault handling.

The Smallest Useful Mental Model

It is more accurate to think of CAN as a “broadcast arbitration system that prioritizes frames by identifier” than as a “better serial port.”

Several facts need to be true at the same time:

Every node on the bus sees the same bit stream
Multiple nodes can start sending at the same time
The frame identifier is part of both the message meaning and the arbitration priority
Nodes that lose arbitration must stop quickly and try again when the bus becomes idle
When an error is detected, a node must actively corrupt the current frame so the whole network knows that the transmission is invalid

Many of the seemingly separate CAN mechanisms are just different ways of supporting those facts.

Why Collisions Do Not Become a Disaster

The classic CAN mechanism is the dominant and recessive bit pair during arbitration. A simple way to think about it is: if anyone on the bus drives a dominant bit, the bus ends up dominant; only when everyone drives recessive does the bus remain recessive.

That leads to a critical property: while sending each bit, a node can also read the bus state back and check whether the line matches what it just sent.

If two nodes start transmitting at the same time, the first several bits may be identical, because both are sending the frame start and part of the identifier. The first real difference decides the winner:

One node sends a recessive bit
The other sends a dominant bit
The bus becomes dominant
The node that sent recessive immediately knows it lost arbitration and stops sending
The winning node continues and finishes the frame

That is the core value of CAN. It does not “recover after a collision.” It “orders the traffic while the collision is happening.” The leading bits already on the wire are not wasted, and the frame that wins arbitration does not need to restart from scratch.

This is also why the identifier cannot be understood only as a message number. It is half semantics and half scheduling policy. A lower identifier means higher priority. If a system mixes low-latency control messages and low-value status reports in the same priority space, later bus saturation, jitter, or deadline misses are usually not caused by a bad driver. They are caused by ID planning that never took scheduling responsibility seriously.

What the Frame Is Actually Guaranteeing

A CAN frame is not complicated, but every important field serves the goal of “reliable broadcast on a shared bus.”

The identifier tells the whole network what this message is and also determines arbitration priority. The control field says how long the payload is. The data field carries the payload itself. The CRC checks whether bit transmission was corrupted. The ACK slot is not an application-level response. Any node that received the frame correctly can pull the bus dominant there to say, “at least one node received a valid frame.”

Two mistakes happen here all the time.

First, ACK is not the same as application success. It only means that at least one node received a physically and link-layer valid frame. It does not mean the business logic has processed it, and it does not mean the target device is definitely online. Second, CAN does not have a traditional destination address, but that does not mean it has no receiver. In real systems, who consumes a frame is usually determined by local node filters, database definitions, and higher-layer protocol rules.

So when reading captures, it is often more useful to ask “who is supposed to consume this ID” than to start with every byte in the payload.

Why Error Handling Is So Aggressive

CAN is strict about errors. If a node detects a bit error, stuff error, CRC error, form error, or ACK error, it immediately sends an error frame and interrupts the current communication. That sounds noisy, but it solves a more serious problem: if a bad frame is allowed to pass quietly, different nodes may develop different views of the bus state, and the system becomes more dangerous than a single retransmission.

Error reporting alone is not enough. CAN also needs to answer the question “is one node itself broken?” So it adds error counters and error states. A node that keeps sending bad frames or keeps detecting errors will see its counters rise. It may move from error active to error passive, and eventually to bus off. Once bus off happens, the node is effectively forced off the bus so that one faulty participant does not drag down the whole network.

The tradeoff is direct: CAN reliability does not mean “keep going even if errors appear.” It means “make sure the whole network agrees on what happened first, then recover.” So if you see a termination resistor problem, baud-rate mismatch, grounding issue, or too much noise in the field, the usual result is not just a few bad frames. It is a burst of error frames, retransmissions, and node dropouts.

What to Check First in the Field

Using CAN well is less about memorizing the full frame format and more about keeping a few high-value checks in mind.

First, check whether the ID plan is actually carrying priority design. If cyclic control frames, event alarms, and diagnostic traffic are all thrown into the same priority zone, the first thing to break under load is usually not “can it send?” but “can the critical frames still get out on time?”

Second, if a frame does not get out, do not blame the transmit function first. Distinguish between losing arbitration repeatedly, getting no ACK after transmission, and being interrupted by error frames after transmission. Those three symptoms point to three very different classes of problems: scheduling contention, missing receivers, or physical-layer/timing faults.

Third, if the node goes bus off, do not treat it as a simple software exception. It is more like an outcome signal: the node has been observing serious communication errors for some time. The real things to check are the transceiver, power, grounding, harness, termination, sample point, and baud-rate configuration.

Fourth, if the higher layer needs proof that a specific device actually processed a command, do not assign that job to CAN ACK. The link layer only guarantees frame-level detectable delivery. It does not provide application confirmation, session state, or end-to-end reliable delivery.

Three Final Judgments

To understand CAN, the most important thing is not that it is common in cars, but these three points:

First, CAN’s core mechanism is non-destructive bit-level arbitration, and identifier design is also scheduling design. Second, CAN provides the foundation for consistent transmission on a shared broadcast bus, not point-to-point session semantics. Third, the goal of error handling is first to prevent the whole network from diverging, and only then to continue communication.

If you then read drivers, captures, and debug logs with those three judgments in mind, many things become easier to explain immediately: why some frames never get out, why ACK can succeed while business logic still fails, and why one physical-layer fault can quickly spread into bus-wide instability.

CAN