Wi-Fi Roaming: Why Switching APs Is Not the Same as Reconnecting

Reading time: 10 minute Word count: 1926

Wireless WiFi Roaming 802.11 11r 11k 11v

A device carries the same SSID (Service Set Identifier) as it moves around a building, the service occasionally stutters, and the logs show only one disconnected followed by one connected. That is where many problems get misread. It looks like “disconnect and reconnect,” but what you really need to understand is why the device left the current BSSID (Basic Service Set Identifier), when it decided to switch, and whether security or upper-layer networking had to be rebuilt after the switch.

The easiest way to misunderstand Wi-Fi roaming is to mix three things together:

Switching BSSID inside the same ESS (Extended Service Set)
Restarting the whole Wi-Fi access path from scratch
Actually interrupting the IP or business layer

These three are often related, but they are not synonyms. For implementation, packet analysis, and debugging, roaming is not “the device connected to Wi-Fi again.” It is the device moving to another access point within the same wireless network while trying to keep the service interruption as short as possible.

Wi-Fi roaming = switching BSSID under the same network meaning
             != restarting the entire network access flow
             != instant switching just because a stronger signal appears

This article focuses on the common infrastructure-mode roaming path for a STA (Station) inside the same ESS. The main line is trigger, discovery, selection, and switching. Mesh, vendor-specific controller optimization, cellular mobility management, and multi-link concurrency are not covered.

What Roaming Is Solving

You can stay connected without roaming, but once the device moves, the link quality changes:

Signal gets weaker and retransmissions increase
Medium contention rises and latency jitter grows
Another AP (Access Point) may be closer, but the terminal still stays on the old one

If the device waits until the current link is completely broken before reconnecting, the service will visibly interrupt. If the device switches whenever it sees a slightly stronger signal, it may bounce back and forth between APs. Roaming exists to balance those two extremes:

When is it no longer worth staying on the current AP?
Where is the next candidate BSSID?
How can authentication, association, and key setup be shortened during the switch?

So roaming is first a continuity problem, and only then a “which AP should I use?” problem.

Minimal Mental Model: Three Layers of State

Layer 1: The radio attachment target changes

The terminal was attached to BSSID-A and is now attached to BSSID-B. That is the most direct definition of roaming.

Layer 2: Part of the Layer-2 access path may run again

When switching BSSID, the device may still need to:

Scan or use cached results to confirm a candidate AP
Authenticate or fast-authenticate
Reassociate
Restore security context or reinstall keys

Whether this step is slow determines whether voice, video, and real-time control traffic stutter.

Layer 3: IP and application traffic do not always restart

If the device remains in the same Layer-2 broadcast domain, the common case is:

IP address stays the same
Default gateway stays the same
Business connections do not necessarily drop immediately

If roaming crosses a different Layer-3 boundary or VLAN, or if the vendor implementation requires a rejoin, the device may also trigger:

DHCP
ARP
DNS
Upper-layer TCP / TLS rebuild

If you mix these layers in debugging, you usually end up with a statement that has no diagnostic value: roaming failed. A better description is often:

The device switched to the new BSSID, but the four-way handshake did not finish
Reassociation succeeded, but DHCP timed out
The wireless switch completed, but the service interruption came from upper-layer reconnection

Why the Same SSID Does Not Automatically Mean Smooth Roaming

In the same-ESS roaming discussed here, multiple APs present the same SSID to the terminal, so the terminal can treat them as candidate attachment points for the same network. That is only a prerequisite for roaming, not the result.

Behind the same SSID, there may still be different:

BSSIDs
Channels and loads
Security capabilities or PMF (Protected Management Frames) settings
Vendor implementations and RF policies

From the terminal’s point of view, the real attachment is to a specific BSSID, not to the abstract SSID name. Smooth roaming depends on whether the following are all true:

The terminal can discover a better candidate BSSID in time
The current and target APs are compatible in security, capabilities, and policy
The switch does not repeat too much work

So “they all have the same Wi-Fi name” only means the user-facing network may be unified. It does not mean the terminal will switch, and it does not mean the switch will be fast.

What Actually Happens in the Roaming Path

The common roaming path looks like this:

Link quality degrades or policy triggers
-> Discover candidate APs
-> Select target BSSID
-> Authenticate / reassociate
-> Restore or rebuild security context
-> Continue data exchange on the new AP

The important part is not the names. It is what each step is solving.

Trigger: Why the device decides to leave the current AP

Roaming is rarely driven by just one RSSI threshold. Common inputs include:

RSSI keeps dropping
Retransmission rate increases
Packet loss and latency worsen
The current AP is overloaded
The AP explicitly suggests the terminal move away

The standard defines the action space, but when to trigger is usually a combination of terminal driver, firmware, and vendor policy.

That is why two devices may behave very differently in the same site:

One is sticky and stays on a bad signal too long
One is too aggressive and keeps bouncing around

The first is often called a sticky client; the second often produces ping-pong roaming.

Discovery: Where the next candidate comes from

The terminal cannot wait until it is fully disconnected and then blindly scan every channel. That would make the interruption too long. It needs to know the next step faster.

Common sources of candidates are:

Cached scan results
Fresh real-time scan results
Neighbor information provided by the current AP

The cost is usually not the switch itself. The cost is how long it takes to find the next hop. Many roaming cases that look like “authentication is slow” or “association is slow” are actually slow because discovery was slow.

Selection: Strongest signal is not always the best choice

The choice of target BSSID often depends on:

Signal strength and quality
Channel utilization and congestion
Security compatibility
Band preference, for example favoring 5 GHz
Terminal power budget and scan cost

Roaming is not purely an RF problem, and it is not purely a protocol problem. It is the result of air-interface reality and device policy together.

Switching: Reassociation

When the terminal moves to a new AP, it often sends Reassociation Request instead of treating the whole path as a first-time connection. That means:

It tells the network that it already belonged to this ESS
It tells the target AP which old AP it came from
It gives the network a chance to migrate context or update state

Reassociation does not mean the switch is automatically fast. Whether the security stage is accelerated afterward determines the interruption time.

What 802.11k, 802.11v, and 802.11r Each Do

These three abbreviations are often listed together as if roaming does not work unless all of them are enabled. In practice, they support different stages of the roaming flow.

802.11k: Help the terminal know who is nearby sooner

802.11k provides neighbor reports and radio measurements. It mainly answers:

Which candidate APs are nearby?
On which channels are they likely to be found?

This reduces the cost of blind full-channel scans. It improves “finding candidates,” not the final decision itself.

802.11v: Help the network suggest a move

802.11v is typically about BSS Transition Management. The current network can suggest:

Which candidate APs you should consider
Staying here is probably no longer the best option

It is usually advice, not a hard command. Whether the terminal accepts it depends on the implementation.

802.11r: Help the security switch finish faster

802.11r, also called FT (Fast BSS Transition), is not about choosing the AP. It is about:

Making key derivation and switch preparation more efficient during roaming
Reducing the time cost of repeating the full security flow

The division of labor is clear: 11k is more about finding candidates, 11v is more about migration advice, and 11r is more about speeding up the secure cutover. They should not be collapsed into “fast roaming protocol.”

Why Roaming Problems Often Look Like Signal Problems

If you only stare at signal strength, roaming problems are easy to misread.

Signal matters, of course, but these are also common:

11r settings do not match, so fast transition cannot complete
PMF, cipher suite, or WPA2 / WPA3 capabilities do not match
Neighbor reports are incomplete, so the terminal does not know where to go
The terminal scan policy is conservative and discovers candidates too late
The controller or network-side context sync is slow, so reassociation is followed by waiting

“It drops when I walk to that spot” does not necessarily mean there is a coverage hole. It may mean the terminal found a candidate too late, or found one but could not finish the security stage quickly enough.

Capture and Log Focus

Roaming problems are easy to drag into the weeds of management frames and key fields. First decide which stage is broken: did the device never switch out, did it switch but fail to attach, or did the wireless switch finish while the upper layer still had to recover?

First locate the stage

This is not about root cause yet. It is about giving the later capture a boundary:

Still on the old BSSID
Left the old BSSID, but failed to attach to the new one
Already attached to the new BSSID, but application traffic is still interrupted

Once this is clear, you will know whether to inspect scanning, reassociation, or the upper layer.

When it is slow to find the next AP, inspect candidate discovery

Many roaming delays are not caused by the switch itself. They are caused by late candidate discovery. Typical checks are:

Did the terminal scan the target BSSID before the switch?
Is the scan cache stale?
Did 11k neighbor reports provide the right information?

The symptom is that the link is already getting worse, but the terminal still does not have a good enough candidate to move to.

Once the switch has started, inspect reassociation and security

When the terminal starts moving to a new AP, the usual slowdown is in reassociation and security:

Reassociation Request / Response
Whether FT was used
Whether the four-way handshake had to run again in full
Status codes, rejection reasons, and deauthentication reasons

Many cases that look like “roaming failed” are actually that the target AP did not accept the security context cleanly.

After the wireless switch, inspect IP and business recovery

Once the wireless side is confirmed switched, look at:

Whether DHCP ran again
Whether ARP updated in time
Whether business connections were reused

That avoids counting upper-layer recovery time as wireless roaming time.

Engineering Judgment

Same SSID does not mean smooth roaming
Signal strength alone does not determine the roaming result
Roaming is a combination of trigger policy, discovery, switching, and security recovery
11k, 11v, and 11r help different stages and should be evaluated separately
A device that roams badly is not necessarily a signal problem; it may be a candidate-discovery, policy, or security-context problem