Skip to main content Wi-Fi Roaming: Why Switching APs Is Not the Same as Reconnecting | IoT Worker

Wi-Fi Roaming: Why Switching APs Is Not the Same as Reconnecting

A device carries the same SSID (Service Set Identifier) as it moves around a building, the service occasionally stutters, and the logs show only one disconnected followed by one connected. That is where many problems get misread. It looks like “disconnect and reconnect,” but what you really need to understand is why the device left the current BSSID (Basic Service Set Identifier), when it decided to switch, and whether security or upper-layer networking had to be rebuilt after the switch.

The easiest way to misunderstand Wi-Fi roaming is to mix three things together:

  • Switching BSSID inside the same ESS (Extended Service Set)
  • Restarting the whole Wi-Fi access path from scratch
  • Actually interrupting the IP or business layer

These three are often related, but they are not synonyms. For implementation, packet analysis, and debugging, roaming is not “the device connected to Wi-Fi again.” It is the device moving to another access point within the same wireless network while trying to keep the service interruption as short as possible.

Wi-Fi roaming = switching BSSID under the same network meaning
             != restarting the entire network access flow
             != instant switching just because a stronger signal appears

This article focuses on the common infrastructure-mode roaming path for a STA (Station) inside the same ESS. The main line is trigger, discovery, selection, and switching. Mesh, vendor-specific controller optimization, cellular mobility management, and multi-link concurrency are not covered.

What Roaming Is Solving

You can stay connected without roaming, but once the device moves, the link quality changes:

  • Signal gets weaker and retransmissions increase
  • Medium contention rises and latency jitter grows
  • Another AP (Access Point) may be closer, but the terminal still stays on the old one

If the device waits until the current link is completely broken before reconnecting, the service will visibly interrupt. If the device switches whenever it sees a slightly stronger signal, it may bounce back and forth between APs. Roaming exists to balance those two extremes:

  • When is it no longer worth staying on the current AP?
  • Where is the next candidate BSSID?
  • How can authentication, association, and key setup be shortened during the switch?

So roaming is first a continuity problem, and only then a “which AP should I use?” problem.

Minimal Mental Model: Three Layers of State

Layer 1: The radio attachment target changes

The terminal was attached to BSSID-A and is now attached to BSSID-B. That is the most direct definition of roaming.

Layer 2: Part of the Layer-2 access path may run again

When switching BSSID, the device may still need to:

  • Scan or use cached results to confirm a candidate AP
  • Authenticate or fast-authenticate
  • Reassociate
  • Restore security context or reinstall keys

Whether this step is slow determines whether voice, video, and real-time control traffic stutter.

Layer 3: IP and application traffic do not always restart

If the device remains in the same Layer-2 broadcast domain, the common case is:

  • IP address stays the same
  • Default gateway stays the same
  • Business connections do not necessarily drop immediately

If roaming crosses a different Layer-3 boundary or VLAN, or if the vendor implementation requires a rejoin, the device may also trigger:

  • DHCP
  • ARP
  • DNS
  • Upper-layer TCP / TLS rebuild

If you mix these layers in debugging, you usually end up with a statement that has no diagnostic value: roaming failed. A better description is often:

  • The device switched to the new BSSID, but the four-way handshake did not finish
  • Reassociation succeeded, but DHCP timed out
  • The wireless switch completed, but the service interruption came from upper-layer reconnection

Why the Same SSID Does Not Automatically Mean Smooth Roaming

In the same-ESS roaming discussed here, multiple APs present the same SSID to the terminal, so the terminal can treat them as candidate attachment points for the same network. That is only a prerequisite for roaming, not the result.

Behind the same SSID, there may still be different:

  • BSSIDs
  • Channels and loads
  • Security capabilities or PMF (Protected Management Frames) settings
  • Vendor implementations and RF policies

From the terminal’s point of view, the real attachment is to a specific BSSID, not to the abstract SSID name. Smooth roaming depends on whether the following are all true:

  • The terminal can discover a better candidate BSSID in time
  • The current and target APs are compatible in security, capabilities, and policy
  • The switch does not repeat too much work

So “they all have the same Wi-Fi name” only means the user-facing network may be unified. It does not mean the terminal will switch, and it does not mean the switch will be fast.

What Actually Happens in the Roaming Path

The common roaming path looks like this:

Link quality degrades or policy triggers
-> Discover candidate APs
-> Select target BSSID
-> Authenticate / reassociate
-> Restore or rebuild security context
-> Continue data exchange on the new AP

The important part is not the names. It is what each step is solving.

Trigger: Why the device decides to leave the current AP

Roaming is rarely driven by just one RSSI threshold. Common inputs include:

  • RSSI keeps dropping
  • Retransmission rate increases
  • Packet loss and latency worsen
  • The current AP is overloaded
  • The AP explicitly suggests the terminal move away

The standard defines the action space, but when to trigger is usually a combination of terminal driver, firmware, and vendor policy.

That is why two devices may behave very differently in the same site:

  • One is sticky and stays on a bad signal too long
  • One is too aggressive and keeps bouncing around

The first is often called a sticky client; the second often produces ping-pong roaming.

Discovery: Where the next candidate comes from

The terminal cannot wait until it is fully disconnected and then blindly scan every channel. That would make the interruption too long. It needs to know the next step faster.

Common sources of candidates are:

  • Cached scan results
  • Fresh real-time scan results
  • Neighbor information provided by the current AP

The cost is usually not the switch itself. The cost is how long it takes to find the next hop. Many roaming cases that look like “authentication is slow” or “association is slow” are actually slow because discovery was slow.

Selection: Strongest signal is not always the best choice

The choice of target BSSID often depends on:

  • Signal strength and quality
  • Channel utilization and congestion
  • Security compatibility
  • Band preference, for example favoring 5 GHz
  • Terminal power budget and scan cost

Roaming is not purely an RF problem, and it is not purely a protocol problem. It is the result of air-interface reality and device policy together.

Switching: Reassociation

When the terminal moves to a new AP, it often sends Reassociation Request instead of treating the whole path as a first-time connection. That means:

  • It tells the network that it already belonged to this ESS
  • It tells the target AP which old AP it came from
  • It gives the network a chance to migrate context or update state

Reassociation does not mean the switch is automatically fast. Whether the security stage is accelerated afterward determines the interruption time.

What 802.11k, 802.11v, and 802.11r Each Do

These three abbreviations are often listed together as if roaming does not work unless all of them are enabled. In practice, they support different stages of the roaming flow.

802.11k: Help the terminal know who is nearby sooner

802.11k provides neighbor reports and radio measurements. It mainly answers:

  • Which candidate APs are nearby?
  • On which channels are they likely to be found?

This reduces the cost of blind full-channel scans. It improves “finding candidates,” not the final decision itself.

802.11v: Help the network suggest a move

802.11v is typically about BSS Transition Management. The current network can suggest:

  • Which candidate APs you should consider
  • Staying here is probably no longer the best option

It is usually advice, not a hard command. Whether the terminal accepts it depends on the implementation.

802.11r: Help the security switch finish faster

802.11r, also called FT (Fast BSS Transition), is not about choosing the AP. It is about:

  • Making key derivation and switch preparation more efficient during roaming
  • Reducing the time cost of repeating the full security flow

The division of labor is clear: 11k is more about finding candidates, 11v is more about migration advice, and 11r is more about speeding up the secure cutover. They should not be collapsed into “fast roaming protocol.”

Why Roaming Problems Often Look Like Signal Problems

If you only stare at signal strength, roaming problems are easy to misread.

Signal matters, of course, but these are also common:

  • 11r settings do not match, so fast transition cannot complete
  • PMF, cipher suite, or WPA2 / WPA3 capabilities do not match
  • Neighbor reports are incomplete, so the terminal does not know where to go
  • The terminal scan policy is conservative and discovers candidates too late
  • The controller or network-side context sync is slow, so reassociation is followed by waiting

“It drops when I walk to that spot” does not necessarily mean there is a coverage hole. It may mean the terminal found a candidate too late, or found one but could not finish the security stage quickly enough.

Capture and Log Focus

Roaming problems are easy to drag into the weeds of management frames and key fields. First decide which stage is broken: did the device never switch out, did it switch but fail to attach, or did the wireless switch finish while the upper layer still had to recover?

First locate the stage

This is not about root cause yet. It is about giving the later capture a boundary:

  • Still on the old BSSID
  • Left the old BSSID, but failed to attach to the new one
  • Already attached to the new BSSID, but application traffic is still interrupted

Once this is clear, you will know whether to inspect scanning, reassociation, or the upper layer.

When it is slow to find the next AP, inspect candidate discovery

Many roaming delays are not caused by the switch itself. They are caused by late candidate discovery. Typical checks are:

  • Did the terminal scan the target BSSID before the switch?
  • Is the scan cache stale?
  • Did 11k neighbor reports provide the right information?

The symptom is that the link is already getting worse, but the terminal still does not have a good enough candidate to move to.

Once the switch has started, inspect reassociation and security

When the terminal starts moving to a new AP, the usual slowdown is in reassociation and security:

  • Reassociation Request / Response
  • Whether FT was used
  • Whether the four-way handshake had to run again in full
  • Status codes, rejection reasons, and deauthentication reasons

Many cases that look like “roaming failed” are actually that the target AP did not accept the security context cleanly.

After the wireless switch, inspect IP and business recovery

Once the wireless side is confirmed switched, look at:

  • Whether DHCP ran again
  • Whether ARP updated in time
  • Whether business connections were reused

That avoids counting upper-layer recovery time as wireless roaming time.

Engineering Judgment

  • Same SSID does not mean smooth roaming
  • Signal strength alone does not determine the roaming result
  • Roaming is a combination of trigger policy, discovery, switching, and security recovery
  • 11k, 11v, and 11r help different stages and should be evaluated separately
  • A device that roams badly is not necessarily a signal problem; it may be a candidate-discovery, policy, or security-context problem

Further Reading