Routing

Reading time: 10 minute Word count: 2022

Network Routing IP Forwarding

When a packet goes out, the hard part is not whether there is a destination address. The hard part is why so many routers in the network all seem to agree, at that moment, to send it in one direction. Links fail, exits change, prefixes get summarized, default routes act as fallback, and policy may deliberately avoid the path that looks shortest. If one of those decisions disagrees with the others, packets disappear into black holes, detour, flap, or fall into a loop.

Routing is easy to describe as a table of destination network, mask, next hop, and exit interface. The real high-frequency engineering questions are different: why was this path chosen, why is failover slow, why do both sides think they have a path, and why is the destination alive but still unreachable. To answer those questions, routing and forwarding have to be separated first.

Routing is not just “find the shortest path for one packet”. It is a continuous distributed judgment that keeps deciding the next hop for each destination prefix while the network changes, and tries to keep the whole domain converged on compatible forwarding results

Why It Appears

If there were only one link, there would be no need for routing: a frame goes out, whatever can receive it does, and whatever cannot just misses it. Once the network is split into multiple subnets, hosts and routers have to answer a few questions:

Is the destination directly connected
If not, which device should get the packet first
Does that next hop know where to send it next
When the old route breaks, how does the replacement path take over

So routing does not solve “an address exists”. It solves “how an address becomes a chain of executable next-hop decisions”. IP gives you a unified address and datagram model. Routing is what attaches those addresses to a path that can actually run.

The Background It Came From

Routing is not a standalone product feature. It is a whole layer of machinery that had to emerge when the Internet grew from local interconnection into large-scale heterogeneous interconnection. The network could not expect every device to know the precise location of every other device, and it could not manually recalculate the whole-network path for every failure.

That shaped routing’s design direction:

Prefer letting each device keep only local useful information instead of global absolute truth
Prefer reducing forwarding to a simple action instead of making every packet trigger a full global solve
Prefer allowing different network sizes and management domains to interconnect instead of requiring one central controller

So routing was never just a shortest-path graph problem. It was first a distributed-systems problem.

Grasp the Main Model First

Keep these three layers separate:

Forwarding: once the packet arrives, which interface and which next hop should carry it onward
Routing: how the device got that forwarding table
Convergence: after the network changes, how multiple devices re-accept a compatible judgment

You can compress the logic like this:

Destination address
  -> Match a prefix
  -> Find the next hop for that prefix
  -> Hand the packet to the outbound interface

The hard part is not the last forwarding step. It is the two things before it:

Why does the table look like this
Is it close enough to what neighboring devices think the world looks like

The Most Common Main Path

A normal cross-subnet path can be simplified like this:

sequenceDiagram participant A as Host A participant R1 as Router 1 participant R2 as Router 2 participant R3 as Router 3 participant B as Host B A->>R1: Packet(dst=10.20.30.8) Note over R1: Longest prefix match
10.20.30.0/24 -> R2 R1->>R2: Forward to next hop Note over R2: Longest prefix match
10.20.30.0/24 -> R3 R2->>R3: Forward to next hop Note over R3: Target prefix directly connected R3->>B: Deliver packet

At each hop, the device is not “finding the final destination again”. It is doing something smaller:

Look at which prefix the destination address belongs to
Find the most specific match in the local table
Hand the packet to the next hop that corresponds to that prefix

So the unit of routing is not “the whole path”. It is “this device’s next-hop decision for this destination prefix”.

Why Longest Prefix Match Is the Foundation

Routing tables are often thought of as “the first matching entry wins”. The stable foundation is really longest prefix match.

For example, if a device has all of these routes:

10.0.0.0/8
10.20.0.0/16
10.20.30.0/24
0.0.0.0/0

and the destination is 10.20.30.8, the path that should really win is 10.20.30.0/24, not the broader /16, not the /8, and certainly not the default route. The reason is direct:

A longer prefix represents a more specific destination range
A more specific route is what makes local exceptions possible
Without this rule, summary routes and specific routes would fight each other

This matters a lot in practice. Many “I added a static route but it still does not work” incidents are not caused by the configuration never arriving. They are caused by being overshadowed by a more specific or broader matching relationship.

Why the Default Route Is So Common

Most devices are not worth maintaining with a full Internet routing table. A normal host, a device behind a home gateway, and even many edge nodes only need to know:

How to reach the locally connected networks
Which upstream router should receive everything else

That is the purpose of the default route 0.0.0.0/0. It solves “if I do not know a more specific path, at least send the packet to a device that knows more about the world”.

The default route is useful, but its boundary should stay clear:

It is only a fallback, not a priority
It gives you an exit, but it does not guarantee the upstream truly has the more specific reachability
If it is wrong, many issues show up as “all external traffic is broken”

So the default route is not laziness. It is a way to centralize path knowledge upstream.

Why Routing and Forwarding Must Stay Separate

When a packet arrives, the forwarding action has to be simple enough that high-throughput networks can actually run. The common separation in reality is:

The control plane learns, chooses, and converges on paths
The data plane forwards at line rate using the current FIB

You can think of it like this:

Routing protocol / static config
  -> Build RIB
  -> Select installable entries
  -> Push FIB
  -> Line-rate forwarding uses the FIB

The common confusion is to treat RIB, FIB, and what show route displays as the same thing. They are not:

RIB is closer to the control-plane candidate route set
FIB is the actual forwarding result used fast
A route being learned does not mean it has already become the current forwarding path

So many troubleshooting mistakes come from assuming “the control plane has a route, so the data plane must already be using it”.

Why Dynamic Routing’s Hard Part Is Convergence, Not Calculation

Static routing can solve small networks, but it collapses as the network grows. The value of dynamic routing protocols is not that they “fill tables automatically”. It is that devices exchange reachability information and reselect paths after topology changes.

The really hard part is not the path calculation. It is the convergence cost:

After a link breaks, how quickly can the old route be withdrawn
Once a new route appears, how quickly can it be accepted by the whole domain
When devices disagree for a short time, will black holes, loops, and flapping appear

So evaluating dynamic routing is not only “did the network eventually get a route”. It is also “how long does the middle of the change stay unstable”.

Why Policy Often Matters More Than the Shortest Path

In engineering, “shortest path” is only one possible criterion, and often not even the most important one. ISPs, enterprise networks, and cross-region deployments often care more about:

Which exit is cheaper
Which link is more stable
Which prefixes must use a private line
Which traffic must not cross a given administrative domain

That is why real routing often produces results that do not look shortest. The router did not calculate wrong. It is enforcing a higher-priority policy objective.

So if you think of routing as pure graph shortest path, you will misread many real network behaviors:

Forward and return paths are asymmetric
The same destination may use different exits at different times
A backup link may stay idle until the primary fails

Why Summarization and Visibility Are a Tradeoff

A large network cannot keep broadcasting every small prefix forever, or the control plane and table size would explode. So routes get summarized: multiple more-specific subnets are compressed into a larger prefix that is advertised outward.

That buys you:

Fewer entries
Smaller change-propagation scope
Less need for upper layers to know every lower-layer detail

The cost is equally clear:

The outside sees abstracted reachability, not the real internal topology
Local failures inside a summary boundary may not be visible immediately outside
If summarization is too coarse, traffic can get pulled in and then black-holed internally

So summarization is not just an optimization. It is a trade between scalability and path precision.

What to Look At in Packet Capture and Troubleshooting

Routing problems are easy to waste time on if you immediately stare at protocol type, LSA, update, or metric. A more useful order is below.

First check whether this is “no route” or “route exists but is wrong”

Confirm:

Whether the source host’s default gateway is correct
Whether the target prefix has a matching entry on the current device
Which exact longest-prefix match is being hit

Many problems are not complex convergence failures. They fail at the first hop because the next-hop decision was already wrong.

Then check whether the control plane and data plane agree

Focus on:

Whether the routing protocol or static config really produced the expected route
Whether that route entered the actual forwarding table
Whether the neighboring device sees the same next hop

If you do not separate those layers, it is easy to say “the command output has the route, so it cannot be routing”.

Finally check whether the issue is convergence, policy, or summarization boundary

These symptoms usually carry the most signal:

Short black holes during failover
Forward path fine, return path broken
More specific prefixes unreachable while the summary prefix still exists
The same destination flaps between different exits periodically

These are usually not caused by one bad field. They happen because distributed path selection never stabilized at one boundary.

What Engineering Should Actually Think About Routing Today

Do not think of routing as “finding the shortest path for a packet”. It is first a continuous next-hop decision for each destination prefix
Do not mix longest-prefix match, default route, and policy routing into one layer. They solve different granularity problems
Do not treat the route learned by the control plane as the same thing as the path currently used by the data plane. Verify them separately
Do not underestimate convergence time and summarization boundaries. Many black holes and detours happen inside the window where the network is still changing
Do not assume “not shortest” means “wrong”. Real networks often prefer policy, cost, and administrative boundaries first