Skip to main content BGP | IoT Worker

BGP

An Internet prefix may come out of a data center in Shanghai today and switch to Beijing or Hong Kong tomorrow. The same public address may follow very different paths for users on different carriers. A data center may already be healthy again, yet external traffic still detours around it. When you see this kind of behavior, the problem is usually no longer in one router’s forwarding table. It is higher up, in how autonomous systems tell each other which prefixes are reachable.

BGP handles exactly that. It is not a protocol that computes the shortest path across the whole Internet for every packet, and it is not simply a way for routers to synchronize a topology map. It lets different autonomous systems exchange “which prefixes I can reach, which ones you can hand to me, and which policy and constraints come with that statement.” Without it, the Internet could form locally, but it would be very hard to assemble into the global multi-carrier, multi-country, multi-admin-domain network that exists today.

The core of BGP is not “find the shortest route.”
It is to propagate prefix reachability and path attributes between autonomous systems,
then choose the route that is currently trusted and willing to be propagated further according to policy.

Why It Exists

Inside a single administrative domain, routers can exchange relatively detailed internal topology information using link-state or distance-vector protocols. But the Internet is not one unified network. It is a collection of many autonomous systems:

  • Carriers maintain their own network boundaries and exit policies
  • Cloud providers, CDNs, enterprises, and campus networks each manage their own prefixes
  • Different networks do not want to share their full internal topology
  • Routing decisions are often influenced by business relationships, cost, and policy, not just technical shortest path

That means an inter-domain routing protocol must answer a different kind of question:

  • Which prefixes can you reach?
  • Through which autonomous systems did those prefixes travel?
  • Should I accept this path, prefer it, and tell others about it?

If every participant had to exchange the full topology and then compute a single global shortest path, scale, trust boundaries, and policy flexibility would all collapse very quickly. The value of BGP is that it does not try to make everyone share the same internal map of the Internet. Instead, each AS only publishes the reachability and path clues it is willing to expose.

What It Was Built for, and Under What Background

BGP evolved with the Internet’s shift from a small early interconnect to a world of multiple carriers and multiple administrative domains. Its default world is not “everyone is optimizing together for the whole Internet.” Its default world is:

  • Each autonomous system first protects its own boundary and interests
  • Routing information must propagate across organizations without exposing everything inside
  • Scalability must matter more than global optimality

That gives BGP a very different character from intra-domain routing protocols:

  • It is first an inter-domain routing protocol, not an intra-domain shortest-path protocol
  • It propagates prefixes and attributes, not the full Internet link-state graph
  • It defaults to policy first, while shortest AS path is only one reference
  • Its convergence goal is “sufficiently consistent and operable,” not “mathematically globally optimal”

So if you think of BGP as “Internet-scale OSPF,” many later behaviors will look like strange anomalies.

The Main Model

To understand BGP, keep four things in mind:

  • One autonomous system tells its neighbors, “these prefixes can be reached through me”
  • That advertisement carries a set of path attributes describing how the path looks and how preferred it is
  • The neighbor does not accept everything blindly. It first applies local policy to decide whether to trust, use, or re-announce the path
  • What finally goes into the forwarding table is only the currently selected path, or a small number of selected paths, from the candidate set

The main flow can be summarized like this:

AS A advertises Prefix P + Attributes
  -> AS B receives it and applies policy
  -> AS B chooses the best path it is willing to use
  -> AS B decides whether to keep advertising it to AS C
  -> The advertisement keeps spreading across more autonomous systems

Once this line is clear, later behavior around AS_PATH, LOCAL_PREF, MED, eBGP/iBGP, and convergence becomes much easier to read and much less likely to dissolve into router-manual noise.

A Common End-to-End Path

A typical BGP reachability propagation flow looks like this:

sequenceDiagram participant A as AS 65001 participant B as AS 65002 participant C as AS 65003 A->>B: Advertise 203.0.113.0/24 Note over B: Includes path attributes
AS_PATH=65001 B->>C: Re-advertise the prefix Note over C: Sees AS_PATH=65002 65001

If C also learns the same prefix from another side, the process becomes:

sequenceDiagram participant B as AS 65002 participant C as AS 65003 participant D as AS 65004 B->>C: 203.0.113.0/24 via 65002 65001 D->>C: 203.0.113.0/24 via 65004 65005 65001 Note over C: Install the route selected by local policy

The key point is not “all paths have been learned.” It is:

  • BGP can learn multiple candidate paths for the same prefix at once
  • Which one is used is not decided by the sender
  • When you advertise onward, you do not necessarily forward every candidate unchanged

So the working unit of BGP is not “the one true path for the whole Internet.” It is “the local choice each AS makes based on external advertisements.”

Why It Spreads Prefix Reachability, Not Full Topology

One of BGP’s most important reductions is that it does not require each AS to expose all internal links, costs, and failure details to the outside. It is more like saying:

  • These prefixes can be reached via me
  • These are the autonomous systems the path has gone through
  • These are the attributes you can use as reference

That gives BGP two huge advantages:

  • Better scalability, because there is no need to synchronize fine-grained topology globally
  • Clearer administrative boundaries, because each AS only exposes the layer it is willing to expose

But the cost is also obvious:

  • The outside world cannot see your true internal structure
  • Why a path is slow or detouring may not be obvious from a BGP-only view
  • The whole network does not solve a single globally optimal shortest-path problem

So BGP’s strength has never been “knowing the most,” but “making the Internet keep working even when no one knows everything.”

Why Policy Matters More Than Shortest Path

When looking at BGP, it is tempting to focus on AS_PATH length and conclude that the shorter path should win. That may be true in some cases, but it is far from the whole picture.

Real networks often prioritize:

  • The preferred local exit or upstream
  • Business relationships such as customer, peer, and provider priorities
  • Certain flows that must stay inside a specific country, carrier, or private line
  • A path that is shorter but more expensive, more jittery, or less stable

So in BGP, the most important question is not “is this the shortest,” but “is this the path I am willing to use and propagate.” That is why LOCAL_PREF and similar attributes matter so much: they put local intent ahead of other considerations.

That is also why you often see in real deployments:

  • A shorter path is not selected
  • Forward and return paths are asymmetric
  • Different carriers see very different routes to the same prefix

Many of these are not anomalies. They are BGP executing policy.

Why AS_PATH Matters, but Is Not the Whole Answer

AS_PATH does at least two important things:

  • It gives the receiver a coarse path clue
  • It helps BGP detect loops and prevent a route from bouncing back forever

If an AS sees its own ASN already in the AS_PATH of an incoming advertisement, it will usually reject that path. That is one of the key mechanisms that keeps BGP stable.

But AS_PATH is not the whole answer, because it only tells you which ASes the route passed through. It does not tell you:

  • How the path is routed internally inside those ASes
  • Which links are actually better
  • Whether this path is cheaper or more aligned with local policy

So AS_PATH is important, but if you treat it as the only truth, you will misread many field behaviors.

Why eBGP and iBGP Must Be Treated Separately

Many people know the terms eBGP and iBGP, but still think of them as just “two configuration modes of the same protocol.” A more accurate view is that they solve two different layers of problems.

eBGP is about:

  • How one AS and another AS exchange prefix reachability

iBGP is about:

  • How one AS internally distributes the external routes it learned so the whole AS stays consistent

If these two are not separated, many behaviors become confusing:

  • The outside route has been learned, but some internal border routers do not use it consistently
  • One exit has already learned a better external route, but other devices are still using the old exit
  • You changed outbound policy, but the internal forwarding side did not catch up

So BGP is not only about “advertising to the outside world.” How an AS digests the routes it learned from outside is just as important.

Why Convergence Delay and Flapping Are Its Long-Term Cost

BGP is not designed for the Internet to become globally consistent instantly. It is designed to gradually converge in a world that is huge, policy-heavy, and not fully trusted. That means it will always be more cautious, and slower, than many intra-domain protocols.

That cost shows up in many places:

  • After an exit is withdrawn, the outside world needs time to stop sending traffic there
  • Replacement paths for the same prefix need time to propagate and be reselected
  • Flapping sites can trigger repeated updates and magnify instability

So black holes, detours, and short windows of unreachability are often not “BGP not working.” They are BGP converging, and that convergence window is already large enough to matter to the business.

That is why engineering teams care so much about:

  • The cadence of prefix advertisement and withdrawal
  • Route flap suppression
  • How quickly health state is reflected in route publication

Why More Specific Prefixes Often Matter More Than Attributes

BGP decides “which prefixes are reachable from where,” but packet forwarding still obeys longest-prefix match. That leads to an important engineering judgment:

  • More specific prefixes often override attribute comparison and directly change traffic direction

For example, if both of the following are visible:

  • 203.0.113.0/23
  • 203.0.113.0/24

traffic to addresses inside that /24 will usually hit the more specific route first, rather than comparing the BGP attributes of the /23 route first.

That is why traffic engineering, blackholing, DDoS diversion, and some Anycast designs rely so much on prefix granularity. Many field problems look like “the attributes did not work,” when in reality the traffic was already captured earlier by a more specific prefix.

Why the Idealized Spec Path and the Field Path Differ So Much

In the abstract model, BGP looks like a clean chain:

  • Build a neighbor session
  • Exchange reachable prefixes
  • Compare attributes
  • Select the best path
  • Re-advertise it

The real network is much messier. Common deviations usually come from:

  • Route policy and service health not being in sync
  • External advertisements changed, but internal forwarding has not caught up
  • Some upstream filtered your prefix or attributes
  • A site is still advertising the address even though the service should no longer receive traffic

That means the protocol model and the field model must be read separately:

  • Learning a BGP route does not mean it is already in the forwarding path
  • A prefix being reachable does not mean the service is usable
  • A policy that “looks right” does not mean every network will see it as you expected

So the hard part of BGP is usually not configuring a few neighbors. It is keeping prefix publication, service health, intra-domain forwarding, and external propagation aligned for a long time.

What to Look at When Debugging

When investigating a BGP problem, do not start by staring at every path attribute. A more useful order is:

First, see whether the route was not learned, or learned but not selected

Confirm:

  • Whether the target prefix was learned from the expected neighbor
  • How many candidate paths exist
  • Which path is currently selected

Many problems are not “BGP is completely broken.” They are “the path exists, but a higher-priority policy is suppressing it.”

Then, check whether the selected best path really entered the forwarding table

Look at:

  • Whether the control-plane best route was actually installed
  • Whether the next hop is resolvable and reachable
  • Whether the intra-domain forwarding plane can really deliver traffic to that exit

If you do not separate those layers, it is easy to mistake “BGP selected it already” for “the data plane must be using it.”

Finally, check whether the issue is policy, propagation, or a more specific prefix

These symptoms usually carry more information:

  • One carrier always detours
  • Forward and return paths stay asymmetric for a long time
  • One region already switched, another has not yet
  • A summarized prefix looks fine, but some more specific prefixes are wrong

These are often not a simple attribute misconfiguration. They are usually about policy boundaries, propagation scope, or prefix granularity.

What You Should Really Think About BGP Today

  • Do not treat BGP as the Internet shortest-path protocol. It first exchanges reachability and policy between autonomous systems
  • Do not treat AS_PATH as the only criterion. Real path choice is often dominated by local preference, business relationships, and policy
  • Do not mix eBGP and iBGP into one layer. The former exchanges external reachability; the latter distributes those external routes inside the AS
  • Do not assume the control-plane best route is automatically the data-plane route. Next-hop resolution and intra-domain reachability must be verified separately
  • Do not underestimate more specific prefixes. In many traffic-engineering cases, longest-prefix match decides the outcome before attributes do
  • Do not equate routability with business availability. If service health cannot be reflected in prefix publication quickly enough, BGP will faithfully spread the bad entry point to the whole network

Further Reading

  • Routing: why BGP is only one special layer inside a larger routing problem
  • OSPF: why routing inside one AS is more like each router computing from shared link state than propagating external prefix policy
  • Anycast: how BGP decides which site a user reaches first when the same address is reachable from many places
  • IP: BGP ultimately publishes prefix reachability, not business semantics
  • DNS: why public resolvers and edge entries often depend on both DNS and BGP
  • CDN: why edge access, origin return, and multi-site exits remain sensitive to BGP path choice
  • ICMP: how the network exposes failure when a BGP-selected path does not work in the data plane

References