In a campus network, if one uplink fails, traffic should quickly switch to the other path. If a distribution switch reboots, access-layer devices should not keep sending traffic to it for a long time. If a new subnet is added, the rest of the domain should soon know where to send that prefix. The hard part in cases like these is not whether there is a static route. It is how many devices inside one autonomous system can quickly form a consistent judgment after the topology changes.
OSPF handles exactly that kind of internal routing problem. It is not routers copying each other’s routing tables, and it is not BGP-style commercial policy negotiation. It lets routers in the same area first build an as-consistent-as-possible map of link state, then each router computes the shortest-path tree to every destination prefix from that map.
OSPF does not mean “a neighbor tells me how to go to a prefix and I just use it”. It means the routers inside the domain share link-state view and each one runs SPF to compute the next hop to the target subnet
Why It Appears
In a very small network, static routes can barely hold together. Once the number of devices grows, redundancy appears, and subnets change, static routing quickly shows obvious problems:
- The maintenance burden grows quickly with the number of devices
- When a link fails, many places have to be changed manually
- Different devices often have inconsistent views of the topology change
- Redundant links exist but are not automatically used correctly
What an interior dynamic routing protocol has to solve is not “distribute a few configuration lines automatically”. It has to answer more basic questions:
- What does the current topology actually look like
- Based on that topology, what is the best path to each prefix for each device
- When a link breaks, how quickly can the domain converge again
If routers only pass around “I have this route” without a more stable topology view, convergence and loop control become much weaker. OSPF’s value is that it first gets the domain to agree on a common link-state base view, then lets each router compute its own path.
The Background It Came From
OSPF came from the real-world need for interior dynamic routing in medium and large IP networks. It was not about propagating policy across carriers. It was about a more focused target inside one autonomous system:
- Detect topology change quickly
- Select paths stably across multiple links and redundant topologies
- Let routers from different vendors and different device families cooperate under one IGP
That makes OSPF very different from BGP:
- It is first and foremost an IGP, an interior gateway protocol
- It propagates link state, not external prefix policy
- It cares about faster convergence inside the domain, not policy autonomy across management boundaries
- It assumes everyone is inside the same administrative boundary and can share more internal structure
So if you think of OSPF as “another protocol that carries prefixes”, its design will seem unnecessarily heavy. If you think of it as “domain-internal BGP”, you will also misunderstand why it floods link state.
Grasp the Main Model First
To understand OSPF, keep four things in mind:
- Routers first discover neighbors and form adjacencies
- Devices flood their link-state information through LSAs inside the area
- Devices in the same area build a similar link-state database
- Each device runs the SPF algorithm locally and gets the best next hop for each subnet
The main path can be compressed like this:
Router advertises its link state
-> Devices in the area synchronize into the LSDB
-> Each device runs SPF on the LSDB
-> The OSPF route is installed into the local RIB / FIB
Once that line is stable, later terms like neighbor state, DR/BDR, area, and LSA types stop feeling like a glossary dump.
The Most Common Main Path
A typical OSPF convergence path inside a domain looks like this:
If the link from R2 to an upstream fails, the main path becomes:
The key is not “who told whom a prefix”. It is:
- Synchronize the topology change first
- Then let each router recompute the path from the similar map
So OSPF is closer to “jointly maintaining one map” than “copying each other’s route outcomes”.
Why It Propagates Link State, Not Final Route Results
What makes OSPF different from many people’s intuitive idea of “dynamic routing” is that it does not primarily exchange final route results. It is more like saying:
- Who am I connected to
- What is the current state of those links
- What is the cost of using those links
That has several important consequences:
- Each router can compute the shortest path to each destination locally
- Routers do not have to trust a neighbor’s “conclusion” blindly. They derive from the same raw topology clues
- Once part of the topology changes, the affected routers can recompute fairly consistently
That is why OSPF’s stability depends so much on LSDB consistency. If routers are looking at different maps, their SPF results will quickly diverge.
Why Hello and Adjacency Matter Before Learning Routes
Many OSPF failures look like “why did I not learn this route?”, but the earlier question is usually:
- Did the neighbors even discover each other
- Did the adjacency actually form
- Did database synchronization complete
OSPF is not a protocol where an Update arrives and routes can be installed immediately. Before that, several layers must already be standing:
- Layer-2 and Layer-3 connectivity must be okay
- Hello parameters must be compatible
- Neighbor state must advance correctly
- The necessary database description and synchronization must finish
So if you start by staring at one prefix, you may miss the more important earlier boundary: adjacency never formed, so SPF and route installation never had a chance.
Why Cost Is Only a Domain-Internal Preference, Not Absolute Truth
OSPF Cost is easy to interpret as “the higher-bandwidth link is always best because it has lower cost”. That is only partly true.
More accurately, cost is:
- A locally chosen metric used by domain devices to compare internal paths
It may be based on bandwidth, but it is not the same as real latency, jitter, or business experience. In practice:
- A high-bandwidth link is not necessarily the most stable one
- A lower-cost path may simply be the path the operator wants preferred
- If the cost design is messy, the domain will start showing strange traffic splits
So OSPF “shortest path” usually means shortest with respect to cost, not optimal with respect to every real-world metric.
Why Areas Exist for Scale and Convergence, Not Decoration
If you dump an entire large network into one OSPF area, the cost of flooding and SPF recomputation grows very quickly:
- The database is larger
- The change propagation scope is larger
- Each recomputation touches more objects
Areas are the way OSPF keeps those costs under control. They are not there to make topology diagrams look neat. They trade off:
- Richer topology visibility inside an area
- Less unnecessary detail propagation across areas
- Boundary routers taking on some summarization and abstraction responsibility
The benefits are clear:
- Domain scale is easier to grow
- Some local changes do not force the whole network to recompute
The tradeoffs are also real:
- Area boundaries add more design constraints
- Summarization and abstraction reduce visibility into internal details
- If the design is poor, troubleshooting bounces between “intra-area problem” and “inter-area reachability”
Why DR/BDR Is a Real-World Compromise on Broadcast Networks
On Ethernet-style multiaccess broadcast networks, if every OSPF device formed full adjacency with every other device, control-plane overhead would grow very quickly. DR/BDR exists to keep adjacency complexity under control in that environment.
It is not about “who is more advanced”. It is about:
- Preventing adjacency count from exploding on a broadcast segment
- Making database sync and LSA exchange more orderly
So if a DR fails, the symptom is not necessarily “OSPF is completely broken”. It may simply be that:
- Synchronization efficiency and stability on that broadcast segment have dropped
That is one reason many OSPF mechanisms look complicated: they are all trying to keep the control plane from exploding once scale increases.
Why It Must Be Kept Separate from BGP
OSPF and BGP often coexist in the same network, but they solve different layers of the problem.
OSPF is closer to answering:
- Inside my autonomous system, how do routers share a common topology view and send traffic toward the correct exit or internal subnet
BGP is closer to answering:
- Which prefixes should this autonomous system accept, choose, and advertise across administrative boundaries
If you blur those layers, a lot of misunderstandings appear:
- The external prefix may already have been selected by BGP, but the IGP cost to the exit is wrong, so traffic leaves from an unexpected place
- The internal OSPF topology may be fine, but cross-AS external reachability is not OSPF’s job at all
So OSPF is not an Internet-wide selection protocol. It is what keeps “how to reach internal networks and exits” stable inside one autonomous system.
Why the Spec Path and the Real-World Path Often Differ So Much
On paper, OSPF looks very clean:
- Build neighbors
- Flood link state
- Synchronize databases
- Run SPF
- Install routes
But the most common real-world deviations come from:
- Neighbor parameter mismatch causing adjacency to stall
- One device not fully synchronized but still forwarding
- Frequent link flaps causing SPF churn and route churn
- Area design, summarization, or boundary devices hiding the real problem behind abstraction
That means the protocol model and the field reality need to be read separately:
- OSPF running does not mean the adjacency is healthy
- Healthy adjacency does not mean the LSDB is identical
- Identical LSDB does not mean the installed FIB exactly matches your business expectation
So the hard part of OSPF is usually not “can you configure area 0”. It is keeping adjacency, database, SPF, and real forwarding aligned over time.
What to Look At in Packet Capture and Troubleshooting
Do not start by reading every LSA type. A more useful order is below.
First check whether the neighbors and adjacencies are really standing
Confirm:
- Whether the neighbors discovered each other
- Which adjacency state they are stuck in
- Whether Hello, area, network type, and authentication parameters match
Many “I did not learn the route” cases are really adjacency failures.
Then check whether the database synchronized and SPF reran as expected
Look at:
- Whether the LSDB is roughly consistent across routers in the same area
- Whether a topology change triggered updates and SPF computation
- Whether the result really created the expected routes
If this layer is not stable, “why did it choose this path” is often just the symptom.
Finally check whether the issue is cost, area boundary, or FIB installation
These cases often carry the most signal:
- The route was learned, but the path selection is not what you expected
- One area is fine, but across areas the path looks wrong or black-holed
- The control plane has converged, but the data plane still takes the old path
These problems are often not caused by one bad packet. They happen because the OSPF model and the actual network boundaries did not line up.
What Engineering Should Actually Think About OSPF Today
- Do not think of OSPF as routers copying route tables. It first shares link state inside the domain, then each router calculates its own path
- Do not treat “learned a route” as the first checkpoint. Neighbor discovery, adjacency formation, and database synchronization often decide success earlier
- Do not think of
Costas absolute real-world quality. It is only one metric for comparing domain-internal paths - Do not underestimate area design. Areas are not decoration. They are a tradeoff among scale, convergence, and visibility
- Do not confuse OSPF with BGP. The former keeps domain-internal topology and exit reachability stable; the latter handles inter-domain prefix propagation and policy
- Do not only inspect the control-plane output. The final data-plane FIB behavior still needs to be validated separately
Further Reading
- Routing - why OSPF is only the domain-internal topology-synchronization layer in the larger routing picture
- BGP - why once you cross an autonomous-system boundary, the problem turns into prefix-policy propagation
- IP - the final thing OSPF serves is still IP prefix forwarding
- ICMP - how the network exposes failure when the OSPF-selected path does not actually work in the data plane
- Anycast - how IGP inside the domain sends traffic to the correct local exit once an Anycast site has been chosen
- WireGuard - how domain routing decides which prefixes enter which exit once tunnels and overlay networks are involved