OSPF

Reading time: 11 minute Word count: 2188

Network OSPF Routing IGP

In a campus network, if one uplink fails, traffic should quickly switch to the other path. If a distribution switch reboots, access-layer devices should not keep sending traffic to it for a long time. If a new subnet is added, the rest of the domain should soon know where to send that prefix. The hard part in cases like these is not whether there is a static route. It is how many devices inside one autonomous system can quickly form a consistent judgment after the topology changes.

OSPF handles exactly that kind of internal routing problem. It is not routers copying each other’s routing tables, and it is not BGP-style commercial policy negotiation. It lets routers in the same area first build an as-consistent-as-possible map of link state, then each router computes the shortest-path tree to every destination prefix from that map.

OSPF does not mean “a neighbor tells me how to go to a prefix and I just use it”. It means the routers inside the domain share link-state view and each one runs SPF to compute the next hop to the target subnet

Why It Appears

In a very small network, static routes can barely hold together. Once the number of devices grows, redundancy appears, and subnets change, static routing quickly shows obvious problems:

The maintenance burden grows quickly with the number of devices
When a link fails, many places have to be changed manually
Different devices often have inconsistent views of the topology change
Redundant links exist but are not automatically used correctly

What an interior dynamic routing protocol has to solve is not “distribute a few configuration lines automatically”. It has to answer more basic questions:

What does the current topology actually look like
Based on that topology, what is the best path to each prefix for each device
When a link breaks, how quickly can the domain converge again

If routers only pass around “I have this route” without a more stable topology view, convergence and loop control become much weaker. OSPF’s value is that it first gets the domain to agree on a common link-state base view, then lets each router compute its own path.

The Background It Came From

OSPF came from the real-world need for interior dynamic routing in medium and large IP networks. It was not about propagating policy across carriers. It was about a more focused target inside one autonomous system:

Detect topology change quickly
Select paths stably across multiple links and redundant topologies
Let routers from different vendors and different device families cooperate under one IGP

That makes OSPF very different from BGP:

It is first and foremost an IGP, an interior gateway protocol
It propagates link state, not external prefix policy
It cares about faster convergence inside the domain, not policy autonomy across management boundaries
It assumes everyone is inside the same administrative boundary and can share more internal structure

So if you think of OSPF as “another protocol that carries prefixes”, its design will seem unnecessarily heavy. If you think of it as “domain-internal BGP”, you will also misunderstand why it floods link state.

Grasp the Main Model First

To understand OSPF, keep four things in mind:

Routers first discover neighbors and form adjacencies
Devices flood their link-state information through LSAs inside the area
Devices in the same area build a similar link-state database
Each device runs the SPF algorithm locally and gets the best next hop for each subnet

The main path can be compressed like this:

Router advertises its link state
  -> Devices in the area synchronize into the LSDB
  -> Each device runs SPF on the LSDB
  -> The OSPF route is installed into the local RIB / FIB

Once that line is stable, later terms like neighbor state, DR/BDR, area, and LSA types stop feeling like a glossary dump.

The Most Common Main Path

A typical OSPF convergence path inside a domain looks like this:

sequenceDiagram participant R1 as Router 1 participant R2 as Router 2 participant R3 as Router 3 R1->>R2: Hello R2->>R1: Hello Note over R1,R2: Establish adjacency R2->>R1: LSA(my links and costs) R2->>R3: LSA(my links and costs) Note over R1,R2,R3: Similar LSDB formed after flooding in the area Note over R1,R2,R3: Each router runs SPF to compute next hops

If the link from R2 to an upstream fails, the main path becomes:

sequenceDiagram participant R1 as Router 1 participant R2 as Router 2 participant R3 as Router 3 Note over R2: Link-state change R2->>R1: Updated LSA R2->>R3: Updated LSA Note over R1,R2,R3: Trigger SPF recalculation

The key is not “who told whom a prefix”. It is:

Synchronize the topology change first
Then let each router recompute the path from the similar map

So OSPF is closer to “jointly maintaining one map” than “copying each other’s route outcomes”.

Why It Propagates Link State, Not Final Route Results

What makes OSPF different from many people’s intuitive idea of “dynamic routing” is that it does not primarily exchange final route results. It is more like saying:

Who am I connected to
What is the current state of those links
What is the cost of using those links

That has several important consequences:

Each router can compute the shortest path to each destination locally
Routers do not have to trust a neighbor’s “conclusion” blindly. They derive from the same raw topology clues
Once part of the topology changes, the affected routers can recompute fairly consistently

That is why OSPF’s stability depends so much on LSDB consistency. If routers are looking at different maps, their SPF results will quickly diverge.

Why Hello and Adjacency Matter Before Learning Routes

Many OSPF failures look like “why did I not learn this route?”, but the earlier question is usually:

Did the neighbors even discover each other
Did the adjacency actually form
Did database synchronization complete

OSPF is not a protocol where an Update arrives and routes can be installed immediately. Before that, several layers must already be standing:

Layer-2 and Layer-3 connectivity must be okay
Hello parameters must be compatible
Neighbor state must advance correctly
The necessary database description and synchronization must finish

So if you start by staring at one prefix, you may miss the more important earlier boundary: adjacency never formed, so SPF and route installation never had a chance.

Why Cost Is Only a Domain-Internal Preference, Not Absolute Truth

OSPF Cost is easy to interpret as “the higher-bandwidth link is always best because it has lower cost”. That is only partly true.

More accurately, cost is:

A locally chosen metric used by domain devices to compare internal paths

It may be based on bandwidth, but it is not the same as real latency, jitter, or business experience. In practice:

A high-bandwidth link is not necessarily the most stable one
A lower-cost path may simply be the path the operator wants preferred
If the cost design is messy, the domain will start showing strange traffic splits

So OSPF “shortest path” usually means shortest with respect to cost, not optimal with respect to every real-world metric.

Why Areas Exist for Scale and Convergence, Not Decoration

If you dump an entire large network into one OSPF area, the cost of flooding and SPF recomputation grows very quickly:

The database is larger
The change propagation scope is larger
Each recomputation touches more objects

Areas are the way OSPF keeps those costs under control. They are not there to make topology diagrams look neat. They trade off:

Richer topology visibility inside an area
Less unnecessary detail propagation across areas
Boundary routers taking on some summarization and abstraction responsibility

The benefits are clear:

Domain scale is easier to grow
Some local changes do not force the whole network to recompute

The tradeoffs are also real:

Area boundaries add more design constraints
Summarization and abstraction reduce visibility into internal details
If the design is poor, troubleshooting bounces between “intra-area problem” and “inter-area reachability”

Why DR/BDR Is a Real-World Compromise on Broadcast Networks

On Ethernet-style multiaccess broadcast networks, if every OSPF device formed full adjacency with every other device, control-plane overhead would grow very quickly. DR/BDR exists to keep adjacency complexity under control in that environment.

It is not about “who is more advanced”. It is about:

Preventing adjacency count from exploding on a broadcast segment
Making database sync and LSA exchange more orderly

So if a DR fails, the symptom is not necessarily “OSPF is completely broken”. It may simply be that:

Synchronization efficiency and stability on that broadcast segment have dropped

That is one reason many OSPF mechanisms look complicated: they are all trying to keep the control plane from exploding once scale increases.

Why It Must Be Kept Separate from BGP

OSPF and BGP often coexist in the same network, but they solve different layers of the problem.

OSPF is closer to answering:

Inside my autonomous system, how do routers share a common topology view and send traffic toward the correct exit or internal subnet

BGP is closer to answering:

Which prefixes should this autonomous system accept, choose, and advertise across administrative boundaries

If you blur those layers, a lot of misunderstandings appear:

The external prefix may already have been selected by BGP, but the IGP cost to the exit is wrong, so traffic leaves from an unexpected place
The internal OSPF topology may be fine, but cross-AS external reachability is not OSPF’s job at all

So OSPF is not an Internet-wide selection protocol. It is what keeps “how to reach internal networks and exits” stable inside one autonomous system.

Why the Spec Path and the Real-World Path Often Differ So Much

On paper, OSPF looks very clean:

Build neighbors
Flood link state
Synchronize databases
Run SPF
Install routes

But the most common real-world deviations come from:

Neighbor parameter mismatch causing adjacency to stall
One device not fully synchronized but still forwarding
Frequent link flaps causing SPF churn and route churn
Area design, summarization, or boundary devices hiding the real problem behind abstraction

That means the protocol model and the field reality need to be read separately:

OSPF running does not mean the adjacency is healthy
Healthy adjacency does not mean the LSDB is identical
Identical LSDB does not mean the installed FIB exactly matches your business expectation

So the hard part of OSPF is usually not “can you configure area 0”. It is keeping adjacency, database, SPF, and real forwarding aligned over time.

What to Look At in Packet Capture and Troubleshooting

Do not start by reading every LSA type. A more useful order is below.

First check whether the neighbors and adjacencies are really standing

Confirm:

Whether the neighbors discovered each other
Which adjacency state they are stuck in
Whether Hello, area, network type, and authentication parameters match

Many “I did not learn the route” cases are really adjacency failures.

Then check whether the database synchronized and SPF reran as expected

Look at:

Whether the LSDB is roughly consistent across routers in the same area
Whether a topology change triggered updates and SPF computation
Whether the result really created the expected routes

If this layer is not stable, “why did it choose this path” is often just the symptom.

Finally check whether the issue is cost, area boundary, or FIB installation

These cases often carry the most signal:

The route was learned, but the path selection is not what you expected
One area is fine, but across areas the path looks wrong or black-holed
The control plane has converged, but the data plane still takes the old path

These problems are often not caused by one bad packet. They happen because the OSPF model and the actual network boundaries did not line up.

What Engineering Should Actually Think About OSPF Today

Do not think of OSPF as routers copying route tables. It first shares link state inside the domain, then each router calculates its own path
Do not treat “learned a route” as the first checkpoint. Neighbor discovery, adjacency formation, and database synchronization often decide success earlier
Do not think of Cost as absolute real-world quality. It is only one metric for comparing domain-internal paths
Do not underestimate area design. Areas are not decoration. They are a tradeoff among scale, convergence, and visibility
Do not confuse OSPF with BGP. The former keeps domain-internal topology and exit reachability stable; the latter handles inter-domain prefix propagation and policy
Do not only inspect the control-plane output. The final data-plane FIB behavior still needs to be validated separately