One of the most valuable skills in any network engineer or aspiring engineer is the ability to isolate and correct a network failure quickly and efficiently. In fact, troubleshooting skills often times are the one thing that separates a good technician from a great one.
In this article, we are going to look at how to troubleshoot OSPF. OSPF is one of our more complicated routing protocols, and it can be pretty intimidating, especially at first. But once you have mastered how to configure this Link State Protocol, the first thing you realize is that it is built by the numbers, and the good news is that it also "breaks" by the numbers. By that, I mean troubleshooting OSPF boils down to just a handful of possible issues that normally cause one of two types of problems. What are the problems, you might ask? The first type of problem is the failure of two adjacent OSPF speakers to become adjacent; the second is once a neighbor relationship has been formed, we fail to see routes in the routing tables. This makes it sound simple, but there are many common configuration errors that you can come up against that can cause either or both of these issues. Here we are going to discuss the most common of these problems and what causes them.
Won't you Be My Neighbor?
The first step of any OSPF configuration has to be the establishment of a neighbor relationship. This is, hands down, the most critical thing we need to nail quickly. When two or more OSPF routers are directly connected, they should immediately begin to exchange Hellos and form OSPF adjacencies. We have to keep in mind, however, that the nature of these adjacencies can vary. For instance, point-to-point links have an adjacency formed for every neighbor relationship, but that is not always true for routers connected across a common LAN segment- in this environment adjacencies will only form with what is referred to as the Designated Router (DR) and the Backup Designated Router (BDR).
The next part of the equation is what happens when an adjacency forms and what state changes the devices go through as they become fully adjacent. The first thing we need to understand is that while this process is taking place. our network will not operate correctly.
This convergence process, if successful, results in a stable network. We should expect to see the adjacencies we have discussed. On a router this looks something like this:
R6#sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 172.16.202.202 255 FULL/DR 00:00:36 10.1.236.2 FastEthernet0/0 172.16.203.203 1 FULL/BDR 00:00:39 10.1.236.3 FastEthernet0/0 172.16.204.204 1 2WAY/DROTHER 00:00:32 10.1.236.4 FastEthernet0/0
In this example, R6 has formed a full adjacency with the DR and the BDR as we discussed. However, we have another neighbor that it knows about that it will not form an adjacency with. In this instance, the 2WAY state simply means that there is "two-way communication" between R6 and this neighbor. In this particular instance this is normal behavior, because we are using a broadcast medium (FastEthernet). If this were a point-to-point connection, it would definitely mean we have a problem. In this case, we should expect the state to change to FULL.
If we see any other state but FULL, we should be concerned. Examples of other states we could see and what they mean are as follows:
INITThe router is seeing Hellos from its connected neighbor, but there is still no two-way communication.
This usually means that your router is receiving Hellos, but the one at the other end is not seeing yours.
Make sure that the interfaces at both ends are configured to support OSPF
There could also be a mismatch in the OSPF Network Type.
EXSTART/EXCHANGEThings have started out okay but there is a problem with the relationship that has formed.
Issues known to cause this problem include:
Vendor interoperability issues
If both devices are Cisco, then typically MTU mismatches or something related to large packet sizes will be the cause.
LOADINGIf you see this, unfortunately you could have some kind of LSA corruption someplace.
You may have to start taking pieces out of the network to fix this one.
The next question has to be "what happens when I do not see any adjacencies for any given neighbor's Router ID?"
This means that the router is not aware that it is supposed to be part of the OSPF process on the interface pointing to that neighbor. Usually this is means that the device is not receiving hellos. Things to check in this instance are pretty simple:
- Is the link even up?
- Can you ping the other end of the link?
- Make sure that the interfaces at both ends are configured to support OSPF
- There could be a mismatch in the OSPF Network Type
- Verify that Hello and Dead timers match on each end of the link
- Neighboring interfaces must be in the same OSPF Area
- If the device is in more than one area, then it MUST have at least one interface in Area 0.
If these variables are not correct then neighbors will never even form, which means that it will also be impossible to form adjacencies.
Where are my OSPF Routes?
Now we are at the point where our neighbor adjacencies have formed properly. The OSPF Database is getting populated with the appropriate Link State Advertisements (LSA), but some or all of the routes are not appearing in the routing table. Do not panic! This usually means that we have a configuration error, and in this case it will most commonly be a topology mismatch somewhere.
How do we fix it? First, we need to identify if the problem applies to all types of routes, or just to specific types of routes like summary routes, or external routes.
- Summary Route ProblemsThese issues are often cause by a non-contiguous Area 0. Meaning that Area 0 exists in two or more places in the network, and they are not connected logically or physically.
External Route ProblemsIf you see that there is an entry in the OSPF Database for a given external route, but it is not appearing in the routing table it may have something to do with the Forwarding Address associated with the route.
The Forwarding Address of an external prefix MUST be known as an internal route, if not, it will never appear in the routing table!
OMG, My Routers have gone to WAR!
%OSPF-4-FLOOD_WAR: Process 1 flushes LSA ID 10.1.2.2 type-5 adv-rtr 10.0.0.2 in area 50
Apparently, the embassies have been shut down and the diplomats have been recalled, because this router has just cried havoc and let slip the dogs of war! This could get exciting!
Unfortunately this message really does not boil down to much at all. What the IOS is telling us is that we have two or more OSPF Routers in the topology that are using the same Router-ID. I know that is pretty disappointing, but the good news is it is easy to isolate and correct once you understand what the message means. Just assign each router a unique Router-ID manually, and run clear ip ospf process. In short do whatever you are allowed to do to get rid of any duplicate addresses and your router will go back to a peaceful state.
When we are first starting OSPF it can seem pretty intimidating, but keep in mind that it is a very organized and methodical protocol. Additionally, it provides us with so much information in the form of the OSPF States, console messages, and the Link State Database the it seems to almost troubleshoot itself. With OSPF, all we have to do is interpret the output and use it to make sure we can form the appropriate adjacencies between neighbors; once that is done, we want to make sure all of our routes make it to the routing table.