Home > Articles > Cisco > CCNP Routing and Switching

  • Print
  • + Share This
This chapter is from the book

Foundation Topics

Resolving InterVLAN Routing Issues

As mentioned in Chapter 4, "Basic Cisco Catalyst Switch Troubleshooting," for traffic to pass from one VLAN to another VLAN, that traffic has to be routed. Several years ago, one popular approach to performing interVLAN routing with a Layer 2 switch was to create a router on a stick topology, where a Layer 2 switch is interconnected with a router via a trunk connection, as seen in Figure 5-1.

Figure 5-1

Figure 5-1 Router on a Stick

In Figure 5-1, router R1's Fast Ethernet 1/1/1 interface has two subinterfaces, one for each VLAN. Router R1 can route between VLANs 100 and 200, while simultaneously receiving and transmitting traffic over the trunk connection to the switch.

More recently, many switches have risen above their humble Layer 2 beginnings and started to route traffic. Some literature refers to these switches that can route as Layer 3 switches. Other sources might call such switches multilayer switches, because of the capability of a switch to make forwarding decisions based on information from multiple layers of the OSI model.

This section refers to these switches as Layer 3 switches because the focus is on the capability of the switches to route traffic based on Layer 3 information (that is, IP address information). Specifically, this section discusses troubleshooting Layer 3 switch issues and contrasts troubleshooting a Layer 3 switch versus a router.

Contrasting Layer 3 Switches with Routers

Because a Layer 3 switch performs many of the same functions as a router, it is important for a troubleshooter to distinguish between commonalities and differences in these two platforms.

Table 5-2 lists the characteristics that Layer 3 switches and routers have in common, as well as those characteristics that differ.

key-topic.jpg

Table 5-2. Layer 3 Switch and Router Characteristics: Compare and Contrast

Layer 3 Switch/Router Shared Characteristics

Layer 3 Switch/Router Differentiating Characteristics

Both can build and maintain a routing table using both statically configured routes and dynamic routing protocols.

Routers usually support a wider selection of interface types (for example, non-Ethernet interfaces).

Both can make packet forwarding decisions based on Layer 3 information (for example, IP addresses).

Switches leverage application-specific integrated circuits (ASIC) to approach wire speed throughput. Therefore, most Layer 3 switches can forward traffic faster than their router counterparts.

A Cisco IOS version running on routers typically supports more features than a Cisco IOS version running on a Layer 3 switch, because many switches lack the specialized hardware required to run many of the features available on a router.

Control Plane and Data Plane Troubleshooting

Many router and Layer 3 switch operations can be categorized as control plane or data plane operations. For example, routing protocols operate in a router's control plane, whereas the actual forwarding of data is handled by a router's data plane.

Fortunately, the processes involved in troubleshooting control plane operations are identical on both Layer 3 switch and router platforms. For example, the same command-line interface (CLI) commands could be used to troubleshoot an Open Shortest Path First (OSPF) issue on both types of platforms.

Data plane troubleshooting, however, can vary between Layer 3 switches and routers. For example, if you were troubleshooting data throughput issues, the commands you issued might vary between types of platforms, because Layer 3 switches and routers have fundamental differences in the way traffic is forwarded through the device.

First, consider how a router uses Cisco Express Forwarding (CEF) to efficiently forward traffic through a router. CEF creates a couple of tables that reside at the data plane. These are the forwarding information base (FIB) and the adjacency table. These tables are constructed from information collected from the router's control plane (for example, the control plane's IP routing table and Address Resolution Protocol [ARP] cache). When troubleshooting a router, you might check control plane operations with commands such as show ip route. However, if the observed traffic behavior seems to contradict information shown in the output of control plane verification commands, you might want to examine information contained in the router's CEF Forwarding Information Base (FIB) and adjacency tables. You can use the commands presented in Table 5-3 to view information contained in a router's FIB and adjacency table.

key-topic.jpg

Table 5-3. Router Data Plan Verification Commands

Command

Description

show ip cef

Displays the router's Layer 3 forwarding information, in addition to multicast, broadcast, and local IP addresses.

show adjacency

Verifies that a valid adjacency exists for a connected host.

Example 5-1 and Example 5-2 provide sample output from the show ip cef and show adjacency commands, respectively.

Example 5-1. show ip cef Command Output

R4# show ip cef
Prefix              Next Hop             Interface
0.0.0.0/0           10.3.3.1             FastEthernet0/0
0.0.0.0/32          receive
10.1.1.0/24         10.3.3.1             FastEthernet0/0
10.1.1.2/32         10.3.3.1             FastEthernet0/0
10.3.3.0/24         attached             FastEthernet0/0
10.3.3.0/32         receive
10.3.3.1/32         10.3.3.1             FastEthernet0/0
10.3.3.2/32         receive
10.3.3.255/32       receive
10.4.4.0/24         10.3.3.1             FastEthernet0/0
10.5.5.0/24         10.3.3.1             FastEthernet0/0
10.7.7.0/24         10.3.3.1             FastEthernet0/0
10.7.7.2/32         10.3.3.1             FastEthernet0/0
10.8.8.0/24         attached             FastEthernet0/1
10.8.8.0/32         receive
10.8.8.1/32         receive
10.8.8.4/32         10.8.8.4             FastEthernet0/1
10.8.8.5/32         10.8.8.5             FastEthernet0/1
10.8.8.6/32         10.8.8.6             FastEthernet0/1
10.8.8.7/32         10.8.8.7             FastEthernet0/1
10.8.8.255/32       receive
192.168.0.0/24      10.3.3.1             FastEthernet0/0
224.0.0.0/4         drop
224.0.0.0/24        receive
255.255.255.255/32  receive

Example 5-2. show adjacency Command Output

R4# show adjacency
Protocol  Interface                Address
IP        FastEthernet0/0          10.3.3.1(21)
IP        FastEthernet0/1          10.8.8.6(5)
IP        FastEthernet0/1          10.8.8.7(5)
IP        FastEthernet0/1          10.8.8.4(5)
IP        FastEthernet0/1          10.8.8.5(5)

Although many Layer 3 switches also leverage CEF to efficiently route packets, some Cisco Catalyst switches take the information contained in CEF's FIB and adjacency table and compile that information into Ternary Content Addressable Memory (TCAM). This special memory type uses a mathematical algorithm to very quickly look up forwarding information.

The specific way a switch's TCAM operates depends on the switch platform. However, from a troubleshooting perspective, you can examine information stored in a switch's TCAM using the show platform series of commands on Cisco Catalyst 3560, 3750, and 4500 switches. Similarly, TCAM information for a Cisco Catalyst 6500 switch can be viewed with the show mls cef series of commands.

Comparing Routed Switch Ports and Switched Virtual Interfaces

On a router, an interface often has an IP address, and that IP address might be acting as a default gateway to hosts residing off of that interface. However, if you have a Layer 3 switch with multiple ports belonging to a VLAN, where should the IP address be configured?

You can configure the IP address for a collection of ports belonging to a VLAN under a virtual VLAN interface. This virtual VLAN interface is called a Switched Virtual Interface (SVI). Figure 5-2 shows a topology using SVIs, and Example 5-3 shows the corresponding configuration. Notice that two SVIs are created: one for each VLAN (that is, VLAN 100 and VLAN 200). An IP address is assigned to an SVI by going into interface configuration mode for a VLAN. In this example, because both SVIs are local to the switch, the switch's routing table knows how to forward traffic between members of the two VLANs.

Figure 5-2

Figure 5-2 SVI Used for Routing

key-topic.jpg

Example 5-3. SVI Configuration

Cat3550# show run
...OUTPUT OMITTED...
!
interface GigabitEthernet0/7
 switchport access vlan 100
 switchport mode access
!
interface GigabitEthernet0/8
 switchport access vlan 100
 switchport mode access
!
interface GigabitEthernet0/9
 switchport access vlan 200
 switchport mode access
!
interface GigabitEthernet0/10
 switchport access vlan 200
 switchport mode access
!
...OUTPUT OMITTED...
!
interface Vlan100
 ip address 192.168.1.1 255.255.255.0
!
interface Vlan200
 ip address 192.168.2.1 255.255.255.0

Although SVIs can route between VLANs configured on a switch, a Layer 3 switch can be configured to act more as a router (for example, in an environment where you are replacing a router with a Layer 3 switch) by using routed ports on the switch. Because the ports on many Cisco Catalyst switches default to operating as switch ports, you can issue the no switchport command in interface configuration mode to convert a switch port to a routed port. Figure 5-3 and Example 5-4 illustrate a Layer 3 switch with its Gigabit Ethernet 0/9 and 0/10 ports configured as routed ports.

Figure 5-3

Figure 5-3 Routed Ports on a Layer 3 Switch

Example 5-4. Configuration for Routed Ports on a Layer 3 Switch

Cat3550# show run
...OUTPUT OMITTED...
!
interface GigabitEthernet0/9
 no switchport
 ip address 192.168.1.2 255.255.255.0
!
interface GigabitEthernet0/10
 no switchport
 ip address 192.168.2.2 255.255.255.0
!
...OUTPUT OMITTED...

When troubleshooting Layer 3 switching issues, keep the following distinctions in mind between SVIs and routed ports:

key-topic.jpg
  • A routed port is considered to be in the down state if it is not operational at both Layer 1 and Layer 2.
  • An SVI is considered to be in a down state only when none of the ports in the corresponding VLAN are active.
  • A routed port does not run switch port protocols such as Spanning Tree Protocol (STP) or Dynamic Trunking Protocol (DTP).

Router Redundancy Troubleshooting

Many devices, such as PCs, are configured with a default gateway. The default gateway parameter identifies the IP address of a next-hop router. As a result, if that router were to become unavailable, devices that relied on the default gateway's IP address would be unable to send traffic off their local subnet.

Fortunately, Cisco offers technologies that provide next-hop gateway redundancy. These technologies include HSRP, VRRP, and GLBP.

This section reviews the operation of these three first-hop redundancy protocols and provides a collection of Cisco IOS commands that can be used to troubleshoot an issue with one of these three protocols.

Note that although this section discusses router redundancy, keep in mind that the term router is referencing a device making forwarding decisions based on Layer 3 information. Therefore, in your environment, a Layer 3 switch might be used in place of a router to support HSRP, VRRP, or GLBP.

HSRP

Hot Standby Router Protocol (HSRP) uses virtual IP and MAC addresses. One router, known as the active router, services requests destined for the virtual IP and MAC addresses. Another router, known as the standby router, can service such requests in the event the active router becomes unavailable. Figure 5-4 illustrates a basic HSRP topology.

Examples 5-5 and 5-6 show the HSRP configuration for routers R1 and R2.

Figure 5-4

Figure 5-4 Basic HSRP Operation

key-topic.jpg

Example 5-5. HSRP Configuration on Router R1

R1# show run
...OUTPUT OMITTED...
interface FastEthernet0/0
 ip address 172.16.1.1 255.255.255.0
 standby 10 ip 172.16.1.3
 standby 10 priority 150
 standby 10 preempt
...OUTPUT OMITTED...
key-topic.jpg

Example 5-6. HSRP Configuration on Router R2

R1# show run
...OUTPUT OMITTED...
interface Ethernet0/0
 ip address 172.16.1.2 255.255.255.0
 standby 10 ip 172.16.1.3
...OUTPUT OMITTED...

Notice that both routers R1 and R2 have been configured with the same virtual IP address of 172.16.1.3 for an HSRP group of 10. Router R1 is configured to be the active router with the standby 10 priority 150 command. Router R2 has a default HSRP priority of 100 for group 10, and with HSRP, higher priority values are more preferable. Also, notice that router R1 is configured with the standby 10 preempt command, which means that if router R1 loses its active status, perhaps because it is powered off, it will regain its active status when it again becomes available.

Converging After a Router Failure

By default, HSRP sends hello messages every three seconds. Also, if the standby router does not hear a hello message within ten seconds by default, the standby router considers the active router to be down. The standby router then assumes the active role.

Although this ten-second convergence time applies for a router becoming unavailable for a reason such as a power outage or a link failure, convergence happens more rapidly if an interface is administratively shut down. Specifically, an active router sends a resign message if its active HSRP interface is shut down.

Also, consider the addition of another router to the network segment whose HSRP priority for group 10 is higher than 150. If it were configured for preemption, the newly added router would send a coup message, to inform the active router that the newly added router was going to take on the active role. If, however, the newly added router were not configured for preemption, the currently active router would remain the active router.

HSRP Verification and Troubleshooting

When verifying an HSRP configuration or troubleshooting an HSRP issue, you should begin by determining the following information about the HSRP group under inspection:

  • Which router is the active router
  • Which routers, if any, are configured with the preempt option
  • What is the virtual IP address
  • What is the virtual MAC address

The show standby brief command can be used to show a router's HSRP interface, HSRP group number, and preemption configuration. Additionally, this command identifies the router that is currently the active router, the router that is currently the standby router, and the virtual IP address for the HSRP group. Examples 5-7 and 5-8 show the output from the show standby brief command issued on routers R1 and R2, where router R1 is currently the active router.

Example 5-7. show standby brief Command Output on Router R1

R1# show standby brief
                        P indicates configured to preempt.
                        |
Interface   Grp Prio P State    Active         Standby        Virtual IP
Fa0/0       10  150  P Active   local          172.16.1.2     172.16.1.3

Example 5-8. show standby brief Command Output on Router R2

R2# show standby brief
                     P indicates configured to preempt.
                     |
Interface   Grp Prio P State    Active         Standby        Virtual IP
Et0/0       10  100    Standby  172.16.1.1     local          172.16.1.3

In addition to an interface's HSRP group number, the interface's state, and the HSRP group's virtual IP address, the show standby interface_id command also displays the HSRP group's virtual MAC address. Issuing this command on router R1, as shown in Example 5-9, shows that the virtual MAC address for HSRP group 10 is 0000.0c07.ac0a.

Example 5-9. show standby fa 0/0 Command Output on Router R1

R1# show standby fa 0/0
FastEthernet0/0 - Group 10
  State is Active
    1 state change, last state change 01:20:00
  Virtual IP address is 172.16.1.3
  Active virtual MAC address is 0000.0c07.ac0a
    Local virtual MAC address is 0000.0c07.ac0a (v1 default)
  Hello time 3 sec, hold time 10 sec
    Next hello sent in 1.044 secs
  Preemption enabled
  Active router is local
  Standby router is 172.16.1.2, priority 100 (expires in 8.321 sec)
  Priority 150 (configured 150)
  IP redundancy name is "hsrp-Fa0/0-10" (default)

The default virtual MAC address for an HSRP group, as seen in Figure 5-5, is based on the HSRP group number. Specifically, the virtual MAC address for an HSRP group begins with a vendor code of 0000.0c, followed with a well-known HSRP code of 07.ac. The last two hexadecimal digits are the hexadecimal representation of the HSRP group number. For example, an HSRP group of 10 yields a default virtual MAC address of 0000.0c07.ac0a, because 10 in decimal equates to 0a in hexadecimal.

Figure 5-5

Figure 5-5 HSRP Virtual MAC Address

Once you know the current HSRP configuration, you might then check to see if a host on the HSRP virtual IP address' subnet can ping the virtual IP address. Based on the topology previously shown in Figure 5-4, Example 5-10 shows a successful ping from Workstation A.

Example 5-10. Ping Test from Workstation A to the HSRP Virtual IP Address

C:\>ping 172.16.1.3

Pinging 172.16.1.3 with 32 bytes of data:

Reply from 172.16.1.3: bytes=32 time=2ms TTL=255
Reply from 172.16.1.3: bytes=32 time=1ms TTL=255
Reply from 172.16.1.3: bytes=32 time=1ms TTL=255
Reply from 172.16.1.3: bytes=32 time=1ms TTL=255

Ping statistics for 172.16.1.3:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 2ms, Average = 1ms

A client could also be used to verify the appropriate virtual MAC address learned by the client corresponding to the virtual MAC address reported by one of the HSRP routers. Example 5-11 shows Workstation A's ARP cache entry for the HSRP virtual IP address of 172.16.1.3. Notice in the output that the MAC address learned via ARP does match the HSRP virtual MAC address reported by one of the HSRP routers.

Example 5-11. Workstation A's ARP Cache

C:\>arp -a


Interface: 172.16.1.4 --- 0x4
  Internet Address     Physical Address     Type
  172.16.1.3            00-00-0c-07-ac-0a     dynamic

You can use the debug standby terse command to view important HSRP changes, such as a state change. Example 5-12 shows this debug output on router R2 because router R1's Fast Ethernet 0/0 interface is shut down. Notice that router R2's state changes from Standby to Active.

Example 5-12. debug standby terse Command Output on Router R2: Changing HSRP to Active

R2#
*Mar  1 01:25:45.930: HSRP: Et0/0 Grp 10 Standby: c/Active timer expired  (172.16.1.1)
*Mar  1 01:25:45.930: HSRP: Et0/0 Grp 10 Active router is local, was 172.16.1.1
*Mar  1 01:25:45.930: HSRP: Et0/0 Grp 10 Standby router is unknown, was local
*Mar  1 01:25:45.930: HSRP: Et0/0 Grp 10 Standby -> Active
*Mar  1 01:25:45.930: %HSRP-6-STATECHANGE: Ethernet0/0 Grp 10 state Standby ->  Active
*Mar  1 01:25:45.930: HSRP: Et0/0 Grp 10 Redundancy "hsrp-Et0/0-10" state Standby  -> Active
*Mar  1 01:25:48.935: HSRP: Et0/0 Grp 10 Redundancy group hsrp-Et0/0-10 state  Active -> Active
*Mar  1 01:25:51.936: HSRP: Et0/0 Grp 10 Redundancy group hsrp-Et0/0-10 state  Active -> Active

When router R1's Fast Ethernet 0/0 interface is administratively brought up, router R1 reassumes its previous role as the active HSRP router for HSRP group 10, because router R1 is configured with the preempt option. The output shown in Example 5-13 demonstrates how router R2 receives a coup message, letting router R2 know that router R1 is taking back its active role.

Example 5-13. debug standby terse Command Output on Router R2: Changing HSRP to Standby

R2#
*Mar  1 01:27:57.979: HSRP: Et0/0 Grp 10 Coup   in  172.16.1.1 Active  pri 150
  vIP 172.16.1.3
*Mar  1 01:27:57.979: HSRP: Et0/0 Grp 10 Active: j/Coup rcvd from higher pri
  router (150/172.16.1.1)
*Mar  1 01:27:57.979: HSRP: Et0/0 Grp 10 Active router is 172.16.1.1, was local
*Mar  1 01:27:57.979: HSRP: Et0/0 Grp 10 Active -> Speak
*Mar  1 01:27:57.979: %HSRP-6-STATECHANGE: Ethernet0/0 Grp 10 state Active -> Speak
*Mar  1 01:27:57.979: HSRP: Et0/0 Grp 10 Redundancy "hsrp-Et0/0-10" state Active  -> Speak
*Mar  1 01:28:07.979: HSRP: Et0/0 Grp 10 Speak: d/Standby timer expired (unknown)
*Mar  1 01:28:07.979: HSRP: Et0/0 Grp 10 Standby router is local
*Mar  1 01:28:07.979: HSRP: Et0/0 Grp 10 Speak -> Standby
*Mar  1 01:28:07.979: HSRP: Et0/0 Grp 10 Redundancy "hsrp-Et0/0-10" state Speak  -> Standby

VRRP

Virtual Router Redundancy Protocol (VRRP), similar to HSRP, allows a collection of routers to service traffic destined for a single IP address. Unlike HSRP, the IP address serviced by a VRRP group does not have to be a virtual IP address. The IP address can be the address of a physical interface on the virtual router master, which is the router responsible for forwarding traffic destined for the VRRP group's IP address. A VRRP group can have multiple routers acting as virtual router backups, as shown in Figure 5-6, any of which could take over in the event of the virtual router master becoming unavailable.

key-topic.jpg
Figure 5-6

Figure 5-6 Basic VRRP Operation

GLBP

Global Load Balancing Protocol (GLBP) can load balance traffic destined for a next-hop gateway across a collection of routers, known as a GLBP group. Specifically, when a client sends an Address Resolution Protocol (ARP) request, in an attempt to determine the MAC address corresponding to a known IP address, GLBP can respond with the MAC address of one member of the GLBP group. The next such request would receive a response containing the MAC address of a different member of the GLBP group, as depicted in Figure 5-7. Specifically, GLBP has one active virtual gateway (AVG), which is responsible for replying to ARP requests from hosts. However, multiple routers acting as active virtual forwarders (AVFs) can forward traffic.

key-topic.jpg
Figure 5-7

Figure 5-7 Basic GLBP Operation

Troubleshooting VRRP and GLBP

Because VRRP and GLBP perform a similar function to HSRP, you can use a similar troubleshooting philosophy. Much like HSRP's show standby brief command, similar information can be gleaned for VRRP operation with the show vrrp brief command and for GLBP operation with the show glbp brief command.

Although HSRP, VRRP, and GLBP have commonalities, it is important for you as a troubleshooter to understand the differences. Table 5-4 compares several characteristics of these first-hop router redundancy protocols.

key-topic.jpg

Table 5-4. Comparing HSRP, VRRP, and GLBP

Characteristic

HSRP

VRRP

GLBP

Cisco proprietary

Yes

No

No

Interface IP address can act as virtual IP address

No

Yes

No

More than one router in a group can simultaneously forward traffic for that group

No

No

Yes

Hello timer default value

3 seconds

1 second

3 seconds

Cisco Catalyst Switch Performance Troubleshooting

Switch performance issues can be tricky to troubleshoot, because the problem reported is often subjective. For example, if a user reports that the network is running "slow," the user's perception might mean that the network is slow compared to what he expects. However, network performance might very well be operating at a level that is hampering productivity and at a level that is indeed below its normal level of operation. At that point, as part of the troubleshooting process, you need to determine what network component is responsible for the poor performance. Rather than a switch or a router, the user's client, server, or application could be the cause of the performance issue.

If you do determine that the network performance is not meeting technical expectations (as opposed to user expectations), you should isolate the source of the problem and diagnose the problem on that device. This section assumes that you have isolated the device causing the performance issue, and that device is a Cisco Catalyst switch.

Cisco Catalyst Switch Troubleshooting Targets

Cisco offers a variety of Catalyst switch platforms, with different port densities, different levels of performance, and different hardware. Therefore, troubleshooting one of these switches can be platform dependent. Many similarities do exist, however. For example, all Cisco Catalyst switches include the following hardware components:

key-topic.jpg
  • Ports: A switch's ports physically connect the switch to other network devices. These ports (also known as interfaces) allow a switch to receive and transmit traffic.
  • Forwarding logic: A switch contains hardware that makes forwarding decisions. This hardware rewrites a frame's headers.
  • Backplane: A switch's backplane physically interconnects a switch's ports. Therefore, depending on the specific switch architecture, frames flowing through a switch enter via a port (that is, the ingress port), flow across the switch's backplane, and are forwarded out of another port (that is, an egress port).
  • Control plane: A switch's CPU and memory reside in a control plane. This control plane is responsible for running the switch's operating system.

Figure 5-8 depicts these switch hardware components. Notice that the control plane does not directly participate in frame forwarding. However, the forwarding logic contained in the forwarding hardware comes from the control plane. Therefore, there is an indirect relationship between frame forwarding and the control plane. As a result, a continuous load on the control plane could, over time, impact the rate at which the switch forwards frames. Also, if the forwarding hardware is operating at maximum capacity, the control plane begins to provide the forwarding logic. So, although the control plane does not architecturally appear to impact switch performance, it should be considered when troubleshooting.

Figure 5-8

Figure 5-8 Cisco Catalyst Switch Hardware Components

The following are two common troubleshooting targets to consider when diagnosing a suspected switch issue:

  • Port errors
  • Mismatched duplex settings

The sections that follow evaluate these target areas in greater detail.

Port Errors

When troubleshooting a suspected Cisco Catalyst switch issue, a good first step is to check port statistics. For example, examining port statistics can let a troubleshooter know if an excessive number of frames are being dropped. If a TCP application is running slow, the reason might be that TCP flows are going into TCP slow start, which causes the window size, and therefore the bandwidth efficiency, of TCP flows to be reduced. A common reason that a TCP flow enters slow start is packet drops. Similarly, packet drops for a UDP flow used for voice or video could result in noticeable quality degradation, because dropped UDP segments are not retransmitted.

Although dropped frames are most often attributed to network congestion, another possibility is that the cabling could be bad. To check port statistics, a troubleshooter could leverage a show interfaces command. Consider Example 5-14, which shows the output of the show interfaces gig 0/9 counters command on a Cisco Catalyst 3550 switch. Notice that this output shows the number of inbound and outbound frames seen on the specified port.

Example 5-14. show interfaces gig 0/9 counters Command Output

SW1# show interfaces gig 0/9 counters

Port            InOctets   InUcastPkts   InMcastPkts   InBcastPkts
Gi0/9           31265148         20003          3179             1

Port           OutOctets  OutUcastPkts  OutMcastPkts  OutBcastPkts
Gi0/9           18744149          9126            96             6

To view errors that occurred on a port, you could add the keyword of errors after the show interfaces interface_id counters command. Example 5-15 illustrates sample output from the show interfaces gig 0/9 counters errors command.

Example 5-15. show interfaces gig 0/9 counters errors Command Output

SW1# show interfaces gig 0/9 counters errors
Port        Align-Err    FCS-Err   Xmit-Err    Rcv-Err UnderSize
Gi0/9               0          0          0          0         0

Port      Single-Col Multi-Col  Late-Col Excess-Col Carri-Sen    Runts   Giants
Gi0/9           5603         0      5373          0         0        0        0

Table 5-5 provides a reference for the specific errors that might show up in the output of the show interfaces interface_id counters errors command.

key-topic.jpg

Table 5-5. Errors in the show interfaces interface_id counters errors Command

Error Counter

Description

Align-Err

An alignment error occurs when frames do not end with an even number of octets, while simultaneously having a bad Cyclic Redundancy Check (CRC). An alignment error normally suggests a Layer 1 issue, such as cabling or port (either switch port or NIC port) issues.

FCS-Err

A Frame Check Sequence (FCS) error occurs when a frame has an invalid checksum, although the frame has no framing errors. Like the Align-Err error, an FCS-Err often points to a Layer 1 issue.

Xmit-Err

A transmit error (that is, Xmit-Err) occurs when a port's transmit buffer overflows. A speed mismatch between inbound and outbound links often results in a transmit error.

Rcv-Err

A receive error (that is, Rcv-Err) occurs when a port's receive buffer overflows. Congestion on a switch's backplane could cause the receive buffer on a port to fill to capacity, as frames await access to the switch's backplane. However, most likely, a Rcv-Err is indicating a duplex mismatch.

UnderSize

An undersize frame is a frame with a valid checksum but a size less than 64 bytes. This issue suggests that a connected host is sourcing invalid frame sizes.

Single-Col

A Single-Col error occurs when a single collisions occurs before a port successfully transmits a frame. High bandwidth utilization on an attached link or a duplex mismatch are common reasons for a Single-Col error.

Multi-Col

A Multi-Col error occurs when more than one collision occurs before a port successfully transmits a frame. Similar to the Single-Col error, high bandwidth utilization on an attached link or a duplex mismatch are common reasons for a Multi-Col error.

Late-Col

A late collision is a collision that is not detected until well after the frame has begun to be forwarded. While a Late-Col error could indicate that the connected cable is too long, this is an extremely common error seen in mismatched duplex conditions.

Excess-Col

The Excess-Col error occurs when a frame experienced sixteen successive collisions, after which the frame was dropped. This error could result from high bandwidth utilization, a duplex mismatch, or too many devices on a segment.

Carri-Sen

The Carri-Sen counter is incremented when a port wants to send data on a half-duplex link. This is normal and expected on a half-duplex port, because the port is checking the wire, to make sure no traffic is present, prior to sending a frame. This operation is the carrier sense procedure described by the Carrier Sense Multiple Access with Collision Detect (CSMA/CD) operation used on half-duplex connections. Full-duplex connections, however, do not use CSMA/CD.

Runts

A runt is a frame that is less than 64 bytes in size and has a bad CRC. A runt could result from a duplex mismatch or a Layer 1 issue.

Giants

A giant is a frame size greater than 1518 bytes (assuming the frame is not a jumbo frame) that has a bad FCS. Typically, a giant is caused by a problem with the NIC in an attached host.

Mismatched Duplex Settings

As seen in Table 5-5, duplex mismatches can cause a wide variety of port errors. Keep in mind that almost all network devices, other than shared media hubs, can run in full-duplex mode. Therefore, if you have no hubs in your network, all devices should be running in full-duplex mode.

A new recommendation from Cisco is that switch ports be configured to autonegotiate both speed and duplex. Two justifications for this recommendation are as follows:

  • If a connected device only supported half-duplex, it would be better for a switch port to negotiate down to half-duplex and run properly than being forced to run full-duplex which would result in multiple errors.
  • The automatic medium-dependent interface crossover (auto-MDIX) feature can automatically detect if a port needs a crossover or a straight-through cable to interconnect with an attached device and adjust the port to work regardless of which cable type is connected. You can enable this feature in interface configuration mode with the mdix auto command on some models of Cisco Catalyst switches. However, the auto-MDIX feature requires that the port autonegotiate both speed and duplex.

In a mismatched duplex configuration, a switch port at one end of a connection is configured for full-duplex, whereas a switch port at the other end of a connection is configured for half-duplex. Among the different errors previously listed in Table 5-5, two of the biggest indicators of a duplex mismatch are a high Rcv-Err counter or a high Late-Col counter. Specifically, a high Rcv-Err counter is common to find on the full-duplex end of a connection with a mismatched duplex, while a high Late-Col counter is common on the half-duplex end of the connection.

To illustrate, examine Examples 5-16 and 5-17, which display output based on the topology depicted in Figure 5-9. Example 5-16 shows the half-duplex end of a connection, and Example 5-17 shows the full-duplex end of a connection.

Figure 5-9

Figure 5-9 Topology with Duplex Mismatch

Example 5-16. Output from the show interfaces gig 0/9 counters errors and the show interfaces gig 0/9 | include duplex Commands on a Half-Duplex Port

SW1# show interfaces gig 0/9 counters errors

Port        Align-Err    FCS-Err   Xmit-Err    Rcv-Err UnderSize
Gi0/9               0           0         0          0        0

Port      Single-Col Multi-Col  Late-Col Excess-Col Carri-Sen   Runts   Giants
Gi0/9           5603         0      5373          0         0       0        0
SW1# show interfaces gig 0/9 | include duplex
  Half-duplex, 100Mb/s, link type is auto, media type is 10/100/1000BaseTX
SW1# show interfaces gig 0/9 counters errors

Example 5-17. Output from the show interfaces fa 5/47 counters errors and the show interfaces fa 5/47 | include duplex Commands on a Full-Duplex Port

SW2# show interfaces fa 5/47 counters errors


Port        Align-Err    FCS-Err   Xmit-Err    Rcv-Err UnderSize OutDiscards
Fa5/47              0       5248          0       5603        27           0

Port      Single-Col Multi-Col  Late-Col Excess-Col Carri-Sen     Runts    Giants
Fa5/47             0         0         0          0         0       227         0

Port       SQETest-Err Deferred-Tx IntMacTx-Err IntMacRx-Err Symbol-Err
Fa5/47               0           0            0            0          0

SW2# show interfaces fa 5/47 include duplex
  Full-duplex, 100Mb/s
SW1# show interfaces gig 0/9 counters errors

In your troubleshooting, even if you only have access to one of the switches, if you suspect a duplex mismatch, you could change the duplex settings on the switch over which you do have control. Then, you could clear the interface counters to see if the errors continue to increment. You could also perform the same activity (for example, performing a file transfer) the user was performing when he noticed the performance issue. By comparing the current performance to the performance experienced by the user, you might be able to conclude that the problem has been resolved by correcting a mismatched duplex configuration.

TCAM Troubleshooting

As previously mentioned, the two primary components of forwarding hardware are forwarding logic and backplane. A switch's backplane, however, is rarely the cause of a switch performance issue, because most Cisco Catalyst switches have high-capacity backplanes. However, it is conceivable that in a modular switch chassis, the backplane will not have the throughput to support a fully populated modular chassis, where each card in the chassis supports the highest combination of port densities and port speeds.

The architecture of some switches allows groups of switch ports to be handled by separated hardware. Therefore, you might experience a performance gain by simply moving a cable from one switch port to another. However, to strategically take advantage of this design characteristic, you must be very familiar with the architecture of the switch with which you are working.

A multilayer switch's forwarding logic can impact switch performance. Recall that a switch's forwarding logic is compiled into a special type of memory called ternary content addressable memory (TCAM), as illustrated in Figure 5-10. TCAM works with a switch's CEF feature to provide extremely fast forwarding decisions. However, if a switch's TCAM is unable, for whatever reason, to forward traffic, that traffic is forwarded by the switch's CPU, which has a limited forwarding capability.

Figure 5-10

Figure 5-10 Populating the TCAM

The process of the TCAM sending packets to a switch's CPU is called punting. Consider a few reasons why a packet might be punted from a TCAM to its CPU:

key-topic.jpg
  • Routing protocols, in addition to other control plane protocols such as STP, that send multicast or broadcast traffic will have that traffic sent to the CPU.
  • Someone connecting to a switch administratively (for example, establishing a Telnet session with the switch) will have their packets sent to the CPU.
  • Packets using a feature not supported in hardware (for example, packets traveling over a GRE tunnel) are sent to the CPU.
  • If a switch's TCAM has reached capacity, additional packets will be punted to the CPU. A TCAM might reach capacity if it has too many installed routes or configured access control lists.

From the events listed, the event most likely to cause a switch performance issue is a TCAM filling to capacity. Therefore, when troubleshooting switch performance, you might want to investigate the state of the switch's TCAM. Please be sure to check documentation for your switch model, because TCAM verification commands can vary between platforms.

As an example, the Cisco Catalyst 3550 Series switch supports a collection of show tcam commands, whereas Cisco Catalyst 3560 and 3750 Series switches support a series of show platform tcam commands. Consider the output from the show tcam inacl 1 statistics command issued on a Cisco Catalyst 3550 switch, as shown in Example 5-18. The number 1 indicates TCAM number one, because the Cisco Catalyst 3550 has three TCAMs. The inacl refers to access control lists applied in the ingress direction. Notice that fourteen masks are allocated, while 402 are available. Similarly, seventeen entries are currently allocated, and 3311 are available. Therefore, you could conclude from this output that TCAM number one is not approaching capacity.

Example 5-18. show tcam inacl 1 statistics Command Output on a Cisco Catalyst 3550 Series Switch

Cat3550# show tcam inacl 1 statistics
Ingress ACL TCAM#1: Number of active labels: 3
Ingress ACL TCAM#1: Number of masks   allocated:   14, available:  402
Ingress ACL TCAM#1: Number of entries allocated:   17, available: 3311

On some switch models (for example, a Cisco Catalyst 3750 platform), you can use the show platform ip unicast counts command to see if a TCAM allocation has failed. Similarly, you can use the show controllers cpu-interface command to display a count of packets being forwarded to a switch's CPU.

On most switch platforms, TCAMs cannot be upgraded. Therefore, if you conclude that a switch's TCAM is the source of the performance problems being reported, you could either use a switch with higher-capacity TCAMs or reduce the number of entries in a switch's TCAM. For example, you could try to optimize your access control lists or leverage route summarization to reduce the number of route entries maintained by a switch's TCAM. Also, some switches (for example, Cisco Catalyst 3560 or 3750 Series switches) enable you to change the amount of TCAM memory allocated to different switch features. For example, if your switch ports were configured as routing ports, you could reduce the amount of TCAM space used for storing MAC addresses, and instead use that TCAM space for Layer 3 processes.

High CPU Utilization Level Troubleshooting

The load on a switch's CPU is often low, even under high utilization, thanks to the TCAM. Because the TCAM maintains a switch's forwarding logic, the CPU is rarely tasked to forward traffic. The show processes cpu command that you earlier learned for use on a router can also be used on a Cisco Catalyst switch to display CPU utilization levels, as demonstrated in Example 5-19.

key-topic.jpg

Example 5-19. show processes cpu Command Output on a Cisco Catalyst 3550 Series Switch

Cat3550# show processes cpu
CPU utilization for five seconds: 19%/15%; one minute: 20%; five minutes: 13%
 PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
   1          0         4           0  0.00%  0.00%  0.00%   0 Chunk Manager
   2           0       610          0  0.00%  0.00%  0.00%   0 Load Meter
   3         128         5      25600  0.00%  0.00%  0.00%   0 crypto sw pk pro
   4        2100       315       6666  0.00%  0.05%  0.05%   0 Check heaps
...OUTPUT OMITTED...

Notice in the output in Example 5-19 that the switch is reporting a 19 percent CPU load, with 15 percent of the CPU load used for interrupt processing. The difference between these two numbers is 4, suggesting that 4 percent of the CPU load is consumed with control plane processing.

Although such load utilization values might not be unusual for a router, these values might be of concern for a switch. Specifically, a typical CPU load percentage dedicated to interrupt processing is no more than five percent. A value as high as ten percent is considered acceptable. However, the output given in Example 5-19 shows a fifteen percent utilization. Such a high level implies that the switch's CPU is actively involved in forwarding packets that should normally be handled by the switch's TCAM. Of course, this value might only be of major concern if it varies from baseline information. Therefore, your troubleshooting efforts benefit from having good baseline information.

Periodic spikes in processor utilization are also not a major cause for concern if such spikes can be explained. Consider the following reasons that might cause a switch's CPU utilization to spike:

  • The CPU processing routing updates
  • Issuing a debug command (or other processor-intensive commands)
  • Simple Network Management Protocol (SNMP) being used to poll network devices

If you determine that a switch's high CPU load is primarily the result of interrupts, you should examine the switch's packet switching patterns and check the TCAM utilization. If, however, the high CPU utilization is primarily the result of processes, you should investigate those specific processes.

A high CPU utilization on a switch might be a result of STP. Recall that an STP failure could lead to a broadcast storm, where Layer 2 broadcast frames endlessly circulate through a network. Therefore, when troubleshooting a performance issue, realize that a switch's high CPU utilization might be a symptom of another issue.

Trouble Ticket: HSRP

This trouble ticket focuses on HSRP. HSRP was one of three first-hop redundancy protocols discussed in this chapter's "Router Redundancy Troubleshooting" section.

Trouble Ticket #2

You receive the following trouble ticket:

  • A new network technician configured HSRP on routers BB1 and BB2, where BB1 was the active router. The configuration was initially working; however, now BB2 is acting as the active router, even though BB1 seems to be operational.

This trouble ticket references the topology shown in Figure 5-11.

Figure 5-11

Figure 5-11 Trouble Ticket #2 Topology

As you investigate this issue, you examine baseline data collected after HSRP was initially configured. Examples 5-20 and 5-21 provide show and debug command output collected when HSRP was working properly. Notice that router BB1 was acting as the active HSRP router, whereas router BB2 was acting as the standby HSRP router.

Example 5-20. Baseline Output for Router BB1

BB1# show standby brief
                        P indicates configured to preempt.
                        |
Interface   Grp Prio P State    Active          Standby         Virtual IP
Fa0/1       1   150    Active   local           172.16.1.3      172.16.1.4
BB1# debug standby
HSRP debugging is on
*Mar  1 01:14:21.487: HSRP: Fa0/1 Grp 1 Hello  in  172.16.1.3 Standby pri 100 vIP  172.16.1.4
*Mar  1 01:14:23.371: HSRP: Fa0/1 Grp 1 Hello  out 172.16.1.1 Active  pri 150 vIP  172.16.1.4
BB1# u all
All possible debugging has been turned off

BB1# show standby fa 0/1 1
FastEthernet0/1 - Group 1
  State is Active
    10 state changes, last state change 00:12:40
  Virtual IP address is 172.16.1.4
  Active virtual MAC address is 0000.0c07.ac01
    Local virtual MAC address is 0000.0c07.ac01 (v1 default)
  Hello time 3 sec, hold time 10 sec
    Next hello sent in 1.536 secs
  Preemption disabled
  Active router is local
  Standby router is 172.16.1.3, priority 100 (expires in 9.684 sec)
  Priority 150 (configured 150)
  IP redundancy name is "hsrp-Fa0/1-1" (default)

BB1# show run
...OUTPUT OMITTED...
hostname BB1
!
interface Loopback0
 ip address 10.3.3.3 255.255.255.255
!
interface FastEthernet0/0
 ip address 10.1.2.1 255.255.255.0
!
interface FastEthernet0/1
 ip address 172.16.1.1 255.255.255.0
 standby 1 ip 172.16.1.4
 standby 1 priority 150
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0

Example 5-21. Baseline Output for Router BB2

BB2# show standby brief
                        P indicates configured to preempt.
                        |
Interface   Grp Prio P State    Active          Standby         Virtual IP
Fa0/1       1  100    Standby  172.16.1.1       local           172.16.1.4

BB2# show run
...OUTPUT OMITTED...
hostname BB2
!
interface Loopback0
 ip address 10.4.4.4 255.255.255.255
!
interface FastEthernet0/0
 ip address 10.1.2.2 255.255.255.0
!
interface FastEthernet0/1
 ip address 172.16.1.3 255.255.255.0
 standby 1 ip 172.16.1.4
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0

As part of testing the initial configuration, a ping was sent to the virtual IP address of 172.16.1.4 from router R2 in order to confirm that HSRP was servicing requests for that IP address. Example 5-22 shows the output from the ping command.

Example 5-22. PINGing the Virtual IP Address from Router R2

R2# ping 172.16.1.4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.4, timeout is 2 seconds:
!!!!!

As you begin to gather information about the reported problem, you reissue the show standby brief command on routers BB1 and BB2. As seen in Examples 5-23 and 5-24, router BB1 is administratively up with an HSRP priority of 150, whereas router BB2 is administratively up with a priority of 100.

Example 5-23. Examining the HSRP State of Router BB1's FastEthernet 0/1 Interface

BB1# show standby brief
                     P indicates configured to preempt.
                     |
Interface   Grp Prio P State    Active         Standby        Virtual IP
Fa0/1       1   150    Standby  172.16.1.3      local         172.16.1.4

Example 5-24. Examining the HSRP State of Router BB2's FastEthernet 0/1 Interface

BB2# show standby brief
                        P indicates configured to preempt.
                        |
Interface   Grp Prio P State    Active         Standby        Virtual IP
Fa0/1       1   100    Active   local          172.16.1.1     172.16.1.4

Take a moment to look through the baseline information, the topology, and the show command output. Then, hypothesize the underlying cause, explaining why router BB2 is currently the active HSRP router, even thought router BB1 has a higher priority. Finally, on a separate sheet of paper, write out a proposed action plan for resolving the reported issue.

Suggested Solution

Upon examination of BB1's output, it becomes clear that the preempt feature is not enabled for the Fast Ethernet 0/1 interface on BB1. The absence of the preempt feature explains the reported symptom. Specifically, if BB1 had at one point been the active HSRP router for HSRP group 1, and either router BB1 or its Fast Ethernet 0/1 interface became unavailable, BB2 would have become the active router. Then, if BB1 or its Fast Ethernet 0/1 interface once again became available, BB1 would assume a standby HSRP role, because BB1's FastEthernet 0/1 interface was not configured for the preempt feature.

To resolve this configuration issue, the preempt feature is added to BB1's Fast Ethernet 0/1 interface, as shown in Example 5-25. After enabling the preempt feature, notice that router BB1 regains its active HSRP role.

Example 5-25. Enabling the Preempt Feature on Router BB1's FastEthernet 0/1 Interface

BB1# conf term
Enter configuration commands, one per line.  End with CNTL/Z.
BB1(config)#int fa 0/1
BB1(config-if)#standby 1 preempt
BB1(config-if)#end
BB1#
*Mar  1 01:17:39.607: %HSRP-5-STATECHANGE: FastEthernet0/1 Grp 1 state Standby ->
  Active

BB1#show standby brief
                     P indicates configured to preempt.
                     |
Interface   Grp Prio P State    Active          Standby         Virtual IP
Fa0/1       1   150  P Active   local           172.16.1.3      172.16.1.4
  • + Share This
  • 🔖 Save To Your Account