Re: High CPU load on 7609 from Radioactive Frog on 2011-11-02 (Ccielab archives 11/2011)

From: Radioactive Frog <pbhatkoti_at_gmail.com>
Date: Wed, 2 Nov 2011 22:42:01 +1100

Nice command Pavel. Never used them before.
remote command show tech and remote command show log are another 2
important commands.

On Mon, Oct 31, 2011 at 10:27 PM, Pavel Bykov <slidersv_at_gmail.com> wrote:

> Hi Frog.
> If the CPU load is due to interrupts, there are no counters that tell you
> what exactly is causing high CPU load.
> "show proc cpu" only lists what processes are taking CPU time, and then it
> shows what percentage of the overall load is the interrrup-level load.
> Interrupt level load comes from working on packets - so its not a process
> that takes care of it, but a packet itself interrupts the CPU and requires
> CPU cycles to go through forwarding (and other) tables.
> Therefore there is no counter for that - only a total load.
> 75%/73% in "show proc cpu" means that the huge list of all the processes
> actually take up only 2% of the CPU (75%-73%), but the rest of the load is
> generated by the packets.
> To see what is causing the load, you need to create a span session and
> have "source" set as RP cpu.
>
> Also, don't forget that these platforms have two CPUs - SP and RP. To see
> if your SP is overloaded, issue:
>
> "remote command switch show proc cpu"
>
> To troubleshoot what is causing interrupt load of the SP cpu, set your
> span source to SP cpu.
>
> On Mon, Oct 31, 2011 at 10:49 AM, Radioactive Frog <pbhatkoti_at_gmail.com>wrote:
>
>> so one thing that i noticed when high CPU usage --> 'show proc cpu'
>> didn't show what is causing high CPU spike. Any way to see it ?
>>
>> On Mon, Oct 31, 2011 at 8:21 PM, Pavel Bykov <slidersv_at_gmail.com> wrote:
>>
>>> Hi.
>>> PBR is supported in the hardware on 7600/6500, but only with a very
>>> specific configuration. All other config, that is not stated in the
>>> documentation, will result in CPU punting, as was mentioned.
>>> It is really important to make sure that you use as much hardware as
>>> possible on those boxes, because it is easy to get carried away and think
>>> of the platform as too versatile, and overload your weak 600MHz SR71000 CPU.
>>>
>>> In this case it was pretty straigh forward - you knew what was the cause
>>> of the problem.
>>>
>>> In cases where its not that straightforward, you can realize that it's
>>> not the process level load that is the problem, but the interrupt level
>>> (73% of CPU load in your "sh proc cpu" output came from interrupts).
>>> Interrupt level CPU load is either cause by security breaches, malfunctions
>>> (e.g. bad IP checksum from other device), or configuration issues.
>>> To protect the CPU from extensive load, you can use control plane
>>> protection in a form of 10 available hardware policers, that prtect 1G
>>> pipes leading to the RP and SP cpus. Software COPP does not really do the
>>> trick in this case, as it is more for Process-Level intensive operations
>>> (e.g. good for SNMP, BGP, TELNET, FTP etc). But for floods, COPP will put
>>> as much load on CPU (as it is a software policer) as dispatching the packet
>>> - which defeats the purpose of COPP.
>>>
>>> In any case, if you really want to know what is your box using the CPU
>>> for, you can easily SPAN the pipe to RP and SP CPU in-band, so you'll know
>>> what exactly your CPU is working on, and based on that information decide
>>> how you can protect it.
>>>
>>> What you did on the end to reduce the CPU workload doesn't seem bad, so
>>> I'm not sure why you're dissapointed. 6500/7600 is not a software platform,
>>> so the functionality is fairly limited to PFC hardware capabilities. As I
>>> said, PBR is possible in hardware on these platforms, but only with a very
>>> specific command set.
>>>
>>> P.S.: CPU on 6500/7600 will never be able to handle more than 1G of
>>> packets, regardless of optimizations, as that is the speed of CPU interface
>>> on PORT-ASIC.
>>> On Thu, Oct 27, 2011 at 7:36 PM, Sajjad Najafizadeh <
>>> najafizadeh_at_gmail.com> wrote:
>>>
>>>> Friends,
>>>>
>>>> I have traffic from other network that need to be sent to some packet
>>>> analyzer ( About 2 gbps ) then the packet analyser send it back to same
>>>> router ( 7609) , first I've used PBR but the CPU load goes very high ,
>>>> I
>>>> though VRF might help , I've changed PBR to VRF , same issue , 80% + cpu
>>>> load , drop down the traffic to 1gbps , downgrade the IOS from 12.2(33)
>>>> to 12.2(18) , no luck , same thing ...
>>>> The only idea that worked for me was I removed the VRF and put IP of
>>>> CRF to
>>>> packet analyser , and make L2 link from other network to packet
>>>> analyzer L2
>>>> through 7600 , it reduced CPU load from 80%+ to 30-40%.
>>>>
>>>> really disappointed ...
>>>>
>>>> Regards
>>>>
>>>> On Thu, Oct 27, 2011 at 8:52 PM, Yuri Bank <yuribank_at_gmail.com> wrote:
>>>>
>>>> > So is this a result of using PBR in VRFs? Or will using PBR period,
>>>> cause
>>>> > all packets ( on that interface ) to hit the CPU?
>>>> >
>>>> > On Thu, Oct 27, 2011 at 5:16 AM, Radioactive Frog <
>>>> pbhatkoti_at_gmail.com>wrote:
>>>> >
>>>> >> >>>I cant believe given the state of the world that cisco is still
>>>> selling
>>>> >> products where a feature that can be configured can take slow down
>>>> the
>>>> >> system that much.
>>>> >>
>>>> >> Totally agreed Joseph.
>>>> >> I'd expect at least it captured somewhere in the 'show proc cpu'
>>>> command
>>>> >> so
>>>> >> that u can see what is causing it. once it's PBR'd (L3) , all
>>>> packets are
>>>> >> gonna punt onto the CPU.
>>>> >> May be it's documented somewhere and we don't know that secret
>>>> location on
>>>> >> CCO yet :)
>>>> >>
>>>> >> Sajjad -
>>>> >> >>>The only workaround that i've think of was to make it L2 toward
>>>> >> next-hop
>>>> >> and removing VRF in configuration.
>>>> >>
>>>> >> Do u mean u turned it into L2 switch? If so what a waste of 760X! :(
>>>> >>
>>>> >>
>>>> >> On Thu, Oct 27, 2011 at 1:39 AM, Joseph L. Brunner
>>>> >> <joe_at_affirmedsystems.com>wrote:
>>>> >>
>>>> >> > Very good to know that Sajjad.
>>>> >> >
>>>> >> > Thanks for posting back what did it.
>>>> >> >
>>>> >> > I cant believe given the state of the world that cisco is still
>>>> selling
>>>> >> > products where a feature that can be configured can take slow down
>>>> the
>>>> >> > system that much.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > -----Original Message-----
>>>> >> > From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On
>>>> Behalf Of
>>>> >> > Sajjad Najafizadeh
>>>> >> > Sent: Wednesday, October 26, 2011 10:08 AM
>>>> >> > To: Radioactive Frog
>>>> >> > Cc: Cisco certification
>>>> >> > Subject: Re: High CPU load on 7609
>>>> >> >
>>>> >> > Hi all
>>>> >> >
>>>> >> > First of all I've change IOS to 12.2(18) , but the issue exist.
>>>> >> > I used VRF light to send traffic to next hop as PBR killed the CPU
>>>> >> before .
>>>> >> > The only workaround that i've think of was to make it L2 toward
>>>> next-hop
>>>> >> > and
>>>> >> > removing VRF in configuration.
>>>> >> > The issue solved with this .
>>>> >> > I do not believe 7600 router can not handle VRF and BGP with some
>>>> PBR in
>>>> >> > same time with max traffic of 4gbps.
>>>> >> >
>>>> >> > Thanks again to all for support.
>>>> >> >
>>>> >> > REgards
>>>> >> >
>>>> >> > On Wed, Oct 26, 2011 at 1:21 PM, Radioactive Frog <
>>>> pbhatkoti_at_gmail.com
>>>> >> > >wrote:
>>>> >> >
>>>> >> > > last week I had same issue on 6509. Weird thing as nothing will
>>>> be
>>>> >> shown
>>>> >> > in
>>>> >> > > 'show proc cpu sorted' output.
>>>> >> > >
>>>> >> > > The root cause of my issue was someone added route-map (matching
>>>> ACL
>>>> >> and
>>>> >> > > set next hop). There were about 8000+ users! University
>>>> environment.
>>>> >> > > The core 6509 was running like a dog!
>>>> >> > >
>>>> >> > > The fix: implement VRF's . After removing route-maps CPU was
>>>> back to
>>>> >> > normal
>>>> >> > > 40-55% (was 95-100% constantly for 10 days).
>>>> >> > >
>>>> >> > >
>>>> >> > > HTH
>>>> >> > >
>>>> >> > > Frog
>>>> >> > >
>>>> >> > >
>>>> >> > > On Wed, Oct 26, 2011 at 6:39 PM, Sajjad Najafizadeh <
>>>> >> > najafizadeh_at_gmail.com
>>>> >> > > > wrote:
>>>> >> > >
>>>> >> > >> Hi all
>>>> >> > >>
>>>> >> > >> we have high CPU load on 7609 router .
>>>> >> > >> there is no ip policy and no NAT on this router but here is the
>>>> CPU
>>>> >> > load.
>>>> >> > >> Could any one suggest what to do ??
>>>> >> > >>
>>>> >> > >> *Output of sho ip proc cpu sorted :*
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> CPU utilization for five seconds: 75%/73%; one minute: 77%; five
>>>> >> > minutes:
>>>> >> > >> 77%
>>>> >> > >> PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY
>>>> >> Process
>>>> >> > >> 220 949800 3116922 304 1.59% 1.58% 1.43%
>>>> 0 IP
>>>> >> > Input
>>>> >> > >>
>>>> >> > >> 256 2544 3638244 0 0.15% 0.16% 0.15% 0
>>>> >> > Ethernet
>>>> >> > >> Msec Ti
>>>> >> > >> 83 80164 61809 1296 0.15% 0.02% 0.05% 2
>>>> >> Virtual
>>>> >> > >> Exec
>>>> >> > >> 2 30380 5976 5083 0.07% 0.05% 0.06% 0
>>>> Load
>>>> >> > Meter
>>>> >> > >>
>>>> >> > >> 219 1132 916608 1 0.07% 0.03% 0.02%
>>>> 0 IP
>>>> >> ARP
>>>> >> > >> Retry Age
>>>> >> > >> 372 576 43514 13 0.07% 0.00% 0.00%
>>>> 0 FM
>>>> >> core
>>>> >> > >>
>>>> >> > >> 162 632 29926 21 0.07% 0.02% 0.02% 0
>>>> >> > >> Per-Second
>>>> >> > >> Jobs
>>>> >> > >> 191 52 29866 1 0.07% 0.00% 0.00%
>>>> 0 CWAN
>>>> >> > >> CHOCX
>>>> >> > >> PROCE
>>>> >> > >> 260 1164 916616 1 0.07% 0.04% 0.05%
>>>> 0 IPAM
>>>> >> > >> Manager
>>>> >> > >> 27 1120 29181 38 0.07% 0.02% 0.02% 0
>>>> IPC
>>>> >> > >> Periodic Tim
>>>> >> > >> 326 1292 126588 10 0.07% 0.03% 0.02%
>>>> 0 TCP
>>>> >> > Timer
>>>> >> > >>
>>>> >> > >> 555 89680 494908 181 0.07% 0.11% 0.11%
>>>> 0 SNMP
>>>> >> > >> ENGINE
>>>> >> > >>
>>>> >> > >> 379 72 29805 2 0.07% 0.00% 0.00%
>>>> 0 PfR
>>>> >> BR
>>>> >> > >> Learn
>>>> >> > >> 15 0 2 0 0.00% 0.00% 0.00% 0
>>>> ATM
>>>> >> Idle
>>>> >> > >> Timer
>>>> >> > >> 14 316 31552 10 0.00% 0.00% 0.00% 0
>>>> ARP
>>>> >> > >> Background
>>>> >> > >> 17 0 1 0 0.00% 0.00% 0.00% 0
>>>> >> > >> AAA_SERVER_DEADT
>>>> >> > >> 13 25684 36715 699 0.00% 0.04% 0.05% 0
>>>> ARP
>>>> >> > Input
>>>> >> > >>
>>>> >> > >> 12 3748 30473 122 0.00% 0.00% 0.00% 0
>>>> >> > WATCH_AFS
>>>> >> > >>
>>>> >> > >> 16 0 1 0 0.00% 0.00% 0.00% 0
>>>> ATM
>>>> >> > ASYNC
>>>> >> > >> PROC
>>>> >> > >> 18 0 1 0 0.00% 0.00% 0.00% 0
>>>> >> Policy
>>>> >> > >> Manager
>>>> >> > >> 22 12 6010 1 0.00% 0.00% 0.00% 0
>>>> IPC
>>>> >> > Event
>>>> >> > >> Notifi
>>>> >> > >> 23 64 29182 2 0.00% 0.00% 0.00% 0
>>>> IPC
>>>> >> > Mcast
>>>> >> > >> Pendin
>>>> >> > >> 24 0 500 0 0.00% 0.00% 0.00% 0
>>>> IPC
>>>> >> > >> Dynamic
>>>> >> > >> Cach
>>>> >> > >> 11 0 2 0 0.00% 0.00% 0.00% 0
>>>> >> Timers
>>>> >> > >>
>>>> >> > >> 26 16 107 149 0.00% 0.00% 0.00% 0
>>>> >> PF_Split
>>>> >> > >> Sync Pr
>>>> >> > >>
>>>> >> > >> Regards
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> Blogs and organic groups at http://www.ccie.net
>>>> >> > >>
>>>> >> > >>
>>>> >>
>>>> _______________________________________________________________________
>>>> >> > >> Subscription information may be found at:
>>>> >> > >> http://www.groupstudy.com/list/CCIELab.html
>>>> >> >
>>>> >> >
>>>> >> > Blogs and organic groups at http://www.ccie.net
>>>> >> >
>>>> >> >
>>>> _______________________________________________________________________
>>>> >> > Subscription information may be found at:
>>>> >> > http://www.groupstudy.com/list/CCIELab.html
>>>> >>
>>>> >>
>>>> >> Blogs and organic groups at http://www.ccie.net
>>>> >>
>>>> >>
>>>> _______________________________________________________________________
>>>> >> Subscription information may be found at:
>>>> >> http://www.groupstudy.com/list/CCIELab.html
>>>>
>>>>
>>>> Blogs and organic groups at http://www.ccie.net
>>>>
>>>> _______________________________________________________________________
>>>> Subscription information may be found at:
>>>> http://www.groupstudy.com/list/CCIELab.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Pavel Bykov
>>>
>>>
>>>
>>
>
>
> --
> Pavel Bykov

Blogs and organic groups at http://www.ccie.net
Received on Wed Nov 02 2011 - 22:42:01 ART

This archive was generated by hypermail 2.2.0 : Thu Dec 01 2011 - 06:29:31 ART