Re: High interrupt cpu usage on 7206...

From: Carlos G Mendioroz <tron_at_huapi.ba.ar>
Date: Thu, 31 Jul 2014 08:10:35 -0300

Johny,
the problem with the show cpu is that the events generate spikes that
are only visible in the 5 seconds values.
I'm tracking the 5 minute cpu values (along the 5 seconds interrupt
time values) via SNMP (OIDs 1.3.6.1.4.1.9.9.109.1.1.1.1.5.1
and 1.3.6.1.4.1.9.9.109.1.1.1.1.11.1) every 5 minutes and those lines
are ok even when there are problems.

The router has somewhere the max of the last 5 minutes, because
it needs it to do the graph of "show proc cpu hist".
I was not able to find its OID though. May be it is not part of
the MIB. That would be a pitty, I do not want to poll the router every 5
seconds...

-Carlos

Johnny Morris @ 30/07/2014 21:40 -0300 dixit:
> You can use the following as this will not give you the detail you need:
>
> process cpu threshold type total rising (%) interval (seconds)
>
> along with:
>
> snmp-server enable traps cpu threshold
>
> OR along with
>
> Use EEM and tie it to the SNMP OID for CPU.
>
> Here is an example testing it with no event and printing output to Syslog,
> this is excluding anything at 0%. Of course you can change it to mail you
> as well for last action to you if you would like. There is also an action
> command to send an snmp-trap if you like instead.
>
> event manager applet TEST
> event none
> action 1.0 cli command "enable"
> action 2.0 cli command "sh processes cpu sorted 5min | e 0.00"
> action 3.0 syslog msg "$_cli_result"
>
> R1(config)#do event manager run TEST
> R1(config)#
> *Mar 1 00:06:24.603: %HA_EM-6-LOG: TEST:
> CPU utilization for five seconds: 0%/0%; one minute: 1%; five minutes: 1%
> PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
> 170 4556 1020 4466 0.32% 0.57% 0.73% 0 Exec
> 134 28 3745 7 0.08% 0.08% 0.08% 0 RBSCP
> Background
> 45 244 78 3128 0.08% 0.07% 0.06% 0 Compute load
> avg
>
>
> Here is an EXAMPLE event for SNMP:
>
> event snmp oid "1.3.6.1.4.1.9.9.109.1.1.1.1.5.1" get-type exact entry-op
> ge entry-val "60" poll-interval 60
>
>
> HTH.
>
>
> On Wed, Jul 30, 2014 at 7:44 PM, Carlos G Mendioroz <tron_at_huapi.ba.ar>
> wrote:
>
>> The sequel...
>> well, it did not end there. Two days after, kaboom again.
>>
>> Again, SPD flushes, CPU to the rough (if you are there to see it),
>> adjacencies losts.
>> I was lucky to spot one event, and it was caused by a blast of UDP.
>> Turns out, I had a BOT in my network and it was commanded to shoot
>> a blast of 80000 or so 60 bytes packets to a victim.
>> My internal network was pumping 1Gb to the 7206, which obviously
>> is above its capacity.
>>
>> Ended up rate limiting the interface (on the switch), policyng UDP and
>> shuting down the cuplprit (host).
>>
>> BTW, I have not found a way to get via SNMP the MAX CPU of the last 5
>> min interval (the number that corresponds to the asterisks in the
>> show proc cpu hist). Does anybody know ?
>>
>> -Carlos
>>
>> Joe Sanchez @ 24/07/2014 12:01 -0300 dixit:
>>> Wow Carlos, nice dig and feedback. I9m sure we all can use this
>>> information.
>>> Appreciate the follow-up bro.
>>>
>>>
>>> Best Regards,
>>> Joe Sanchez
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 7/24/14, 8:13 AM, "Carlos G Mendioroz" <tron_at_huapi.ba.ar> wrote:
>>>
>>>> Well, it seems that it's just that my expectations on the capacity of
>>>> the box were above the real power of it.
>>>> I ended up removing all the NBAR classification and we are back in
>>>> business (CPU on 20-25% for ~50Mbps).
>>>>
>>>> In the process, I learned something about the new 15.x CEF
>>>> implementation: PBR default NH was not supported by CEF before,
>>>> so default PBR (default route) flows were punted to the RP.
>>>> Now it is, but with a requirement that was not there:
>>>> NH has to be real NH, i.e., it will not be recursivelly determined.
>>>> (recursive worked before, now its optional for specific routes)
>>>>
>>>> The problem with that is that it creates a single point of failure. If
>>>> the NH goes down, your PBR creates a black hole.
>>>> So I used to use "recursive" NH with a static route that was tracked.
>>>> No joy in 15.x, so I went back to 12.4.
>>>>
>>>> Thanks for tunning :)
>>>> -Carlos
>>>>
>>>> Carlos G Mendioroz @ 23/07/2014 10:56 -0300 dixit:
>>>>> Andrew,
>>>>> thanks. As most of the CPU is being chewed at interrupt time, there is
>>>>> no process in the process list that has any significant portion of it.
>>>>>
>>>>> That document (and many others) redirect to
>>>>>
>>>>>
>>>>>
>> http://www.cisco.com/c/en/us/support/docs/routers/7500-series-routers/411
>>>>> 20-highcpu-interrupts.html
>>>>>
>>>>> Which I'm starting to memorize already :-/
>>>>>
>>>>> -Carlos
>>>>>
>>>>> Andrew LaPorte @ 23/07/2014 10:51 -0300 dixit:
>>>>>> Carlos,
>>>>>>
>>>>>> I typical start out with a " show proc cpu" or " show proc cpu sorted"
>>>>>> to
>>>>>> figure out which process is eating up the cpu. I then normally try to
>>>>>> figure out what the process is responsible for.
>>>>>>
>>>>>> Here is a doc that may help:
>>>>>>
>>>>>>
>> http://www.cisco.com/c/en/us/support/docs/routers/10000-series-routers/1
>>>>>> 5095
>>>>>> -highcpu.html?referring_site=bodynav
>>>>>>
>>>>>> I know it is not specific to the 7200 but it's a good read.
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On Behalf
>> Of
>>>>>> Carlos G Mendioroz
>>>>>> Sent: Wednesday, July 23, 2014 9:27 AM
>>>>>> To: Cisco certification
>>>>>> Subject: OT: High interrupt cpu usage on 7206...
>>>>>>
>>>>>> is driving me nuts :(
>>>>>>
>>>>>> I'm trying to nail down what is taking a 7206 with G1 to its knees
>>>>>> with just
>>>>>> about 50Mbps (75%/72% cpu).
>>>>>>
>>>>>> The config has many things, like WCCP, QoS, PBR, NBAR and OSPF + BGP,
>>>>>> but
>>>>>> still, 50Mbps for a G1 should be a breeze, shouldn't it ?
>>>>>>
>>>>>> I'm seeing some SPD flushes, trying to make sense of them.
>>>>>> But I would not mind advice on what to look after.
>>>>>> (I was hoping CSCtg42179 would save my day, but upgraded to 12.4.24T8
>>>>>> and
>>>>>> same thing...)
>>>>>>
>>>>>> TIA,
>>>>>> --
>>>>>> Carlos G Mendioroz <tron_at_huapi.ba.ar> LW7 EQI Argentina
>>>>>>
>>>>>>
>>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>>
>>>>>>
>> _______________________________________________________________________
>>>>>> Subscription information may be found at:
>>>>>> http://www.groupstudy.com/list/CCIELab.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Carlos G Mendioroz <tron_at_huapi.ba.ar> LW7 EQI Argentina
>>>>
>>>>
>>>> Blogs and organic groups at http://www.ccie.net
>>>>
>>>> _______________________________________________________________________
>>>> Subscription information may be found at:
>>>> http://www.groupstudy.com/list/CCIELab.html
>>>
>>>
>>> Blogs and organic groups at http://www.ccie.net
>>>
>>> _______________________________________________________________________
>>> Subscription information may be found at:
>>> http://www.groupstudy.com/list/CCIELab.html
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Carlos G Mendioroz <tron_at_huapi.ba.ar> LW7 EQI Argentina
>>
>>
>> Blogs and organic groups at http://www.ccie.net
>>
>> _______________________________________________________________________
>> Subscription information may be found at:
>> http://www.groupstudy.com/list/CCIELab.html
>
>
> Blogs and organic groups at http://www.ccie.net
>
> _______________________________________________________________________
> Subscription information may be found at:
> http://www.groupstudy.com/list/CCIELab.html
>
>
>
>
>
>
>

-- 
Carlos G Mendioroz  <tron_at_huapi.ba.ar>  LW7 EQI  Argentina
Blogs and organic groups at http://www.ccie.net
Received on Thu Jul 31 2014 - 08:10:35 ART

This archive was generated by hypermail 2.2.0 : Fri Aug 01 2014 - 07:53:01 ART