RE: Real world Scnario...Need Help...Lossing $$$

From: Scott Morris (smorris@ipexpert.com)
Date: Sat Dec 08 2007 - 11:53:54 ART


So you are telling me that when your BGP flaps you have a large
reconvergence in OSPF??? If your routing is set up as you described (e.g.
ONLY a default route from ebgp) then that should not happen.
 
You would need to have multiple things not configured correctly/as-described
in order for a BGP flap to cause a complete reconvergence in BGP. Are you
running OSPF with your Juniper routers as well? If both IGP and BGP are
flapping, that would seem that either the Juniper router is rebooting or
disappearing or you have a LINK flap in between the two, and for an extended
period of time.
 
If you have limited routes changing, debug ip routing will NOT have any
major impact on your devices. If it does (you only need to run it through
one convergence to get information) cause issues, you really won't cause
additional problems. Per your statements, things go bad and make you lose
connection/packets anyway, so what's an additional couple of seconds in
order to gather the information you need to see exactly what is happening?
 
Too many people are afraid to do debugs in production networks. here you
aren't going to be any worse off than you already are. I would not
recommend doing a "debug all". :) But targeted/necessary debugs done
appropriately are just fine.
 
You can pore through your configs on the Juniper and Cisco side and attempt
to diagnose your loop by configuration but if it isn't an obvious
misconfiguration, that won't help!
 
OSPF should not reconverge because of BGP flapping per your described config
though.
 

Scott Morris, CCIE4 (R&S/ISP-Dial/Security/Service Provider) #4713, JNCIE-M
#153, JNCIS-ER, CISSP, et al.
CCSI/JNCI-M/JNCI-ER
VP - Technical Training - IPexpert, Inc.
IPexpert Sr. Technical Instructor

A Cisco Learning Partner - We Accept Learning Credits!

smorris@ipexpert.com

 

Telephone: +1.810.326.1444
Fax: +1.810.454.0130
http://www.ipexpert.com

 

  _____

From: ccie ccie [mailto:cciefun@gmail.com]
Sent: Saturday, December 08, 2007 3:07 AM
To: smorris@ipexpert.com
Cc: Cisco certification
Subject: Re: Real world Scnario...Need Help...Lossing $$$

Hi Scott,
 
  Thanks for your reply. I cant do "debug ip rpouting"
 
1. Because we have lots of OSPF routes in the routing table.
2. If single BGP flap unfortunately the same time when i am doing debugging.
Sure i will have to face BIG problem. So i cant go for this.
 
Is any other way i can find out the BGP routing loop only using show
command?
Will CPU span wil help?
 
{ In many cases i have observed that only EBGP session flap when CPU spike
97-99%, IBGP session doesn't flap. Can you please put some light on this.}
 
Regards
Mike

 
On 12/8/07, Scott Morris <smorris@ipexpert.com> wrote:

Do a "debug ip routing" on the 6500's. If you are just getting a default
route, not that much should change. However, I think you have introduced a
BGP routing loop. You'll see some messages about which route is causing
recursion and therefore killing the sessions if I'm right.
 
Scott
 
 

  _____

From: ccie ccie [mailto:cciefun@gmail.com]
Sent: Friday, December 07, 2007 10:01 PM
To: smorris@ipexpert.com
Subject: Re: Real world Scnario...Need Help...Lossing $$$

 

Hi All,
 
 
What SUP are you running on the 6500's? ==> Sup 720
 
 how much memory? Not in office need to check that but sure more more than
128MB. They have set good standard for memory/enggeering IOS etc.
 
 What does "sh proc cpu" look like? EBGP hold time expire (hello 2 , hod
time is 6sec) & its come up immegiately. In "sh proc cpu his" the flap time
matches to cpu spike. CPU spike when flap happen mostly between 97-99% only
for few second.
 
 Are you getting any log messages on your switch (e.g. are you SURE you
aren't getting full routes or trying to?)--Only default BGP route from
Juniper router.
 
What i have checked,
 
1. BW on link to Juniper & Core sw 2 is OK
2. CPU is 20% all time mostly
3. Log all clear except BGP hold timer expire when flap
4. No input output interface drop on Core sw2 to juniper interface
5. IBGP session between both switches, never falp. Only EBGP session flap
with BOTH juniper. ( So juniper is not culprit as usual.)
6. I suspect some kind of traffic from LAN or WAN hitting causing the few
second spike in CPU which turn to EBGP session flap on core switch 2. My
observation in all DC of my company, whenever the CPU hits to 97-99% only
EBGP session flap & IBGP as well as OSPF keep clam.
 
The reason EBGP flap is due to MTU mismatch on uplink that is also not
there.
 
Thanks in advance for any kind on help.
 
Regards,
Mike
 
 
 
 

 
On 12/8/07, Scott Morris <smorris@ipexpert.com
<mailto:smorris@ipexpert.com> > wrote:

What SUP are you running on the 6500's? how much memory? What does "sh
proc cpu" look like? Are you getting any log messages on your switch ( e.g.
are you SURE you aren't getting full routes or trying to?)

Scott

-----Original Message-----
From: nobody@groupstudy.com [mailto: <mailto:nobody@groupstudy.com>
nobody@groupstudy.com] On Behalf Of ccie
ccie
Sent: Friday, December 07, 2007 12:42 PM
To: Cisco certification
Subject: Real world Scnario...Need Help...Lossing $$$

Hi Tech Lover,

                                             Juniper R1
Juniper J2 <---- Connect to BB
router as well as this router face the internet world.
                                                 | | / / \ \
| | <--- Each EBGP
session with hello 2 Hold 6 running on each physical link. OPSF is also
running on same link.
       Server Iron LB ======= 6500 sw1 ---ibgp---
6500 sw2 ====Server Iron LB
                                                 | | / / \ \
| | <--- Physical
link are L2 port channel
                                            4948 sw1 4948 sw2

This is an scnario, where i need your help. We cant run EBGP on loopback
due to some architecturial concern. Problem with only EBGP session on core
sw2.

Our EBGP session only on core switch 2 with Juniper 1 & 2 flap with few
irregular hours. IBGP session between core switch 1 & 2 never flap. We are
get hell lot of traffic. This site run multicast ( but sure that could not
be problem) servers, web server, DB server etc. Core sw1 session are always
stable. OSPF neighborship on both switch to Juniper BB router is up snce
more than 1 yrs.

1. BW on link is not high.
2. CPU show 99% spike when BGP falp. CPU cause BGP flap i think so. ( I dont

think BGP flap can cause of CPU spike.)
3 Core switch is not having full BGP route.

( i tried to put ip nabr protocol-discover on EBGP link & session flap due
to me, i revert back in few sec to save my job)

Suggest an troubleshooting step. As usual TAC is already open & CISCO taking

too much time.

Regards,
Mike



This archive was generated by hypermail 2.1.4 : Tue Jan 01 2008 - 12:04:29 ARST