RE: EEM to keep BGP peer shut during an interface flap

From: Jon Hartman <jon.hartman_at_verizon.net>
Date: Thu, 29 Aug 2013 00:30:49 -0400

Joe, there was a time when I would have happily joined your flame war on a
professional forum, but I'm above that and someday I hope you are too. TDM
is still alive and he didn't specify he was tracking someone else's
interface. Your comment about the issues with intermediary devices in
Ethernet access are valid, but aren't news, either. Likewise, "why only EEM"
isn't the same as "what is EEM," but I'll extend the benefit of the doubt
and assume you were having an off day.

Regarding the issue at hand, if the goal is to minimize flapping, while
keeping your capacity up to par, something simple would be using cumulative
penalty multiplier. I'm a fan of returning from failure scenarios in
controlled manners in maintenance windows, but I've worked for clients that
consider simplex mode an outage or have exceeded their 50% limit for
maintaining redundancy.

In the below script, it'll keep track of failures and increase the penalty.
By playing with the values in 2.1 and 2.2, you can increase the down-time to
something more reasonable than the aggressive values I've got below. Bear in
mind, that if the config is saved then the current instability value will
be, as well. This could be solved multiple ways, like an initialization
script, using contexts instead, etc. I couldn't see letting it get triggered
more than once a minute or be down for more than a day, but those could
obviously be modified as well.

event manager environment instability 0

event manager applet HealthMonitor
event syslog pattern "%BGP-5-NBR_RESET: Neighbor 10.10.23.3 reset" maxrun
87000 ratelimit 60
action 1.0 cli command "conf t"
action 1.1 cli command "router bgp 2"
action 1.2 cli command "neighbor 10.10.23.3 shutdown"
action 2.1 increment instability 1
action 2.2 multiply $instability 60
action 2.3 set punishment "$_result"
action 2.4 if $punishment gt "86400"
action 2.5 set punishment "86400"
action 2.6 end
action 2.7 syslog msg "Punishment is $punishment"
action 3.1 while $punishment ge 1
action 3.3 wait 1
action 3.4 decrement punishment 1
action 3.5 end
action 4.0 syslog msg "Shutdown timer of $punishment elapsed. Re-enabling
peer."
action 4.1 cli command "no neighbor 10.10.23.3 shutdown"
action 4.2 cli command "event manager environment instability $instability"
action 4.3 cli command "end"

Bear in mind, pinging things outside of your administrative control makes
the behavior of your network beholden to an external entities security
policies, which likely don't take such things into account.

-Jon

-----Original Message-----
From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On Behalf Of Jay
McMickle
Sent: Thursday, August 15, 2013 10:01 PM
To: Christopher Rae
Cc: Joseph L. Brunner; jon.hartman_at_verizon.net; mathewfer_at_gmail.com;
marco207p_at_gmail.com; jneiberger_at_gmail.com; ccielab_at_groupstudy.com
Subject: Re: EEM to keep BGP peer shut during an interface flap

Joe is always trying to kick over rocks. Ignore him, it's a Jeckle and Hyde
thing. ;) BWhahahahaahaa!
Oh, and I think he meant CCIE lab rat, not rate.

Yes, Jon knows BGP well, and made me feel very small in our CCIE training
class in Sept 2011.

Regards,
Jay McMickle- 2x CCIE #35355 (R/S,Sec)
Sent from my iPhone 5

On Aug 15, 2013, at 10:15 AM, Christopher Rae <chris.rae07_at_me.com> wrote:

> Whats a lab rate ccie?
>
> Cheers
> Chris Rae
>
> On 15/08/2013, at 11:08 PM, "Joseph L. Brunner" <joe_at_affirmedsystems.com>
wrote:
>
>> Another lab rate ccie :)
>>
>> Cause Jon,
>>
>> ISP are often useless post office style entities. We often cant rely
>> on them for much. In my experience (500+ bgp implementations with a
>> dual homed site or colo) the carriers can do things like freeze up,
>> so you have to wait the keepalive and dead times before the secondary
>> route(s) take over. BFD? I have not seen an isp offer that. We have
>> Windstream (Paetec), TWC, Level3, Transbeam and Cogent to choose from
>> here in NYC. I have a hard enough time just getting the peering
>> session setup (one of those carrier's noc guy needed a config, I kid
>> you not)
>>
>> EEM can also send you an email when bad things happen before your users
(or boss) comes and tells you...
>>
>> Also, fast external failover is often useless. We are in the ethernet
society... That feature was designed 15 years ago in the era of hdlc and ppp
connections - like a DS3/T3. Your interface will almost never go down when
your ethernet isp is "down". I know on my Level3 connections there are 2
alcatel lucent boxes between us and the juniper router actually doing the
bgp. No chance that will help.
>>
>> EEM is your final control of how the router functions under different bgp
and other conditions. Don't leave home without it...
>>
>>
>> ----- Original Message -----
>> From: Jon Hartman [mailto:jon.hartman_at_verizon.net]
>> Sent: Thursday, August 15, 2013 10:40 AM
>> To: Christopher Rae <chris.rae07_at_me.com>
>> Cc: Mathew <mathewfer_at_gmail.com>; Joe Sanchez <marco207p_at_gmail.com>;
>> Joseph L. Brunner; John Neiberger <jneiberger_at_gmail.com>; Cisco
>> certification <ccielab_at_groupstudy.com>
>> Subject: Re: EEM to keep BGP peer shut during an interface flap
>>
>> I'd have to think that features like BFD, bgp fast failover, interface
dampening, and BGP dampening would accommodate the issue at hand.
>>
>> Why the requirement to use EEM?
>>
>> Jon Hartman
>> CCIE #34941
>>
>> On Aug 15, 2013, at 4:14 AM, "Christopher Rae" <chris.rae07_at_me.com>
wrote:
>>
>>> Hey Joseph,
>>>
>>> Yes, had BFD running with a few providers no worries.
>>>
>>> Cheers
>>> Chris
>>>
>>> -----Original Message-----
>>> From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On Behalf
>>> Of Mathew
>>> Sent: Thursday, August 15, 2013 3:47 PM
>>> To: Joe Sanchez
>>> Cc: Joseph L. Brunner; John Neiberger; Chris Rae; Cisco
>>> certification
>>> Subject: Re: EEM to keep BGP peer shut during an interface flap
>>>
>>> Hi,
>>>
>>> I just tried the below but I could not get it to work. The idea is
>>> to ping an IP and depending on the result to take action.
>>>
>>> I think line "action 11.2 regexp "(.*) (!\!\!\!\!) (.*)"
>>> "$_cli_result" _match _sub1" is NOT correct.
>>> As I am still building this applet, I run this manually.
>>>
>>> How do I get this regular expression correctly to match ping result?
>>>
>>> R2#show event manager version | in Event Manager Version Embedded
>>> Event Manager Version 3.00 R2#
>>>
>>> !
>>> event manager applet CHECK-PING-STATUS event none action 11.1 cli
>>> command "ping 2.2.2.2"
>>> action 11.2 regexp "(.*) (!\!\!\!\!) (.*)" "$_cli_result" _match
>>> _sub1 action 11.3 if $_regexp_result eq 1 action 11.4 syslog msg
>>> "Ping is success"
>>> action 11.5 else
>>> action 11.6 syslog msg "Ping is failed"
>>> action 11.7 end
>>> !
>>>
>>> Mathew
>>>
>>> On Wed, Aug 14, 2013 at 11:09 PM, Joe Sanchez <marco207p_at_gmail.com>
wrote:
>>>> Level 3 will as long as your're homed to the right gateway boxes.
>>>>
>>>> Regards,
>>>> Joe Sanchez
>>>>
>>>> ( please excuse the brevity of this email as it was sent via a
>>>> mobile device. Please excuse misspelled words or sentence
>>>> structure.)
>>>>
>>>> On Aug 14, 2013, at 3:26 AM, "Joseph L. Brunner"
>>>> <joe_at_affirmedsystems.com>
>>> wrote:
>>>>
>>>>> I have never seen an ISP that will run BFD with any customers...
>>>>> they seem to have enough issues just getting basic bgp setup
>>>>> (cogent
>>>>> anyone?)
>>>>>
>>>>> How about an EEM solution that shuts down bgp for a few hours and
>>>>> turns it back on aftermarket hours? Yes it works... we use it :)
>>>>>
>>>>> kbro-voip-rt01#show run | sec event
>>>>>
>>>>> event manager directory user policy "flash:/"
>>>>> event manager policy sendmail.tcl
>>>>>
>>>>> event manager applet ShutdownCohereBGPNeighbor event track 10
>>>>> state down action 1.0 info type routername action 2.0 cli command
"enable"
>>>>> action 2.1 cli command "configure terminal"
>>>>> action 2.5 cli command "router bgp 65080"
>>>>> action 2.6 cli command "neighbor 208.71.93.213 shutdown"
>>>>> action 3.0 mail server "outbounds9.obsmtp.com" to
>>> "kbro-notif_at_affirmedsystems.com" from "kbro-voip-rt01_at_kbro.com"
>>> subject "Cohere VoIP Direct route down @ $_info_routername"
>>>>>
>>>>> event manager applet EnableCohereat8PM event timer cron name
>>>>> EnableCohereat8PM cron-entry "0 20 * * *"
>>>>> action 1.0 info type routername
>>>>> action 2.0 cli command "enable"
>>>>> action 2.1 cli command "configure terminal"
>>>>> action 2.5 cli command "router bgp 65080"
>>>>> action 2.6 cli command "no neighbor 208.71.93.213 shutdown"
>>>>>
>>>>> event manager applet NoShutCohere805PM event tag 1.0 track 10
>>>>> state up event tag 2.0 timer cron name NoShutCohere805PM
>>>>> cron-entry "5 20 *
>>>>> * *"
>>>>> trigger occurs 1 delay 10
>>>>> correlate event 1.0 and event 2.0
>>>>> attribute tag 1.0 occurs 1
>>>>> attribute tag 2.0 occurs 1
>>>>> action 1.0 info type routername
>>>>> action 2.0 cli command "enable"
>>>>> action 2.1 cli command "configure terminal"
>>>>> action 2.5 cli command "router bgp 65080"
>>>>> action 2.6 cli command "no neighbor 208.71.93.213 shutdown"
>>>>> action 2.7 cli command "do clear ip nat translation *"
>>>>> action 3.0 mail server "outbounds9.obsmtp.com" to
>>> "kbro-notif_at_affirmedsystems.com" from "kbro-voip-rt01_at_kbro.com"
>>> subject "Cohere VoIP Direct route restored @ $_info_routername"
>>>>>
>>>>>
>>>>> event manager applet EnableCohereat7AM event timer cron name
>>>>> EnableCohereat7AM cron-entry "0 7 * * *"
>>>>> action 1.0 info type routername
>>>>> action 2.0 cli command "enable"
>>>>> action 2.1 cli command "configure terminal"
>>>>> action 2.5 cli command "router bgp 65080"
>>>>> action 2.6 cli command "no neighbor 208.71.93.213 shutdown"
>>>>>
>>>>> event manager applet KeepNoShutCohere705AM event tag 1.0 track 10
>>>>> state up event tag 2.0 timer cron name KeepNoShutCohere705AM
>>>>> cron-entry "5 7 * * *"
>>>>> trigger occurs 1 delay 10
>>>>> correlate event 1.0 and event 2.0
>>>>> attribute tag 1.0 occurs 1
>>>>> attribute tag 2.0 occurs 1
>>>>> action 1.0 info type routername
>>>>> action 2.0 cli command "enable"
>>>>> action 2.1 cli command "configure terminal"
>>>>> action 2.5 cli command "router bgp 65080"
>>>>> action 2.6 cli command "no neighbor 208.71.93.213 shutdown"
>>>>> action 2.7 cli command "do clear ip nat translation *"
>>>>> action 3.0 mail server "outbounds9.obsmtp.com" to
>>> "kbro-notif_at_affirmedsystems.com" from "kbro-voip-rt01_at_kbro.com"
>>> subject "Cohere VoIP Direct route restored @ $_info_routername"
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On
>>>>> Behalf Of John Neiberger
>>>>> Sent: Tuesday, August 13, 2013 12:12 PM
>>>>> To: Chris Rae
>>>>> Cc: Mathew; Cisco certification
>>>>> Subject: Re: EEM to keep BGP peer shut during an interface flap
>>>>>
>>>>> This. Exactly. Use BFD for this. It already does what you're
>>>>> trying to do
>>> and it's a heck of a lot easier to configure.
>>>>>
>>>>>
>>>>> On Tue, Aug 13, 2013 at 6:53 AM, Chris Rae <chris.rae07_at_me.com> wrote:
>>>>>
>>>>>> Hey Matt,
>>>>>>
>>>>>> Why not just use BFD?
>>>>>> If the BFD peer is down (ie no keep alive or interface goes down)
>>>>>> BGP will immediately reroute via other peer.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 13/08/2013, at 7:52 PM, Mathew <mathewfer_at_gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I tested two EEM applet configs:
>>>>>>>
>>>>>>> - One check for syslog for an interface down and CLI to shut
>>>>>>> down BGP
>>>>>> peer.
>>>>>>> - Second one to no shut the BGP peer when syslog entry is seen
>>>>>>> with interface up.
>>>>>>>
>>>>>>> In fact the interface that I want to check is NOT being used for
>>>>>>> this BGP peering so there is no way to do it with BGP configuration.
>>>>>>>
>>>>>>> The above two EEM configs works but the issue is that when this
>>>>>>> interface start to flap, EEM keep shutting and no-shutting BGP peer.
>>>>>>> I want to
>>>>>> avoid
>>>>>>> this as it results in BGP flap.
>>>>>>>
>>>>>>> Has any body tried an EEM solution to keep the BGP peer shut
>>>>>>> during an interface flap?
>>>>>>>
>>>>>>> I do not mind keeping the BGP shut till interface flapping is
>>>>>>> over but
>>>>>> how
>>>>>>> do we do/detect it with EEM?
>>>>>>>
>>>>>>> Thanks in advance for your replies.
>>>>>>>
>>>>>>> Mathew
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>>
>>>>>>> Mathew
>>>>>>>
>>>>>>>
>>>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>>>
>>>>>>> ________________________________________________________________
>>>>>>> ___ _ ___ Subscription information may be found at:
>>>>>>> http://www.groupstudy.com/list/CCIELab.html
>>>>>>
>>>>>>
>>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>>
>>>>>> _________________________________________________________________
>>>>>> ___ __ _ Subscription information may be found at:
>>>>>> http://www.groupstudy.com/list/CCIELab.html
>>>>>
>>>>>
>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>
>>>>> __________________________________________________________________
>>>>> ___ __ Subscription information may be found at:
>>>>> http://www.groupstudy.com/list/CCIELab.html
>>>>>
>>>>>
>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>
>>>>> __________________________________________________________________
>>>>> ___ __ Subscription information may be found at:
>>>>> http://www.groupstudy.com/list/CCIELab.html
>>>
>>>
>>>
>>> --
>>> Thanks
>>>
>>> Mathew
>>>
>>>
>>> Blogs and organic groups at http://www.ccie.net
>>>
>>> ____________________________________________________________________
>>> ___ Subscription information may be found at:
>>> http://www.groupstudy.com/list/CCIELab.html
>>>
>>>
>>> Blogs and organic groups at http://www.ccie.net
>>>
>>> ____________________________________________________________________
>>> ___ Subscription information may be found at:
>>> http://www.groupstudy.com/list/CCIELab.html
>>
>>
>> Blogs and organic groups at http://www.ccie.net
>>
>> _____________________________________________________________________
>> __ Subscription information may be found at:
>> http://www.groupstudy.com/list/CCIELab.html
>
>
> Blogs and organic groups at http://www.ccie.net
>
> ______________________________________________________________________
> _ Subscription information may be found at:
> http://www.groupstudy.com/list/CCIELab.html

Blogs and organic groups at http://www.ccie.net
Received on Thu Aug 29 2013 - 00:30:49 ART

This archive was generated by hypermail 2.2.0 : Sun Sep 01 2013 - 08:35:51 ART