Re: BGP Path Selection weirdness regarding next hops

From: Marko Milivojevic <markom_at_ipexpert.com>
Date: Fri, 30 Nov 2012 19:53:02 -0800

... assuming it's the route that will be used to evaluate the next-hop
for your BGP prefix.

--
Marko Milivojevic - CCIE #18427 (SP R&S)
Senior CCIE Instructor - IPexpert
On Fri, Nov 30, 2012 at 7:49 PM, John Neiberger <jneiberger_at_gmail.com> wrote:
> Ah! I misunderstood. I'll take off my current route maps and add one to
> change the MED of the default route.
>
>
> On Fri, Nov 30, 2012 at 8:38 PM, Marko Milivojevic <markom_at_ipexpert.com>
> wrote:
>>
>> Ah, but when you do that, you're subject to MED comparisons, which
>> have their own set of rules. I was referring to the MED on the
>> 0.0.0.0/0
>>
>> --
>> Marko Milivojevic - CCIE #18427 (SP R&S)
>> Senior CCIE Instructor - IPexpert
>>
>> On Fri, Nov 30, 2012 at 7:12 PM, John Neiberger <jneiberger_at_gmail.com>
>> wrote:
>> > No, I increased the MED of the prefix in question, 100.100.100.0/24, in
>> > my
>> > case. The BGP-learned default route is staying at 0 MED.
>> >
>> > It seems weird to me, too!
>> >
>> >
>> > On Fri, Nov 30, 2012 at 8:07 PM, Yuri Bank <yuribank_at_gmail.com> wrote:
>> >>
>> >> So you increased the MED of the default route you're receiving? I find
>> >> it
>> >> interesting that its the actual metric of each protocol being compared,
>> >> regardless of the prefix-length or AD.
>> >>
>> >> -Yuri
>> >>
>> >>
>> >> On Fri, Nov 30, 2012 at 7:02 PM, Marko Milivojevic
>> >> <markom_at_ipexpert.com>
>> >> wrote:
>> >>>
>> >>> I knew it was a good guess. That's one of my favorites with BGP. It
>> >>> gets people unawares all the time :-).
>> >>>
>> >>> Now, I think Cisco is well within their rights not to touch that part
>> >>> of the documentation. The next-hop is *usually* reachable via IGP.
>> >>> There are very rare circumstances when the next-hop is reachable via
>> >>> BGP *and* is valid for more than hold-down. It seems like you hit one
>> >>> of those :-)
>> >>>
>> >>> Fun.
>> >>>
>> >>> --
>> >>> Marko Milivojevic - CCIE #18427 (SP R&S)
>> >>> Senior CCIE Instructor - IPexpert
>> >>>
>> >>> On Fri, Nov 30, 2012 at 6:55 PM, John Neiberger <jneiberger_at_gmail.com>
>> >>> wrote:
>> >>> > You are correct! I just did a test by creating a route map to bump
>> >>> > up
>> >>> > the
>> >>> > MED of the prefix in question and it changed the behavior. That
>> >>> > proved
>> >>> > that
>> >>> > even though one path now doesn't have an IGP metric to compare, it's
>> >>> > still
>> >>> > being compared. Maybe Cisco needs to change their documentation to
>> >>> > say
>> >>> > that
>> >>> > one of the steps is to compare the metrics, not just "IGP metrics".
>> >>> > :-)
>> >>> >
>> >>> > Thanks!
>> >>> > John
>> >>> >
>> >>> >
>> >>> > On Fri, Nov 30, 2012 at 7:37 PM, Marko Milivojevic
>> >>> > <markom_at_ipexpert.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Without going any deeper (some topology information is missing and
>> >>> >> m
>> >>> >> pod is otherwise busy to try this, no matter how FUN it sounds),
>> >>> >> I'd
>> >>> >> venture a guess that yes, "igp" metric is compared.
>> >>> >>
>> >>> >> The "igp metric" in this sense is really "the metric to reach the
>> >>> >> protocol, no matter what that protocol might be". In your case, one
>> >>> >> of
>> >>> >> these protocols happens to be BGP. You may want to test this
>> >>> >> hypotesis
>> >>> >> by tweaking the BGP's MED value for the default route to make it
>> >>> >> numerically higher than OSPF cost to reach the next-hop of the
>> >>> >> other
>> >>> >> route.
>> >>> >>
>> >>> >> Funnily enough, this is one of the few places where numerical
>> >>> >> metric
>> >>> >> values of different protocols are directly compared, regardless of
>> >>> >> the
>> >>> >> AD and/or longest-match.
>> >>> >>
>> >>> >> --
>> >>> >> Marko Milivojevic - CCIE #18427 (SP R&S)
>> >>> >> Senior CCIE Instructor - IPexpert
>> >>> >>
>> >>> >> On Fri, Nov 30, 2012 at 6:21 PM, John Neiberger
>> >>> >> <jneiberger_at_gmail.com>
>> >>> >> wrote:
>> >>> >> > I posted this question to the Cisco NSP list and I've also talked
>> >>> >> > to
>> >>> >> > a
>> >>> >> > couple of guys from Cisco Advanced Services and I'm still stumped
>> >>> >> > about
>> >>> >> > something. I'll try my best to phrase it in a way that makes
>> >>> >> > sense.
>> >>> >> >
>> >>> >> > Router A is learning about a prefix from two route reflector
>> >>> >> > clients. In
>> >>> >> > both cases, the next hop for the prefix is the loopback address
>> >>> >> > of
>> >>> >> > the
>> >>> >> > advertising routers. Their loopback addresses are being
>> >>> >> > advertised
>> >>> >> > into
>> >>> >> > OSPF.
>> >>> >> >
>> >>> >> > So, from the perspective of Router A, it's BGP table for this
>> >>> >> > prefix
>> >>> >> > has
>> >>> >> > two paths:
>> >>> >> >
>> >>> >> > 1: 4.4.4.4  (loopback address of Router B, learned via OSPF) *
>> >>> >> > winner
>> >>> >> > due
>> >>> >> > to lower IGP metric
>> >>> >> > 2. 5.5.5.5 (loopback address of Router C, learned via OSPF)
>> >>> >> >
>> >>> >> > Now for the weirdness to begin. A network event occurs that
>> >>> >> > causes
>> >>> >> > the
>> >>> >> > loopback address of Router C to go away. This shouldn't affect
>> >>> >> > Router A
>> >>> >> > because it is already selecting the shortest path to the network
>> >>> >> > via
>> >>> >> > Router
>> >>> >> > B (4.4.4.4).
>> >>> >> >
>> >>> >> > However, Router A is also learning a default via BGP. That means
>> >>> >> > that
>> >>> >> > even
>> >>> >> > though 5.5.5.5 (loopback of Router C) disappeared and is
>> >>> >> > unreachable,
>> >>> >> > the
>> >>> >> > router is doing a recursive lookup and keeps the path in the BGP
>> >>> >> > table;
>> >>> >> > 5.5.5.5 is still reachable, it thinks, by using the default
>> >>> >> > route.
>> >>> >> >
>> >>> >> > The weird thing is that this causes Router A to start using the
>> >>> >> > wrong
>> >>> >> > path!
>> >>> >> > It seems to be preferring a path with a next hop learned via BGP
>> >>> >> > to
>> >>> >> > a
>> >>> >> > path
>> >>> >> > with a next hop learned via OSPF. Why would it do this? I see no
>> >>> >> > documentation that would explain why a BGP-learned next hop is
>> >>> >> > preferred
>> >>> >> > over an IGP-learned next hop.
>> >>> >> >
>> >>> >> > Is the router still comparing IGP metrics even though the "wrong"
>> >>> >> > path
>> >>> >> > now
>> >>> >> > has no IGP metric?
>> >>> >> >
>> >>> >> > It's not changing due to router ID, cluster length, or neighbor
>> >>> >> > IP
>> >>> >> > address.
>> >>> >> > I checked. So, why is it switching?
>> >>> >> >
>> >>> >> > As soon as the BGP session from Router A to Router C times out,
>> >>> >> > the
>> >>> >> > extraneous path gets removed from the BGP table and the router
>> >>> >> > goes
>> >>> >> > back
>> >>> >> > to
>> >>> >> > using the correct path it should have been using all along.
>> >>> >> >
>> >>> >> > So, is a BGP-learned next hop preferred over an IGP-learned next
>> >>> >> > hop? If
>> >>> >> > so, why? If not, any idea why my router switches paths? I've
>> >>> >> > turned
>> >>> >> > on
>> >>> >> > BGP
>> >>> >> > debugging and IP routing debugging and haven't found a suitable
>> >>> >> > explanation
>> >>> >> > for the switch.
>> >>> >> >
>> >>> >> > John
>> >>> >> >
>> >>> >> >
>> >>> >> > Blogs and organic groups at http://www.ccie.net
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > _______________________________________________________________________
>> >>> >> > Subscription information may be found at:
>> >>> >> > http://www.groupstudy.com/list/CCIELab.html
>> >>>
>> >>>
>> >>> Blogs and organic groups at http://www.ccie.net
>> >>>
>> >>>
>> >>> _______________________________________________________________________
>> >>> Subscription information may be found at:
>> >>> http://www.groupstudy.com/list/CCIELab.html
Blogs and organic groups at http://www.ccie.net
Received on Fri Nov 30 2012 - 19:53:02 ART

This archive was generated by hypermail 2.2.0 : Tue Jan 01 2013 - 09:36:52 ART