Re: BGP Path Selection weirdness regarding next hops

From: Marko Milivojevic <markom_at_ipexpert.com>
Date: Fri, 30 Nov 2012 20:09:54 -0800

No I mean, from this router here. Next-hop for 0.0.0.0/0 is 1.1.1.1.
Where's 1.1.1.1 - what's IGP cost to reach it?

--
Marko Milivojevic - CCIE #18427 (SP R&S)
Senior CCIE Instructor - IPexpert
On Fri, Nov 30, 2012 at 8:08 PM, John Neiberger <jneiberger_at_gmail.com> wrote:
> It's directly attached to the router in question.  :-)   lol
>
>
> On Fri, Nov 30, 2012 at 9:06 PM, Marko Milivojevic <markom_at_ipexpert.com>
> wrote:
>>
>> What's the cost to reach 1.1.1.1? I can't believe this didn't occur to
>> me before (I'm beginning to feel ridiculous). I should've labbed it up...
>>
>> --
>> Marko Milivojevic - CCIE #18427 (SP R&S)
>> Senior CCIE Instructor - IPexpert
>>
>> On Fri, Nov 30, 2012 at 8:01 PM, John Neiberger <jneiberger_at_gmail.com>
>> wrote:
>> > I just tried it and didn't have the hoped-for results. Here is the BGP
>> > table
>> > before the "outage":
>> >
>> >    Network          Next Hop            Metric LocPrf Weight Path
>> > *> 0.0.0.0          1.1.1.1                250             0 7922 i
>> > * i100.100.100.0/24 5.5.5.5                  0    100      0 ?
>> > * i                 5.5.5.5                  0    100      0 ?
>> > *>i                 4.4.4.4                  0    100      0 ?
>> >
>> > Then, after removing 5.5.5.5 from OSPF:
>> >
>> > R2#show ip bgp 100.100.100.0
>> > BGP routing table entry for 100.100.100.0/24, version 6
>> > Paths: (3 available, best #1, table Default-IP-Routing-Table)
>> > Flag: 0x900
>> >   Advertised to update-groups:
>> >         1    2    3
>> >   Local, (Received from a RR-client)
>> >     5.5.5.5 from 5.5.5.5 (5.5.5.5)
>> >       Origin incomplete, metric 0, localpref 100, valid, internal, best
>> >   Local
>> >     5.5.5.5 from 23.23.23.3 (35.35.35.3)
>> >       Origin incomplete, metric 0, localpref 100, valid, internal
>> >       Originator: 5.5.5.5, Cluster list: 35.35.35.3
>> >   Local, (Received from a RR-client)
>> >     4.4.4.4 (metric 101) from 4.4.4.4 (4.4.4.4)
>> >       Origin incomplete, metric 0, localpref 100, valid, internal
>> >
>> > It still switched. It seems as if I have to change the MED of the routes
>> > in
>> > the BGP table, but that is a whole different mess and very complicated
>> > in
>> > the long run.
>> >
>> > This is a tough nut to crack! If this were IOS XR, it's possible to use
>> > BGP
>> > next hop tracking and enforce a policy that the prefix length on routes
>> > used
>> > to match on next hops can't be less than a certain length, so I could,
>> > for
>> > example, make sure no prefix length less than a /30 would suffice to be
>> > used
>> > for next hops in BGP. I have no idea how to solve this problem in IOS
>> > yet.
>> > lol
>> >
>> >
>> >
>> >
>> > On Fri, Nov 30, 2012 at 8:53 PM, Marko Milivojevic <markom_at_ipexpert.com>
>> > wrote:
>> >>
>> >> ... assuming it's the route that will be used to evaluate the next-hop
>> >> for your BGP prefix.
>> >>
>> >> --
>> >> Marko Milivojevic - CCIE #18427 (SP R&S)
>> >> Senior CCIE Instructor - IPexpert
>> >>
>> >> On Fri, Nov 30, 2012 at 7:49 PM, John Neiberger <jneiberger_at_gmail.com>
>> >> wrote:
>> >> > Ah! I misunderstood. I'll take off my current route maps and add one
>> >> > to
>> >> > change the MED of the default route.
>> >> >
>> >> >
>> >> > On Fri, Nov 30, 2012 at 8:38 PM, Marko Milivojevic
>> >> > <markom_at_ipexpert.com>
>> >> > wrote:
>> >> >>
>> >> >> Ah, but when you do that, you're subject to MED comparisons, which
>> >> >> have their own set of rules. I was referring to the MED on the
>> >> >> 0.0.0.0/0
>> >> >>
>> >> >> --
>> >> >> Marko Milivojevic - CCIE #18427 (SP R&S)
>> >> >> Senior CCIE Instructor - IPexpert
>> >> >>
>> >> >> On Fri, Nov 30, 2012 at 7:12 PM, John Neiberger
>> >> >> <jneiberger_at_gmail.com>
>> >> >> wrote:
>> >> >> > No, I increased the MED of the prefix in question,
>> >> >> > 100.100.100.0/24,
>> >> >> > in
>> >> >> > my
>> >> >> > case. The BGP-learned default route is staying at 0 MED.
>> >> >> >
>> >> >> > It seems weird to me, too!
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Nov 30, 2012 at 8:07 PM, Yuri Bank <yuribank_at_gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> So you increased the MED of the default route you're receiving? I
>> >> >> >> find
>> >> >> >> it
>> >> >> >> interesting that its the actual metric of each protocol being
>> >> >> >> compared,
>> >> >> >> regardless of the prefix-length or AD.
>> >> >> >>
>> >> >> >> -Yuri
>> >> >> >>
>> >> >> >>
>> >> >> >> On Fri, Nov 30, 2012 at 7:02 PM, Marko Milivojevic
>> >> >> >> <markom_at_ipexpert.com>
>> >> >> >> wrote:
>> >> >> >>>
>> >> >> >>> I knew it was a good guess. That's one of my favorites with BGP.
>> >> >> >>> It
>> >> >> >>> gets people unawares all the time :-).
>> >> >> >>>
>> >> >> >>> Now, I think Cisco is well within their rights not to touch that
>> >> >> >>> part
>> >> >> >>> of the documentation. The next-hop is *usually* reachable via
>> >> >> >>> IGP.
>> >> >> >>> There are very rare circumstances when the next-hop is reachable
>> >> >> >>> via
>> >> >> >>> BGP *and* is valid for more than hold-down. It seems like you
>> >> >> >>> hit
>> >> >> >>> one
>> >> >> >>> of those :-)
>> >> >> >>>
>> >> >> >>> Fun.
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Marko Milivojevic - CCIE #18427 (SP R&S)
>> >> >> >>> Senior CCIE Instructor - IPexpert
>> >> >> >>>
>> >> >> >>> On Fri, Nov 30, 2012 at 6:55 PM, John Neiberger
>> >> >> >>> <jneiberger_at_gmail.com>
>> >> >> >>> wrote:
>> >> >> >>> > You are correct! I just did a test by creating a route map to
>> >> >> >>> > bump
>> >> >> >>> > up
>> >> >> >>> > the
>> >> >> >>> > MED of the prefix in question and it changed the behavior.
>> >> >> >>> > That
>> >> >> >>> > proved
>> >> >> >>> > that
>> >> >> >>> > even though one path now doesn't have an IGP metric to
>> >> >> >>> > compare,
>> >> >> >>> > it's
>> >> >> >>> > still
>> >> >> >>> > being compared. Maybe Cisco needs to change their
>> >> >> >>> > documentation
>> >> >> >>> > to
>> >> >> >>> > say
>> >> >> >>> > that
>> >> >> >>> > one of the steps is to compare the metrics, not just "IGP
>> >> >> >>> > metrics".
>> >> >> >>> > :-)
>> >> >> >>> >
>> >> >> >>> > Thanks!
>> >> >> >>> > John
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > On Fri, Nov 30, 2012 at 7:37 PM, Marko Milivojevic
>> >> >> >>> > <markom_at_ipexpert.com>
>> >> >> >>> > wrote:
>> >> >> >>> >>
>> >> >> >>> >> Without going any deeper (some topology information is
>> >> >> >>> >> missing
>> >> >> >>> >> and
>> >> >> >>> >> m
>> >> >> >>> >> pod is otherwise busy to try this, no matter how FUN it
>> >> >> >>> >> sounds),
>> >> >> >>> >> I'd
>> >> >> >>> >> venture a guess that yes, "igp" metric is compared.
>> >> >> >>> >>
>> >> >> >>> >> The "igp metric" in this sense is really "the metric to reach
>> >> >> >>> >> the
>> >> >> >>> >> protocol, no matter what that protocol might be". In your
>> >> >> >>> >> case,
>> >> >> >>> >> one
>> >> >> >>> >> of
>> >> >> >>> >> these protocols happens to be BGP. You may want to test this
>> >> >> >>> >> hypotesis
>> >> >> >>> >> by tweaking the BGP's MED value for the default route to make
>> >> >> >>> >> it
>> >> >> >>> >> numerically higher than OSPF cost to reach the next-hop of
>> >> >> >>> >> the
>> >> >> >>> >> other
>> >> >> >>> >> route.
>> >> >> >>> >>
>> >> >> >>> >> Funnily enough, this is one of the few places where numerical
>> >> >> >>> >> metric
>> >> >> >>> >> values of different protocols are directly compared,
>> >> >> >>> >> regardless
>> >> >> >>> >> of
>> >> >> >>> >> the
>> >> >> >>> >> AD and/or longest-match.
>> >> >> >>> >>
>> >> >> >>> >> --
>> >> >> >>> >> Marko Milivojevic - CCIE #18427 (SP R&S)
>> >> >> >>> >> Senior CCIE Instructor - IPexpert
>> >> >> >>> >>
>> >> >> >>> >> On Fri, Nov 30, 2012 at 6:21 PM, John Neiberger
>> >> >> >>> >> <jneiberger_at_gmail.com>
>> >> >> >>> >> wrote:
>> >> >> >>> >> > I posted this question to the Cisco NSP list and I've also
>> >> >> >>> >> > talked
>> >> >> >>> >> > to
>> >> >> >>> >> > a
>> >> >> >>> >> > couple of guys from Cisco Advanced Services and I'm still
>> >> >> >>> >> > stumped
>> >> >> >>> >> > about
>> >> >> >>> >> > something. I'll try my best to phrase it in a way that
>> >> >> >>> >> > makes
>> >> >> >>> >> > sense.
>> >> >> >>> >> >
>> >> >> >>> >> > Router A is learning about a prefix from two route
>> >> >> >>> >> > reflector
>> >> >> >>> >> > clients. In
>> >> >> >>> >> > both cases, the next hop for the prefix is the loopback
>> >> >> >>> >> > address
>> >> >> >>> >> > of
>> >> >> >>> >> > the
>> >> >> >>> >> > advertising routers. Their loopback addresses are being
>> >> >> >>> >> > advertised
>> >> >> >>> >> > into
>> >> >> >>> >> > OSPF.
>> >> >> >>> >> >
>> >> >> >>> >> > So, from the perspective of Router A, it's BGP table for
>> >> >> >>> >> > this
>> >> >> >>> >> > prefix
>> >> >> >>> >> > has
>> >> >> >>> >> > two paths:
>> >> >> >>> >> >
>> >> >> >>> >> > 1: 4.4.4.4  (loopback address of Router B, learned via
>> >> >> >>> >> > OSPF) *
>> >> >> >>> >> > winner
>> >> >> >>> >> > due
>> >> >> >>> >> > to lower IGP metric
>> >> >> >>> >> > 2. 5.5.5.5 (loopback address of Router C, learned via OSPF)
>> >> >> >>> >> >
>> >> >> >>> >> > Now for the weirdness to begin. A network event occurs that
>> >> >> >>> >> > causes
>> >> >> >>> >> > the
>> >> >> >>> >> > loopback address of Router C to go away. This shouldn't
>> >> >> >>> >> > affect
>> >> >> >>> >> > Router A
>> >> >> >>> >> > because it is already selecting the shortest path to the
>> >> >> >>> >> > network
>> >> >> >>> >> > via
>> >> >> >>> >> > Router
>> >> >> >>> >> > B (4.4.4.4).
>> >> >> >>> >> >
>> >> >> >>> >> > However, Router A is also learning a default via BGP. That
>> >> >> >>> >> > means
>> >> >> >>> >> > that
>> >> >> >>> >> > even
>> >> >> >>> >> > though 5.5.5.5 (loopback of Router C) disappeared and is
>> >> >> >>> >> > unreachable,
>> >> >> >>> >> > the
>> >> >> >>> >> > router is doing a recursive lookup and keeps the path in
>> >> >> >>> >> > the
>> >> >> >>> >> > BGP
>> >> >> >>> >> > table;
>> >> >> >>> >> > 5.5.5.5 is still reachable, it thinks, by using the default
>> >> >> >>> >> > route.
>> >> >> >>> >> >
>> >> >> >>> >> > The weird thing is that this causes Router A to start using
>> >> >> >>> >> > the
>> >> >> >>> >> > wrong
>> >> >> >>> >> > path!
>> >> >> >>> >> > It seems to be preferring a path with a next hop learned
>> >> >> >>> >> > via
>> >> >> >>> >> > BGP
>> >> >> >>> >> > to
>> >> >> >>> >> > a
>> >> >> >>> >> > path
>> >> >> >>> >> > with a next hop learned via OSPF. Why would it do this? I
>> >> >> >>> >> > see
>> >> >> >>> >> > no
>> >> >> >>> >> > documentation that would explain why a BGP-learned next hop
>> >> >> >>> >> > is
>> >> >> >>> >> > preferred
>> >> >> >>> >> > over an IGP-learned next hop.
>> >> >> >>> >> >
>> >> >> >>> >> > Is the router still comparing IGP metrics even though the
>> >> >> >>> >> > "wrong"
>> >> >> >>> >> > path
>> >> >> >>> >> > now
>> >> >> >>> >> > has no IGP metric?
>> >> >> >>> >> >
>> >> >> >>> >> > It's not changing due to router ID, cluster length, or
>> >> >> >>> >> > neighbor
>> >> >> >>> >> > IP
>> >> >> >>> >> > address.
>> >> >> >>> >> > I checked. So, why is it switching?
>> >> >> >>> >> >
>> >> >> >>> >> > As soon as the BGP session from Router A to Router C times
>> >> >> >>> >> > out,
>> >> >> >>> >> > the
>> >> >> >>> >> > extraneous path gets removed from the BGP table and the
>> >> >> >>> >> > router
>> >> >> >>> >> > goes
>> >> >> >>> >> > back
>> >> >> >>> >> > to
>> >> >> >>> >> > using the correct path it should have been using all along.
>> >> >> >>> >> >
>> >> >> >>> >> > So, is a BGP-learned next hop preferred over an IGP-learned
>> >> >> >>> >> > next
>> >> >> >>> >> > hop? If
>> >> >> >>> >> > so, why? If not, any idea why my router switches paths?
>> >> >> >>> >> > I've
>> >> >> >>> >> > turned
>> >> >> >>> >> > on
>> >> >> >>> >> > BGP
>> >> >> >>> >> > debugging and IP routing debugging and haven't found a
>> >> >> >>> >> > suitable
>> >> >> >>> >> > explanation
>> >> >> >>> >> > for the switch.
>> >> >> >>> >> >
>> >> >> >>> >> > John
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> > Blogs and organic groups at http://www.ccie.net
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> > _______________________________________________________________________
>> >> >> >>> >> > Subscription information may be found at:
>> >> >> >>> >> > http://www.groupstudy.com/list/CCIELab.html
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> Blogs and organic groups at http://www.ccie.net
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> _______________________________________________________________________
>> >> >> >>> Subscription information may be found at:
>> >> >> >>> http://www.groupstudy.com/list/CCIELab.html
Blogs and organic groups at http://www.ccie.net
Received on Fri Nov 30 2012 - 20:09:54 ART

This archive was generated by hypermail 2.2.0 : Tue Jan 01 2013 - 09:36:52 ART