Re: BGP Path Selection weirdness regarding next hops

From: John Neiberger <jneiberger_at_gmail.com>
Date: Fri, 30 Nov 2012 21:01:07 -0700

I just tried it and didn't have the hoped-for results. Here is the BGP
table before the "outage":

   Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0 1.1.1.1 250 0 7922 i
* i100.100.100.0/24 5.5.5.5 0 100 0 ?
* i 5.5.5.5 0 100 0 ?
*>i 4.4.4.4 0 100 0 ?

Then, after removing 5.5.5.5 from OSPF:

R2#show ip bgp 100.100.100.0
BGP routing table entry for 100.100.100.0/24, version 6
Paths: (3 available, best #1, table Default-IP-Routing-Table)
Flag: 0x900
  Advertised to update-groups:
        1 2 3
  Local, (Received from a RR-client)
    5.5.5.5 from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
  Local
    5.5.5.5 from 23.23.23.3 (35.35.35.3)
      Origin incomplete, metric 0, localpref 100, valid, internal
      Originator: 5.5.5.5, Cluster list: 35.35.35.3
  Local, (Received from a RR-client)
    4.4.4.4 (metric 101) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 0, localpref 100, valid, internal

It still switched. It seems as if I have to change the MED of the routes in
the BGP table, but that is a whole different mess and very complicated in
the long run.

This is a tough nut to crack! If this were IOS XR, it's possible to use BGP
next hop tracking and enforce a policy that the prefix length on routes
used to match on next hops can't be less than a certain length, so I could,
for example, make sure no prefix length less than a /30 would suffice to be
used for next hops in BGP. I have no idea how to solve this problem in IOS
yet. lol

On Fri, Nov 30, 2012 at 8:53 PM, Marko Milivojevic <markom_at_ipexpert.com>wrote:

> ... assuming it's the route that will be used to evaluate the next-hop
> for your BGP prefix.
>
> --
> Marko Milivojevic - CCIE #18427 (SP R&S)
> Senior CCIE Instructor - IPexpert
>
> On Fri, Nov 30, 2012 at 7:49 PM, John Neiberger <jneiberger_at_gmail.com>
> wrote:
> > Ah! I misunderstood. I'll take off my current route maps and add one to
> > change the MED of the default route.
> >
> >
> > On Fri, Nov 30, 2012 at 8:38 PM, Marko Milivojevic <markom_at_ipexpert.com>
> > wrote:
> >>
> >> Ah, but when you do that, you're subject to MED comparisons, which
> >> have their own set of rules. I was referring to the MED on the
> >> 0.0.0.0/0
> >>
> >> --
> >> Marko Milivojevic - CCIE #18427 (SP R&S)
> >> Senior CCIE Instructor - IPexpert
> >>
> >> On Fri, Nov 30, 2012 at 7:12 PM, John Neiberger <jneiberger_at_gmail.com>
> >> wrote:
> >> > No, I increased the MED of the prefix in question, 100.100.100.0/24,
> in
> >> > my
> >> > case. The BGP-learned default route is staying at 0 MED.
> >> >
> >> > It seems weird to me, too!
> >> >
> >> >
> >> > On Fri, Nov 30, 2012 at 8:07 PM, Yuri Bank <yuribank_at_gmail.com>
> wrote:
> >> >>
> >> >> So you increased the MED of the default route you're receiving? I
> find
> >> >> it
> >> >> interesting that its the actual metric of each protocol being
> compared,
> >> >> regardless of the prefix-length or AD.
> >> >>
> >> >> -Yuri
> >> >>
> >> >>
> >> >> On Fri, Nov 30, 2012 at 7:02 PM, Marko Milivojevic
> >> >> <markom_at_ipexpert.com>
> >> >> wrote:
> >> >>>
> >> >>> I knew it was a good guess. That's one of my favorites with BGP. It
> >> >>> gets people unawares all the time :-).
> >> >>>
> >> >>> Now, I think Cisco is well within their rights not to touch that
> part
> >> >>> of the documentation. The next-hop is *usually* reachable via IGP.
> >> >>> There are very rare circumstances when the next-hop is reachable via
> >> >>> BGP *and* is valid for more than hold-down. It seems like you hit
> one
> >> >>> of those :-)
> >> >>>
> >> >>> Fun.
> >> >>>
> >> >>> --
> >> >>> Marko Milivojevic - CCIE #18427 (SP R&S)
> >> >>> Senior CCIE Instructor - IPexpert
> >> >>>
> >> >>> On Fri, Nov 30, 2012 at 6:55 PM, John Neiberger <
> jneiberger_at_gmail.com>
> >> >>> wrote:
> >> >>> > You are correct! I just did a test by creating a route map to bump
> >> >>> > up
> >> >>> > the
> >> >>> > MED of the prefix in question and it changed the behavior. That
> >> >>> > proved
> >> >>> > that
> >> >>> > even though one path now doesn't have an IGP metric to compare,
> it's
> >> >>> > still
> >> >>> > being compared. Maybe Cisco needs to change their documentation to
> >> >>> > say
> >> >>> > that
> >> >>> > one of the steps is to compare the metrics, not just "IGP
> metrics".
> >> >>> > :-)
> >> >>> >
> >> >>> > Thanks!
> >> >>> > John
> >> >>> >
> >> >>> >
> >> >>> > On Fri, Nov 30, 2012 at 7:37 PM, Marko Milivojevic
> >> >>> > <markom_at_ipexpert.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> Without going any deeper (some topology information is missing
> and
> >> >>> >> m
> >> >>> >> pod is otherwise busy to try this, no matter how FUN it sounds),
> >> >>> >> I'd
> >> >>> >> venture a guess that yes, "igp" metric is compared.
> >> >>> >>
> >> >>> >> The "igp metric" in this sense is really "the metric to reach the
> >> >>> >> protocol, no matter what that protocol might be". In your case,
> one
> >> >>> >> of
> >> >>> >> these protocols happens to be BGP. You may want to test this
> >> >>> >> hypotesis
> >> >>> >> by tweaking the BGP's MED value for the default route to make it
> >> >>> >> numerically higher than OSPF cost to reach the next-hop of the
> >> >>> >> other
> >> >>> >> route.
> >> >>> >>
> >> >>> >> Funnily enough, this is one of the few places where numerical
> >> >>> >> metric
> >> >>> >> values of different protocols are directly compared, regardless
> of
> >> >>> >> the
> >> >>> >> AD and/or longest-match.
> >> >>> >>
> >> >>> >> --
> >> >>> >> Marko Milivojevic - CCIE #18427 (SP R&S)
> >> >>> >> Senior CCIE Instructor - IPexpert
> >> >>> >>
> >> >>> >> On Fri, Nov 30, 2012 at 6:21 PM, John Neiberger
> >> >>> >> <jneiberger_at_gmail.com>
> >> >>> >> wrote:
> >> >>> >> > I posted this question to the Cisco NSP list and I've also
> talked
> >> >>> >> > to
> >> >>> >> > a
> >> >>> >> > couple of guys from Cisco Advanced Services and I'm still
> stumped
> >> >>> >> > about
> >> >>> >> > something. I'll try my best to phrase it in a way that makes
> >> >>> >> > sense.
> >> >>> >> >
> >> >>> >> > Router A is learning about a prefix from two route reflector
> >> >>> >> > clients. In
> >> >>> >> > both cases, the next hop for the prefix is the loopback address
> >> >>> >> > of
> >> >>> >> > the
> >> >>> >> > advertising routers. Their loopback addresses are being
> >> >>> >> > advertised
> >> >>> >> > into
> >> >>> >> > OSPF.
> >> >>> >> >
> >> >>> >> > So, from the perspective of Router A, it's BGP table for this
> >> >>> >> > prefix
> >> >>> >> > has
> >> >>> >> > two paths:
> >> >>> >> >
> >> >>> >> > 1: 4.4.4.4 (loopback address of Router B, learned via OSPF) *
> >> >>> >> > winner
> >> >>> >> > due
> >> >>> >> > to lower IGP metric
> >> >>> >> > 2. 5.5.5.5 (loopback address of Router C, learned via OSPF)
> >> >>> >> >
> >> >>> >> > Now for the weirdness to begin. A network event occurs that
> >> >>> >> > causes
> >> >>> >> > the
> >> >>> >> > loopback address of Router C to go away. This shouldn't affect
> >> >>> >> > Router A
> >> >>> >> > because it is already selecting the shortest path to the
> network
> >> >>> >> > via
> >> >>> >> > Router
> >> >>> >> > B (4.4.4.4).
> >> >>> >> >
> >> >>> >> > However, Router A is also learning a default via BGP. That
> means
> >> >>> >> > that
> >> >>> >> > even
> >> >>> >> > though 5.5.5.5 (loopback of Router C) disappeared and is
> >> >>> >> > unreachable,
> >> >>> >> > the
> >> >>> >> > router is doing a recursive lookup and keeps the path in the
> BGP
> >> >>> >> > table;
> >> >>> >> > 5.5.5.5 is still reachable, it thinks, by using the default
> >> >>> >> > route.
> >> >>> >> >
> >> >>> >> > The weird thing is that this causes Router A to start using the
> >> >>> >> > wrong
> >> >>> >> > path!
> >> >>> >> > It seems to be preferring a path with a next hop learned via
> BGP
> >> >>> >> > to
> >> >>> >> > a
> >> >>> >> > path
> >> >>> >> > with a next hop learned via OSPF. Why would it do this? I see
> no
> >> >>> >> > documentation that would explain why a BGP-learned next hop is
> >> >>> >> > preferred
> >> >>> >> > over an IGP-learned next hop.
> >> >>> >> >
> >> >>> >> > Is the router still comparing IGP metrics even though the
> "wrong"
> >> >>> >> > path
> >> >>> >> > now
> >> >>> >> > has no IGP metric?
> >> >>> >> >
> >> >>> >> > It's not changing due to router ID, cluster length, or neighbor
> >> >>> >> > IP
> >> >>> >> > address.
> >> >>> >> > I checked. So, why is it switching?
> >> >>> >> >
> >> >>> >> > As soon as the BGP session from Router A to Router C times out,
> >> >>> >> > the
> >> >>> >> > extraneous path gets removed from the BGP table and the router
> >> >>> >> > goes
> >> >>> >> > back
> >> >>> >> > to
> >> >>> >> > using the correct path it should have been using all along.
> >> >>> >> >
> >> >>> >> > So, is a BGP-learned next hop preferred over an IGP-learned
> next
> >> >>> >> > hop? If
> >> >>> >> > so, why? If not, any idea why my router switches paths? I've
> >> >>> >> > turned
> >> >>> >> > on
> >> >>> >> > BGP
> >> >>> >> > debugging and IP routing debugging and haven't found a suitable
> >> >>> >> > explanation
> >> >>> >> > for the switch.
> >> >>> >> >
> >> >>> >> > John
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > Blogs and organic groups at http://www.ccie.net
> >> >>> >> >
> >> >>> >> >
> >> >>> >> >
> >> >>> >> >
> _______________________________________________________________________
> >> >>> >> > Subscription information may be found at:
> >> >>> >> > http://www.groupstudy.com/list/CCIELab.html
> >> >>>
> >> >>>
> >> >>> Blogs and organic groups at http://www.ccie.net
> >> >>>
> >> >>>
> >> >>>
> _______________________________________________________________________
> >> >>> Subscription information may be found at:
> >> >>> http://www.groupstudy.com/list/CCIELab.html

Blogs and organic groups at http://www.ccie.net
Received on Fri Nov 30 2012 - 21:01:07 ART

This archive was generated by hypermail 2.2.0 : Tue Jan 01 2013 - 09:36:52 ART