BGP and MEDs - Cisco's response and "the rest of the story"

From: Howard C. Berkowitz (hcb@gettcomm.com)
Date: Sun Jan 05 2003 - 13:55:46 GMT-3


At 1:54 AM -0500 1/5/03, cebuano wrote:
>Hi Gang.
>I am enclosing the response I got from the feedback I made to the URL on
>CCO. I hope this helps those reviewing BGP now. This is what I
>received...
>
>Hello Elmer,
>
>Thanks for your feedback. In this document the paths are ordered from
>newest to oldest is to explain how deterministic med influences the path
>selection. The way BGP path selection is implemented is way more
>complicated that the simplified way it is explained the path selection
>doc. The reason the oldest path selection ( point 10 in path selection
>doc) was introduced was to reduce the flapping and that a new route
>should not displace the old stable route and create instabilities in the
>network because of route flapping. It also depends on number of other
>reasons like if you already have a bgp best path in the bgp table and
>that if you have just two paths to select from or more than two.
>
>Hope this helps
>
>Regards
>Vivek Baveja

A good response, to which I'd like to add a bit of BGP research work
that might help understand why some of these knobs are being
introduced.

The idea of route flap comes up fairly often here, but I usually see
it in the context of a single bad link being communicated between two
adjacent AS. The global problem is more subtle than that.

You see, over the last few years, the fundamental topology of the
Internet has been changing. Typically, you used to see AS path
lengths of 5-10 as an average, because there were a relatively small
level of upper-tier carriers that most people eventually connected
to. These upper-tier carriers also enforced hierarchy and hid
instabilities through aggregation.

The current problem, however, is that the old hierarchical model (not
the single core model of EGP, but of BGP-4), is largely broken due to
operational trends. In Geoff Huston's terms, the net has "flattened".
AS path lengths tend to be more on the order of 2-3, due to much more
user-level multihoming. This is good from the standpoint of
protecting users against immediate upstream failure, and also allows
more traffic engineering.

It is bad, however, because when aggregation breaks down, you start
seeing a stale data problem much as you do in distance vector IGPs
when split horizon, holddown, etc., are turned off or set to high
timer values. Since a BGP speaker, under standard assumptions
(graceful restart helps with this problem) must withdraw all its
routes when it hears that a speaker has gone down or it can no longer
reach an AS, there is far more announcement of withdrawals. This is
subtly different than the problem which route flap dampening solves,
which is oscillation between advertisements and withdrawals.

There had been an implicit operational assumption that "Bad news
travels fast" -- i.e., withdrawals propagate faster than
announcements. Detailed observations by Labovits' team, CAIDA,
Huston, etc., indicate this isn't the case. What we see is not a
flap, but a huge number of often redundant announcements.
Determiistic MED helps rule these out, although there still is a
substantial increase in processing load to get to the MED decision.

Is there a well-understood solution? No. A few short-term proposals,
like Huston's new NOPEER well-known community, will help. But a
large part of the problem is that path vector does not scale well
with the evolving topology. There are discussions and research about
new global routing paradigms, but it is extremely difficult to get
research funding for something that could well cause a meltdown, but
won't happen for 5 years or so.

Is this a full discussion? Of course not. the IRTF-RR mailing list,
NANOG, and some of the other research lists are where this is being
discussed. Cisco and Juniper are actively involved, but haven't put
large research funding into the problem -- a function of the economy
and stock market expectations.
.



This archive was generated by hypermail 2.1.4 : Sat Feb 01 2003 - 07:33:42 GMT-3