From: Howard C. Berkowitz (hcb@gettcomm.com)
Date: Sun Jan 05 2003 - 13:55:46 GMT-3
At 1:54 AM -0500 1/5/03, cebuano wrote:
>Hi Gang.
>I am enclosing the response I got from the feedback I made to the URL on
>CCO. I hope this helps those reviewing BGP now. This is what I
>received...
>
>Hello Elmer,
>
>Thanks for your feedback. In this document the paths are ordered from
>newest to oldest is to explain how deterministic med influences the path
>selection.  The way BGP path selection is implemented is way more
>complicated that the simplified way it is explained the path selection
>doc. The reason the oldest path selection ( point 10 in path selection
>doc) was introduced was to reduce the flapping and that a new route
>should not displace the old stable route and create instabilities in the
>network because of route flapping. It also depends on number of other
>reasons like if you already have a bgp best path in the bgp table and
>that if you have just two paths to select from or more than two.
>
>Hope this helps
>
>Regards
>Vivek Baveja
A good response, to which I'd like to add a bit of BGP research work 
that might help understand why some of these knobs are being 
introduced.
The idea of route flap comes up fairly often here, but I usually see 
it in the context of a single bad link being communicated between two 
adjacent AS.  The global problem is more subtle than that.
You see, over the last few years, the fundamental topology of the 
Internet has been changing.  Typically, you used to see AS path 
lengths of 5-10 as an average, because there were a relatively small 
level of upper-tier carriers that most people eventually connected 
to.  These upper-tier carriers also enforced hierarchy and hid 
instabilities through aggregation.
The current problem, however, is that the old hierarchical model (not 
the single core model of EGP, but of BGP-4), is largely broken due to 
operational trends. In Geoff Huston's terms, the net has "flattened". 
AS path lengths tend to be more on the order of 2-3, due to much more 
user-level multihoming.  This is good from the standpoint of 
protecting users against immediate upstream failure, and also allows 
more traffic engineering.
It is bad, however, because when aggregation breaks down, you start 
seeing a stale data problem much as you do in distance vector IGPs 
when split horizon, holddown, etc., are turned off or set to high 
timer values. Since a BGP speaker, under standard assumptions 
(graceful restart helps with this problem) must withdraw all its 
routes when it hears that a speaker has gone down or it can no longer 
reach an AS, there is far more announcement of withdrawals. This is 
subtly different than the problem which route flap dampening solves, 
which is oscillation between advertisements and withdrawals.
There had been an implicit operational assumption that "Bad news 
travels fast" -- i.e., withdrawals propagate faster than 
announcements. Detailed observations by Labovits' team, CAIDA, 
Huston, etc., indicate this isn't the case.  What we see is not a 
flap, but a huge number of often redundant announcements. 
Determiistic MED helps rule these out, although there still is a 
substantial increase in processing load to get to the MED decision.
Is there a well-understood solution? No. A few short-term proposals, 
like Huston's new NOPEER well-known community, will help.  But a 
large part of the problem is that path vector does not scale well 
with the evolving topology.  There are discussions and research about 
new global routing paradigms, but it is extremely difficult to get 
research funding for something that could well cause a meltdown, but 
won't happen for 5 years or so.
Is this a full discussion? Of course not. the IRTF-RR mailing list, 
NANOG, and some of the other research lists are where this is being 
discussed. Cisco and Juniper are actively involved, but haven't put 
large research funding into the problem -- a function of the economy 
and stock market expectations.
.
This archive was generated by hypermail 2.1.4 : Sat Feb 01 2003 - 07:33:42 GMT-3