There is no "real" ECMP limitation in a protocol per-se (rather just
software implementation), the true limiting factor is hardware
capability. All modern data-center switches (proprietary or merchant
silicon based) support large ECMP fan-outs in hardware, and moreover
support some form of FIB hierarchy, such that ECMP group could be
shared across multiple prefixes. For example, Broadcom trident+
supports ECMP group sizes way over 64, beyond its physical port count.
2013/2/28 marc abel <marcabel_at_gmail.com>:
> One general question I have regarding Clos fabrics is how you deal with so
> many equal cost paths? In some of the topologies I see have maybe 32 or
> even 128 spine switches giving you say 32 equal cost paths but most routing
> protocols usually only support 6,8, or 16 max paths.
>
> Can anyone explain this to me?
>
>
> On Tue, Feb 26, 2013 at 4:53 AM, Carlos G Mendioroz <tron_at_huapi.ba.ar>wrote:
>
>> Ah, ok, now I see what you mean.
>> You do have independent Clos fabrics (much like storage A/B but
>> potentially C/D/... :) and then if one of the spines loose a destination,
>> because of default route it would continue to receive traffic to it.
>> Thanks.
>>
>>
>> Petr Lapukhov @ 26/02/2013 04:34 -0300 dixit:
>>
>> The problem is that in "classic" folded Clos topology there is only a
>>> single L3 path from "middle stage" (spine) to the input/output stage
>>> (leaf or ToR). Therefore, if a single link b/w leaf and spine fails,
>>> there is no other way around from that spine to that leaf.
>>>
>>> Imagine that you announce a default route to a leaf from spine device,
>>> and on that same spine another link to a different leaf fails. The
>>> first leaf switch would not know about the failure, since it only
>>> receives a default route, and will keep sending packets even to the
>>> spine with the failed link, obliviously following all ECMP paths -
>>> thus effectively black-holing traffic.
>>>
>>> If you want to allow for route summarization in Clos topologies you
>>> need to make sure there is at least two parallel paths from spine to
>>> leaf (by "compressing" the spine devices and mapping multiple links
>>> from a leaf on the same spine device). This would make you resilient
>>> to a single link failure, but would expose to a different problem -
>>> when one of the parallel paths fail, the other one will have to pick
>>> up 2x the traffic, often creating congestion. There is always a
>>> tradeoff you have to make...
>>>
>>> 2013/2/25 Carlos G Mendioroz <tron_at_huapi.ba.ar>:
>>>
>>>> Interesting :)
>>>> Petr, can you please give me some hint on how default route only can led
>>>> to
>>>> black holing ? (Slide 24). I fail to see how "default only" where by
>>>> definition there are no details can create a hole.
>>>>
>>>> Thanks,
>>>> -Carlos
>>>>
>>>> Petr Lapukhov @ 23/02/2013 16:48 -0300 dixit:
>>>>
>>>> There is even more fun when you add centralized routing control there,
>>>>> doing SDN-type stuff with BGP only :)
>>>>>
>>>>> 2013/2/23 Antonio Soares <amsoares_at_netcabo.pt>:
>>>>>
>>>>>>
>>>>>> I found this presentation made by Petr Lapukhov:
>>>>>>
>>>>>>
>>>>>> http://www.nanog.org/meetings/**nanog55/abstracts.php?pt=**
>>>>>> MTk0MiZuYW5vZzU1&nm=n<http://www.nanog.org/meetings/nanog55/abstracts.php?pt=MTk0MiZuYW5vZzU1&nm=n>
>>>>>> anog55
>>>>>>
>>>>>> BGP to the ToR. No OSPF, no vPC, no L2. Really excelent.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Antonio Soares, CCIE #18473 (R&S/SP)
>>>>>> amsoares_at_netcabo.pt
>>>>>> http://www.ccie18473.net
>>>>>>
>>>>>>
>>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>>
>>>>>> ______________________________**______________________________**
>>>>>> ___________
>>>>>> Subscription information may be found at:
>>>>>> http://www.groupstudy.com/**list/CCIELab.html<http://www.groupstudy.com/list/CCIELab.html>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Carlos G Mendioroz <tron_at_huapi.ba.ar> LW7 EQI Argentina
>>>>
>>>>
>>>>
>>>> Blogs and organic groups at http://www.ccie.net
>>>>
>>>> ______________________________**______________________________**
>>>> ___________
>>>> Subscription information may be found at:
>>>> http://www.groupstudy.com/**list/CCIELab.html<http://www.groupstudy.com/list/CCIELab.html>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>> --
>> Carlos G Mendioroz <tron_at_huapi.ba.ar> LW7 EQI Argentina
>>
>>
>> Blogs and organic groups at http://www.ccie.net
>>
>> ______________________________**______________________________**
>> ___________
>> Subscription information may be found at: http://www.groupstudy.com/**
>> list/CCIELab.html <http://www.groupstudy.com/list/CCIELab.html>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Marc Abel
> CCIE #35470
> (Routing and Switching)
>
>
> Blogs and organic groups at http://www.ccie.net
>
> _______________________________________________________________________
> Subscription information may be found at:
> http://www.groupstudy.com/list/CCIELab.html
>
>
>
>
>
>
>
-- Petr Lapukhov, petr_at_INE.com CCIE #16379 (R&S/Security/SP/Voice) CCDE #20100007 Internetwork Expert, Inc. http://www.INE.com Toll Free: 877-224-8987 Outside US: 775-826-4344 Blogs and organic groups at http://www.ccie.netReceived on Fri Mar 01 2013 - 08:10:43 ART
This archive was generated by hypermail 2.2.0 : Wed Apr 03 2013 - 19:06:18 ART