Re: Cat6500 HighAvailability Question

From: Marko Milivojevic (markom@markom.info)
Date: Mon Oct 27 2008 - 13:54:52 ARST


On Mon, Oct 27, 2008 at 15:40, Joseph Brunner <joe@affirmedsystems.com> wrote:
> The problem I have with single sup's is there is almost ALWAYS something
> that will not come back from a chassis failure... most likely it's something
> that is only on one "core" switch. This is true of smaller and larger
> organizations... where the core has become the fastest, most central place
> to plug in something that should be at the access layer (i.e. the san or the
> exchange cluster, etc.)

Almost a valid point, however, breaking the architecture is hardly a
reason fo more investement. Then again, your architecture may be such
to allow these exceptions. I usually fight them until I'm escorted out
of the room by fine gentleman with battons. In 100% cases when
architectures were broken, I had the pleasure to say "I told you so"
few months later.

Connecting SAN or Exchange cluster to the core is perfectly fine (and
it doesnC0t belog to the ccess layer, btw). For those things, there is
almost no excuse not to have it dual connected. If you put dual Sup in
the core, how is that going to prevent failure caused by cut cable
going to the core from SAN? If you have two, why not connect to both
cores? Hardly an argument for 2x$45k+ ...

> So considering I never want the failure of a chassis, because it will cause
> an outage, I always go with dual sups! I have never had an issue where a
> native IOS switch running SSO hangs when failing between sup's... on the
> contrary, I have ALWAYS had an issue where a "core switch" with only one sup
> has failed and something has lost is SOLE network connection. Be that a
> server or a SINGLE MPLS router, etc.

100 men, 100 opinions, of course. Like I said, I always prefer well
designed network architecture that can handle failed node. Heck, I had
a core node failure in backbone few feeks ago and we didn't even
bother sending out failure notice to customers - only "protection at
risk". The network simply rerouted around the failure in ~70ms. And
yeah, it didn't boot properly, but we had plenty of time to fix it, as
no services were affected.

--
Marko
CCIE #18427 (SP)
My network blog: http://cisco.markom.info/

Blogs and organic groups at http://www.ccie.net



This archive was generated by hypermail 2.1.4 : Sat Nov 01 2008 - 15:35:23 ARST