RE: Cat6500 HighAvailability Question

From: Joseph Brunner (joe@affirmedsystems.com)
Date: Mon Oct 27 2008 - 13:40:20 ARST

Next message: Joseph Brunner: "RE: Traffic generator ( Need advise for the best so i can"
Previous message: Ahmed Elhoussiny: "Traffic generator ( Need advise for the best so i can perchase"
Maybe in reply to: Han Solo: "Cat6500 HighAvailability Question"
Next in thread: Marko Milivojevic: "Re: Cat6500 HighAvailability Question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

>With all that in mind, if you look into design guidelines for core and
>distribution, you will see that redundant sup's are not really advertised
>heavily, even by Cisco. They are proponents of providing network-level
>redundancy for those layers of network. With all modern features available
>to network designer, like MPLS-TE based FastReroute (MPLS features are just
>fine in Enterprise core, not only SP), link and node protection, having
>nodes fully redundant is unnecessary cost and complexity, better done
>without.

>Access layer is a different thing. Unless you can provide dual connections
>to every work station and then in scalable way solve all the misery with
>operating system drivers, broken NIC's and all sorts of PEBKAC issues,
>network redundancy is not an option. This is the layer where you want to
>have redundancy, yet... this is where we least have it available. The first
>real redundancy is available in 4510-R (or was there 4507-R?) - smaller
>switches support none if it (3750 stacking adds capacity, but redundancy
>for ports on the failed switch is still absent).

The problem I have with single sup's is there is almost ALWAYS something
that will not come back from a chassis failure... most likely it's something
that is only on one "core" switch. This is true of smaller and larger
organizations... where the core has become the fastest, most central place
to plug in something that should be at the access layer (i.e. the san or the
exchange cluster, etc.)

So considering I never want the failure of a chassis, because it will cause
an outage, I always go with dual sups! I have never had an issue where a
native IOS switch running SSO hangs when failing between sup's... on the
contrary, I have ALWAYS had an issue where a "core switch" with only one sup
has failed and something has lost is SOLE network connection. Be that a
server or a SINGLE MPLS router, etc.

-Joe

-----Original Message-----
From: nobody@groupstudy.com [mailto:nobody@groupstudy.com] On Behalf Of
Marko Milivojevic
Sent: Monday, October 27, 2008 11:18 AM
To: Han Solo
Cc: Cisco certification
Subject: Re: Cat6500 HighAvailability Question

On Mon, Oct 27, 2008 at 14:55, Han Solo <emaillists@me.com> wrote:
> Interesting ... I think a combination of the 2 is in order ... For anytype
> of HA you should have both system redundancy but also build the network
with
> dual uplinks etc etc etc ... My big question was really how often do
folks
> with say 12.2(18)SXF8 and above with SSO etc have issue's where they find
> there Core switches setup with tested , properly configugred HA for there
> core 6500's have partial failures in which SSO did not fully failover and
is
> hung somewhere in the middle causing all forms of HA to be lost , I had a
> situation where the x2 Core 6500's running 12.2(18)SXF13 with dual
> SUP720-3BXL's , SSO , all fabric enabled 67XX line cards , redundant mode
> 6000 WATT PSU's , the works .. We also quartly do manual failover tests
to
> remove the "Change Control" factor during a quarter period of time , but
one
> day we had an issue where the Active SUP on CORE1 failed in such a manner
> that the STANDBY sup did not fully take over and litterly they were
flapping
> back and forth , I was curious if others have ran into this , as we are
now
> in the process of having to validate our design which provides both system
> level and network level redundancy , so I have the arguement that even
> though I have a fault redundant network meaning physical links , routing
> protocols , features etc , if the system fails to failover properly then
> this is a moot point...

That's the whole point. If you have redundancy, you need to make sure
that it actually works. If it doesn't you have wasted both time and
money.

With redundant nodes, they work in 99% of the cases, but that one when
something barely fails is always a problem (perfect example being
flapping link). There is no cheap and easy way to address that
problem, other than to be aware of it. In my experience, it is always
better to have a device fail, than to have it failing.

I would like to raise another issue and that is complexity. While it's
quite sexy to say that you have fully redundant network, all nodes
have dual sup's, power supplies, blah blah blah, it adds incredible
complexity to the network. This is something that also needs careful
consideration and management and blindly adding "redundancy" can
actually have very adverse effect on the availability of the network.

With all that in mind, if you look into design guidelines for core and
distribution, you will see that redundant sup's are not really
advertised heavily, even by Cisco. They are proponents of providing
network-level redundancy for those layers of network. With all modern
features available to network designer, like MPLS-TE based FastReroute
(MPLS features are just fine in Enterprise core, not only SP), link
and node protection, having nodes fully redundant is unnecessary cost
and complexity, better done without.

Access layer is a different thing. Unless you can provide dual
connections to every work station and then in scalable way solve all
the misery with operating system drivers, broken NIC's and all sorts
of PEBKAC issues, network redundancy is not an option. This is the
layer where you want to have redundancy, yet... this is where we least
have it available. The first real redundancy is available in 4510-R
(or was there 4507-R?) - smaller switches support none if it (3750
stacking adds capacity, but redundancy for ports on the failed switch
is still absent).

Just my 2c.

-- Marko CCIE #18427 (SP) My network blog: http://cisco.markom.info/

Blogs and organic groups at http://www.ccie.net

Next message: Joseph Brunner: "RE: Traffic generator ( Need advise for the best so i can"
Previous message: Ahmed Elhoussiny: "Traffic generator ( Need advise for the best so i can perchase"
Maybe in reply to: Han Solo: "Cat6500 HighAvailability Question"
Next in thread: Marko Milivojevic: "Re: Cat6500 HighAvailability Question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sat Nov 01 2008 - 15:35:23 ARST