RE: To dump the cat 5k or not -- Native IOS mode has no High Avai lability Failover

From: Colin Barber (Colin.Barber@xxxxxxxxxxxxxx)
Date: Tue Jul 09 2002 - 05:05:31 GMT-3


   
High availability is a problem with native IOS. It's not a problem on the
4000's because they only have one supervisor but you are looking at 1.5
minutes+ reload time on a 6k.

Still there are problems with hybrid mode. Have you tried to use a flexwan
module? The wan interfaces only appear on the active MSFC. You have to boot
up the second MSFC in primary, configure it and then swap back. Any new
configuration to the routers and you have to copy the secondary's config to
your workstation make the change and TFTP it back because if you make the
change on the router and wr mem you will wipe the wan config because it's
not in it's running config!!

How many times have you had a supervisor or MSFC failure that caused your
standby supervisor/MSFC to kick in? The only problems I have had is a MSFC
running a 99% for 5 minutes, which suddenly cleared, and a supervisor that
started causing problems when a new Ethernet module was installed. On both
occasions because the supervisor/MSFC was not completely down the secondary
did not kick it even though performance was being affected! Both of these
problems were with hybrid mode and not native.

-----Original Message-----
From: jsaxe@Crutchfield.com [mailto:jsaxe@Crutchfield.com]
Sent: 08 July 2002 18:22
To: ccielab@groupstudy.com
Subject: RE: To dump the cat 5k or not -- Native IOS mode has no High
Avai lability Failover

Thank you for your points, Colin. I really do wish I could see 5-minute load
stats on an interface without an SNMP querier. Here's my humble opinion,
based on my experience, my wishes, and my employer's recent purchase of a
Cat6509 with MSFC's. We had been running for a few years on a very early
Cat6509 with no MSFC -- in fact, no PFC either, just the original Supervisor
1 with boring Layer 2 switching. It's been a great, but boring, switch, very
reliable and fast. (I just recently powered it off after >18 months
continuous uptime.) This is not a rant, but it's kinda long; if you have no
interest in Cat6K's, feel free to hit Delete right now. :-)

Anyway, I was excited to get the new L3-capable supervisors, and after
reading that part of the Cisco LAN Switching book and seeing that Cisco
seemed to be leaning toward IOS everywhere and phasing out CatOS, I thought
I'd try native mode. It takes quite a bit of trouble and time to actually
change a supervisor between native and hybrid modes, because of all the
flash ROM downloading and stuff, and you even have to do it again separately
for the second supervisor, but Cisco's documentation was quite good. Note
carefully: what's called "bootflash" in hybrid mode refers to either the
sup's on-board flash or the MSFC's completely separate on-board flash. In
native mode, "bootflash" means the MSFC's flash, and "sup-bootflash" means
the supervisor's flash. This is especially confounding during the conversion
process, since the boot parameters need to be changed to just the right name
at just the right time, or you could get stuck in a situation where you
can't get convenient TFTP access to copy flash images, and you need to break
the startup process with a console cable and use XMODEM-1K to send a big
image at 38,400 baud. Not pleasant. Follow Cisco's directions carefully, and
it will help if you have a flash card to store images temporarily, although
you can do it without as I did.

So after this whole process, I decided to see what the new baby could do,
and I was *stunned* to discover an important core switch feature missing
from native mode: "high availability" failover of the supervisors. In hybrid
mode, you have essentially four devices in there: One supervisor/PFC that's
really doing wire-speed packet switching, one supervisor/PFC that's powered
up and self-tested but is just standing by waiting for the active one to
fail, and two completely independent MSFC's that are both running and doing
as much routing as you want to throw their way. They are tightly integrated
in the sense that both MSFC's are acting as MLS routers for the active PFC,
so an MSFC routes the first packet of a flow and the PFC "snaps the link"
and routes the other zillion packets. But basically both MSFC's are
independent, individually IP-addressed, separately configurable routers
running all the time. It is assumed that the customer will want to make the
configurations very similar, in fact almost identical except for slightly
different IP addresses on each VLAN/subnet and probably different HSRP
priorities for each VLAN (to split the upstream gateway workload), so Cisco
has a very nice configuration-sharing option in which you give statements
like...

interface VLAN 1
  ip address 10.1.1.251 255.255.255.0 alt ip address 10.1.1.252
255.255.255.0
  standby 1 ip 10.1.1.1
  standby 1 priority 105 alt standby 1 priority 100

Then automatically the MSFC in module 1 gets the first address and the
number 2 MSFC gets the second listed address. MSFC 1 will be the active HSRP
peer for this VLAN, with MSFC 2 ready to step in if it fails. Not very hard
to deal with, really. The L2 supervisors see the MSFC's as really fast
routers-on-a-stick, and the MSFC's see the L2 switch ports as magical,
ISL-encapsulated VLANs all coming through one big honkin' multi-gigabit
trunk port.

If one supervisor card fails, there are two failovers to consider: the
switching failover to the other PFC card on the other supervisor, and the
loss of one MSFC. CatOS's "set highavailability enable" feature takes care
of the switching failover beautifully: it statefully transitions to the new
supervisor, with all of spanning tree intact, trunks still trunking,
EtherChannels still aggregated, etc., all in 3 seconds. (Yes, I tried it
many times. Very impressive.) The MSFC just disappears off the face of the
earth, but if HSRP is set up correctly, the other MSFC should assume all its
upstream gateway duties within 10 seconds, or even less if you tune the
"standby timers" shorter. So the end customers should see no more than 10
seconds of downtime, possibly as little as 5 or 3 seconds. Extremely
impressive, really.

Native IOS mode is completely different from this. It has an active and a
standby supervisor, but the standby MSFC is dead to the world until the
active fails. There's only one config to worry about, because management of
the PFC and MSFC is all in one, but you can't even talk to the standby MSFC
while it's standby; it runs precisely the same configuration, with the same
IP addresses, etc. So if the primary fails, first of all the switching does
not fail over within 3 seconds, it takes down spanning tree and everything
will experience at least 30 seconds' outage. But worse, the second PFC/MSFC
combination has to come up from scratch and read the whole config, so even
if the switching did come up in 3 seconds, the new MSFC is just getting
started with its routing protocols and would have little idea how to route.
It's absolutely not the same "hot" failover as hybird mode currently offers.

After discovering this, I went back through the time-consuming procedure to
go back to hybrid mode, and now we're in production. For people who don't
even care to buy a second supervisor module (they are expensive), this isn't
an issue; if the one fails, you will have some downtime no matter what. But,
as at many other companies, this thing is at the center of our network, and
even a few minutes' disruption is going to be pretty painful, so the high
availability was definitely a deal breaker for me. I figure at some point
Cisco will coalesce the highavailability feature into the native IOS Cat6K
software, but that might be a very tall order -- the whole way that the
supervisor card "almost completely boots" and receives its Layer 2 data
structures from the active PFC may be very hard to combine with the way the
MSFC "almost completely boots". I'm not holding my breath, and in the
meantime the hybrid mode is mildly confusing, but not bad overall. I'm
satisfied.

-----Original Message-----
From: Colin Barber [mailto:Colin.Barber@telewest.co.uk]
Sent: Saturday, July 06, 2002 5:11 AM
To: ccielab@groupstudy.com
Subject: RE: To dump the cat 5k or not

To use a Cat 6000/6500 in native mode you need MSFC(s).

One of my native 6500 has 278 ports - you don't want to do a show run on
that, you will be there all day! Use TFTP to copy the config, look at config
in CiscoWorks etc.

To see one interface - Show run interface fa3/33 - this is possible on any
IOS based device

To get a short summary of ports - show interface range - output the same as
CatOS

You can bulk program interfaces - (config)# interface range ........

Cisco have been saying that they will be migrating to IOS for all their
switches so there is one common cli for all their products. This is why the
latest switches are IOS - new 2950 and 3550 series and the sup III for 4000.
Once they remove the extra cost for native mode on Cat 6K expect to see more
of them.

I personally prefer the native mode. It is a bit slower to config but more
powerful. You can get load stats for an interface without having to use
SNMP. The 6K can do load balancing (extra license required) which acts like
a big local-director. You have a startup and a running config so you can
easily rollback a change, the debug command, and you don't have to configure
the switch and the MSFC separately. I would use native IOS more but not all
our switches have MSFCs and there is the extra cost.

If you read the Cisco Press book Cisco LAN switching - CCIE series it talks
about the pro's/con's for both switching routers (Cat 8500) and routing
switches (MLS) and that native IOS has the benefits of both.



This archive was generated by hypermail 2.1.4 : Sat Sep 07 2002 - 19:36:23 GMT-3