From: Jonathan Hays (nomad@gfoyle.org)
Date: Mon Jun 07 2004 - 08:28:24 GMT-3
(Sorry about the off-topic post).
We had a 7513 hiccup and I'm wondering whether anyone has any ideas why
the router ended up at the rommon prompt instead of booting up on the
slave RSP.
This 7513 has a standard configuration with two RSP4 cards. The card in
slot 6 is normally the master and the RSP4 in slot 7 is the slave, which
is the standard Cisco configuration. We just happen to have verified
this a couple days prior to the failure by glancing at the master/slave
lights on the card.
Here's a link to "Configuring High System Availability" on the 7513 for
those interested.
http://www.cisco.com/univercd/cc/td/doc/product/core/cis7505/rte_swit/26
62rsp4.htm#260398
There's another reference entitled "Configuring High System Availability
on the Cisco 7500 Series" under IOS Configuration Fundamentals
Configuration Guide:
http://www.cisco.com/univercd/cc/td/doc/product/software/ios121/121cgcr/
fun_c/fcprt2/fcd205.htm#1001558
After the box was back online I went through this link and verified
everything and the configuration looks okay. None of the fancy HSA
features are configured on this 7513 (such as SLCR, RPR, RPR+, FSU, SSO,
etc. Scroll down in the link above for more detail.) It has been
configured as a standard cold boot failover, using a common config in
NVRAM. To quote CCO: "Two RSP cards in a router provide the most basic
level of increased system availability through a 'cold restart' feature.
A 'cold restart' means that when one RSP card fails, the other RSP card
reboots the router."
Here's a detailed description of what we saw and did (there were 3 of us
looking at this problem).
1. Monitoring software indicated this 7513 was offline (there are
other 7513's) and we could not access the unit remotely.
2. The documentation states that "if a crash has occurred, the RSP in
the odd slot becomes the active and the RSP in the even slot becomes the
standby" which we could verify because the master/slave LEDs on each RSP
were reversed from normal: the slot 6 RSP4 was the slave and the one in
slot 7 was the master.
3. We connected to the 'Y' console cable (connects to both RSPs but you
only see the master) and found the slot 7 master was at the rommon
prompt, with no indication it was trying to boot.
4. We typed 'boot' but that just brought us up to the setup dialog,
indicating the config file was being bypassed. (This may be a config
register problem - but why hadn't the router reached this point on its
own?)
5. Finally we cycled the power on the 7513 and it came up normally.
As I said, after everything was up and back online, I checked the
various parameters and variables (show bootvar, show bootflash, show
slavebootflash, etc) and everything seemed to be configured per the
Cisco
guidelines in the links given above.
Needless to say, the guy responsible for this router (and other 7500s in
the company) is a bit nervous about this incident, since failover did
not work.
Does anyone have any ideas why the slave RSP ended up at the rommon
prompt or where to begin an investigation?
Thanks,
Jonathan
This archive was generated by hypermail 2.1.4 : Sat Jul 03 2004 - 19:40:34 GMT-3