From: Joe Rothstein (ziutek@mac.com)
Date: Sat Aug 21 2004 - 04:58:45 GMT-3
Here's a real life oddity that even Cisco can't come up with a good
explanation for.
Two 7206VXR (IOS 12.1(14)) routers go down at exactly the same time.
these routers are connected to two Cat 2950 switches which are trunked
through a gigabit etherchannel. Initial investigation shows that these
routers are continuously rebooting themselves about every minute. Cause
software forced reboot. The routers won't even stay up long enough to
get a sh tech. the router configs are idential more or less, running 10
or so HSRP groups between the routers. Each router has two 2-port
Fastethernet modules. Only one port has HSRP running on it. this port
is of course a trunk port going to each switch, and the HSRP instances
are running over subinterfaces running dot1q encap.
So much for redundancy.
After several hours of trying different things, we finally got one of
the routers to come up and be stable. This was achieved by shutting
down the gig ports on the switch, and shutting down the trunk port to
the router. Then the router was rebooted, quickly shutting down the
four fasthernet ports (if not, the router would reboot again), then the
ports would be enabled one by one (including the subinterfaces) until
all four were up. The second router was eventually brought up but no
traffic was flowing over it, and the gigabit ports also brought back
online.
Now this all might not be so strange if this installation were a new
installation, and not in place for more than a year with no problems
whatsoever. I know the IOS is old, but like I said more than a year of
uptime with no problems.
After sending what info we had to Cisco, including the crashinfo files
on both routers, etc., Cisco said that they "felt" like this was a
hardware problem on the second router (the one that was not passing any
traffic) so we scheduled an outage, and replacement of the second
router.
Replacement time: All ready to go with the replacement. New parts
onsite, copy of the config put on the router, ready to go.
As soon as the "bad" router was switched off to do the replacement, the
good router crashes, and begins the same rebooting sequence all over.
Same attempts to get the "good" router back online fails. Nothing this
time seems to be able to get the second router to come up and stay up.
"Good" router powered off, and the new router put into production, and
also begins to crash in the same manner. A completely new box.
This is when I left my shift after staying an addititonal two hours to
try and solve the problem. I called later in the day, and found out
that they did get the "good" router back into production, but did not
find out why.
All I know now is that both routers will be replaced completely, as
well as upgraded to 12.2 (26) Service Provider version.
Now can anyone begin to explain this problem? I thought it might be an
HSRP bug (and still think that), seeing as this is about the only
process going on directly between the two routers, but we had the
rebooting problem with both of the machines booted up by themselves. I
also could not find any kind of documented bug that might explain this.
And seeing as this IOS is pretty old, I would think this kind of bug
would have already been widely reported.
Any thoughts?
Regards to all,
Joe
-- There is more to life than increasing its speed. - Mahatma GhandiJoseph Rothstein Ridlerstr. 32 80339 Munich Germany
ziutek@mac.com http://www.geocities.com/jozek444 http://www.rothstein.no-ip.org/ http://waywardgenuses.blogspot.com/ http://ziutek.journalspace.com/
This archive was generated by hypermail 2.1.4 : Fri Sep 03 2004 - 07:02:46 GMT-3