RE: ASA High Availability - Stateful Failover with Dynamic

From: Antonio Soares <amsoares_at_netcabo.pt>
Date: Fri, 26 Oct 2012 11:40:46 +0100

The timers I was looking for are the OSPF Throttle timers:

http://www.cisco.com/en/US/docs/ios/12_2s/feature/guide/fs_spftrl.html

Strange defaults we have (show ip ospf):

Initial SPF schedule delay 5000 msecs
Minimum hold time between two consecutive SPFs 10000 msecs
Maximum wait time between two consecutive SPFs 10000 msecs

Regards,

Antonio Soares, CCIE #18473 (R&S/SP)
amsoares_at_netcabo.pt
http://www.ccie18473.net

-----Original Message-----
From: Antonio Soares [mailto:amsoares_at_netcabo.pt]
Sent: sexta-feira, 26 de Outubro de 2012 00:45
To: 'Joseph L. Brunner'; 'ccielab_at_groupstudy.com'
Subject: RE: ASA High Availability - Stateful Failover with Dynamic Routing
Protocols

I can't change the design, there are a few ASA Failover pairs doing routing
in the DC.

I noticed that lowering down the OSPF timers to 1 second, I have a better
result. For example, if I have a Failover in the failover Pair "A", the
failover Pair "B" doesn't rebuild the routing table:

(inside network)===IOS Switch===OSPF===ASA Failover Pair "A"===OSPF===ASA
Failover Pair "B"===(outside network)

The problem is that the IOS Switch still does it and it takes exactly 10
seconds between deleting and re-adding the OSPF routes.

I can't remember where this timer comes from ! Why does it take exactly 10
seconds ? I tried tweaking the OSPF timers but it doesn't make any
difference. I'm talking about this:

+++++++++++++++++++++++++++++++++++++++
000190: *Mar 1 04:08:26: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from FULL to EXSTART, SeqNumberMismatch
000191: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from EXSTART to EXCHANGE, Negotiation Done
000192: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from EXCHANGE to LOADING, Exchange Done
000193: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from LOADING to FULL, Loading Done

000194: *Mar 1 04:08:32.277: RT: del 172.x.x.x/29 via 172.x.x.x, ospf
metric [110/21] <------ DELETES THE ROUTE

(...)

000275: *Mar 1 04:08:42.284: RT: add 172.x.x.x/29 via 172.x.x.x, ospf
metric [110/21] <------ ADDS THE ROUTE 10s LATER
+++++++++++++++++++++++++++++++++++++++

Thanks.

Regards,

Antonio Soares, CCIE #18473 (R&S/SP)
amsoares_at_netcabo.pt
http://www.ccie18473.net

-----Original Message-----
From: Joseph L. Brunner [mailto:joe_at_affirmedsystems.com]
Sent: sexta-feira, 26 de Outubro de 2012 00:07
To: Antonio Soares; ccielab_at_groupstudy.com
Subject: RE: ASA High Availability - Stateful Failover with Dynamic Routing
Protocols

Great observations Antonio!

Have you tried this with simply running BGP THROUGH the ASA? And not having
the ASA participate in dynamic routing?

That was our standard extranet design for a number of years all the way back
to the pix 515E - we had 6509 CORE devices EBGP peering with "EGDE" 3845's
from BT Radianz outside our firewall perimeter... no issues with routing,
connections or BGP during failover I can report.

thanks

-----Original Message-----
From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On Behalf Of
Antonio Soares
Sent: Thursday, October 25, 2012 6:34 PM
To: ccielab_at_groupstudy.com
Subject: ASA High Availability - Stateful Failover with Dynamic Routing
Protocols

Hello group,

ASA release 8.4.1 introduced a feature called "Stateful Failover with
Dynamic Routing Protocols":

"Routes that are learned through dynamic routing protocols (such as OSPF and
EIGRP) on the active unit are now maintained in a Routing Information Base
(RIB) table on the standby unit. Upon a failover event, traffic on the
secondary active unit now passes with minimal disruption because routes are
known. Routes are synchronized only for link-up or link-down events on an
active unit. If the link goes up or down on the standby unit, dynamic routes
sent from the active unit may be lost. This is normal, expected behavior."

http://www.cisco.com/en/US/docs/security/asa/roadmap/asa_new_features.html#w
p43273

But this feature has many limitations. When you have a failover and you are
peering with another IOS Router or Switch, the IOS device detects that the
neighbor changed and deletes everything learned from the ASA and about 10
seconds later rebuilds the routing table:

+++++++++++++++++++++++++++++++++++++++
000190: *Mar 1 04:08:26: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from FULL to EXSTART, SeqNumberMismatch
000191: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from EXSTART to EXCHANGE, Negotiation Done
000192: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from EXCHANGE to LOADING, Exchange Done
000193: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on
Vlanxxx from LOADING to FULL, Loading Done

000194: *Mar 1 04:08:32.277: RT: del 172.x.x.x/29 via 172.x.x.x, ospf
metric [110/21]

(...)

000275: *Mar 1 04:08:42.284: RT: add 172.x.x.x/29 via 172.x.x.x, ospf
metric [110/21]
+++++++++++++++++++++++++++++++++++++++

This causes the obvious downtime of 10 seconds but worse than that, other
ASAs in the network terminate the TCP connections due to lack of routing
information:

+++++++++++++++++++++++++++++++++++++++
%ASA-6-110003: Routing failed to locate next hop for TCP from
outside:172.x.x.x/23 to inside:9.x.x.x/35365
%ASA-6-302014: Teardown TCP connection 3609 for inside:9.x.x.x/35365 to
outside:172.x.x.x/23 duration 0:01:00 bytes 50721 No valid adjacency
+++++++++++++++++++++++++++++++++++++++

Cisco has an enhancement to solve this that basically is the implementation
of the Non-Stop Forwarding feature (CSCsu90386) but it seems it will take
months or years to be available.

Basically the current implementation of Stateful Failover is a Joke. The
only workaround I have is getting rid of OSPF or EIGRP and use static
routing.

Does anyone has/had this problem and found any type of workaround ?

I have this in the lab if someone is interested in more details:

(inside network)===IOS Switch===OSPF===ASA Failover Pair===OSPF===ASA
Failover Pair===(outside network)

Thanks.

Regards,

Antonio Soares, CCIE #18473 (R&S/SP)
amsoares_at_netcabo.pt
http://www.ccie18473.net

Blogs and organic groups at http://www.ccie.net
Received on Fri Oct 26 2012 - 11:40:46 ART

This archive was generated by hypermail 2.2.0 : Thu Nov 01 2012 - 10:53:34 ART