RE: ASA High Availability - Stateful Failover with Dynamic from Joseph L. Brunner on 2012-10-25 (Ccielab archives 10/2012)

From: Joseph L. Brunner <joe_at_affirmedsystems.com>
Date: Thu, 25 Oct 2012 23:06:33 +0000

Great observations Antonio!

Have you tried this with simply running BGP THROUGH the ASA? And not having the ASA participate in dynamic routing?

That was our standard extranet design for a number of years all the way back to the pix 515E - we had 6509 CORE devices EBGP peering with "EGDE" 3845's from BT Radianz outside our firewall perimeter... no issues with routing, connections or BGP during failover I can report.

thanks

-----Original Message-----
From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On Behalf Of Antonio Soares
Sent: Thursday, October 25, 2012 6:34 PM
To: ccielab_at_groupstudy.com
Subject: ASA High Availability - Stateful Failover with Dynamic Routing Protocols

Hello group,

ASA release 8.4.1 introduced a feature called "Stateful Failover with Dynamic Routing Protocols":

"Routes that are learned through dynamic routing protocols (such as OSPF and
EIGRP) on the active unit are now maintained in a Routing Information Base
(RIB) table on the standby unit. Upon a failover event, traffic on the secondary active unit now passes with minimal disruption because routes are known. Routes are synchronized only for link-up or link-down events on an active unit. If the link goes up or down on the standby unit, dynamic routes sent from the active unit may be lost. This is normal, expected behavior."

http://www.cisco.com/en/US/docs/security/asa/roadmap/asa_new_features.html#w
p43273

But this feature has many limitations. When you have a failover and you are peering with another IOS Router or Switch, the IOS device detects that the neighbor changed and deletes everything learned from the ASA and about 10 seconds later rebuilds the routing table:

+++++++++++++++++++++++++++++++++++++++
000190: *Mar 1 04:08:26: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on Vlanxxx from FULL to EXSTART, SeqNumberMismatch
000191: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on Vlanxxx from EXSTART to EXCHANGE, Negotiation Done
000192: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on Vlanxxx from EXCHANGE to LOADING, Exchange Done
000193: *Mar 1 04:08:31: %OSPF-5-ADJCHG: Process 2011, Nbr 172.x.x.x on Vlanxxx from LOADING to FULL, Loading Done

000194: *Mar 1 04:08:32.277: RT: del 172.x.x.x/29 via 172.x.x.x, ospf metric [110/21]

(...)

000275: *Mar 1 04:08:42.284: RT: add 172.x.x.x/29 via 172.x.x.x, ospf metric [110/21]
+++++++++++++++++++++++++++++++++++++++

This causes the obvious downtime of 10 seconds but worse than that, other ASAs in the network terminate the TCP connections due to lack of routing
information:

+++++++++++++++++++++++++++++++++++++++
%ASA-6-110003: Routing failed to locate next hop for TCP from
outside:172.x.x.x/23 to inside:9.x.x.x/35365
%ASA-6-302014: Teardown TCP connection 3609 for inside:9.x.x.x/35365 to
outside:172.x.x.x/23 duration 0:01:00 bytes 50721 No valid adjacency
+++++++++++++++++++++++++++++++++++++++

Cisco has an enhancement to solve this that basically is the implementation of the Non-Stop Forwarding feature (CSCsu90386) but it seems it will take months or years to be available.

Basically the current implementation of Stateful Failover is a Joke. The only workaround I have is getting rid of OSPF or EIGRP and use static routing.

Does anyone has/had this problem and found any type of workaround ?

I have this in the lab if someone is interested in more details:

(inside network)===IOS Switch===OSPF===ASA Failover Pair===OSPF===ASA Failover Pair===(outside network)

Thanks.

Regards,

Antonio Soares, CCIE #18473 (R&S/SP)
amsoares_at_netcabo.pt
http://www.ccie18473.net

Blogs and organic groups at http://www.ccie.net
Received on Thu Oct 25 2012 - 23:06:33 ART

This archive was generated by hypermail 2.2.0 : Thu Nov 01 2012 - 10:53:34 ART