Re: OT: ASA Failover & Monitored Interfaces

From: Tony Singh <mothafungla_at_gmail.com>
Date: Mon, 25 Mar 2013 21:29:46 +0000

Anyone raise the problem with TAC? Would be good to see what their view is surely

--
BR
Tony
Sent from my iPad
On 25 Mar 2013, at 14:53, Joe Astorino <joeastorino1982_at_gmail.com> wrote:
> Hi Joe,
> 
> Like I was saying this happened at the same time on two different
> pairs of ASA's.  One pair runs 8.2 code and the other pair 8.4 code.
> I can get you the specifics if you want, but I find it awfully strange
> they would both have the same issue.
> 
> Now, as far as my "unknown" on both sides test, I solved that issue.
> I re-ran the simulation as follows.  I added a single host to the
> segment I am testing.  I then again pruned that VLAN on the switch on
> the primary side.  Sure enough the primary side then showed that
> interface as "failed" and the secondary side showed it as normal.
> That is what should have happened in production.
> 
> 
> On Mon, Mar 25, 2013 at 10:33 AM, Joe Sanchez <marco207p_at_gmail.com> wrote:
>> Joe, what version of code are in production that had the issue?
>> 
>> Regards,
>> Joe Sanchez
>> 
>> ( please excuse the brevity of this email as it was sent via a mobile device.  Please excuse misspelled words or sentence structure.)
>> 
>> On Mar 25, 2013, at 8:58 AM, Joe Astorino <joeastorino1982_at_gmail.com> wrote:
>> 
>>> Well when I did the test and both sides went unknown I think I know why
>>> 
>>> Guide states these 4 tests happen
>>> 
>>> 1) link status test. If this passes we do network tests
>>> 2) network traffic test
>>> 3) arp test
>>> 4) broadcast ping
>>> 
>>> If both sides fail all tests they both go unknown. In this case I
>>> think both sides would have failed because there is literally nothing
>>> on the VLAN except the two firewalls.
>>> 
>>> So it was unknown for like 15 minutes before I fixed it...now this is
>>> still a different problem though than what happened in production. Wha
>>> happened there was both sides continued to show "normal" when there is
>>> no way hellos could be received on several data interfaces. Only when
>>> I manually failed over did those interfaces show failed which is
>>> disturbing
>>> 
>>> Sent from my iPhone
>>> 
>>> On Mar 25, 2013, at 8:43 AM, Jay McMickle <jay.mcmickle_at_yahoo.com> wrote:
>>> 
>>>> Thanks!
>>>> 
>>>> How long did this interface remain as unknown? To Carlos' point, you need to make sure the timer expired (I'm sure you did, and I think it's 15 seconds). Can you tweak the poll time? Can you post your sh run fail output?
>>>> 
>>>> Regards,
>>>> Jay McMickle CCIE #35355
>>>> Sent from my iPhone
>>>> 
>>>> On Mar 24, 2013, at 10:56 PM, Joe Astorino <joeastorino1982_at_gmail.com> wrote:
>>>> 
>>>>> Thanks for the reply Jay and good luck on your lab man! To answer your
>>>>> questions and give you some more info:
>>>>> 
>>>>> - This actually occurred on two different pairs.  One pair is 8.4 and
>>>>> the other is 8.2
>>>>> 
>>>>> - I did some testing and here is what I found.  I don't understand it,
>>>>> but here is what I found : )
>>>>> 
>>>>> One of the sub-interfaces we have on one of these pairs is not used
>>>>> for anything. Being a sub-interface it is of course 802.1Q and the
>>>>> switch connection on the other side is a trunk.  So to simulate the
>>>>> issue, I simply pruned this VLAN from the trunk on the primary side so
>>>>> that the monitoring hello's on that interface would not reach the
>>>>> other side
>>>>> 
>>>>> Interestingly, when running "show failover" the interface I was
>>>>> playing with does not show "failed" it shows "unknown".  Because it is
>>>>> not "failed" failover will never occur.  The only thing I can find
>>>>> about status "unknown" in the documentation is that it is the initial
>>>>> state and that the status cannot be determined.
>>>>> 
>>>>> WTF is unknown?  I mean, if I lose IP connectivity on a monitored
>>>>> interface shouldn't that be considered a failure?  I've read all the
>>>>> ASA documentation about failover, failover triggers and health
>>>>> monitoring and I just don't get this.
>>>>> 
>>>>> On Sun, Mar 24, 2013 at 11:44 PM, Jay McMickle <jay.mcmickle_at_yahoo.com> wrote:
>>>>>> It should indeed, but I agree, I too had this issue March 8, 2013 (yes, I remember the exact date as it was catastrophic).
>>>>>> 
>>>>>> We have 5585-20's in HA A/S, with IPS 4270's inline with fail-close as well.
>>>>>> 
>>>>>> I came unglued when this occurred, and Cisco stated this was a "software" issue but without any attributing bug. We are were running 9.0.1 on the ASA's (now 9.1.1-4), and could not reproduce the issue on our 5585-20/4270 lab hardware with the same IOS. The default failover is set for 1 interface, but to your point, if it doesn't mark it down, it won't failover. In the lab, it worked as expected. In production, it did not, but now does.
>>>>>> 
>>>>>> May I ask what code on the ASA's your running?
>>>>>> 
>>>>>> BTW- the IPS's locked up due to a bug in the 7.1(6)E4 IPS engine when signature 694 was pushed. Interesting to see if any of these variables were constants in your environment.
>>>>>> 
>>>>>> I'm taking the Security IE lab on Wednesday of this week, and these little nuances made me a bit nervous!
>>>>>> 
>>>>>> Cheers.
>>>>>> 
>>>>>> Regards,
>>>>>> Jay McMickle CCIE #35355
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Mar 24, 2013, at 10:17 PM, Joe Astorino <joeastorino1982_at_gmail.com> wrote:
>>>>>> 
>>>>>>> I ran into an interesting situation tonight, and am trying to piece
>>>>>>> together what happened and what should happen.  Pretty simple setup:
>>>>>>> 
>>>>>>> 2 ASA's running in active standby.  These ASA's have a number of
>>>>>>> interfaces and sub-interfaces all of which are monitored in the
>>>>>>> failover configuration.  There are IPS units physically inline between
>>>>>>> these ASA's on most of the interfaces.  The failover interface itself
>>>>>>> is of course a straight connection between the ASA's
>>>>>>> 
>>>>>>> So, the IPS on the primary side "locked up" and it is set to
>>>>>>> fail-closed.  From the inside network, the interfaces on the primary
>>>>>>> ASA were completely unreachable. I was unable to ping the interfaces
>>>>>>> at all. I was expecting that when this happened, the ASA's would
>>>>>>> trigger failover because several of the monitored interfaces were not
>>>>>>> reachable but they didn't.  When logging into the standby unit,
>>>>>>> everything showed "normal" ...all the monitored interfaces were
>>>>>>> normal. As soon as I manually failed it over, that changed and the
>>>>>>> interfaces that were unreachable on the other side showed up as
>>>>>>> "failed".
>>>>>>> 
>>>>>>> So basically, the ASA's were unable to communicate with each other
>>>>>>> over several of the monitored data-interfaces, but the status still
>>>>>>> showed "normal" until a manual failover was done. If the failover link
>>>>>>> is fine, but the ASA's cannot communicate via the monitored data
>>>>>>> interfaces shouldn't that trigger a failover event?
>>>>>>> 
>>>>>>> --
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Joe Astorino
>>>>>>> CCIE #24347
>>>>>>> http://astorinonetworks.com
>>>>>>> 
>>>>>>> "He not busy being born is busy dying" - Dylan
>>>>>>> 
>>>>>>> 
>>>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>>> 
>>>>>>> _______________________________________________________________________
>>>>>>> Subscription information may be found at:
>>>>>>> http://www.groupstudy.com/list/CCIELab.html
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> 
>>>>> Joe Astorino
>>>>> CCIE #24347
>>>>> http://astorinonetworks.com
>>>>> 
>>>>> "He not busy being born is busy dying" - Dylan
>>>>> 
>>>>> 
>>>>> Blogs and organic groups at http://www.ccie.net
>>>>> 
>>>>> _______________________________________________________________________
>>>>> Subscription information may be found at:
>>>>> http://www.groupstudy.com/list/CCIELab.html
>>> 
>>> 
>>> Blogs and organic groups at http://www.ccie.net
>>> 
>>> _______________________________________________________________________
>>> Subscription information may be found at:
>>> http://www.groupstudy.com/list/CCIELab.html
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> Regards,
> 
> Joe Astorino
> CCIE #24347
> http://astorinonetworks.com
> 
> "He not busy being born is busy dying" - Dylan
> 
> 
> Blogs and organic groups at http://www.ccie.net
> 
> _______________________________________________________________________
> Subscription information may be found at: 
> http://www.groupstudy.com/list/CCIELab.html
Blogs and organic groups at http://www.ccie.net
Received on Mon Mar 25 2013 - 21:29:46 ART

This archive was generated by hypermail 2.2.0 : Wed Apr 03 2013 - 19:06:19 ART