Re: OT: ASA Failover & Monitored Interfaces

From: Joe Astorino <joeastorino1982_at_gmail.com>
Date: Mon, 25 Mar 2013 10:53:56 -0400

Hi Joe,

Like I was saying this happened at the same time on two different
pairs of ASA's. One pair runs 8.2 code and the other pair 8.4 code.
I can get you the specifics if you want, but I find it awfully strange
they would both have the same issue.

Now, as far as my "unknown" on both sides test, I solved that issue.
I re-ran the simulation as follows. I added a single host to the
segment I am testing. I then again pruned that VLAN on the switch on
the primary side. Sure enough the primary side then showed that
interface as "failed" and the secondary side showed it as normal.
That is what should have happened in production.

On Mon, Mar 25, 2013 at 10:33 AM, Joe Sanchez <marco207p_at_gmail.com> wrote:
> Joe, what version of code are in production that had the issue?
>
> Regards,
> Joe Sanchez
>
> ( please excuse the brevity of this email as it was sent via a mobile device. Please excuse misspelled words or sentence structure.)
>
> On Mar 25, 2013, at 8:58 AM, Joe Astorino <joeastorino1982_at_gmail.com> wrote:
>
>> Well when I did the test and both sides went unknown I think I know why
>>
>> Guide states these 4 tests happen
>>
>> 1) link status test. If this passes we do network tests
>> 2) network traffic test
>> 3) arp test
>> 4) broadcast ping
>>
>> If both sides fail all tests they both go unknown. In this case I
>> think both sides would have failed because there is literally nothing
>> on the VLAN except the two firewalls.
>>
>> So it was unknown for like 15 minutes before I fixed it...now this is
>> still a different problem though than what happened in production. Wha
>> happened there was both sides continued to show "normal" when there is
>> no way hellos could be received on several data interfaces. Only when
>> I manually failed over did those interfaces show failed which is
>> disturbing
>>
>> Sent from my iPhone
>>
>> On Mar 25, 2013, at 8:43 AM, Jay McMickle <jay.mcmickle_at_yahoo.com> wrote:
>>
>>> Thanks!
>>>
>>> How long did this interface remain as unknown? To Carlos' point, you need to make sure the timer expired (I'm sure you did, and I think it's 15 seconds). Can you tweak the poll time? Can you post your sh run fail output?
>>>
>>> Regards,
>>> Jay McMickle CCIE #35355
>>> Sent from my iPhone
>>>
>>> On Mar 24, 2013, at 10:56 PM, Joe Astorino <joeastorino1982_at_gmail.com> wrote:
>>>
>>>> Thanks for the reply Jay and good luck on your lab man! To answer your
>>>> questions and give you some more info:
>>>>
>>>> - This actually occurred on two different pairs. One pair is 8.4 and
>>>> the other is 8.2
>>>>
>>>> - I did some testing and here is what I found. I don't understand it,
>>>> but here is what I found : )
>>>>
>>>> One of the sub-interfaces we have on one of these pairs is not used
>>>> for anything. Being a sub-interface it is of course 802.1Q and the
>>>> switch connection on the other side is a trunk. So to simulate the
>>>> issue, I simply pruned this VLAN from the trunk on the primary side so
>>>> that the monitoring hello's on that interface would not reach the
>>>> other side
>>>>
>>>> Interestingly, when running "show failover" the interface I was
>>>> playing with does not show "failed" it shows "unknown". Because it is
>>>> not "failed" failover will never occur. The only thing I can find
>>>> about status "unknown" in the documentation is that it is the initial
>>>> state and that the status cannot be determined.
>>>>
>>>> WTF is unknown? I mean, if I lose IP connectivity on a monitored
>>>> interface shouldn't that be considered a failure? I've read all the
>>>> ASA documentation about failover, failover triggers and health
>>>> monitoring and I just don't get this.
>>>>
>>>> On Sun, Mar 24, 2013 at 11:44 PM, Jay McMickle <jay.mcmickle_at_yahoo.com> wrote:
>>>>> It should indeed, but I agree, I too had this issue March 8, 2013 (yes, I remember the exact date as it was catastrophic).
>>>>>
>>>>> We have 5585-20's in HA A/S, with IPS 4270's inline with fail-close as well.
>>>>>
>>>>> I came unglued when this occurred, and Cisco stated this was a "software" issue but without any attributing bug. We are were running 9.0.1 on the ASA's (now 9.1.1-4), and could not reproduce the issue on our 5585-20/4270 lab hardware with the same IOS. The default failover is set for 1 interface, but to your point, if it doesn't mark it down, it won't failover. In the lab, it worked as expected. In production, it did not, but now does.
>>>>>
>>>>> May I ask what code on the ASA's your running?
>>>>>
>>>>> BTW- the IPS's locked up due to a bug in the 7.1(6)E4 IPS engine when signature 694 was pushed. Interesting to see if any of these variables were constants in your environment.
>>>>>
>>>>> I'm taking the Security IE lab on Wednesday of this week, and these little nuances made me a bit nervous!
>>>>>
>>>>> Cheers.
>>>>>
>>>>> Regards,
>>>>> Jay McMickle CCIE #35355
>>>>> Sent from my iPhone
>>>>>
>>>>> On Mar 24, 2013, at 10:17 PM, Joe Astorino <joeastorino1982_at_gmail.com> wrote:
>>>>>
>>>>>> I ran into an interesting situation tonight, and am trying to piece
>>>>>> together what happened and what should happen. Pretty simple setup:
>>>>>>
>>>>>> 2 ASA's running in active standby. These ASA's have a number of
>>>>>> interfaces and sub-interfaces all of which are monitored in the
>>>>>> failover configuration. There are IPS units physically inline between
>>>>>> these ASA's on most of the interfaces. The failover interface itself
>>>>>> is of course a straight connection between the ASA's
>>>>>>
>>>>>> So, the IPS on the primary side "locked up" and it is set to
>>>>>> fail-closed. From the inside network, the interfaces on the primary
>>>>>> ASA were completely unreachable. I was unable to ping the interfaces
>>>>>> at all. I was expecting that when this happened, the ASA's would
>>>>>> trigger failover because several of the monitored interfaces were not
>>>>>> reachable but they didn't. When logging into the standby unit,
>>>>>> everything showed "normal" ...all the monitored interfaces were
>>>>>> normal. As soon as I manually failed it over, that changed and the
>>>>>> interfaces that were unreachable on the other side showed up as
>>>>>> "failed".
>>>>>>
>>>>>> So basically, the ASA's were unable to communicate with each other
>>>>>> over several of the monitored data-interfaces, but the status still
>>>>>> showed "normal" until a manual failover was done. If the failover link
>>>>>> is fine, but the ASA's cannot communicate via the monitored data
>>>>>> interfaces shouldn't that trigger a failover event?
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Joe Astorino
>>>>>> CCIE #24347
>>>>>> http://astorinonetworks.com
>>>>>>
>>>>>> "He not busy being born is busy dying" - Dylan
>>>>>>
>>>>>>
>>>>>> Blogs and organic groups at http://www.ccie.net
>>>>>>
>>>>>> _______________________________________________________________________
>>>>>> Subscription information may be found at:
>>>>>> http://www.groupstudy.com/list/CCIELab.html
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Joe Astorino
>>>> CCIE #24347
>>>> http://astorinonetworks.com
>>>>
>>>> "He not busy being born is busy dying" - Dylan
>>>>
>>>>
>>>> Blogs and organic groups at http://www.ccie.net
>>>>
>>>> _______________________________________________________________________
>>>> Subscription information may be found at:
>>>> http://www.groupstudy.com/list/CCIELab.html
>>
>>
>> Blogs and organic groups at http://www.ccie.net
>>
>> _______________________________________________________________________
>> Subscription information may be found at:
>> http://www.groupstudy.com/list/CCIELab.html
>>
>>
>>
>>
>>
>>
>>

-- 
Regards,
Joe Astorino
CCIE #24347
http://astorinonetworks.com
"He not busy being born is busy dying" - Dylan
Blogs and organic groups at http://www.ccie.net
Received on Mon Mar 25 2013 - 10:53:56 ART

This archive was generated by hypermail 2.2.0 : Wed Apr 03 2013 - 19:06:19 ART