Hi GS:
I've banged my head up against this one for some time. Neither myself nor
Cisco is able to replicate anywhere except production - just too many things
playing off one another to recreate the problem in the lab.
Basically, we have two datacenters. Each datacenter is connected with two
WAN links to the same MPLS backbone. The datacenters are also connected to
each other with a pair of MetroE links.
MPLS Cloud
| |
R1-------R2
| |
S1----S2 [pair of 6500s)
| |
S3-----S4 [pair of Nexus 7ks)
| | [MetroEthernet links)
S5------S6 [pair of Nexus 7ks)
| |
S7--------S8 [pair or 4500s]
| |
R3--------R4
| |
MPLS Cloud
Each datacenter is assigned a /16 prefix, and both datacenters advertise
both /16's to the MPLS. The idea is that if one datacenter completely lost
WAN connectivity, it would still remain reachable through the other
datacenter and metroe links.
The /16's are EIGRP summary routes generated by the Nexus 7K's. Those
summaries get passed up to the routers which participate in both EIGRP and
BGP, and mutual redistribution occurs there on R1/R2 and R3/R4. We only
inject the /16 from EIGRP into BGP, and only inject a default route from BGP
into EIGRP.
There are EIGRP peerings on every link in the picture below except the WAN
links to the cloud and the B2B links between R1/R2 and R3/R4.
There are iBGP peerings between each pair of routers and eBGP peerings
between each router and the MPLS.
What we saw was that for one datacenter, it wouldn't advertise it's own /16
to the MPLS. Result being that traffic to both locations flowed through a
single datacenter.
A debug of BGP showed the following - that it wasn't redistributing the /16
prefix from EIGRP into BGP. Googling that error message yields nothing.
R1#clear ip route vrf TEST 10.15.0.0
Jun 26 00:38:36.399 GMT: BGP(4): route 1:1:10.15.0.0/16 down
Jun 26 00:38:36.399 GMT: BGP(4): add request for 1:1:10.15.0.0/16
Jun 26 00:38:36.399 GMT: BGP: TX VPNv4 Unicast Tab RIB walk done version
30903, added 2 topologies.
*Jun 26 00:38:36.399 GMT: BGP(4): route 1:1:10.15.0.0/16 up but not redist,
deleting <<<<<<<<<<<<<<<*
Jun 26 00:38:36.399 GMT: BGP: TX VPNv4 Unicast Tab RIB walk done version
30903, added 2 topologies.clear ip route vrf CARLYLE 10.15.0.0
The prefix even would show up in the BGP table, but as a RIB failure.
R1#sh ip bgp vpnv4 vrf TEST 10.15.0.0
BGP routing table entry for 1:1:10.15.0.0/16, version 26239
*Paths: (1 available, no best path) <<<<<<<<<<<<<<<<<<<<<<*
Not advertised to any peer
11111 65406 65406 65406 65406 65406, (received-only)
10.128.0.42 from 10.128.0.42 (1.1.1.1)
Origin incomplete, localpref 100, valid, external
Extended Community: RT:935:1
The RIB failure would either be from the eBGP peering with MPLS, or with
iBGP peering with B2B router. I understand it's expected that eBGP AD of 20
would be preferred over EIGRP AD of 90. We tried BGP backdoor command, and
that didn't fix it. That made it so the prefix was installed as EIGRP in
the routing table, but it still wouldn't redistribute that route into BGP.
In my labs, I don't even need the backdoor command - it seems to know to
prefer the EIGRP route in the BGP table, and redistributes perfectly.
The only thing that did fix it was to configure route maps that blocked
learning the /16 route from BOTH eBGP MPLS peering and iBGP B2B peering. It
was as if the route only got injected from EIGRP into BGP if it didn't have
any competition in the BGP table in the first place. Then, it would
redistribute fine.
Oddly enough, this only happened for one datacenter's /16 prefix. The other
datacenter worked fine.
Does anyone have any ideas? Thanks very much,
Blogs and organic groups at http://www.ccie.net
Received on Sat Aug 13 2011 - 15:53:44 ART
This archive was generated by hypermail 2.2.0 : Thu Sep 01 2011 - 06:05:56 ART