Hi Jerry,
I've seen header synchronization errors during a topology migration in a live
environment, and the problem there was related to BGP segment corruption when
PMTUD was used to determine MSS sizes. However the problem was very
intermittent and unpredictable; it affected multiple routers, but not at the
same time even though they received the same BGP routes (so there was one or
more other factors involved as well).
Packet captures showed that the BGP segments were being corrupted before being
received, and depending on which part of the segment the corruption occurred
in, it resulted in different error messages on the receiving router. When a
router receives a bad BGP segment it also resets the session.
One link used to connect the BGP neighbors had a 1500 byte MTU, whereas the
others were all much higher. Using the same MTU on all the links effectively
removed PMTUD from the MSS calculation, and resolved the issue.
If you have PMTUD enabled, then disable it, clear the sessions and then see if
the error reoccurs. This has the drawback of reducing the MSS of course, so
using consistent MTUs is a better workaround for live environments.
Hope this helps,
Paul.
Blogs and organic groups at http://www.ccie.net
Received on Sat Jun 27 2009 - 08:56:59 ART
This archive was generated by hypermail 2.2.0 : Wed Jul 01 2009 - 20:02:37 ART