Around May 21st a large number of IPv6 servers got marked as unhealthy by the pool monitoring system.
It was remedied not long after a bit of debugging and being brought to the attention of the network and co-location facility hosting the central NTP Pool servers.
It was tracked down to a configuration change related to a router upgrade affecting network with particularly goofy BGP announcements particularly affecting IPv6 networks.
So what the happened…
We’ve been forklifting our LA core network from Cisco 7600 to ASR9K routers. Over the years we had developed a very specialized configuration for the 7600s to protect their limited resources as much as possible. Part of this configuration included a very restrictive maximum as-path filter.
Previously this wasn’t an issue because even though we were discarding as-paths that exceeded our filter length the routers had static default route entries. If there was no matching route in the table, the router would just send the traffic upstream to be dealt with.
Basically a few things happened simultaneously to cause the issue, and make it hard to troubleshoot:
- The maximum as-path limit was exceeded for several prefixes that included NTP Pool servers.
- During fiber relocations, default routes were not properly updated to their new interface locations.
- Unlike IOS, IOS-XR silently drops these prefixes vs complaining loudly in the log.
This has all been remedied by adjusting the maximum as-path limits, correcting the default route entries, and making sure that proper logging is in place.