Partial IPv6 monitoring outage May 21st

Around May 21st a large number of IPv6 servers got marked as unhealthy by the pool monitoring system.

It was remedied not long after a bit of debugging and being brought to the attention of the network and co-location facility hosting the central NTP Pool servers.

It was tracked down to a configuration change related to a router upgrade affecting network with particularly goofy BGP announcements particularly affecting IPv6 networks.

So what the happened…

We’ve been forklifting our LA core network from Cisco 7600 to ASR9K routers. Over the years we had developed a very specialized configuration for the 7600s to protect their limited resources as much as possible. Part of this configuration included a very restrictive maximum as-path filter.

Previously this wasn’t an issue because even though we were discarding as-paths that exceeded our filter length the routers had static default route entries. If there was no matching route in the table, the router would just send the traffic upstream to be dealt with.

Basically a few things happened simultaneously to cause the issue, and make it hard to troubleshoot:

  1. The maximum as-path limit was exceeded for several prefixes that included NTP Pool servers.
  2. During fiber relocations, default routes were not properly updated to their new interface locations.
  3. Unlike IOS, IOS-XR silently drops these prefixes vs complaining loudly in the log.

This has all been remedied by adjusting the maximum as-path limits, correcting the default route entries, and making sure that proper logging is in place.

hi @ask

thanks for NTP pool it is great!

i offer some kind of monitoring: i could setup a daemon in germany that constantly checks all IPv4 and IPv6 of the pool and sends the information to the monitoring station in LA. then LA could use this information to provide more precise information :slight_smile:

Oliver

My time server is being disparaged by the IPv6 monitoring system (v. https://is.gd/IMt5Mw), though less so by the IPv4 one (v. https://is.gd/N5bQhY). Is it my network or the monitoring system’s?

I ran while true; do sleep 70; ntpdate -q -p 1 2605:6000:101e:97::123; done for a few hours on a few servers I have, and noticed some “no server suitable for synchronization found” messages once in a while, for example today between 17:32:24 and 17:37:08 (UTC). I would not blame the monitoring system for this.

what kind of server are you using? KVM VPS, OpenVZ VPS, dedicated?

It seems it works, but apparently sometimes not.

I just found it odd that things looked different on IPv4 from IPv6, though on the same wire and machine.

Actually, other services running on this machine are hiccuping too, but NTP caught my attention sooner because it’s the best monitored.

Thanks for checking it out.

It’s a physical machine dedicated to Internet services.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.