Partial IPv6 monitoring outage May 21st

ask · June 7, 2017, 7:20am

Around May 21st a large number of IPv6 servers got marked as unhealthy by the pool monitoring system.

It was remedied not long after a bit of debugging and being brought to the attention of the network and co-location facility hosting the central NTP Pool servers.

It was tracked down to a configuration change related to a router upgrade affecting network with particularly goofy BGP announcements particularly affecting IPv6 networks.

So what the happened…

We’ve been forklifting our LA core network from Cisco 7600 to ASR9K routers. Over the years we had developed a very specialized configuration for the 7600s to protect their limited resources as much as possible. Part of this configuration included a very restrictive maximum as-path filter.

Previously this wasn’t an issue because even though we were discarding as-paths that exceeded our filter length the routers had static default route entries. If there was no matching route in the table, the router would just send the traffic upstream to be dealt with.

Basically a few things happened simultaneously to cause the issue, and make it hard to troubleshoot:

The maximum as-path limit was exceeded for several prefixes that included NTP Pool servers.

During fiber relocations, default routes were not properly updated to their new interface locations.

Unlike IOS, IOS-XR silently drops these prefixes vs complaining loudly in the log.

This has all been remedied by adjusting the maximum as-path limits, correcting the default route entries, and making sure that proper logging is in place.

ncomputers.org · June 7, 2017, 5:27pm

hi @ask

thanks for NTP pool it is great!

i offer some kind of monitoring: i could setup a daemon in germany that constantly checks all IPv4 and IPv6 of the pool and sends the information to the monitoring station in LA. then LA could use this information to provide more precise information

Oliver

ebahapo · June 8, 2017, 1:20pm

My time server is being disparaged by the IPv6 monitoring system (v. https://is.gd/IMt5Mw), though less so by the IPv4 one (v. https://is.gd/N5bQhY). Is it my network or the monitoring system’s?

avij · June 8, 2017, 5:42pm

I ran while true; do sleep 70; ntpdate -q -p 1 2605:6000:101e:97::123; done for a few hours on a few servers I have, and noticed some “no server suitable for synchronization found” messages once in a while, for example today between 17:32:24 and 17:37:08 (UTC). I would not blame the monitoring system for this.

ncomputers.org · June 9, 2017, 9:47pm

what kind of server are you using? KVM VPS, OpenVZ VPS, dedicated?

It seems it works, but apparently sometimes not.

ebahapo · June 10, 2017, 5:23am

I just found it odd that things looked different on IPv4 from IPv6, though on the same wire and machine.

Actually, other services running on this machine are hiccuping too, but NTP caught my attention sooner because it’s the best monitored.

Thanks for checking it out.

ebahapo · June 10, 2017, 5:24am

It’s a physical machine dedicated to Internet services.

ask · July 10, 2017, 5:25am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Monitoring station routing problems Server operators	10	1004	July 13, 2019
Problems with the Los Angeles IPv4 monitoring Station Server operators	25	4128	May 10, 2019
Max offset for IPv6 is now much smaller	8	1779	June 25, 2017
What's wrong with IPv6 in Austria	6	1273	May 23, 2017
Remove my server from pool Server operators monitoring	2	1107	November 2, 2019

Partial IPv6 monitoring outage May 21st

Related topics