Massive packet loss on be.pool.ntp.org since 9/09/2019

Dear, sry if this is the wrong place/section to post.

I’ve been using be.pool.ntp.org to monitor time differences between local equipement and standard time. I’ve been doing this from a number of networks and I’ve been having issues since 09/09/2019. Most requests are not being answered by the pool.

TCPdump snippet: (1.1.1.1 is a fake IP, the dump is on my internet line/WAN interface)
16:37:26.752409 IP 1.1.1.1.54521 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:29.767974 IP 1.1.1.1.10032 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:32.783754 IP 1.1.1.1.48557 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:35.799376 IP 1.1.1.1.10576 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:38.804825 IP 1.1.1.1.63090 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:41.733746 IP 1.1.1.1.55554 > 193.190.198.14.ntp: NTPv4, Client, length 48
16:37:41.756921 IP 193.190.198.14.ntp > 1.1.1.1.ntp: NTPv4, Server, length 48
16:37:41.820461 IP 1.1.1.1.64601 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:44.836180 IP 1.1.1.1.63118 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:44.856588 IP 45.87.76.3.ntp > 1.1.1.1.65224: NTPv1, Server, length 48
16:37:46.867375 IP 1.1.1.1.49814 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:49.883147 IP 1.1.1.1.59978 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:52.891051 IP 1.1.1.1.16481 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:55.906746 IP 1.1.1.1.49212 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:37:58.922342 IP 1.1.1.1.48866 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:38:01.938018 IP 1.1.1.1.6169 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:38:01.957319 IP 45.87.76.3.ntp > 1.1.1.1.65230: NTPv1, Server, length 48
16:38:03.969375 IP 1.1.1.1.hp-san-mgmt > 45.87.76.3.ntp: NTPv1, Client, length 48
16:38:06.984976 IP 1.1.1.1.33443 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:38:10.000729 IP 1.1.1.1.44963 > 45.87.76.3.ntp: NTPv1, Client, length 48
16:38:13.016323 IP 1.1.1.1.47305 > 45.87.76.3.ntp: NTPv1, Client, length 48

NTP requests to other NTP servers are working without issues ( tested with ntp.belnet.be ).
I’m experiencing this on multiple locations, internet connections/types, firewalls.
Is anyone else experiencing issues?

At the moment it’s a minor issue for appliances as it works sometimes. I noticed it in my monitoring because I only check twice a day with a specific number of requests. As this often failed to deliver results, it came on my radar.

Hello and welcome. One of the volunteer pool admins here. The server 45.87.76.3 is reporting a score of 20 so looks fine. The other options are some kind of network / routing issue or the server has decided not to reply to you. :wink: Looking at the log if I’m reading it right it looks like you are requesting time from it every three seconds, so I would guess it’s probably blocked you as an abusive client! :slight_smile:

My suggestions would be either query the pool server no more often than say once every 15 minutes or set up an NTP server on your site (that connects to the pool) and direct your queries to the local NTP server.

Most default NTPD server configurations have a minimum interval of 8 seconds (per IP) otherwise it can [ignore,rate,kod]…

NTP client default poll intervals are typically a minimum of 64s to a max of 1024s…

If you could do a traceroute that would also be helpful, both to the server you are having problems with and other BE servers.

Thank you for the feedback. I’ll collect more information.

To be clear, the TCPdump was to debug what was happening. ( w32tm /stripchart /computer:be.pool.ntp.org /dataonly )
Normally only one request is done per server each 4 hours or more.

The server giving the most issues is 45.87.76.3
I failed to diagnose any further. I’m only getting 1 in 20 responses or less. Each of the networks impacted are from the Provider Telenet, (but then again, most proper internet lines over here are). Even my home line has the issue. Though other Telenet lines are not impacted at all.
I’ve moved the systems over to ntp.belnet.be for now.

I can do traceroutes if requested (I can’t post the results in public. )

Understood about the TCPDump, but when you say “I’m only getting 1 in 20 responses or less” - is that when requesting once per 4 hours?

I’m guessing from your replies that you’re got a number of hosts on site requesting time from the pool? Roughly how many hosts are requesting? Do they all have the same public IP? Do they each individually request time from the pool or do you have an NTP server on site?

Could you post the section from the config that references the pool servers?

45.87.76.3 seems to respond to ping, so that or traceroute may help diagnose if it’s an internet issue.

I tried polling the server from few different hosts in Europe and I do see the issue on one of them. It looks like on the border between the Aorta and NTT network there is filtering/rate-limiting specific to the port, most likely as a mitigation for the amplification attacks. mtr sending UDP packets to port 123 reports a huge packet loss. Different ports are ok.

 7. de-fra04d-rc1-ae33-0.aorta.net (84.116.135.5)          0.0%
 8. fr-par02b-rd1-ae102-0.aorta.net (84.116.130.102)       0.0% 
 9. ae-21.r00.frnkge13.de.bb.gin.ntt.net (129.250.9.29)   72.2%
10. ae-1.r25.frnkge08.de.bb.gin.ntt.net (129.250.4.16)    77.8%
11. ae-9.r25.amstnl02.nl.bb.gin.ntt.net (129.250.3.77)    58.8%
12. ae-8.r03.amstnl02.nl.bb.gin.ntt.net (129.250.2.109)   77.8%
1 Like

Hi, I tried making a post with a test/log results, but it keeps getting filtered by this sites spam system.

My result was that I was getting los on requests using ipv4, but ipv6 was working fine. I’ve posted my post on pastebin… https://justpaste.it/3smnk

I’m 100% sure now something is wrong external to my systems. Since it’s only on ntp packets and not on ICMP, I’m fairly certain something it wrong inside the pool network.

Hi, thanks for the extra info.

The NTP pool is a collection of machines that people around the world volunteer - there’s no “pool network” in the sense that you describe. There are central machines that monitor the volunteers’ servers to check that they are supplying sensible time and central DNS to resolve IPs of volunteer servers near to the end users, but “the pool network” is just the internet. :slight_smile: The pool has a suggested config for pool servers but operators are free to configure their machines how they like - it may be that operator has more restrictions on IPv4 that IPv6 traffic. There’s a huge amount more IPv4 NTP than IPv6 traffic. We don’t have access or control to volunteers’ servers.

Sorry, I’m not quite clear yet from your description of how NTP is configured on your sites. I’m guessing there could be up to 50 devices that are each requesting time from the NTP pool that are all behind a single public IP? If so the NTP pool server they’re talking to will see 50 times the traffic of one device coming from one IP, so is quite likely to treat it as abusive and not reply. 50 devices making one request every 4 hours is on average a request every 4.8 minutes.

With 20 - 50 devices on site it would be better to set up an NTP server/service (or two) on each site, point the devices to that NTP server, and that one NTP server to the external NTP pool.

Are the devices Windows servers? Could you post the NTP config you are using?

Hi Elljay,

Each site has a local NTP server, machines sync with that. It’s a monitoring system on top of that, that checks machines if they are actually in sync with an external source(and that was the be pool). It detects misconfigurations.
I’ve halved the number of requests that are being done now (and i’ve moved on to the Belnet ntp source). In the future, I’ll set up a central NTP source for the monitoring system.

The tests In my previous post where from my home PC.
No NTP packets where going to the servers before I did my tests from my IP address.
The first requests I did, did not get answered when I did them over IPv4. I would have expected the first request to work, and successive requests to fail if rate limiting is configured.

I’m happy for this post to end here. I thought it was something worth reporting, but perhaps the issue is all mine.
Have a great weekend!