The issue of NTP requests exceeding bandwidth load

PoolMUC · November 4, 2024, 9:25pm

Good, thanks! The score trend visible is very interesting. There’s a few monitors that have issues, but there is a sufficient number that work well enough to keep an almost perfect score. That seems to suggest that the issue with low scoring is (likely) not due to the location of the monitors and their traffic crossing international connections.

Yes, “full” NTP clients (e.g., chronyd, NTP classic, NTPsec), often running on servers or PCs, will keep polling a server for a while after initially getting the IP address of a server. They have mechanisms to continually track upstream servers, and to make continuous corrections to the local clock (called “disciplining” the clock) so it is smooth without jumps in time. That is why they hold on to the IP addresses they got initially across multiple queries to the upstream servers. That is what you are still seeing.

The IoT devices you mention tend to implement a simpler version of NTP called SNTP. Those typically don’t continuously adjust their clocks, but poll upstream servers periodically, and then set the time according to that. That entails that when configured with a DNS name pointing to the pool, they keep re-resolving that each time, and depending on the timing, get different IP addresses every time. So when a server leaves the pool, it will not get any requests from this type of client anymore.

But also non-IoT devices may have an SNTP client only, e.g., older versions of MS Windows (newer versions have “high precision timekeeping” support, which sounds like a full(er) NTP implementation), or systemd-timesyncd becoming more common on Linux.

I encourage you (and others) to set up your own RIPE Atlas software probe. You can earn points that way to eventually run your own tests, but you also help improve the coverage of the RIPE Atlas project so it becomes more useful for investigations like ours.

The additional load, both CPU as well as network wise, is rather small, and the network throughput can be limited to as low as 10 kbit/s. Only thing is that the tools are currently a tad chatty as far as system logging is concerned…

It really depends on circumstances. Pings can be a good rough indicator of network transmission conditions that also NTP packets will encounter, but that always needs to be taken with a grain of salt.

Obviously, ICMP is not NTP, so network operators may treat them differently, and, e.g., ping does not assess the performance of the NTP server itself, e.g., whether it is overloaded or otherwise dropping packets (e.g., rate limiting).

So if one sees some network transmission behavior for ICMP, it is likely that also NTP might be affected, or vice versa, giving an indication as to how to potentially proceed further in the investigation. Always keeping in mind that it might also be quite different.

Case in point: While the NTP measurement to time2.cloud.tencent.com clearly shows packet loss, a ping measurement from the same four original vantage points as the original NTP measurement doesn’t indicate any relevant level of packet loss. In other cases, the correlation is nicely visible.

Since it is quite easy to use, this is typically the first tool I use when troubleshooting a wide range of networking issues (provided the target hopefully doesn’t block it).

Sorry, I didn’t mean to imply there to be anything wrong with your server. This can simply happen if two network operators don’t cooperate very well, or some operator implements some filtering of NTP traffic (there was a time when NTP servers could be abused for amplification attacks).

E.g., my Alibaba-hosted server in Singapore for some reason seems unreachable from the monitor in Finland, at least via NTP.

Good idea, looking forward to the outcome of the experiment. I hope 10 Mbps will be sufficient to cope with the potential peaks you’ll see, keeping my fingers crossed

Variations in the Singapore zone are quite pronounced during the daily cycle.

There’s been a proposal for an API to be able to programatically control the “netspeed” setting, e.g., to automatically manage traffic volume/rate in the face of bandwidth limits/traffic quotas. But the right combination of know-how and resources/time wasn’t available so far to make that happen.

Kind of a sledgehammer method to regulate the load on a server would be to expressly use the monitoring system for that purpose, like it anyhow implictly regulates the traffic by dropping servers from the pool when they get overloaded. A bit more difficult to use these days, now that we (thankfully) have a much larger and diverse set of monitors. And the feedback loop might be a bit too slow in severely underserved zones if traffic rate shall be managed (i.e., when there isn’t a “hard” bandwidth limitation, but, e.g., too high bitrate could trigger protection mechanisms).

Same here

No worries!

Topic		Replies	Views
Some client really can't behave Server operators	54	1703	December 18, 2023
Adding servers to the China zone Server operators	386	25188	June 9, 2022
Collapse of Russia country zone Server operators	202	2050	December 9, 2024
Network: i/o timeout Server operators monitoring	33	526	December 20, 2024
Regulating the load of the NTP servers Pool Development	15	291	November 27, 2024

The issue of NTP requests exceeding bandwidth load

Related topics