Regulating the load of the NTP servers

There is a need for regulating the load an NTP server is receiving. And not relative to other NTP servers as it is today, but in terms of absolute values. An NTP server owner must be able to declare the maximum value of queries his/her server is ready to handle during a given time interval. This would guarantee that the infrastructure of the volunteer would not get overloaded, otherwise leading the volunteer to eventually leave the pool.

That would also require a regulatory loop implemented in the pool infrastructure. The required basic information for this feedback loop is the absolute load of the NTP servers, already discussed in details in this thread:

The ratio between the actual load and the declared maximum load of an NTP server would be a key factor for the geoDNS server to decide on the frequency the IP address of the given NTP server appearing in the DNS replies for a query of pool server.

I am proposing an action plan item: discuss the implementation details of the load feedback of an NTP server in the already quoted thread.

1 Like

There’s many questions to be answered, and challenges to be overcome, but if it were possible, this were a good thing I think. It would help servers in some areas not getting bored at the “3Gbit” setting, while it would help servers in other areas not get overloaded by many Mbit/s traffic at even the currently lowest setting of “512kbit”. It could help foster the number of smaller servers added to the pool to grow, despite structural challenges in some regions.

At the same time, this does not address one crucial challenge:

What to to do when the demand is higher than the available capacity as per such fixed settings? Where does the excess go? Can it just be ignored? Maybe hand out not four records per IP protocol version, but only one, so clients use less servers simultaneously (as far as the implementation supports it, and users don’t manually try to override it)?

Or…?

1 Like

I do not think it is crucial question, but still a valid one. There are multiple possible ways to go. One option, a radical one is to give DNS answer without address records for the excess traffic request. It is more important to protect the volunteer’s infrastructure relative to provide the time service.

Indeed that would arguably be a somewhat radical approach, to essentially refuse service to clients.

But it could simplify the implementation, at the cost of not being exact, in the sense of having a feedback loop control the actual traffic. But using the rate of DNS requests as a rough measure of the current demand, calibrated periodically against actual traffic seen by a subset of servers reporting their actual load, a similar mechanism as today could be used. Only that it doesn’t hand out server IP addresses in proportion to the current netspeed setting vs. the sum of all netspeed values, but matching it against the estimated rate.

I think some of the work done by Ask already goes somewhat in that direction, e.g., recording the number of times a server’s IP address is being returned for queries, and comparing that to the netspeed fraction (the relatively new “Client distribution” section on a server’s management page). That number of DNS responses with a server’s IP address would then just need to be matched against the estimated rate that would cause on the server.

1 Like

Some DNS request would cause one NTP client, some other DNS request thousands of clients (or some orders more). Depending the query is originating from someone’s own DNS resolver, or from big DNS resolver, like 1.1.1.1 or 8.8.8.8.
The DNS traffic measurement with calibration servers is probably better than nothing, but I am not sure about its reliability.

1 Like

Just brainstorming further. What if server operators could define the frequency the IP address of their server could appear in DNS answers, instead of defining the bandwith relative to the global load as it is today? The server operators could really fine tune their own load, eliminating the need of the complex logic to create a proxy for the actual load of the NTP servers.

Yes, sure, that is what is going on behind the scenes. But this fine-granular picture isn’t what is actually driving this. But given a large enough population, as can be assumed for the pool, including most zones, this will statistically boil down to some average value.

Sure, a closed-loop control system will typically be more accurate than an open-loop one. But it is also more effort and more complex.

Sounds like what I proposed, except the input target value has a different unit, “frequency the IP address of their server could appear in DNS answers”, rather than “bitrate value scaled to the frequency the IP address of their server could appear in DNS answers”. I.e., similar to what is currently misleadingly called “netspeed”, misleadingly given in some kbit, Mbit, or Gbit, but with a better name, and not being relative.

That would get rid of the (complexity of the) automatic feedback loop, and make the feedback loop be manual operator intervention, which could be good enough.

In a way, that is what we have today, except that currently, part of the equation is a bit of a moving target, at least in underserved zones. And the granularity of the inputs doesn’t really give operators full control. But in well-served zones, that is what we pretty much have today: Set a “netspeed” value, observe the traffic, and adjust the netspeed value until you get the desired throughput. In a well-served zone, that should remain sufficiently stable for some time, until traffic patterns change, e.g., the effect you criticised above, how the DNS rate maps to actual traffic when the resolver landscape/usage changes.

1 Like