Gradually add/remove server to/from pool in parallel to score increase/decrease

PoolMUC · May 1, 2024, 2:14pm

Hi,

I was wondering whether it would be possible, as well as make sense, to have the pool gradually add/remove a server to/from the pool (i.e., scale the inclusion in DNS responses) in parallel to the server’s score increasing/decreasing, vs. the current binary on/off at score 10?

I’ve been wondering about that as part of recurring discussions on this forum related to challenges in underserved regions, but am now experiencing that myself first hand (though not in a way that I couldn’t deal with it by own means).

E.g., instead of the netspeed setting being considered in a binary on/off fashion when the score crosses the 10 points boundary, more something like this:

0 < score < 10: fraction of inclusion in DNS = netspeed * 0% (= 0, as now)
15 <= score <= 20: fraction of inclusion in DNS = netspeed * 100% (= full “netspeed”, as now for the entire range 10-20)
10 <= score < 15: fraction of inclusion in DNS = netspeed * (score - 10) / 5 (new, somewhere between 0% and 100% of “netspeed”)

One issue that was reported time and time again in various threads related to under-served zones was that adding a new server to such an under-served zone is a challenge because once the score crosses the boundary of 10 points, the server gets hit “right away” with the full traffic load corresponding to its netspeed share in that zone*. Enough to right away bring a less beefy server down again in the scoring, eventually/potentially leading to some kind of yo-yo effect for the server’s score, with related traffic pattern. Which in turn may have repercussions on other such servers in that zone, leading to a domino effect of servers dropping from the zone, as described a few times in this forum.

Having a more gradual “DNS share” increase does not solve the underlying problem of a zone being under-served, but might make it a bit easier to add servers to the zone, and might help in keeping the number of servers in such a zone steadier, helping all the servers in the zone.

@ask, you previously hinted at working on something like that, though at the time more in the context of dealing with “weird” server behavior. While I think the conclusion at the time was that the specific “tests” considered then might not have been useful (potentially triggering default rate limits of some server implementations), I think the functionality of such more nuanced inclusion in the pool would generally be useful in adequately-served zones as well (not only in under-served ones).

Namely by generally reducing exposure of clients to not-optimally scoring servers, be it because a server is on its (temporary) way out of the pool, i.e., while transitioning in -5 downward steps in scoring from high values to values below 10. Or be it because there are semi-persistent issues with a server, e.g., in connectivity or maintaining good offset, so the server’s score oscillates somewhere at the lower end of the 10…20 score range, or sometimes dips into the area below 10 points.

The above “formula” is just a proposal, trying to keep the general “in the pool above 10, out of the pool below 10” approach, limiting the gradual part to the lower half of the “in the pool” range to keep a sufficient score range where full “netspeed” is reached, and being relatively simple (linear relationship between score and DNS inclusion share in the transition area). But this could obviously be tweaked, e.g., as far as threshold values are concerned, especially after some potential real-life experience with such an approach.

* This description is a simplification, the actual process is a bit more differentiated, but still results in a potentially very steep/sudden increase in traffic load prone to cause issues, e.g., in my case “DDoS protection” outside of my control temporarily blocking all NTP traffic to my server.

paulgear · May 3, 2024, 8:51am

I really like this idea. Obviously it would need to be tested in practice to see how practical it is, but a graduated weighting makes a lot of sense if a host is drifting between 10 & 20.

Bas · May 4, 2024, 4:27pm

Zones are NOT under-serviced, you really have to stop making this point.
People should use pool.ntp.org and NOT local pools.

As the pool.ntp.org does have enough servers even for countries that have little or no own servers.
It doesn’t matter. It doesn’t.

@ask should point ALL local zones to the world-pool regardless.

In the past internet peers where expensive and not as good/fast as today, the local.pool.ntp.org is therefor obsolete.

In short, there is no such thing as area’s that are under-served, really there isn’t any country that is starving of time.

People that still use it, should stop doing so.

There is just 1 pool: pool.ntp.org

Stop using outdated links, unless you are a company and you have your own pool-url.

My 2cts Bas.

PoolMUC · May 4, 2024, 5:01pm

Sure, as in the past, I agree, and have myself previously advocated for that. But as previously noted, until you have reached out to each and every client out there, and convinced them to change their ways, server operators in certain countries (and I am explicitly not using the term “zone”) are confronted with the challenge of being assigned too many clients, and need to deal with that. In a dream world, we wouldn’t have the issue. In reality, we do, and as it looks, for some time still.

Also, use of the “local” zones are not the only issue, the current GeoDNS approach is. Even if you use the global zone, if the GeoDNS server gets an idea as to the country you are in, you’ll again end up limited to the servers of your country zone, even if you have configured the global zone, not a local one. See the sources mentioned here, and then try it yourself.

Bas · May 4, 2024, 5:21pm

Have you checked this? As typical the biggest WORLD servers get the most requests.

See my check for Belgium:

bas@workstation:~$ nslookup pool.ntp.org
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	pool.ntp.org
Address: 162.159.200.1
Name:	pool.ntp.org
Address: 45.87.77.15
Name:	pool.ntp.org
Address: 213.211.167.202
Name:	pool.ntp.org
Address: 162.159.200.123

bas@workstation:~$ nslookup 162.159.200.1
1.200.159.162.in-addr.arpa	name = time.cloudflare.com.

Authoritative answers can be found from:

Belgium isn’t starved. Also note, the last server is also Cloudfare.

Typical we are used as backup for major servers…does that matter? Nope.

Bas.

PoolMUC · May 4, 2024, 5:29pm

Yes.

A single query does not show you the entire set of servers that are allocated to a country zone, at least not if there are 4 or more servers in the zone. So it shows exactly nothing as far as the topic of under-served zones is concerned.

Then why do you bring it up? Because Belgium isn’t starved, other zones can’t be, either? I don’t understand.

PoolMUC · May 4, 2024, 5:34pm

See the list of servers returned for pool.ntp.org from 9.9.9.9, and the number of times, when queried from Singapore over the last few minutes:

  4 IPv4 Address: 172.104.44.120
  9 IPv4 Address: 139.162.96.56
 10 IPv4 Address: 119.28.230.190
 17 IPv4 Address: 209.58.185.100
 22 IPv4 Address: 47.241.41.246
 23 IPv4 Address: 144.126.242.176
 26 IPv4 Address: 133.243.238.163
 29 IPv4 Address: 129.250.35.250
 31 IPv4 Address: 167.172.70.21
 34 IPv4 Address: 94.237.79.110
 36 IPv4 Address: 203.123.48.1
 37 IPv4 Address: 133.243.238.243
 51 IPv4 Address: 129.250.35.251
 51 IPv4 Address: 45.125.1.20
 52 IPv4 Address: 167.71.195.165
 78 IPv4 Address: 61.239.100.17
180 IPv4 Address: 23.106.249.200
382 IPv4 Address: 52.148.114.188
515 IPv4 Address: 162.159.200.1
529 IPv4 Address: 162.159.200.123

Doesn’t look like the world to me. Even with the short sample period, “the world” should have more servers than that.

Will run the test overnight, let’s see tomorrow if there are significantly more servers in that list.

PoolMUC · May 4, 2024, 6:02pm

If I run the same test from a location in Germany, I get 150+ servers after less than two minutes of running it. Above 175 since writing the previous number, and still rising…

Would be interesting to see what the view from within Belgium looks like.

200+ and still rising, even if a bit slower now…

Singapore at 20 now for a while.

Badeand · May 4, 2024, 6:52pm

Well I got curious now and did dig +short pool.ntp.org >> pool.txt in a loop with a short sleep in it. From Norway:

136 162.159.200.1
122 162.159.200.123
113 195.64.118.26 NO, Norway
100 185.175.56.208 NO, Norway
98 185.175.56.95 NO, Norway
91 185.181.61.91 NO, Norway
85 192.36.143.130 SE, Sweden
82 80.203.110.169 NO, Norway
55 185.35.202.197 NO, Norway
38 91.189.182.32 NO, Norway
33 152.65.32.101 NO, Norway
31 193.150.22.56 NO, Norway
31 185.42.170.200 EE, Estonia
29 193.150.22.36 NO, Norway
19 62.101.228.30 NO, Norway
19 185.41.243.43 NO, Norway
9 79.160.225.150 NO, Norway
5 77.40.226.121 NO, Norway
4 80.89.32.121 NO, Norway
4 79.160.225.13 NO, Norway
3 62.92.229.27 NO, Norway
3 51.175.182.153 NO, Norway
2 80.89.32.122 NO, Norway

I got my own servers 198 times

PoolMUC · May 4, 2024, 7:20pm

Cool, thanks for sharing!

A bit less than the number of servers currently reported active for Norway, but running the test a bit longer would likely increase the number.

I’m at 34 and 355 for Singapore and Germany now. More than the active servers reported for Singapore, less than reported active for Germany.

By the way, how did you get the nice geolocating for the IP addresses?

Badeand · May 4, 2024, 7:25pm

Had ChatGPT write me a little script

#!/bin/bash

# File containing the sorted unique IPs
input_file="sorted_pool.txt"

# Output file to store the results with country codes
output_file="ip_with_country.txt"

# Read each line from the file
while IFS= read -r line
do
    # Extract the IP address from the line
    ip=$(echo $line | awk '{print $2}')
    count=$(echo $line | awk '{print $1}')

    # Perform geoiplookup on the IP, extract the country code
    country=$(geoiplookup $ip | awk -F: '{if (NF>1) print $2}' | awk '{print $1, $2}')

    # Append the result in the format "count IP country"
    echo "$count $ip $country" >> "$output_file"

done < "$input_file"

echo "Data with country codes collected in $output_file."

sorted_pool.txt would be the output from sort and uniq

Want me to do that?

PoolMUC · May 4, 2024, 7:38pm

Ah, smart, hadn’t thought about that. Thanks!

Would add some first-hand/hands-on insight into the topic, complementing/underlining the systematic findings in the paper/blog referred to earlier. So in that sense, not strictly needed, but nice if not too much trouble.

But also potentially opens up another can of worms: Why does one see more servers in a zone than reported active for it? What if one sees (significantly) less? May touch upon the controversial topic of DNS TTL settings and DNS caching, and others.

Badeand · May 4, 2024, 8:02pm

I’m just lazy

3512 IP samples now, up from 1112. 26 unique IP addresses, 3 more than before, but still not all 30 in the “no” zone for IPv4.

382 162.159.200.123
358 162.159.200.1
314 195.64.118.26 NO, Norway
310 185.175.56.208 NO, Norway
299 80.203.110.169 NO, Norway
282 185.181.61.91 NO, Norway
275 185.175.56.95 NO, Norway
259 192.36.143.130 SE, Sweden
155 91.189.182.32 NO, Norway
124 193.150.22.56 NO, Norway
119 193.150.22.36 NO, Norway
117 185.35.202.197 NO, Norway
114 185.42.170.200 EE, Estonia
109 152.65.32.101 NO, Norway
96 185.41.243.43 NO, Norway
42 62.101.228.30 NO, Norway
38 77.40.226.121 NO, Norway
36 79.160.225.150 NO, Norway
25 79.160.225.13 NO, Norway
16 51.175.182.153 NO, Norway
13 217.170.194.241 NO, Norway
8 77.95.79.99 NO, Norway
7 80.89.32.121 NO, Norway
6 62.92.229.27 NO, Norway
4 85.166.50.189 NO, Norway
4 80.89.32.122 NO, Norway

Got my own servers 582 times, 16.6% Granted, they’re on max net speed.

PoolMUC · May 4, 2024, 8:25pm

Thanks! Very interesting. I don’t have a reference at hand, with details, but seeing servers from neighboring countries is somehow baked into the system as well, I think. Like one server from Sweden and Estonia in your case, my client in Singapore gets servers from Japan, Hong Kong, and even the USA. Plus the global Cloudflare anycast addresses.

So one is not entirely limited to servers of one’s own zone, but also a far cry from having access to the global set of servers.

Need to re-read the paper/blog, to understand why (if I recall correctly) they really had zones where they got only one server. As per our finding, I would have expected for a very small number of servers from neighboring countries to seep in. But maybe in case of the main example in Africa, there aren’t any servers in neighboring countries sufficiently “close by”. Or maybe going through a recursive resolver, vs. directly to the origin server, makes a difference.

Anyway, something for another day. But either way, an overhaul of the current GeoDNS approach would be quite welcome…

PoolMUC · May 4, 2024, 8:34pm

One thread I had in mind seems to indicate that I mis-remembered. I.e., according to one statement, in this example referring to the German country zone:

But the thread is also for the inverse problem, why does a server see clients from another country when it isn’t part of that country zone.

But, that’s a topic for another day…

Badeand · May 4, 2024, 8:44pm

The Swedish server is part of multiple zones
https://www.ntppool.org/scores/192.36.143.130

The Estonian one is just in the Norwegian zone, so the geoip database in Debian is probably just wrong on this one
https://www.ntppool.org/scores/185.42.170.200

PoolMUC · May 5, 2024, 10:57am

Ah, true, thanks! I guess it was a bit too late already last night…

Here now the outcome of having my tests run overnight:

Nameserver	Domain	Client set ECS	Unique servers
8.8.4.4	pool.ntp.org	no	24
9.9.9.9	pool.ntp.org	no	51
9.9.9.9	pool.ntp.org	yes	49
9.9.9.9	sg.pool.ntp.org	no	24
9.9.9.9	asia.pool.ntp.org	no	152

So pool.ntp.org is better than the country zone in this case - at least with certain nameservers. It looks like 9.9.9.9 is causing the mixing in of a few servers from neighboring countries, I guess maybe due to their design for privacy, trying to hide clients’ locations from upstream nameservers. The country zone really only returns servers from that zone, even with 9.9.9.9. And the continent zone stays somewhat short of the full amount of servers that supposedly are in that zone. I guess the more servers there are in a zone, the more smaller ones (lower netspeed) kind of get “squeezed out” by the ones with larger netspeeds/shares.

Below the country mix for the servers returned by 9.9.9.9 for the global zone.

So looks like the recommendation for the global zone is still somewhat right, though it depends on the nameserver used, not really much better, and certainly way below the large number of servers in the global pool overall. I.e., certainly not the silver bullet some of us, including myself, were hoping for/expecting. Continent zone might be a bit better as far as number of servers are concerned. But that also yielded some servers with delay greater than 300ms, and noticeable offset values of 30+ ms.

@bas, let’s hope we don’t need to wait too much longer to see our dreams of an improved GeoDNS service come true. Though I fully trust @ask is aware of the various issues, as hinted at in various threads in this forum, where he describes his ideas for changes, e.g., also addressing the concerns raised in the paper and illustrated by these tests. And that he is working diligently on addressing them, balancing his available resources with the priorities, first one being keeping the pool service stable, as in available.

1 45.76.218.37  JP, Japan
2 122.248.201.177  SG, Singapore
2 167.179.119.205  JP, Japan
2 18.180.64.47  JP, Japan
3 122.215.240.51  JP, Japan
3 223.255.185.3  HK, Hong Kong
4 118.143.17.82  HK, Hong Kong
4 202.181.103.212  JP, Japan
5 45.11.104.223  HK, Hong Kong
7 103.214.22.185  AU, Australia
7 157.119.101.135  HK, Hong Kong
9 203.9.150.169  HK, Hong Kong
16 172.105.204.105  JP, Japan
17 47.243.51.23  HK, Hong Kong
19 133.130.121.141  JP, Japan
30 172.104.34.44  SG, Singapore
30 45.77.20.103  JP, Japan
31 128.199.243.248  SG, Singapore
37 129.150.48.1  SG, Singapore
37 202.182.111.234  JP, Japan
42 106.10.186.201  SG, Singapore
49 160.16.113.133  JP, Japan
64 106.10.186.200  SG, Singapore
69 194.36.178.157  SG, Singapore
74 119.28.230.190  HK, Hong Kong
76 139.162.96.56  JP, Japan
80 165.173.8.64  SG, Singapore
102 51.79.159.86  SG, Singapore
129 45.77.243.81  SG, Singapore
147 137.184.250.82  SG, Singapore
213 118.189.187.101  SG, Singapore
245 17.253.84.253  HK, Hong Kong
428 209.58.185.100  HK, Hong Kong
458 47.241.41.246  SG, Singapore
466 144.126.242.176  SG, Singapore
472 133.243.238.243  JP, Japan
488 167.172.70.21  SG, Singapore
494 203.123.48.1  SG, Singapore
504 129.250.35.250  US, United States
518 129.250.35.251  US, United States
521 172.104.44.120  SG, Singapore
553 133.243.238.163  JP, Japan
564 94.237.79.110  SG, Singapore
573 45.125.1.20  HK, Hong Kong
881 167.71.195.165  SG, Singapore
924 218.186.3.36  SG, Singapore
962 61.239.100.17  HK, Hong Kong
3284 23.106.249.200  SG, Singapore
4139 52.148.114.188  SG, Singapore
8531 162.159.200.123  IP Address not found
8552 162.159.200.1  IP Address not found

Bas · May 5, 2024, 2:24pm

Yes and no, the nameserver can be a big deal on the number of changes pool.ntp.org makes.
If you use a nameserver that caches a long time then you will end up with the same results more often.

The nameservers I typical use, also the fastest: 1.1.1.1 and 1.0.0.1

I only use others for backup.

As for squuezed out, that is correct, when you lower the netspeed you get less requests. As should

How many servers do you want/need? As you do not know if a zone is having problems with time-keeping.

I doubt any of them do.

Again, I fail to see your point.

Also, many ISP’s have their own NTP-servers and broadcast those via their modems/routers if you don’t change anything.

It’s too simplistic to count servers and presume there are time-serving issues.

My 2 cts,

Bas.

Bas · May 5, 2024, 2:32pm

That is where the multiple-monitors come into play, they test from all over the planet and servers that are bad are removed from the pool until they are good again.

Before there was just 1 monitor and the monitor ITSELF was unstable, as such marking servers bad while being good.

This isn’t the case anymore.

The pool only gives good and stable servers, and it really doesn’t mater if they are 50 or 500km away.

I use servers from all over Europe as reference to make sure my time is correct and not wrong.
However they are hand-picked stratum 1 servers, just for reference to my GPS.

davehart · May 5, 2024, 5:54pm

PoolMUC:

Here now the outcome of having my tests run overnight:

Nameserver Domain Client set ECS Unique servers

8.8.4.4 pool.ntp.org no 24

9.9.9.9 pool.ntp.org no 51

9.9.9.9 pool.ntp.org yes 49

9.9.9.9 sg.pool.ntp.org no 24

9.9.9.9 asia.pool.ntp.org no 152

Thanks for sharing this. I have a couple of comments that may be relevant.

I suspect all these nameservers are anycast, as are Cloudflare’s 1.1.1.1/1.0.0.1 and their IPv6 equivalents. That means the server you reach depends on where your queries are coming from. Essentially, you are querying the closest instance, not as the crow flies, but based on internet routing. @ask has described how the pool DNS service accounts for the locale of the query source, as seen by the pool.ntp.org authoritative nameservers. This requires any methodical survey of pool.ntp.org DNS to spread its queries around the world.

Moreover, I am aware of a project which by design attempts to harvest all the pool server IP addresses, and in supposedly-rare circumstances, query a substantial portion of them. I have expressed strong disapproval of this strategy for fear it would eventually destroy the pool as we know it. It seems conceivable that there are intentional or unintentional mitigations already in place that make such harvesting very difficult.

Topic		Replies	Views
Adding servers to the China zone Server operators	386	25142	June 9, 2022
CN pool collapse a few hours every day Server operators	48	1452	February 17, 2024
Collapse of Russia country zone Server operators	202	2035	December 9, 2024
The issue of NTP requests exceeding bandwidth load Server operators	54	786	November 24, 2024
Suggestions for monitors, as Newark fails a lot and the scores are dropped too quickly Server operators monitoring	91	4059	August 2, 2021

Gradually add/remove server to/from pool in parallel to score increase/decrease

Related topics