I run a server that could easily handle 10TB of traffic monthly, but I get much less traffic than that from the pool.
So I was trying to determine where all the other traffic went, and it seems a very large part of it goes to time.cloudflare.com (at least for the zone NL). Like they have a larger share than “normal” people in the pool.
I think this is a bit strange. I expected the pool to consist of volunteers that do this as a hobby. But if instead NNTPPool just forwards most queries to big companies like Cloudflare, Google, etc. What incentive will be left for people to choose the pools addresses, instead of directly using “time.cloudflare.com”?
Is there any reason why the pool prefers large commerical companies instead of the volunteers? If it was only for zones where there was a shortage of volunteers, then it would make sense.
I think it’s a limitation of the system that there is only one global speed per IP address instead of having zone-specific speeds and the reason for giving the Cloudflare servers higher speed (2 Gb) is to help more in the underserved zones. If the system allowed zone-specific speeds, it would make sense to decrease their share in the well served zones.
One way to get a bigger share of the pool traffic is to add multiple addresses of the server. In some old deduplication tests I saw one server having 7 IPv4 addresses in the pool.
Strange, I thought the purpose of the pool was to aggregate NTP servers for public use behind a fixed single (or small number of) DNS domain names that can be burned into embedded devices and never managed by end users. If Cloudflare and Google can handle most of the pool’s traffic then that’s a good thing: it means less load on the other volunteers to achieve the end goal.
Who says Google and Cloudflare aren’t just a scaled up version of volunteering as a hobby? Do they get revenue from NTP packets? Is NTP a profit-making business for them? It’s more likely that public NTP service is a tiny cost compared to overheads they were already paying to run private NTP service for their own use.
Why do you want more than your share of NTP traffic? Do you get revenue from NTP packets?
10TB/month is about 33 mbits. 4gbit would be 1265 TB/month, which would put small volunteers on metered hosting offline in a matter of hours. Not something we want to enable on a web form where users can click the wrong button or typo an IP address. Better to have volunteers with exceptionally large pipes contact the project for a special exception to the bandwidth cap, or work around it by adding multiple IP addresses.
Clients should still use pool.ntp.org because even Google and Cloudflare have outages and limits to their reach. A client could also get time from Cloudflare and Google, because pool.ntp.org has outages and limits to its reach. A client using Cloudflare and Google directly would have to manage its list of server pools itself. If Cloudflare decided one day that NTP was too much hassle and turned their servers off, a directly configured client would need to change its config, while a client configured to use pool.ntp.org would be trivially redirected to remaining servers in the pool as the Cloudflare servers went dark.
It does harm pool diversity to send most of the traffic to a handful of entities (especially if Google does leap-second smearing or publishes incorrect time for some other reason on all their servers at once) but there’s no reason to exclude well-behaved consenting NTP servers of any size. The pool normally sends out 4 IP addresses per DNS query, and it could ensure that each IP in the response has a different server operator to keep diversity up (though I don’t know if it does do that).
I hope no one adds extra IP addresses to the Pool, whether it’s to help by accepting a greater burden of traffic, or for for less benign reasons.
What if someone looks up pool.ntp.org and gets nothing but your IP addresses? What if your server becomes inaccessible due to a routing issue or outage? They’d have a problem until their client – hopefully – finds other IP addresses.
What if someone gets 3 of your IP addresses and 1 other, and then something goes wrong and your server’s clock goes haywire? Their NTP client may think your “3 servers” are correct and the other 1 is wrong.
(And IPv4 addresses are in short supply anyway…)
(For better or worse, time.cloudflare.com also has 2 IPv4 and 2 IPv6 addresses in the Pool, but at least it has some redundancy.)
As an aside, as far as I know, time.google.com's IP addresses are only in the Pool for monitoring; they aren’t served to clients. (The scores page says “Not active in the pool, monitoring only”.) The Pool only uses regular UTC servers, not servers that do leap smearing. (Though we only find out how true that is if and when there’s a leap second…)
NTP leap second smearing occurred within the NTP pool for the December 31, 2016 leap second. See figure 18.
Looks like at least 50 NTP pool servers currently sync to Google server which may cause problems at the next leap second.
One of my “wishlist items” for next I pass through or refactor the zone generation code is to make the system put limits on how many servers a user will be offered from the same server operator, the same ASN, etc (or another variation would, in well served areas, artificially limiting the “server speed” along the same buckets).
It may help in certain case even for those kind of NTP daemons. If multiple such systems on the same network boot at the same time (power came back), they will not receive the same set of NTP servers from their shared DNS caching-forwarder.
Sure, we cannot save the world, we can only make the ntppool better.
Why make a compromise on that? Just tell the DNS resolver, please do not cache what you just fetched. If another DNS client of yours need the same thing, fetch it again, and it will have different IP addresses.
Out of couriosity: Are NTP clients around which ask for SRV records?
If so, adding SRV records additionally with proper priority and weight values might be a good first step for better load balancing.
But AFAICS there is no major NTP client implementation around which is using SRV.
Or am I missing some?
1.) For what it’s worth, some of my NTP servers use one leap smearing server. They’ll show up on lists of potentially misconfigured servers, but it shouldn’t cause any harm – during an actual leap second, the smearing server would be outvoted, unless there was a total catastrophe. (And I would be monitoring things, and consider removing the smearing servers, unless there was a total catastrophe.)
2.) Decreasing the TTL is possible if the authoritative servers can handle the load.
But it definitely shouldn’t be set to 0. It’s against best practices and it would probably DoS the authoritative servers and some resolvers and it would make DNS people everywhere scream.
<= 30 second TTLs are somewhat popular (perhaps unfortunately) but there’s a big difference between poor caching and no caching.
It’s very unfortunate when a building full of computers boots simultaneously, all get the same NTP servers, use IPv4 NAT and get rate limited, but DNS doesn’t have a good way to solve that (except that enabling AAAA would probably help).
(Now that I think about it, adding more AAAA records would probably increase authoritative DNS traffic, since the current negative TTL is quite a bit higher.)
FYI, https://status.ntppool.org/ shows (partial?) authoritative DNS server query statistics. But it obviously doesn’t say what total capacity is.
It should be 80-90% of the total query statistics. The “peak” graph is the busiest second in that 5 minute period (on the day view, the weekly and monthly views get averaged out in some way that I can’t change in the statuspage.io system).