Why is Cloudflare in the pool?

I run a server that could easily handle 10TB of traffic monthly, but I get much less traffic than that from the pool.

So I was trying to determine where all the other traffic went, and it seems a very large part of it goes to time.cloudflare.com (at least for the zone NL). Like they have a larger share than “normal” people in the pool.

I think this is a bit strange. I expected the pool to consist of volunteers that do this as a hobby. But if instead NNTPPool just forwards most queries to big companies like Cloudflare, Google, etc. What incentive will be left for people to choose the pools addresses, instead of directly using “time.cloudflare.com”?

Is there any reason why the pool prefers large commerical companies instead of the volunteers? If it was only for zones where there was a shortage of volunteers, then it would make sense.

I think it’s a limitation of the system that there is only one global speed per IP address instead of having zone-specific speeds and the reason for giving the Cloudflare servers higher speed (2 Gb) is to help more in the underserved zones. If the system allowed zone-specific speeds, it would make sense to decrease their share in the well served zones.

One way to get a bigger share of the pool traffic is to add multiple addresses of the server. In some old deduplication tests I saw one server having 7 IPv4 addresses in the pool.

1 Like

Thanks for the explanation, I understand the reason for this issue a lot better now.

Maybe instead of zone-specific speeds (which will take some effort to implement), the quickest solution would be to allow volunteers to set speeds higher than 1gb?

Because my real speed is 4gb, but I could not specify that. If I could, it would allow me to take priority over Cloudflare with its 2gb.

And maybe the best fix would be to declare a list of “fallback” servers like time.cloudflare.com, and just don’t include those at all in well-served zones.

Strange, I thought the purpose of the pool was to aggregate NTP servers for public use behind a fixed single (or small number of) DNS domain names that can be burned into embedded devices and never managed by end users. If Cloudflare and Google can handle most of the pool’s traffic then that’s a good thing: it means less load on the other volunteers to achieve the end goal.

Who says Google and Cloudflare aren’t just a scaled up version of volunteering as a hobby? Do they get revenue from NTP packets? Is NTP a profit-making business for them? It’s more likely that public NTP service is a tiny cost compared to overheads they were already paying to run private NTP service for their own use.

Why do you want more than your share of NTP traffic? Do you get revenue from NTP packets?

10TB/month is about 33 mbits. 4gbit would be 1265 TB/month, which would put small volunteers on metered hosting offline in a matter of hours. Not something we want to enable on a web form where users can click the wrong button or typo an IP address. Better to have volunteers with exceptionally large pipes contact the project for a special exception to the bandwidth cap, or work around it by adding multiple IP addresses.

Clients should still use pool.ntp.org because even Google and Cloudflare have outages and limits to their reach. A client could also get time from Cloudflare and Google, because pool.ntp.org has outages and limits to its reach. A client using Cloudflare and Google directly would have to manage its list of server pools itself. If Cloudflare decided one day that NTP was too much hassle and turned their servers off, a directly configured client would need to change its config, while a client configured to use pool.ntp.org would be trivially redirected to remaining servers in the pool as the Cloudflare servers went dark.

It does harm pool diversity to send most of the traffic to a handful of entities (especially if Google does leap-second smearing or publishes incorrect time for some other reason on all their servers at once) but there’s no reason to exclude well-behaved consenting NTP servers of any size. The pool normally sends out 4 IP addresses per DNS query, and it could ensure that each IP in the response has a different server operator to keep diversity up (though I don’t know if it does do that).

2 Likes

Oh, I wasn’t aware of that situation. It wasn’t mentioned when I coined a similar question a while back: NTPpool to Cloudflare? - #8 by marco.davids

Sorry but you seem to misunderstand the purpose of a pool.

It’s designed to do a few things.

  1. Reduce the load on all servers to acceptable levels.
  2. Make sure everybody has accurate time as accurate as possible all over the world
  3. Let no single system monopolise the NTP-time protocol.
  4. Serve all parts of the planet with the correct time.
  5. Stay away from politics and weird regimes that want to disturb it.
  6. Time is universal the same, and should not me manipulated or owned by anybody.

There are probably more reasons.

But all my servers are running NTP Stratum 1 and 2 just to make sure time is a public thing and not to be messed with by anybody.

So if you want to have a high number of clients, set the Connection-type high, but remember, if there are many servers in your area, you will probably never get the max load you expect.

I’m for one really happy with all clients I get and that have correct time thanks to my systems :wink:

Apart from the pool, my servers are listed at other websites too, where people go to select servers on their own.

Like this website:

https://support.ntp.org/bin/view/Servers/StratumOneTimeServers

Bas.

I hope no one adds extra IP addresses to the Pool, whether it’s to help by accepting a greater burden of traffic, or for for less benign reasons.

What if someone looks up pool.ntp.org and gets nothing but your IP addresses? What if your server becomes inaccessible due to a routing issue or outage? They’d have a problem until their client – hopefully – finds other IP addresses.

What if someone gets 3 of your IP addresses and 1 other, and then something goes wrong and your server’s clock goes haywire? Their NTP client may think your “3 servers” are correct and the other 1 is wrong.

(And IPv4 addresses are in short supply anyway…)

(For better or worse, time.cloudflare.com also has 2 IPv4 and 2 IPv6 addresses in the Pool, but at least it has some redundancy.)

As an aside, as far as I know, time.google.com's IP addresses are only in the Pool for monitoring; they aren’t served to clients. (The scores page says “Not active in the pool, monitoring only”.) The Pool only uses regular UTC servers, not servers that do leap smearing. (Though we only find out how true that is if and when there’s a leap second…)

1 Like

NTP leap second smearing occurred within the NTP pool for the December 31, 2016 leap second. See figure 18.
Looks like at least 50 NTP pool servers currently sync to Google server which may cause problems at the next leap second.

4 Likes

One of my “wishlist items” for next I pass through or refactor the zone generation code is to make the system put limits on how many servers a user will be offered from the same server operator, the same ASN, etc (or another variation would, in well served areas, artificially limiting the “server speed” along the same buckets).

You may want to change the TTL value from 150 to 0 in the geoDNS as well. That would allow to distribute the NTP load more evenly.

2 Likes

Not against this, but it might increase the load on the authoritative servers. And some NTP daemons will resolve once and keep the resolved IP addresses ‘forever’. Changing the TTL won’t help there.

Also some resolver-operators won’t allow TTL’s below a certain value and replace it with their own configured minimum value.

And finally I am not sure if ‘0’ is a good value. Perhaps something like 60 would already improve things.

Have I already mentioned that adding AAAA’s on all four pool-instances would also be a great idea? It helps great against the problem of bursts caused in CGNAT environments.

2 Likes

Guaranteed.

It may help in certain case even for those kind of NTP daemons. If multiple such systems on the same network boot at the same time (power came back), they will not receive the same set of NTP servers from their shared DNS caching-forwarder.

Sure, we cannot save the world, we can only make the ntppool better.

Why make a compromise on that? Just tell the DNS resolver, please do not cache what you just fetched. If another DNS client of yours need the same thing, fetch it again, and it will have different IP addresses.

1 Like

The reason is that if you set TTL to 0, it means that the DNS is queried every request, but not only the DNS of the domain-provider but ALL nameservers that get requests.

Your Domain-ISP is not going to like the extra traffic it creates.

Even DynDNS-servers do not go that low. I doubt if it’s allowed at all.

1 Like

Yep.

Think of sudden burst at the top of every hour where cronjobs with ntpdate in them fire off.

Indeed Marco,

In my humble opinion it could trigger firewalls to see it as an attack.
As all of a sudden the number of DNS requests will go sky-high and will be seen as an DNS-attack.

Especially Mikrotik and Barracuda (to name a few) will knock many people off the internet in total if they request all the time.

Setting TTL to 0 is a (very) bad idea, if it is even possible.

Out of couriosity: Are NTP clients around which ask for SRV records?
If so, adding SRV records additionally with proper priority and weight values might be a good first step for better load balancing.
But AFAICS there is no major NTP client implementation around which is using SRV.
Or am I missing some?

Never heard of any using it.

1.) For what it’s worth, some of my NTP servers use one leap smearing server. They’ll show up on lists of potentially misconfigured servers, but it shouldn’t cause any harm – during an actual leap second, the smearing server would be outvoted, unless there was a total catastrophe. (And I would be monitoring things, and consider removing the smearing servers, unless there was a total catastrophe.)

2.) Decreasing the TTL is possible if the authoritative servers can handle the load.

But it definitely shouldn’t be set to 0. It’s against best practices and it would probably DoS the authoritative servers and some resolvers and it would make DNS people everywhere scream.

<= 30 second TTLs are somewhat popular (perhaps unfortunately) but there’s a big difference between poor caching and no caching.

It’s very unfortunate when a building full of computers boots simultaneously, all get the same NTP servers, use IPv4 NAT and get rate limited, but DNS doesn’t have a good way to solve that (except that enabling AAAA would probably help).

(Now that I think about it, adding more AAAA records would probably increase authoritative DNS traffic, since the current negative TTL is quite a bit higher.)

FYI, https://status.ntppool.org/ shows (partial?) authoritative DNS server query statistics. But it obviously doesn’t say what total capacity is.

1 Like

yeah, the DNS servers already do about 200k queries a second in the peak periods. It’s been a while since I experimented with the TTLs, but I expect it’d be much higher with a (say) 10 second TTL.

This can be fixed with more DNS servers, but what’s there already is much of the time that goes into managing the system.

That being said, it might be time to experiment with the TTLs again soon.

Also, I do hear the concern about some of the anycast NTP servers being disproportionally visible in the system.

3 Likes

It should be 80-90% of the total query statistics. The “peak” graph is the busiest second in that 5 minute period (on the day view, the weekly and monthly views get averaged out in some way that I can’t change in the statuspage.io system).

1 Like