CN pool collapse a few hours every day

OK. So at least some servers in the cn pool are disabled / deleted by their owners. That’s very bad indeed.

Yes, as long as the server operators are fine with this. Ideally all servers could be put in a giant pool, and each server could communicate if / how it is overloaded. The pool could then combine the load and distance factors to give out an optimal solution.

Maybe there is a much simpler solution: just delete all overloaded servers in the cn pool, and ask people to reconfigure their clients to e.g. the pool.

This already exists, it’s the global zone

Such aspects are being discussed, and as that is somewhat tricky, I suspect this might be one reason why it is taking so long to come up with a well-rounded concept how it could practically work, and implement it.

That is what the global zone, or even the continent zones, are supposed to do already today, to some degree. I.e., no direct load feedback from individual servers as that would not be easy to realize, but it is at least considering network topology.

In a way, that is what the monitors are doing automatically (and what they are supposed to do, though due to the complexity of factors to consider, it doesn’t always work optimally). But that would lead to the complete breakdown of the zone. And this is already kind of happening, with server operators being the ones to contemplate (such as in the case that triggered this thread) or decide to remove their servers from the pool (which may in turn trigger further operators to contemplate/take that step). Hence the chicken and egg problem I was referring to earlier.

Sure, that is what other people are promoting, and what is also hinted at on the pool pages: If you know a good server “near you”, there are good reasons you might want to use that one rather than the pool. And especially manufacturers of large numbers of devices, or operators of large networks, are encouraged to run their own servers, and then have their devices use those.
But to some degree/depending on circumstances, I feel that is throwing out the baby with the bathwater. Simply configuring clients to use the global pool instead of any regional pool would be another option to get around the issues with specific regional pools. With the isues to some extent being inherent to the concept of the regional pools.
Which is one reason why @ask is contemplating to functionally replace them by an entirely geolocation-based mechanism (and while it has the “geo” in its name referring again to actual geography, I would expect/hope for that to rather mean “network geography”).

Here are a few reasons why:

  1. China’s laws and operators stipulate that home networks cannot be used for public services, so if we want to set up an NTP server in mainland China, we have to buy either a VPS or a commercial network.
  2. Because of our policy of subsidizing home network rates with commercial network rates, commercial network rates are extremely high.
  3. So it is basically impractical for an individual to buy a commercial network.
  4. And if you buy a VPS, the traffic/bandwidth costs are extremely high.

@Coelacanthus, thanks for those insights, sheds some light on the background. At the same time, that doesn’t make the technical problem go away. Having more users use the global zone instead of regional zones may help by spreading the load to a larger server population. Or, if there are known, good local servers, use those directly instead of the pool.
The pool was conceived when there were just a few public servers, and because those then got overloaded. So the point, and technical realization of the pool, are geared towards more automatically spreading the load over a large server population, rather than having a few well-known servers hammered. And it will work suboptimally, but actually also be superfluous, if only a small number of servers are available in a region, but those have enough capacity to absorb the entire load generated in a region.

Why can’t we just abandon country pools and use the global pool instead?

Provocative question: What’s keeping you (and others) from using the global zone, and is anything/anyone forcing you to use the country zones?
The view in this forum generally seems to be to recommend use of the global zone, and discourage use of the regional zones. The challenge is one of spreading the word. (Obviously, the pool pages in some languages still promoting use of continent or even country zones is part of the problem.)
And work on transparently making this happen on the infrastructure side is being planned, but it is not as easy as it might sound to just replace/remap the regional zones to the global one.

Same though.
You raised some very valid points - need to promote the use of global pool, just put the global pool address on the website; put some colorful reminders on the website to make sure all visitors will see how the pool prefers to be used.

The website source is available publicly, and proposals can be raised there. Challenge on that side is to get them actually deployed, see, e.g., the long-standing attempt to recommend use of the pool directive instead of the server directive.

I think it’s easy, just add a batch of CNAME RR from all countries or region subdomains to the global zone. According to the DNS standard, all clients will use global zone when they try to use region zone.

This sounds like a good idea actually!

If it were a static zone, that would be true. But the zones are dynamically generated by the pool infrastructure as servers come and go, or are reconfigured. So that code would need to be adapted to take that into account.
Also, some other technical and non-technical questions need to be considered as well. E.g., as mentioned, other zones have smaller servers operating without issue because of a “good” client to server ratio in those zones. It needs to be managed carefully not to overload them, as could likely happen if the load of some larger zones is suddenly hitting the wider population with the flip of a switch. A more gradual approach is needed, e.g., by more and more people over time switching to the global zone. Or the infrastructure having a mechanism to slowly ramp up the share of traffic distributed outside the originating zone over time.

For this, I think it’s a good beginning to allow server owner modify the zones of their servers in Management Panel.

1 Like

Sure, I’d welcome such a change, too. But given the bottleneck in getting things implemented, if it is a question of what to get done first if not everything can be done at the same time, I’d rather have what looks more like a thorough fix implemented first, rather than what seems more of a temporary patch (though that is still relevant later even if the “fix” is done first).
At this point, I’d concur to be happy if anything could be done, and if the “fix” is “too big” to get done anytime in the “near” future (and “too big” referring to any number of issues, be it conceptual topics being too complex to be resolved soon, be it the amount of code changes that would be needed in one go, or anything else), I’d be happy to see the “temporary patch” done instead first.
I think I have an appreciation on a qualitative level of the potential challenges that come with any change in the system, but I am far away from claiming any sufficient understanding for a quantitative assessment of efforts and impact on prioritization, and fully defer to @ask for that.

And @ask has some plans on it.

Yes, those are what I keep referring to as technical solution, and there are different aspects being discussed in various other threads over time since then, but it is unclear to me as to where we stand with those plans as of today.

When the pool provides <region code> domains, users naturally intend to use one that matches their regions. Only those with some insights on how NTP and the NTP pool works may think differ. So for majority of users that having “configuring a NTP server” in their minds, using their regional pool is a de facto standard, and then many tutorials and documentations are saying so.

Clearly there’s not much we can do to change what domain users chose, not to mention most clients are not configured by the end users, but by product vendors.

I agree that using the global pool for all regions is a good approach, and that could/should be easily implemented on the DNS layer: just CNAME all regional zones to the global one.

ps, I’m the one added some far away servers to pool recently and been refused to add them to the CN zone due to physical distance, I’d be more than happy to see a feature allowing server operators to set zones of their servers in the manage panel.

Another thing coming to mind in this context: For now, it should also be promoted to use the zones prefixed with “2.”, as only those have IPv6 servers in them. Those are often underutilized as compared to the IPv4 servers, so using the “2.” zones also helps offload a bit of traffic from the IPv4 servers. (See various other threads in this forum for background discussion on that topic.)

1 Like

In fact, I don’t know whether it’s necessary to have 0/1/2/3 in front of the global pool. Why not just and have both Ipv6 and Ipv4 enabled for it?

1 Like

If regional pools are eliminated, user cannot use them so will be the default.

There is a separate thread on that that is ongoing, besides various older ones.