You’re right about that. It may not be the best idea after all. Here is my question, if the China zone was implemented such that scores of any server in the zone were always at 20. I’m not suggesting that we do that but seeing how many requests there are, I wonder if it’s reasonable to believe that even with the entire zone active, they would not be able to handle the requests. Perhaps what we really need to do is push strongly for some alternate solution whatever that may be (personally looking forward to ISPs hosting Strat 3s and using the NTP DHCP option but basically no one supports receiving it and no one sends it so that is not the right thing to do right now)
faelix - The problem with the CN zone can probably best be told by individuals with servers in that zone. It’s my guess that the one monitoring server can have issues and is getting improper time, either due to the “great firewall” or just the vast distance and numerous network hops. In which case having a monitor server in China (and a few more geographically distributed around the world) would be quite helpful to reduce incorrect scores and dropping servers.
Adding servers to the CN zone that aren’t physically located in CN to me isn’t a fix at all. Yes, if a zone is under-served then load-balancing should be subsidized by adjacent zones and continue outwards as necessary to meet demand. But for China I can only assuming the network issues that outside monitoring servers inside the firewall would cause just as much of an issue for CN clients to receive accurate time from outside the firewall.
Excluding monitoring from CN servers is a very dangerous proposition and just asking for trouble.
I can’t comment on the DNS cycling rate as I have no idea how fast it operates currently. However IMO, when you have a reduced number of clients in a zone they should not solely be carrying the burden of the load. If the DNS server did some query tracking and basic statistics then depending on the amount of “bandwidth” available from servers in a country vs DNS queries it could mix in servers from adjacent zones to meet demand.
The geographic proximity between server & client IMO is important. Too far of a distance and thanks to random network delays odds are the packets you are sending are all for not when your server is excluded from NTP’s selection algorithm on the client. If it’s just a SNTP client polling a single server it makes do with what it receives.
Exactly my point. Until monitoring of in-China CN-zone nodes is accurate, we will never keep those nodes in the pool… and the efforts of slinging packets around the world might prop things up, but ultimately will be a detriment to the service of clients in China. For that reason we pulled our two ECMP clusters out of the CN pool a few days ago. The traffic wasn’t the problem — we’ve had those clusters serving the zone for about a year now — it’s that those clusters aren’t the solution.
Take, for instance https://www.ntppool.org/user/cate83vctnviet3dbbzj
Now come on a minute, tencent isn’t a little one-man hosting company in China. @maxmeng probably provisioned something which would cope with the scale of traffic expected. And yet their NTP nodes “perform badly” as viewed by the NTP Pool Project and so many of them aren’t actually serving any users in China.
Hope I can help clarify the problem. As a Chinese internet user, I see the problem to be both about congested international internet routes and the GFW (will explain later). Adding a monitoring station here would definitely help.
As you can see in Tencent’s servers there are two Hong Kong servers keep getting much higher scores - because they have much better international connectivity. In the Hosting/VPS world many service providers targeting Chinese market will use words like “CN2”, “Asia-optimized”, “Direct route to China” to advertise, because the general international connectivity to main ISPs in China mainland is so terrible. If the monitoring system doesn’t have an optimized network to access any Chinese nodes, the situation is not likely to improve.
The term “CN2” usually means AS4809, a famous choice for better international connectivity to China mainland, but of course they sell bandwidth at a much higher price. Currently if one want his servers to stay in the pool and, at the same time, actually usable in China mainland, he would have to spend much more money on this topic only to workaround the monitoring system. The said Tencent Hong Kong servers are real examples, which have CN2 access to China mainland so the connectivity is actually good, but Tencent would have to pay much more for this.
I once set up a server located in China with AS9929 access (somewhat similar to AS4809) and got a high score in the monitoring system immediately. Unfortunately my bandwidth is really limited so it got kicked out almost instantly after starting to serve the zone. Options like this exist but are really expensive, that’s probably why we are left with this unfortunate state.
For the GFW part, I believe the thread by tomli (Overcoming Great Obstacle of NTP in China) still makes sense. Although GFW is a highly unknown system and we often mix up ISP level QoS as part of it. It remains unknown what caused the ultimate UDP drops, yet.
I am using a simple workaround for myself currently - “ntp.felixc.at”, a geodns (gdnsd) that keeps running simple ntpdate validations against Tencent and Alibaba’s servers and return them for requests coming from China mainland, and CNAME to pool.ntp.org for the rest. My NS servers are carefully chosen to always have a not-so-bad internet connectivity to China mainland. And yet some of the NTP servers are still getting kicked repeatedly, and manual testing turns out to be complete UDP drops.
Attaching a mtr screenshot to one of Tencent’s server. The international packet loss is always so high and this situation is not likely to improve in a near future:
An excellent explanation, thank you @felixonmars.
Do you feel, therefore, that the best thing for the CN zone would be:
- in-China NTP servers?
- in-China monitoring?
- and not international NTP servers in the CN pool?
All three, exactly. I don’t happen to find a nearby st1 source or I’ll try to provide a monitoring station myself.
In China minitoring is the most important. We already have many (about 40) server in China provided by CERNET, AliCloud, Tencent etc. According to my data (I monitored them for two weeks in China), they’re all keep good quality.
Just 3 to 4 servers left in .cn
One of ours is working at 100% wire speed.
What was the plan B again?
Are you looking for servers that are actually in china? I might be able to help but I have to see if some funds arrive that are in limbo. This might take a couple weeks to a few months but I will probably be able to dedicate a year initially if you are still in need of it.
Ok, added to the .CN zone.
It would. Thanks for joining. I have added it to .CN pool.
Thanks for helping out, its added.
Seems my 10Mbit connection would not be enough for cn pool.
And few minutes later you are out…
Is monitoring going to be fixed or what?
This is getting very annoying. Most .cn zone volunteering efforts are being pissed away due to some unexplained desire to automate what cannot be automated.
I think the #1 issue is to get a monitoring source inside china so that those NTP servers within the great firewall can be verified properly and not constantly dropped due to firewall / monitoring issues.
The #2 issue is, right now there are single-digit IPv4 servers serving the country. The few that @iocc added earlier I’m sure got pummeled by demand and now they are experiencing the see-saw effect in load as their score puts them in & out of the pool…
I think at minimum 30-40 servers should be added to China at once (like the first post was all about), otherwise I think any requests to the china zone should redirect to the greater asia zone to balance the load for the time being…
This. I’m getting a bit annoyed that people automatically blame the monitoring server without checking if the problem is actually at the NTP server’s end. Stateful firewalls have been a common problem in the past, and I don’t think that problem has gone anywhere. The servers and firewalls may be able to handle, like, 500qps just fine, but when the traffic grows to tens of thousands of queries per second, the stateful firewall can’t keep track of all of them and starts dropping packets. The proper way to fix this problem is to not track the connection state of NTP packets at the firewall (both incoming and outgoing).
My NTP server in the .cn zone can handle 100Mbps of NTP requests and there is no firewall in front of it, so that is not the problem. The problem is that there is well beyond 130Mbps of incomming traffic this minute, so a lot of traffic is thrown away since it cannot be processed.
Once in a while it is a packet from the monitoring server that is dropped and my ntp server’s rating is severely punished for it. When its rating drops below 10 it is being kicked out and the few hosts that is left has to share this load as well. Personally I think there is more than 600Mbps of NTP traffic totally right now bombarding the NTP hosts in .cn.
Hopefully I have fixed the firewall issues in my server. Now it’s serving ~80k QPS at ~9MiB/s without an issue. Let’s see if it can stay longer this time…
Please explain these two images then.
Monitoring packets do get lost because they are in contention with other 400mln requests per hour being taken care of. 400mln went through and who knows how many were dropped.
To be honest, end users are way more important than stupid robot drawing pretty charts. The robot prevents you from serving users so that it can go through.
I’ll give it until Monday and will pull servers from .cn
This is such a waste.