Oh dear, this explains it. 6 servers left in CN zone. There were 50 just a few months ago.
We saw LeoBodnar’s tweet yesterday and so added a stratum-2 node (preferring our LeoNTP GPS receiver on the same network) to the China pool, with a notional bandwidth setting of 1Mbit/sec — I understand that this is a “weight” and I didn’t want to pick a setting which would pull loads of traffic straight to our new node. That’s also why we run this NTP server on a separate IP address — in case we needed to nullroute it with upstreams due to DDoS.
With 1Mbit/sec set, we’re currently receiving around 10k requests per second, peaking at 40k requests per second. That’s a bandwidth usage of around 25Mbit/sec at 95th percentile (we are an ISP, we pay for transit links with CDRs and 95%iles) — which equates to about 8Tbytes/month for people billed that way by VPS providers.
Could one of the problems be that time servers actually in China, like the ones xushuang listed above, are being dropped from the pool because the pool’s monitoring (which has to traverse the Great Firewall fo China, potentially) is seeing packet loss? They might be absolutely fine within China, but given the vantage point of the quality monitoring we think there are problems due to filtering/DPI capacity problems?
Certainly I’ve seen some interesting things happen with UDP traffic suddenly change, presumably because of Chinese filters adapting to different VPN services, etc. Would that explain the sudden drop off in China? Has anybody got access to the pool monitoring data of any in-China NTP servers which have fallen out of the pool to confirm?
And if this is the case, we haven’t made things better for Chinese users at all — because the NTP servers which they would be able to reach have fallen out of the pool, and NTP services we are offering them from outside China (which are suffering packet loss as UDP 123 transits the GFW) are actually a degraded service, even before you consider how overloaded they are
@tomli took enitiative to talk about this earlier in this thread, but I am not sure if there came anything out of it.
Personally I think the pool need to change the setup to have multiple monitorings stations and allow that not all NTP servers will be available from all monitoring stations.
It looks like some servers are recovering:
16 (+8) active 1 day ago
15 (+9) active 7 days ago
Yeah, that’s the plan. It’s mostly already supported on the beta system. The beta code is running in kubernetes and there’s a bit more work to do before everything is working properly (so I can move the production code to the same branch and to run in the same way). There’s also (still…) some work to do to better manage the increase in monitoring data.
I’ve been focused on an update to the DNS server; we rolled it out to most of the servers over the last couple of weeks so I should soon be able to focus on this again. (And the work we’ve talked about elsewhere on the forum around “backfill servers”).
By now the asia pool faces same problem like 8 days before, the continental pool losts one third of its ipv4 servers in a day, 157->107. Notable country pools:
- China: 19 -> 7
- India: 8 -> 3
- Japan: 27 -> 17
- Taiwan: 7 -> 2
- Hong Kong: 9 -> 2
Is there something wrong with the montoring system?