Adding servers to the China zone

faelix · May 11, 2018, 1:34pm

Could one of the problems be that time servers actually in China, like the ones xushuang listed above, are being dropped from the pool because the pool’s monitoring (which has to traverse the Great Firewall fo China, potentially) is seeing packet loss? They might be absolutely fine within China, but given the vantage point of the quality monitoring we think there are problems due to filtering/DPI capacity problems?

http://www.pool.ntp.org/scores/120.132.6.211
http://www.pool.ntp.org/scores/120.132.6.225

Certainly I’ve seen some interesting things happen with UDP traffic suddenly change, presumably because of Chinese filters adapting to different VPN services, etc. Would that explain the sudden drop off in China? Has anybody got access to the pool monitoring data of any in-China NTP servers which have fallen out of the pool to confirm?

And if this is the case, we haven’t made things better for Chinese users at all — because the NTP servers which they would be able to reach have fallen out of the pool, and NTP services we are offering them from outside China (which are suffering packet loss as UDP 123 transits the GFW) are actually a degraded service, even before you consider how overloaded they are

Hedberg · May 12, 2018, 3:49pm

@tomli took enitiative to talk about this earlier in this thread, but I am not sure if there came anything out of it.

Personally I think the pool need to change the setup to have multiple monitorings stations and allow that not all NTP servers will be available from all monitoring stations.

publicarray · May 13, 2018, 10:12pm

It looks like some servers are recovering:

16 (+8) active 1 day ago
15 (+9) active 7 days ago

http://www.pool.ntp.org/zone/cn

ask · May 16, 2018, 2:51am

Yeah, that’s the plan. It’s mostly already supported on the beta system. The beta code is running in kubernetes and there’s a bit more work to do before everything is working properly (so I can move the production code to the same branch and to run in the same way). There’s also (still…) some work to do to better manage the increase in monitoring data.

I’ve been focused on an update to the DNS server; we rolled it out to most of the servers over the last couple of weeks so I should soon be able to focus on this again. (And the work we’ve talked about elsewhere on the forum around “backfill servers”).

alica · May 16, 2018, 5:35pm

By now the asia pool faces same problem like 8 days before, the continental pool losts one third of its ipv4 servers in a day, 157->107. Notable country pools:

China: 19 -> 7
India: 8 -> 3
Japan: 27 -> 17
Taiwan: 7 -> 2
Hong Kong: 9 -> 2

Is there something wrong with the montoring system?

faelix · June 3, 2018, 6:47am

Indeed, after a bit of a recovery (maybe almost 20 servers in the zone), .cn is now back to single digits!

There are 9 active servers in this zone.
19 (-10) active 1 day ago

Is there any way to find out which 10 NTP servers left the zone, and if so, whether they themselves opted to leave the zone or whether monitoring booted them?

avij · June 4, 2018, 7:15am

Speaking only for myself, but at the current rate it looked like my server in Singapore (also in the .cn pool) would have sent some 6.5TB of NTP traffic this month and that would have exceeded my 4TB quota. I’ve grown tired of watching the traffic amounts manually, so I’ve now scripted things a bit and my server will now drop probes from the pool monitoring servers if the estimated monthly traffic (according to vnstat) exceeds 3.90 TB. This does not affect other clients, but will cause the score of my server to drop below 10, thus dropping the server from the pool DNS. When the estimated monthly traffic drops below 3.90 TB, the server will start responding to probes from the pool monitoring servers again. If a hard limit of 3.95TB of monthly transmitted data is exceeded, all NTP traffic will be dropped until next month and new quota limits.

My U.S. server (also in .cn pool) has a similar setup, but the numbers are different. It has a 3TB monthly quota and at the current rate it would have sent 4.68 TB this month. Both of these servers are configured as 384kbit/s in the pool.

So, the current low scores for my servers are intentional:
http://www.pool.ntp.org/scores/94.237.64.20
http://www.pool.ntp.org/scores/173.255.246.13

avij · June 6, 2018, 5:07pm

As of now, the China zone says:

“IPv4: There are 5 active servers in this zone”

This might be bad. Those five servers may get quite a lot of traffic at the moment. One of my servers will join the .cn pool again in a few hours when its score reaches 10 again (see above message for why it was below 10), so I’ll get to see this myself. Stats here: http://biisoni.miuku.net/stats/ntppackets.html

Edit: 4 active servers now.

faelix · June 6, 2018, 5:21pm

“IPv4: There are 5 active servers in this zone”

Yeah… just starting to see the bandwidth of incoming packets get rather brutal!

Hedberg · June 9, 2018, 8:09pm

There are 8 active servers right now and 5.103.139.163 is on average recieving more than 60Mbps over the last 24 hours.

edit: wow… just went down to 5 and traffic increased about 20Mbps

Hedberg · June 10, 2018, 5:57pm

Seems like the zone collapsed now. It is down to three servers.

faelix · July 1, 2018, 10:16am

I speculate that, in addition to GFW potentially affecting the monitoring, one of the problems now is that it’s very easy to get a level of traffic that a single-threaded ntpd process cannot handle. This happened to us yesterday and because that single core became overloaded, it started sending fewer replies. The result of that was that we got even more queries. At one point our NTP server’s inbound traffic was about six times the outbound. Normally we monitor the PPS and bytes in:out ratio to see if we are being abused for DDoS type attacks. In this instance it looked like we were being attacked.

The problems for us all started around the time that the DNS master went down — 09:30 UTC on 30th June — but got really bad around 03:30 UTC on 1st July, and then I started to get alerts from our network monitoring setup.

The times on this PPS graph are UTC+02, with inbound in blue and outbound in green. The “flat cap” to the blue is when we start hitting the PPS limit of the firewall!

Thankfully it didn’t take that long to convert one quad-core VM (of which only one core was being used by ntpd) into four single-core VMs. Within our network now we are now using ECMP on our routers to spread the incoming requests across what has become a four-node NTP “anycast cluster” serving for our IP address that is in the CN Pool. The result was almost instantaneous (just before 07:00 on that graph). We are now getting about <50% CPU usage on each VM, and our inbound traffic has subsided by about 80%. We’re still seeing about 2:1 ratio of in:out, but that’s probably just a ton of misconfigured clients we have picked up.

Anyway, it could be that some servers have fallen out of the CN zone not because of bandwidth but because of CPU throttling — which turns into a vicious cycle that causes bandwidth problems, and then NTP Pool Monitoring throws the node out. And that just piles the load back on to the rest…

Hedberg · July 13, 2018, 2:20pm

Can I have this server 5.103.128.88 added to the CN pool?

Thanks,

ChrisW · July 13, 2018, 3:14pm

May I politely suggest that the problems with the China zone and its disapperaing servers are - at least to some degree - also caused by the pool monitoring system and its flaky network connection?

We have 2 perfectly well working servers in a perfectly well working network that nonetheless keep getting bumped out of the pool several times a month simply because the monitoring system seems to have trouble reaching them. And I’m not the only one with that problem.

And while we sit in Europe, I would not be surprised to learn this affects Chinese networks as well. And this is a way to loose Servers - if an operator sees his server getting removed again and again and again without being able to do anything about it, he might very well conclude that the NTP Pool Project has such an ample supply of server capacity so it can afford this largesse and doesn’t need his servers anyway

Hedberg · July 15, 2018, 12:09pm

Chinese zone collapsed - 3 hosts left. My own server will also be pushed out at some point since it recieves more traffic that it can answer now (100+ Mbit)

faelix · July 15, 2018, 2:57pm

…aaaand we’ve just gone below score of 10 as well.

“IPv4 There is 1 active server in this zone.”

faelix · July 15, 2018, 3:34pm

We’re now at the stage where e.g. LeoBodnar’s GPS devices are bouncing in and out of the pool because as soon as they get added they’re ramping up to >100Mbit/sec traffic, start dropping packets, and their reputation slides back down again. There are too few members of the pool to share the load adequately. As @Hedberg says, the zone has collapsed

Hedberg · July 15, 2018, 4:19pm

@Ask - Could the monitor be configured to be a slightly more forgiving for hosts in the CN zone? E.g. when a host doesnt reply every time?

faelix · July 15, 2018, 5:00pm

@hedberg - sounds like a very simple (maybe temporary) fix for the problems that we are currently theorising the .cn zone to have had - e.g. earlier posts by @ChrisW, myself, and others.

littlejason99 · July 15, 2018, 9:16pm

What happens when there are no servers in a zone? Do requests just get bumped up one level?

Topic		Replies	Views
Updated stats on "Join" page Forum Site Feedback	4	1026	July 15, 2018
CN pool collapse a few hours every day Server operators	48	1448	February 17, 2024
31-01-2019: CN pool is about to fail Server operators	8	1344	March 5, 2019
Getting hit hard... 83.85.79.213 Server operators	1	1698	April 20, 2017
IPv6 for China Zone Server operators	10	1336	May 28, 2018

Adding servers to the China zone

Related topics