What happens when there are no servers in a zone? Do requests just get bumped up one level?
We might find out later today, given the problem is recurring again. Just woken up to find we are one of three NTP servers left for China. Make that two, now, because I’ve preemptively removed our server from the zone — before it gets to the point where our NTP server’s uplink is saturated and then incoming traffic escalates beyond that because it starts dropping responses.
It is very difficult to maintain a presence right now or get into the pool. My own server was kicked out because it began to drop packets when it reached 80Mbps+ it and it is “fighting” to get back now.
Even though it it kicked out, it still processes ~40Mbps of NTP traffic
I’m just spinning up an 8-node anycast/ECMP cluster within our Manchester, UK, network. That should be able to cope with many hundred Mbit/sec. I don’t know how long it’s going to take for its stats to come into the CN pool: https://www.ntppool.org/scores/126.96.36.199
@hedberg out of interest, did it start dropping packets at 80Mbit/sec because
ntpd was running at 100% of one core (because good old
ntpd is a single-threaded process)? Or do you think it was because you had a saturated link?
@faelix I am impressed with the setup you are throwing at this
188.8.131.52 is a LeoNTP server. It is a saturated link or at least so I think.
I hope this anycast cluster can soak up a bunch of traffic so that other nodes will recover. We can bring online a significant transit upgrade from Cogent — it’s all ready except for arranging the maintenance window with Cogent’s NOC to move the BGP from our old cross-connect to this new one. The plan had been to swing that over in August, to coincide with a new customer… but if we find that there’s a bit too much heat from adding this cluster to the .cn pool, we might activate it a bit earlier.
Our anycast/ECMP node in the UK just hit the
.cn pool. Almost instantly: 100Mbit/sec of traffic.
Four node and eight node ECMP clusters. Traffic has subsided a little (maybe China is going to bed?). Only around 80-100Mbit/sec of traffic on both clusters now (previously was about 100+100Mbit/sec).
Hope this helps alleviate some pressure on the rest of the
.cn volunteer timelords
I think the biggest problem of China region is the network to China from LA Monitoring Station(not GFW certainly). As a populous country, The international network of China are always congested, and ISPs needs to increase Qos level to guarantee everyone can access the international network, which caused high latency and packet loss.
So the old LA monitoring server always remove China local server from the pool due the International network congestion, which caused many China volunteer can do nothing but add the server outside China. Besides, if we want to guarantee the network connective outside China(Prevent the monitoring server from removing us from the pool), we can only purchase really expensive carrier VIP service($100/Mbps/Month), which represented by CN2 of China Telecom.
Generally speaking, solve this problem may significant increase the availability and stability of ntp pool of China region. May be someone can reach the maintainer of POOL.NTP.ORG, and we can provide some monitoring server located in China, or adjust the score algorithm in this region?
It helped a lot - thanks.
Glad it helped - I guessed something had improved because we’re currently seeing less traffic than yesterday, and it’s great to see the zone is back up to 9 this morning
It is a slow recovery. There are 9 hosts in cn right now and my 100Mbit connection is still congested, but no where as much as yesterday.
Is your LeoNTP on 100Mbit in the zone with “100Mbit/sec bandwidth”? I’m also wondering whether the “balancing” is being a bit unfair on people — I know there’s a few nodes who have “dipped their toe” in
.cn and put themselves down at 384kbit/sec (their post said they were getting about 40kpps with that!). So maybe if you’ve put yourself down as 100M, you’re getting more than your “fair share”…? The whole balancing algorithm fails if there’s only a few of us in the zone
Maybe Ask could talk to the people that manage the RIPE Atlas probes & anchors and figure out a more robust monitoring system? They have over 10,000 probes and 300 anchors.
They already have the ability to monitor NTP, and a CLI toolset… There is a section about commercial usage, but I didn’t see anything about non-profit use…
It is usually configured with 1000Mbit/sec, but it doesnt really make any difference in a zone like .cn when there a few hosts compared to the demand.
Right now it is configured with 100Mbit/sec, and that is what it recieves - so it is congested.
It’s quite late, but I still like to note that my servers in JP are all okay to be added to the CN zone. (https://www.ntppool.org/user/AstroProfundis)
The CN zone is still getting quite large amount of queries per server, and that makes me not able to afford the bandwidth fee, when we have enough working nodes (and the load to one node become low enough) I’ll be able to add at least 2 servers located in China.
And as it’s not that easy to setup monitoring server in mainland China, somewhere in HK, JP or KR may generally have good connections to those servers “inside”. I believe it could help improving the monitoring quality a lot by setting up a node nearby.