Aha, I see my servers listed in @studentmain’s testing result , and with quite a loss rate. But that’s actually intended.
I have implemented rate limit on my server, in iptables
but not ntp.conf
, to make sure my server could be removed from the pool periodically, so that I won’t run out of all my traffic and lost my other services on them at middle of a month (or to pay a thousand-dollar bill). I see @avij here is having similar problem and is using similar workaround as well.
Actually, I’m quite agree with @LeoBodnar that servers of .cn pool should only been removed from the DNS response with total failures, but not high packet drop rate. Because the original problem here for .cn pool is that we have insufficient resource to handle all requests from the clients in this zone. Packet loss is just not possible to prevent.
It’s clearly better to have all clients suffer a relatively low loss rate on requests to the pool, when there are multiple servers keep staying in pool to serve all the traffic (with some packets dropped), than servers have inbound traffic higher than the capacity get kicked out quickly, remains only very few ones serving all traffic, and every client suffers a high loss rate on requests.
As I live in China myself, I know the pain from client side very well. I have st1 servers set up for own usage and for some of my servers inside China. But I just not able to provide those servers to pool, for I can’t afford to pay those bandwidth and traffic bills.
This is an egg-chicken situation, the only way to solve it is to provide enough capacity of handling the entire zone, so that new servers can be added without been “DDoS” to death. We did so in OP, but it just didn’t worked out, because when we don’t really have enough resource in the pool, servers with packet loss (again, this is not possible to prevent in this circumstance) get kicked, then higher traffic leads to higher loss rate on other server, then more servers are kicked… then the whole zone collapsed.
So how about we keep a constant amount of servers in the pool regardless of their dropping packets (only remove those don’t answer to any probes for long time, but keep ones that occasionally fail two or three probes), and gracefully add more servers to handle those requests, to finally reach the point that total capacity is larger than total requests, and servers in pool don’t need to run at full bandwidth, as other zones like .us or .eu are?