Suggestions for the offset threshold for pool server

I’m picking this thread back up since the consensus seems to be in favor of a lower offset threshold, but no conclusion has been reached.

Planning for internet congestion or outage of an entire country feels out of scope to me. The problems of a complete outage would be the same regardless of how high or low the offset threshold is chosen. If the country only has an unstable or congested link to the wider internet, lost packets would still impact the scoring the same way they do now. The only change occurs when the routing or congestion results in asymmetric packet travel times. But even then it only becomes a problem if the asymmetries become bad enough to induce more than 50ms of error.
In any of those cases, the monitoring system can not really tell what is going on the inside of the affected network. But the decision that it makes is not only relevant for clients inside the area, but also for clients outside. So dropping the servers seems like the right decision to me for the current architecture of the pool.

I think the goal here should not be to check if the server could be synchronized to a correct clock and only is behind a bad network, but to check if the server is able to be used as a reliable and precise time source. A server that my (synchronized) client sees with 100ms latency and 80ms offset might be working correctly, but is still not a good time source for my client. So I don’t think we should consider the latency.

Continously applying a relative threshold would over time delete the pool. We would measure - then drop 2-5% of the servers - then measure again, now the threshhold is 95% of the remaining server measurements - again, drop the worst 2-5% - if we repeat this often enough we slowly but surely decrease the pool size until only a handful of servers remain.