Hi,
I was wondering whether it would be possible, as well as make sense, to have the pool gradually add/remove a server to/from the pool (i.e., scale the inclusion in DNS responses) in parallel to the server’s score increasing/decreasing, vs. the current binary on/off at score 10?
I’ve been wondering about that as part of recurring discussions on this forum related to challenges in underserved regions, but am now experiencing that myself first hand (though not in a way that I couldn’t deal with it by own means).
E.g., instead of the netspeed setting being considered in a binary on/off fashion when the score crosses the 10 points boundary, more something like this:
- 0 < score < 10: fraction of inclusion in DNS = netspeed * 0% (= 0, as now)
- 15 <= score <= 20: fraction of inclusion in DNS = netspeed * 100% (= full “netspeed”, as now for the entire range 10-20)
- 10 <= score < 15: fraction of inclusion in DNS = netspeed * (score - 10) / 5 (new, somewhere between 0% and 100% of “netspeed”)
One issue that was reported time and time again in various threads related to under-served zones was that adding a new server to such an under-served zone is a challenge because once the score crosses the boundary of 10 points, the server gets hit “right away” with the full traffic load corresponding to its netspeed share in that zone*. Enough to right away bring a less beefy server down again in the scoring, eventually/potentially leading to some kind of yo-yo effect for the server’s score, with related traffic pattern. Which in turn may have repercussions on other such servers in that zone, leading to a domino effect of servers dropping from the zone, as described a few times in this forum.
Having a more gradual “DNS share” increase does not solve the underlying problem of a zone being under-served, but might make it a bit easier to add servers to the zone, and might help in keeping the number of servers in such a zone steadier, helping all the servers in the zone.
@ask, you previously hinted at working on something like that, though at the time more in the context of dealing with “weird” server behavior. While I think the conclusion at the time was that the specific “tests” considered then might not have been useful (potentially triggering default rate limits of some server implementations), I think the functionality of such more nuanced inclusion in the pool would generally be useful in adequately-served zones as well (not only in under-served ones).
Namely by generally reducing exposure of clients to not-optimally scoring servers, be it because a server is on its (temporary) way out of the pool, i.e., while transitioning in -5 downward steps in scoring from high values to values below 10. Or be it because there are semi-persistent issues with a server, e.g., in connectivity or maintaining good offset, so the server’s score oscillates somewhere at the lower end of the 10…20 score range, or sometimes dips into the area below 10 points.
The above “formula” is just a proposal, trying to keep the general “in the pool above 10, out of the pool below 10” approach, limiting the gradual part to the lower half of the “in the pool” range to keep a sufficient score range where full “netspeed” is reached, and being relatively simple (linear relationship between score and DNS inclusion share in the transition area). But this could obviously be tweaked, e.g., as far as threshold values are concerned, especially after some potential real-life experience with such an approach.
* This description is a simplification, the actual process is a bit more differentiated, but still results in a potentially very steep/sudden increase in traffic load prone to cause issues, e.g., in my case “DDoS protection” outside of my control temporarily blocking all NTP traffic to my server.