Hi all,
as far as I am aware, the current default server distribution works like this: Each active server is included in the zone for the country it is in, the zone for the continent it is on, and (at sufficient netspeed) the global zone.
When a client asks for servers, the response is:
- if there is at least one server in the country zone of the country the client is asking from, up to four servers from the same country,
- if there are no servers in the country zone but in the encompassing continent zone, up to four servers from the continental zone,
- if there are no servers in neither the country nor the continental zone, up to four servers from the global zone
The main issue for this thread is that the different countries do not balance each other out. In countries with few servers, these are often overwhelmed and sometimes an entire zone collapses when the servers can’t keep up with the request rate and get demoted by the monitoring. Meanwhile in countries with lots of servers, the servers still have capacity left unused. As an interesting side effect, coverage in countries with no servers is sometimes more stable than a country with a small amount of servers since the load gets distributed across the entire continent. All of this has been discussed in different threads here before.
The goal of this thread is to collect and discuss different approaches on how to distribute the available servers across the country zones, so we achieve a better coverage and better utilization of the resources made available in the pool.
Parameters to think about:
- How do we select servers to support underserved zones? Simply by continental grouping? Geographic adjacency of countries?
- What is the impact on the servers used as support? How do we prevent a scenario where we degrade a not yet underserved zone by adding too much load from nearby underserved zones?
- How do we handle the difference between zones? Does an optimization approach fit for large and small zones alike?
- How do we balance higher availability from adding more servers to a zone against higher latency from adding servers that are farther away?
- How do we define which zones are actually “underserved”? A certain amount of DNS requests per server? A certain amount of DNS request in proportion to netspeed? The amount of servers?
- How much complexity do we want to introduce?
I want to explicitly exclude the topic of IPv4 and IPv6 from the scope of this thread. Any server distribution algorithm should work well for both protocol versions, as I assume there will be both IPv4-only and IPv6-only clients for the foreseeable future, where the distribution of servers in one protocol version is irrelevant to the quality of distribution in the other.
I want to present two options that I have thought about. Both are designed purely on the NTP Pool management level without having to introduce new features to the GeoDNS servers.
The first approach was suggested here Minor new features on the website - #9 by ask and is to define a limit on at least how many servers should be in a zone. If there are not enough servers in a given zone, then servers from the surrounding zones could be added.
For the specific implementation: The Pool could calculate the average netspeed of the servers directly mapped to the zone, then add all servers (or at least all that are not themselves in an underserved zone) from the surrounding continent to the zone and scale their netspeeds down so that in the end, they add a total netspeed equal to the total netspeed of the “missing” servers if they all had the average netspeed. Pros: Low complexity, works with information already present in the core Pool database, no change for zones that are already empty or have a huge amount of servers. Cons: Static, might overload other zones in the same continent. Variation: Define a list with the minimum server count per zone based on the DNS statistics to account for differences in Pool usage by country.
Another approach could be to calculate the “load” on a zone based on the amount of DNS requests the zone gets in proportion to the “amount” of netspeed it has. Then the servers from zones with below average load are added to nearby (=same continent?) zones with above average load, again with a scaled down netspeed to support the above-average-loaded zones without needlessly dominating them. This could be mapped again on a continental level, and might even implement that countries that have a lower relative load are weighted to take more support load. Pros: Dynamic, scales with zone usage and server counts. Cons: Need to implement DNS metrics in the Pool management algorithms, zone generation now depends on an “external” factor, might degrade service quality for “below average” zones that were already well served.
For any approach, instead of just using all available servers of the continental zone, we could define a distance matrix, indicating the distance between countries as a simple number abstracting geographical distance and internet interconnections. If a zone needs support, the support is provided by servers from all countries but scaled to weigh servers from nearby countries with a higher netspeed.
What are your thoughts on this topic? Do you have further input or ideas on how to improve the server distribution?
