Who watches the watchers?

elvisimprsntr · December 4, 2025, 12:42pm

BACKGROUND

My GPS+PPS disciplined NTP server is being monitored by the pool.

I read ntppool uses SNTP for monitoring, which does not take into account asymmetric latencies, but I was curious to know why some candidate monitors are calculating negative scores when all the other monitors are calculating at or near 20?

MagicNTP · December 4, 2025, 2:04pm

I don’t think compensating for path asymmetries is a feature of NTP itself (vs. SNTP), but rather a feature of specific implementations. Though indeed keeping some history, like “full” NTP implementations do, would arguably be helpful in doing so, with SNTP clients thus less suitable from that perspective.

I guess that is just a “feature” of the wild, big Internet. Based on own experience, there’s always issues somewhere. In this case, the packets simply might take a route that is overloaded, so a sufficient number of them get dropped for the score to be affected. In some cases, there simply isn’t any connectivity between customers of certain providers because they can’t agree on connecting to each other. E.g., Cogent refusing to peer with Hurricane Electric/HE not willing to pay Cogent for peering prevents any and all IPv6 communication between the two. IPv4 traffic between the two finds a path, but it is far from optimal, potentially impacting NTP traffic.

elvisimprsntr · December 4, 2025, 3:00pm

Would limiting monitors to the same country or zone reduce occurrence of false negative scores?

Bas · December 4, 2025, 3:22pm

Normally you only get monitors assigned that score you good.
Those will become active for your server.
All others will be ignored.

Your score only drops when active monitors monitor you badly, typical when your server is online this won’t happen…except when your are working on something that it can’t be reached or your server is ticking wrongly.

Other then that, your score will be perfect and monitors that see your server are selected.

When you add a new server to the pool, it also takes some time for monitors are assigned to your system.

From what most of us see, this system works quite well.

Keep in mind, only the ACTIVE monitors determine your score, all others have NO impact, regardless their score.

And yes, active monitors can be added when candidates score you better.
And yes, active monitors can be removed when actives score worse then candidates.

Normally only the best monitors are assigned to your server. It’s an algorithm that determines this automatically.

The task of a monitor is to see if you are online and ticking correctly, the ms you see is a ping-time not important. A monitor with a higher ping can be very good for your server, while a close monitor may not even reach you.

Also know, the monitors themselves are also checked by the system and internally by the monitor-deamon itself, before it started checking other servers, the monitor checks the system-time of the machine, when it’s off, it will stop checking others.

MagicNTP · December 4, 2025, 3:50pm

Not sure what the issue with “false” negative scores is, apart from an “optical” thing (doesn’t look good in the graphs). I wouldn’t even call them “false”, because the monitors see what they see, nothing “false” about that. And one has to consider that the score is not only about the server, but it is reflecting the relationship between the monitors, and the server. As an incomplete representation of what “normal” clients would see. Which is what the monitoring system is all about. Not about getting best grades for the server itself.

To some degree, the system already does what you suggest, by giving preference to monitors that are closer (RTT-wise) to the server. The only thing it does not do is that it does not ignore servers that are farther away (it’s been suggested even before the new monitoring system was put in place to give less prominence in the graphs to severe outliers).

And from a practical point of view, it wouldn’t work, because probably the majority of countries/zones don’t have any monitors in them. Even continent zones sometimes have less than the currently required number of monitors (7) in them.

So from the comfortable position of a server operator in Europe or North America (and some select other places elsewhere), sure, one could be more “strict” (different aspects of “strictness” have been discussed before, e.g., tightening the performance requirements on servers). However, if the Pool still aspires to be a truly global project, then I don’t think further reinforcing an already existing imbalance/bias (with many causes) even further is the way to go.

Bas · December 4, 2025, 7:09pm

This is not true, monitors that often give good results are selected,
It does not matter where they are.
A monitor that keeps testing good will made active.

Sorry, what you say doesn’t matter, the ping-time has no impact.

Monitors check your time, and if you are online…if they keep giving that as good, they are selected…nothing more to it.
Monitors that can not verify your correct time are not assigned. Simple as that.

MagicNTP · December 4, 2025, 8:15pm

Yes, that is the first preference, which takes precedence. But second is RTT. With mechanisms in place to avoid switching active and testing monitors too often, so there may be situations where testing monitors may have better values in one or both categories than active ones.

Sorry to say, but it does. Just compare the typical RTT values of active, testing, and candidate monitors. There may be exceptions, but you should be able to see a pattern.

And yes, score has first priority, but since the proposal was to consider some metric roughly related to RTT (country/zone), that is what I focused on, omitting the aspect about the scores for brevity and clarity (which apparently did not work as intended…).

MagicNTP · December 4, 2025, 8:28pm

Here’s an example for a server in India. Do you see a pattern (with exceptions) regarding where the monitors are located (as far as the country code in their name can be trusted), and regarding RTT values (considering that monitors in each section are first ordered according to score, but among monitors with same score secondly based on RTT, i.e., hkhkg1-3strqkc’s RTT is the lowest among Candidate monitors, and twtpe2-2kg9ezv’s RTT already exceeds 100ms)?

MagicNTP · December 4, 2025, 9:04pm

Here an example of a server in Japan which had some issues very recently, thus the varying scores from the monitors lending for some diversity. But again, do you see some sort of a pattern here regarding monitor location (with one exception, with LA and the West Coast of the USA typically well connected towards Asia), and regarding RTT?

Ok, one example only, could be an outlier. But look at as many servers as you like, and you should be able to discern a pattern.

I don’t even think this would help in your case. Looking at the full set of monitor scores for the server you indicated, it seems your server is in the USA, and the offending monitors with negative scores are as well. Thus, limiting monitors to the same country or zone as the server would not get rid of those offending monitors for you.

Nice graphs by the way!

Bas · December 5, 2025, 10:01am

I agree here, but before the change the monitor (and beta monitors) where just unable to reach the system in time and your system was taken out of the pool because of that.

Monitors that do this too much will be replaced with others that are acting better.

As such the RTT only matters, it should not matter for scoring a system.

Also, it’s not just 1 monitor that can bring your score down, a bad scoring monitor will be taken out before anything drastic happens like your server(s) being taken out of the pool.

That was the entire point of the new system, to prevent that from happening.

I do not see that happening with the servers in the above samples.

MagicNTP · December 5, 2025, 10:53am

I have the impression two, or even more (related but separate) things are getting confused here:

The criteria for including an NTP server in the pool’s DNS rotation, which is the score, which is determined based on reachability, accuracy (i.e., offset), and some other criteria (e.g., stratum, KISS codes). RTT indeed doesn’t play a direct role here, but nobody suggested it did. Only thing is that the longer the path, the higher the likelihood for packet loss (impacting reachability aspect), and path asymmetries and the magnitude they can reach (impacting offset, and how large the offset due to path asymmetries can get), so there is at least an indirect potential impact of RTT on the score, but it is not a direct input in the score calculation.
The criteria for selecting Active monitors for an NTP server, where the score (reflecting the aspects mentioned in the previous bullet item) is the primary criterion, and RTT secondary. I.e., over time, monitors closer to the server tend to be favored as long as there’s not other factors impacting their score. Even when adding a new server, when there isn’t sufficient measurement data yet to calculate a solid score to sort monitors into the different categories, I think the system deliberately picks monitors with lower RTT to be the initial “Testing” monitors over monitors with higher RTT.
How a monitor “going bad” affects a server’s score. Yes, a monitor that becomes unavailable (doesn’t send any data anymore), or considers its own “clock” unreliable, will not generate new data points/scores for a server anymore. But existing scores will still be considered for a server. And as mentioned before, the score is not about the server only, but reflects the relationship between monitor and server. I.e., the monitor could still be considered “good” and provide data for a server that is also “good”, but due to some issue between monitor and server, out of control of either, the monitor’s score for the server can still drop, potentially affecting the server’s overall score. But how, or rather, how fast that affects the overall server score is mostly governed by the intentional inertia of the system as far as switching monitors is concerned. I.e., it will take some time before a monitor which start generating “bad” scores for a server will be replaced by a monitor with a better score so as to avoid too many switches, potentially causing system instability.

I don’t think that was the “entire” point of the new system, not least because already the previous system (with a closed group of monitor hosts) was doing it that way in my understanding, so not really much change there. Not sure whether it was explicit then already, and just not highlighted as it is now with the RTT being shown in a server’s monitor table, but I well remember that over time, at least my servers would predominantly be scored by monitors “close” to them geographically.

I think the main (but probably not only) points were to counter the attrition in monitors, and to provide more diversity regarding monitor locations. I believe already the previous system was considering the RTT as one factor when selecting monitors for a server, as it has become clear that while high RTT does not necessarily impact the metrics that ultimately determine the score for a server, the likelihood that it does increases as the RTT increases, as mentioned above. Besides other factors not necessarily directly related with RTT potentially impacting the score, e.g., non-negligible parts of the Internet in India apparently having asymmetric paths to the rest of the world causing remote monitors to see higher apparent offsets than when scored by local monitors.

No, but that also was not the point of the original poster in my understanding. Rather, that for reasons out of scope for this comment, some unspecified uneasiness about what was called “false negatives” lead to the desire to not have them considered for a server (whatever that means, as as you rightly point out as well, given the ample amount of monitors that have good scores for the server, the “false negatives” do not play any role whatsoever in determining the overall score of the server, and regardless of that in that particular example, the proposed mitigation would not even have helped).

elvisimprsntr · December 8, 2025, 1:07pm

I published the python script on my GitHub page. I added a few more plots, including an arbitrary scoring calculation of the monitoring servers. Appears RTT > 400 msec is about where the monitoring servers start to negatively impact my NTP server score. Let me know if there are other visualizations which may be helpful.

MagicNTP · December 8, 2025, 2:50pm

Cool, thank you very much! Quickly checked a few of my servers, quite interesting, e.g., offset vs. RTT for two IPv4 servers in India and Singapore, respectively.

Will take a more thorough look later, also to come up with potential further virtualizations that might be useful.

Thanks again, nice job working with Copilot to get those nice plots!

Topic		Replies	Views
More precise (sensible, sensitive) server monitoring score Pool Development monitoring	55	666	January 10, 2026
Suggestions for monitors, as Newark fails a lot and the scores are dropped too quickly Server operators monitoring	91	4325	August 2, 2021
Monitor belgg1-19sfa9p Pool Development monitoring	19	834	May 31, 2023
Score/network woes Server operators monitoring	71	7229	March 7, 2019
Monitoring upgrade Announcements	68	3673	May 25, 2023

Who watches the watchers?

Related topics