More precise (sensible, sensitive) server monitoring score

I am wondering does this retry logic still make sense?
With the new multi-location monitoring system in place, servers get very good scores in general. We may want more precise measurement, even knowing about one packet loss and not smoothing the offset value. So I suggest to change the line:

cfg.Samples = 3

to be:

cfg.Samples = 1

in monitor/client/monitor/monitor.go at main · ntppool/monitor · GitHub

1 Like

servers get very good scores in general

Always good reminder that NA/EU networking is pretty good, but outside of that fluctuations will happen.

Removing retries maybe once more global PoPs exists.

1 Like

Rather, there are fluctuations as soon as packets hop on an undersea cable, even between the EU and NA.

The reason being that different probes may go through wildly different, long latency routes at different times.

2 Likes

I agree with @NTPman that more precise monitoring would be better. Under the assumption that the monitors represent the pool clients, any packet loss should be taken into account. Even if the cause for packet drops is outside of a server operators influence, it still potentially impacts any clients.

To account for sporadic packet drops and the resiliency of ntp clients against such drops, the point penalty for a network timeout could be decreased if the decision is made to not retry unanswered queries.

Which leads to the more general question: How harsh should the monitoring punish packet drops?

  • After how many consecutive unanswered queries should a server be considered offline and dropped from the pool? (currently: 6, in 2 bursts of 3 packets each)
  • How many packet drops should be allowed on average until a server is not considered reliable enough for the pool? (currently: up to 2/3 of packets can be lost without consequence…)
1 Like

Let me revive this old topic (the issue is painful to me).

My server has score equal to 20 from 95 monitors, and less than 20 only from 17 monitors.

The score 20 from a particular monitor means there was no timeout from many, many monitoring runs.

I do not think that the Internet quality ameliorated so much since the introduction of the new monitoring system that the NTP packet loss became such rare event.

I think current version of the monitoring system hides valuable data.

In one particular run from one monitor to one server multiple probes are sent (3 at this moment), and if any probe succeed, than the full set of sample is considered success.

What data is hidden, or lost? The distinction between two monitors, one monitor where all the three probes are always success from the other monitor where only some probes are success from the three probes for a given NTP server. Both the two monitors gives score 20 for the long run, but that is unfair.

Until this it looks theoretical. But let’s take an example from the real world.

I selected an NTP server that is reachable, but not perfectly from my test monitor (frlys1-355n9ds) in the beta system: 111.198.57.33.

tumbleweed:~ # tcpdump -nn -r ntp1.pcap | grep -E '^(06:5|07:0).*111.198.57.33'
reading from file ntp1.pcap, link-type EN10MB (Ethernet), snapshot length 262144
06:51:20.083038 IP 192.168.1.2.53697 > 111.198.57.33.123: NTPv4, Client, length 48
06:51:20.223770 IP 111.198.57.33.123 > 192.168.1.2.53697: NTPv4, Server, length 48
06:51:22.224914 IP 192.168.1.2.37661 > 111.198.57.33.123: NTPv4, Client, length 48
06:51:22.483003 IP 111.198.57.33.123 > 192.168.1.2.37661: NTPv4, Server, length 48
06:51:24.484301 IP 192.168.1.2.60743 > 111.198.57.33.123: NTPv4, Client, length 48
06:51:24.730062 IP 111.198.57.33.123 > 192.168.1.2.60743: NTPv4, Server, length 48
06:55:34.676930 IP 192.168.1.2.38994 > 111.198.57.33.123: NTPv4, Client, length 48
06:55:34.886060 IP 111.198.57.33.123 > 192.168.1.2.38994: NTPv4, Server, length 48
06:55:36.887033 IP 192.168.1.2.34582 > 111.198.57.33.123: NTPv4, Client, length 48
06:55:37.044173 IP 111.198.57.33.123 > 192.168.1.2.34582: NTPv4, Server, length 48
06:55:39.045276 IP 192.168.1.2.43220 > 111.198.57.33.123: NTPv4, Client, length 48
06:55:39.240575 IP 111.198.57.33.123 > 192.168.1.2.43220: NTPv4, Server, length 48
06:59:42.921494 IP 192.168.1.2.51422 > 111.198.57.33.123: NTPv4, Client, length 48
06:59:43.068721 IP 111.198.57.33.123 > 192.168.1.2.51422: NTPv4, Server, length 48
06:59:45.069648 IP 192.168.1.2.45790 > 111.198.57.33.123: NTPv4, Client, length 48
06:59:50.072415 IP 192.168.1.2.56603 > 111.198.57.33.123: NTPv4, Client, length 48
06:59:50.219855 IP 111.198.57.33.123 > 192.168.1.2.56603: NTPv4, Server, length 48
07:04:14.799439 IP 192.168.1.2.51640 > 111.198.57.33.123: NTPv4, Client, length 48
07:04:14.950563 IP 111.198.57.33.123 > 192.168.1.2.51640: NTPv4, Server, length 48
07:04:16.951960 IP 192.168.1.2.54686 > 111.198.57.33.123: NTPv4, Client, length 48
07:04:17.165959 IP 111.198.57.33.123 > 192.168.1.2.54686: NTPv4, Server, length 48
07:04:19.167400 IP 192.168.1.2.38934 > 111.198.57.33.123: NTPv4, Client, length 48
07:04:19.383559 IP 111.198.57.33.123 > 192.168.1.2.38934: NTPv4, Server, length 48
07:08:21.774295 IP 192.168.1.2.55438 > 111.198.57.33.123: NTPv4, Client, length 48
07:08:21.931311 IP 111.198.57.33.123 > 192.168.1.2.55438: NTPv4, Server, length 48
07:08:23.931737 IP 192.168.1.2.40330 > 111.198.57.33.123: NTPv4, Client, length 48
07:08:24.082025 IP 111.198.57.33.123 > 192.168.1.2.40330: NTPv4, Server, length 48
07:08:26.082575 IP 192.168.1.2.33998 > 111.198.57.33.123: NTPv4, Client, length 48
07:08:26.294138 IP 111.198.57.33.123 > 192.168.1.2.33998: NTPv4, Server, length 48
tumbleweed:~ # 

and

tumbleweed:~ # curl -s 'https://web.beta.grundclock.com/scores/111.198.57.33/log?limit=200&monitor=frlys1-355n9ds' | grep -E ' (06:5|07:0)'
1765696106,2025-12-14 07:08:26,0.006772629,1,19.999845505,128,frlys1-355n9ds,150.288,,
1765695859,2025-12-14 07:04:19,-0.002198036,1,19.999837875,128,frlys1-355n9ds,151.12,,
1765695590,2025-12-14 06:59:50,-0.000376309,1,19.999828339,128,frlys1-355n9ds,147.315,,
1765695339,2025-12-14 06:55:39,0.003695062,1,19.999820709,128,frlys1-355n9ds,157.169,,
1765695085,2025-12-14 06:51:25,0.002075839,1,19.999811172,128,frlys1-355n9ds,140.879,,
tumbleweed:~ # 

The sample at 06:59:50 is considered good. However, on the packet capture you can see that the second, middle probe’s reply packet is lost. (The default packet spacing is two seconds, plus three seconds waiting for the reply packet, that accounts from the packet spacing of 5 seconds to the next probe. 06:59:50.072415 - 06:59:45.069648 = 5 seconds + 0.03 sec processing time)

The score of the NTP server 111.198.57.33 is 20 from the monitor frlys1-355n9ds in the beta system, when it shouldn’t be.

I suggest the following change in the monitoring code: make the number of probes a run-time configurable parameter. The code should run properly when this parameter is equal to three (as today) and run properly as well when this parameter is equal to one.

Then, as next step for the production deployment monitors use parameter value 3 (not affecting the production), and for the beta monitors use parameter value 1 (gain experience in the beta/test environment).

I think the parameter can already be configured at run time, ie. it does not require changes to client binaries.

But you’re still ignoring my concern from the other topic that by changing the number of queries to one we would not be able to detect if some server does too aggressive rate limiting. For example chrony on Rocky Linux 9 with the iburst option seems to send five queries with a two second interval at startup. Besides, our great leader Ask said: “One reason we do multiple queries is to detect servers with overly aggressive rate limits.”

I’d still rather not touch the current scoring algorithm or the number of queries sent in a batch. But if more data is desired, the monitors could report to the pool monitor management server the number of queries sent, the number of good responses received and the number of error/timeout responses. This would be only for collecting more data, without touching the scoring at all yet.

No we don’t.
As this will mean the monitors that are not in charge of your scoring have to work harder.
Why?

The monitors that score your server give enough data.

I do not want to see my monitor being overloaded just because of this.

An active monitor needs to examine you being a good ticker or not.
You can not make it more precise this way. As ping-time is corrected.

Making it ‘ping’ more won’t help. I fail to see your point.

Only way to make it more precise is by making the off-set figure more strickt.
But will this make it more accurate or just drop more servers?

Make scoring more sensible, yes that may help, make the rms-freq-error more strict for scoring.

Will this help? Maybe.

Question is, how much are servers off-time at all?

Less work, one third the number of packets sent relative to the today’s situation.

How? As currently the non-active monitors sends about 1/3th of the active monitors.
When you check the graphs.

However, how will this improve accuracy?

I’m trying to understand your point of view.

Polling more won’t change accuracy. I do not see how.

Three queries sent, and if only one reply is received, the sample is considered perfect (if there is no KoD packet). I fail to see how aggressive rate limiting translates into decrease in score.

That does not change.

We will see the difference between a very good monitor-server connectivity from a medium quality monitor-server connectivity (such as some, sporadically occuring packet loss). Today the second case is reported as a very good (no packet loss) situation.

Monitors are checked…

Why do you think the monitors are wrong/off?

Packet-loss is an internet problem…not a monitor problem.

I fail to see your issue.

I have to admit that I forgot the system was changed to behave that way some months ago. Previously, all the sent requests were expected to be replied to.

But still, how did you think you would detect rate limiting issues with only one query? I’d think it would require several queries sent at two second intervals or so.

What do you think of my proposal to keep the current number of queries in a batch, but report them independently to the pool management server? I’d think it’d be the best of both worlds, ie. it would detect rate limiting issues AND it would detect single requests getting dropped.

Hmm, strictly speaking, it means that the server meets the quality criteria currently encoded in the monitors and scoring system to the highest degree, but not necessarily that it is perfect. The NTP protocol is somewhat immune to some disturbance, and as the monitoring system’s role is to asses each server based on the suitability to serve time to clients, that is what the system reflects. A sanity check if you will, e.g., that there isn’t too much packet loss that would impair getting time, or too much off, or some other anomaly. And it is certainly not a beauty contest or competition as to who will get the best scores, though I concede that that probably plays a role in motivating server operators to add servers to the pool, and to maintain them in good condition.

I don’t see how that data would enhance the purpose of the monitoring system, which is to assess the suitability of a server to serve time and be included in the pool.

I understand the point, and think it is important to address this perceived unfairness. But I don’t think making each individual probe count and impact is the way to go. From a Global North perspective, that may work very well. But while the pool still aspires to be a truly global project, also conditions in the Global South, where infrastructure is more expensive and less reliable as a matter of fact, I fear that would kick many a server out of the pool, aggravating an already bad situation in arguably the largest parts of the world (by people). Just two or three lost packets would kick a server out of the pool.

Usually, I am all for just trying things out, which gives better answers than debating theoretically. But to be fair in that context, it would need a level playing field, and good representation from each place that the pool aspires to serve. And I see that even less with the beta system than already with the production system.

Not related to a server, but as analogy to highlight the problem: Some of my monitors particularly in Asia more or less frequently take themselves out of the monitoring and thus don’t contribute to scoring of local and regional servers. Despite their time being ok as compared to good local reference clocks. Why? Because they are being assessed against reference servers far away, sometimes half way aound the world, with one saying it is +20ms off, the other -20ms, and only one can be true. So already this introduces a serious bias into the whole system.

Though I recognize that differentiating a bit more would be considered to be fairer, so how about scaling the steps that the scores can take. E.g., subtract not the full 5 points when one out of three packets gets lost, but only like half a point. Or add less than a full point.

PS: I am currently seeing at least one of my monitors sending four probe packets, not three.

Preload the server with arbitrary number of packets, spaced 2 seconds, and take into consideration only the last packet for scoring.

In fact, there is already a single packet loss being reflected. The score from 19.982591629 decreased to 19.959260941 . But that decrease does not seem visible.