More precise (sensible, sensitive) server monitoring score

Let me revive this old topic (the issue is painful to me).

My server has score equal to 20 from 95 monitors, and less than 20 only from 17 monitors.

The score 20 from a particular monitor means there was no timeout from many, many monitoring runs.

I do not think that the Internet quality ameliorated so much since the introduction of the new monitoring system that the NTP packet loss became such rare event.

I think current version of the monitoring system hides valuable data.

In one particular run from one monitor to one server multiple probes are sent (3 at this moment), and if any probe succeed, than the full set of sample is considered success.

What data is hidden, or lost? The distinction between two monitors, one monitor where all the three probes are always success from the other monitor where only some probes are success from the three probes for a given NTP server. Both the two monitors gives score 20 for the long run, but that is unfair.

Until this it looks theoretical. But let’s take an example from the real world.

I selected an NTP server that is reachable, but not perfectly from my test monitor (frlys1-355n9ds) in the beta system: 111.198.57.33.

tumbleweed:~ # tcpdump -nn -r ntp1.pcap | grep -E '^(06:5|07:0).*111.198.57.33'
reading from file ntp1.pcap, link-type EN10MB (Ethernet), snapshot length 262144
06:51:20.083038 IP 192.168.1.2.53697 > 111.198.57.33.123: NTPv4, Client, length 48
06:51:20.223770 IP 111.198.57.33.123 > 192.168.1.2.53697: NTPv4, Server, length 48
06:51:22.224914 IP 192.168.1.2.37661 > 111.198.57.33.123: NTPv4, Client, length 48
06:51:22.483003 IP 111.198.57.33.123 > 192.168.1.2.37661: NTPv4, Server, length 48
06:51:24.484301 IP 192.168.1.2.60743 > 111.198.57.33.123: NTPv4, Client, length 48
06:51:24.730062 IP 111.198.57.33.123 > 192.168.1.2.60743: NTPv4, Server, length 48
06:55:34.676930 IP 192.168.1.2.38994 > 111.198.57.33.123: NTPv4, Client, length 48
06:55:34.886060 IP 111.198.57.33.123 > 192.168.1.2.38994: NTPv4, Server, length 48
06:55:36.887033 IP 192.168.1.2.34582 > 111.198.57.33.123: NTPv4, Client, length 48
06:55:37.044173 IP 111.198.57.33.123 > 192.168.1.2.34582: NTPv4, Server, length 48
06:55:39.045276 IP 192.168.1.2.43220 > 111.198.57.33.123: NTPv4, Client, length 48
06:55:39.240575 IP 111.198.57.33.123 > 192.168.1.2.43220: NTPv4, Server, length 48
06:59:42.921494 IP 192.168.1.2.51422 > 111.198.57.33.123: NTPv4, Client, length 48
06:59:43.068721 IP 111.198.57.33.123 > 192.168.1.2.51422: NTPv4, Server, length 48
06:59:45.069648 IP 192.168.1.2.45790 > 111.198.57.33.123: NTPv4, Client, length 48
06:59:50.072415 IP 192.168.1.2.56603 > 111.198.57.33.123: NTPv4, Client, length 48
06:59:50.219855 IP 111.198.57.33.123 > 192.168.1.2.56603: NTPv4, Server, length 48
07:04:14.799439 IP 192.168.1.2.51640 > 111.198.57.33.123: NTPv4, Client, length 48
07:04:14.950563 IP 111.198.57.33.123 > 192.168.1.2.51640: NTPv4, Server, length 48
07:04:16.951960 IP 192.168.1.2.54686 > 111.198.57.33.123: NTPv4, Client, length 48
07:04:17.165959 IP 111.198.57.33.123 > 192.168.1.2.54686: NTPv4, Server, length 48
07:04:19.167400 IP 192.168.1.2.38934 > 111.198.57.33.123: NTPv4, Client, length 48
07:04:19.383559 IP 111.198.57.33.123 > 192.168.1.2.38934: NTPv4, Server, length 48
07:08:21.774295 IP 192.168.1.2.55438 > 111.198.57.33.123: NTPv4, Client, length 48
07:08:21.931311 IP 111.198.57.33.123 > 192.168.1.2.55438: NTPv4, Server, length 48
07:08:23.931737 IP 192.168.1.2.40330 > 111.198.57.33.123: NTPv4, Client, length 48
07:08:24.082025 IP 111.198.57.33.123 > 192.168.1.2.40330: NTPv4, Server, length 48
07:08:26.082575 IP 192.168.1.2.33998 > 111.198.57.33.123: NTPv4, Client, length 48
07:08:26.294138 IP 111.198.57.33.123 > 192.168.1.2.33998: NTPv4, Server, length 48
tumbleweed:~ # 

and

tumbleweed:~ # curl -s 'https://web.beta.grundclock.com/scores/111.198.57.33/log?limit=200&monitor=frlys1-355n9ds' | grep -E ' (06:5|07:0)'
1765696106,2025-12-14 07:08:26,0.006772629,1,19.999845505,128,frlys1-355n9ds,150.288,,
1765695859,2025-12-14 07:04:19,-0.002198036,1,19.999837875,128,frlys1-355n9ds,151.12,,
1765695590,2025-12-14 06:59:50,-0.000376309,1,19.999828339,128,frlys1-355n9ds,147.315,,
1765695339,2025-12-14 06:55:39,0.003695062,1,19.999820709,128,frlys1-355n9ds,157.169,,
1765695085,2025-12-14 06:51:25,0.002075839,1,19.999811172,128,frlys1-355n9ds,140.879,,
tumbleweed:~ # 

The sample at 06:59:50 is considered good. However, on the packet capture you can see that the second, middle probe’s reply packet is lost. (The default packet spacing is two seconds, plus three seconds waiting for the reply packet, that accounts from the packet spacing of 5 seconds to the next probe. 06:59:50.072415 - 06:59:45.069648 = 5 seconds + 0.03 sec processing time)

The score of the NTP server 111.198.57.33 is 20 from the monitor frlys1-355n9ds in the beta system, when it shouldn’t be.

I suggest the following change in the monitoring code: make the number of probes a run-time configurable parameter. The code should run properly when this parameter is equal to three (as today) and run properly as well when this parameter is equal to one.

Then, as next step for the production deployment monitors use parameter value 3 (not affecting the production), and for the beta monitors use parameter value 1 (gain experience in the beta/test environment).