More precise (sensible, sensitive) server monitoring score

I am wondering does this retry logic still make sense?
With the new multi-location monitoring system in place, servers get very good scores in general. We may want more precise measurement, even knowing about one packet loss and not smoothing the offset value. So I suggest to change the line:

cfg.Samples = 3

to be:

cfg.Samples = 1

in monitor/client/monitor/monitor.go at main · ntppool/monitor · GitHub

1 Like

servers get very good scores in general

Always good reminder that NA/EU networking is pretty good, but outside of that fluctuations will happen.

Removing retries maybe once more global PoPs exists.

1 Like

Rather, there are fluctuations as soon as packets hop on an undersea cable, even between the EU and NA.

The reason being that different probes may go through wildly different, long latency routes at different times.

2 Likes

I agree with @NTPman that more precise monitoring would be better. Under the assumption that the monitors represent the pool clients, any packet loss should be taken into account. Even if the cause for packet drops is outside of a server operators influence, it still potentially impacts any clients.

To account for sporadic packet drops and the resiliency of ntp clients against such drops, the point penalty for a network timeout could be decreased if the decision is made to not retry unanswered queries.

Which leads to the more general question: How harsh should the monitoring punish packet drops?

  • After how many consecutive unanswered queries should a server be considered offline and dropped from the pool? (currently: 6, in 2 bursts of 3 packets each)
  • How many packet drops should be allowed on average until a server is not considered reliable enough for the pool? (currently: up to 2/3 of packets can be lost without consequence…)
1 Like