More precise (sensible, sensitive) server monitoring score

Routers may have lower priority for responding to traceroute packets, ie. responding to those may take longer. This will show up as longer RTTs. Another funny thing you may have noticed when doing traceroutes to hosts far away (ie. >10 hops or so) – it is possible that traceroute reports 14 ms for the first few hops, then maybe 10 ms for the next hops and then 20 ms for the last hops. This should be “impossible”, but it really isn’t because there’s no guarantee that traceroute requests are replied to immediately. Then there’s also a possibility that packets take one path when going towards the destination, and another path when going back to you. In short, take the results of traceroute RTTs with a large grain of salt.

As for the RTT in the NTP and NTP Pool contexts, it really is the round-trip time. The relation between ping times and NTP RTT measurements may be more visible when observing hosts further away. Let’s take one random example NTP server from Australia. Its RTT time for fihel4 is (currently) shown as 309.2 ms, and when I ping the same server from fihel4 I get “rtt min/avg/max/mdev = 309.163/309.435/309.517/0.114 ms”. So I’d say the NTP Pool’s RTT time matches the ping time pretty closely.

1 Like

The monitor is on GitHub; the NTP client used by the monitor too. You don’t have to guess what the value is. ntp/ntp.go at v1.5.0 · beevik/ntp · GitHub

The RTT is the receive time minus transmit time.

Its RTT time for fihel4 is (currently) shown as 309.2 ms, and when I ping the same server from fihel4 I get “rtt min/avg/max/mdev = 309.163/309.435/309.517/0.114 ms”. So I’d say the NTP Pool’s RTT time matches the ping time pretty closely.

When doing multiple queries the monitoring client picks the response with the lowest RTT (assuming it’s least affected by network latency), so 309.163ms and 309.2ms matches pretty closely indeed. :slight_smile:

Ok try again, I just tested:

bas@workstation:~$ ntpdate -q 110.232.114.22
2025-12-24 21:45:21.247389 (+0100) -68.074959 +/- 0.170928 110.232.114.22 s2 no-leap
bas@workstation:~$ ping 110.232.114.22
PING 110.232.114.22 (110.232.114.22) 56(84) bytes of data.
64 bytes from 110.232.114.22: icmp_seq=1 ttl=49 time=330 ms
64 bytes from 110.232.114.22: icmp_seq=2 ttl=49 time=329 ms
64 bytes from 110.232.114.22: icmp_seq=3 ttl=49 time=329 ms

Yes, the time from me to him is 330ms. So what?

The active monitors are lower, and score 20. My time check on him is also good.

So what is the point? His time is correct, regardless of the ‘rtt-time’. Meaning this rtt DOES NOT MATTER.

I’m saying this all along, the ‘ping-time’ has no impact on timekeeping.

The problem is/was, that in this topic the TS said we can have more precise time by adding x-monitors to local AS’ses. The thing is…it doesn’t matter.

If I request FAR away, the time is still good, unless the routes change all the time, or timeout, then it can’t be calculated anymore.
But because we have many monitors now, we can be sure ALL Timeservers are scored correctly within limits. And those limits are (hopefully) <5ms…ergo we produce time at 0.005sec accuracy.

Isn’t that the goal of the system? Or am I missing something? Sure I like 1ns…but that 's not realistic. Beware, we serve the globe of GOOD time…then 0.005 is pretty good in my opinion.

My 2cts.

I would call that a time-out! Even at 100ms. Maybe you set it toooooooo high for a responds.

Just my opinion.

I would dismiss any ntp-server that doesn’t respond in 100ms, like the San Jose monitor before (if I’m not mistaken) makes it easier to dismiss monitors from being cosidered or even get active.

100ms would be a good figure for time-out in my opinion.

For NTP traffic, RTT may depend upon the client UDP port.
This manefests as multiple bands in RTT plots due to multiple paths.
Typical traceroute uses multiple UDP ports & may show similar path variations.

For ICMP traffic (e.g., Echo Request / Echo Response), there are no ports.

Network paths can be very assymetric, though that is not common. Recently
I found a network path where traffic from the NTP server (Australia) towards
my NTP client (different city in Australia) was low latency. Traffic from
my NTP client towards the NTP server travelled from Australia, to Japan,
to Los Angeles, and back to Australia. This is a good example of the
maxim “time transfer uncertainty is at least 1/2 the RTT”

Separately, NTP round trip time may differ from Client Rx time - Client Tx time.
NTP servers may internally buffer NTP requests. Normally the
internal buffer time is sub-millisecond. I took a random sample now and see
several servers with internal delay of multiple milliseconds.
I’m omitting some messy details.

1 Like

So what you’re basically saying is that you’d simply kick out pretty much all servers in Oceania, such as this one:

Because once a seventh active monitor has again been added from among the testing monitors, e.g., the above server would be kicked out according to your proposal of considering any RTT beyond 100ms as a timeout.

And this one right away as well:

Or this one:

Or this one:

Or this one in South Africa:

Or this one in Argentina:

Or this one in Peru:

And so on and so forth.

I did not say that, I said the monitors that score that high and are above e.g. 100ms should dismiss the ntp-server, so the monitor is taken off monitorring that ntp-server and better ones are selected.
I used 100ms as sample as San Jose did remove servers from the pool, but it was the only monitor to decide.
In the current system there are 99 monitors. So they stay in the pool unless ALL monitors say the ntp-server is bad. Ergo, all your listed servers will stay in the pool, just different monitors are assigned where the respond-time is less then e.g. 100ms.

A score from a monitor on a wobbly-path to a server isn’t a good score, you shouldn’t want to be scored based on that. I wouldn’t.

Well, while you dismiss it at the beginning of your sentence, you actually repeat it in the second half. Maybe actually take a look at the data I shared, and you would easily see that your proposal would result in pretty much all monitors being dismissed for the examples (and many more in the referenced parts of the world), as fewer than the needed quorum of seven monitors would be remaining, sometimes much fewer.

Well I did that and entered your sample server in Chrony and let it run for a bit. The offset is bigger then my servers but that is to be expected.

Sourcestats:

bas@workstation:~$ chronyc sourcestats 
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
heppen.be                   9   6   395     +0.544      1.018    +34us    83us
heppen.be                   8   7   391     +0.666      3.218   -458us   179us
voip.sprintweb.be          10   7   395     +1.555     20.794  +2319us  1718us
ip-217-103-55-36.ip.prio>   8   5   330     +1.013      3.585   -492us   230us
mansfield.id.au             8   5   452     -3.838     77.779  +4673us  5252us

Sources:

bas@workstation:~$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* heppen.be                     1   6   377    12   -159us[ -199us] +/- 1795us
^- heppen.be                     2   6   377    13   -671us[ -711us] +/-   18ms
^- voip.sprintweb.be             2   6   377    12    +66us[  +27us] +/-   23ms
^- ip-217-103-55-36.ip.prio>     2   6   333    76   -947us[ -994us] +/-   25ms
^- mansfield.id.au               2   6   377    19    +16ms[  +16ms] +/-  170ms

The error seems to be 170ms, but then, it’s comming from the other side of the globe.
Timing seems pretty good.

So why are the RTT’s comming from? As I see my own monitor scoring it a perfect 20: belgg3-19sfa9p 20

I did an UDP portscan…and yes, it’s the same number:

bas@workstation:~$ sudo nmap -sU -p 123 110.232.114.22 
Starting Nmap 7.94SVN ( https://nmap.org ) at 2025-12-26 17:50 CET
Nmap scan report for mansfield.id.au (110.232.114.22)
Host is up (0.32s latency).

PORT    STATE SERVICE
123/udp open  ntp

Nmap done: 1 IP address (1 host up) scanned in 0.89 seconds

Latency 0.32s. So I believe the value has no meaning, just the error should be given.
If the error magin is too high, I would expect the monitor to be dismissed for that ntp-server as being active.

As I see tittle error for a server so far away, I also fail to see why the ping-time matters.
Looks to me the error in time is far more important, as it does increase with every hop if the peers between the hops are unstable. Apart from that, I do not see why my server should be selected (it isn’t) but yet should test it like twice a day or as backup-tester. As the error between us is far too high in my opinion. As you can see compared to my other servers.

Maybe it’s best to replace the ping-time with error-margin? Just my opinion.

The main difference is that the RTT is easy to measure with only one probe – send one query and record the time it takes for the response to arrive. Chrony’s estimated error, in contrast, is not based on a single measurement but it takes into account multiple recent measurements. The time stability is taken into account.

This means that replacing “ping time” with “error margin” is not quite as straightforward as one might think. There is a relation between RTT and estimated error. I believe changing the monitor ranking to be based on estimated error would not change the order of the selected monitors. Therefore I don’t see a need to put any effort into making such a change.

Edit: Some data regarding our guinea pig NTP server 110.232.114.22 in Australia. The list is primarily sorted by RTT. You’ll notice that the sorting would remain the same if sorted by estimated error. It is possible that with a large enough list of monitoring hosts there could be some differences (ie. some monitoring hosts might swap places in the list), but I believe the differences would most probably be minor. The monitoring hosts in top 10 would still likely be in top 10, regardless of how they are sorted.

Host RTT (ms) Estimated error (ms) Ratio
au 2,68 4,68 0,57
sg 153 80 1,91
ph 181,12 94 1,93
us 198,07 102 1,94
nl 308,82 158 1,95
fi-1 309,12 158 1,96
pl 327,42 167 1,96
be (Bas) 329 170 1,94
fi-2 339,58 172 1,97

Edit2: Another table of results, this time for a server in Germany (185.248.189.10). As with the above table, the sorting remains unaffected whether we sort by RTT or estimated error.

Host RTT (ms) Estimated error (ms) Ratio
nl 8,09 4,04 2,00
pl 24,77 12 2,06
fi1 25,69 13 1,98
fi2 41,60 19 2,19
us 95,58 48 1,99
sg 178,05 81 2,20
ph 201,01 106 1,90
au 290,99 151 1,93

Edit3: Based on these numbers you can estimate the estimated error from RTT with 0.429 * RTT^1.03 + 1.02. YMMV.

The “proposal” also illustrates a certain level of ignorance as to how the RTT is currently actually used by the system. Namely, it is not used for deriving the score itself, but only to select which samples go into score calculation, as well as in monitor selection.

The only thing is, request time/rtt/whatever you call it…does not matter.

What matters is the correct time + stable path.

Monitors should be selected on that only.
Regardless if they are 300ms away, but stay 300ms all the time.

The problem is the wobbly paths. As you can’t correct those.

This happened with the single monitor San Jose before, it has wobbly paths, making your server look good and the other day bad.

Monitors should be selected on stability, not on RTT (request-time) or distance. Why is this so important for some? I do not get it.

If the path is stable at 300ms, the monitor will score you good. If the path is bad, the monitor won’t work for you. Period, end of story.

In short, and put this in your head, it does not matter one bit where the monitor is IF the path is stable. It can and will compensate the path-delay, and stable gives GOOD time.

RTT is bullshit when it’s not stable. Why do you not get this? And takes to long is time-out.
This is not hard to understand.

The problem is unstable paths, regardless the distance. One day 100ms, the next 200ms, you can not calculate good time on such. That is the problem for poor scoring servers.

I understand and almost kind of agree with you in principle, but bear this in mind: The probability of error increases with increasing RTT. In other words, when the RTT is high, it is probable that the estimated error is also high. Here’s a plot of the numbers in my above tables:

Keeping this in mind we can simplify the monitor selection by picking those monitors with a low RTT to the target server, because it is probable that their estimated error is also low. I believe the system essentially works this way now.

In addition to time stability there’s another point of view – time offset caused by asymmetric routing. It is very possible that the RTT to some server stays stable at around 100 ms, but there may be a constant 5 ms offset if the packets are routed so that it takes 45 ms for the request to reach the server and 55 ms for the response to come back. In this case the estimated error could be low, but the monitor would think that the server’s clock is significantly off. I don’t think that’s a good outcome either. The effects and probability of asymmetric routing increase when the monitor and NTP server are further away from each other, ie. higher RTT and more network hops.

Do you have some examples (actual measurements) where you think using the estimated error would be better than using RTT? Using the estimated error instead of RTT would not have changed anything in my above example graph/tables.

1 Like

The point of the monitoring is not to get the best possible score. The point of the monitoring is to assess a server as best as can be done from the point of view of local clients, or at least an approximation of that. As such, a monitor half-way around the world is of little use, even if it is good, because it does not reflect, in general, what local clients would see. Sure, if a far-away monitor scores a server well, then it is likely that also local clients may get good service. But even that is not a given, depending on the local and regional networking.

E.g., a server may be sitting in a datacenter close to an IXP with good international connectivity, so score well with far-away monitors. But maybe a lot of clients are with providers that do not connect well to that IXP. E.g., Deutsche Telekom in Germany has been accused of not providing good connectivity with some other national providers because DTAG does not connect to open IXPs, but is asking money from other providers to connect to them. And if another provider doesn’t pay up, packets take a detour and/or cross underprovisioned and overloaded interconnects, raising the likelihood for packet drops, path asymmetries, and higher jitter.

So despite the good score from a far-away monitor, local clients might still not get good service. A local monitor has a better chance of reflecting that. And even more so in the inverse case, i.e., some far away monitors scoring badly, but local clients getting excellent service.

Thus, the system makes heavy use of the RTT when selecting monitors, but the score itself still has the higher impact. Because it is hard to tell what the reason for a low score is, so to be on the safe side and not kick out servers that actually perform well but where some monitors may have issues, the score gets precedence when selecting monitors. And because of the gaming factor for operators, i.e., a low score is rather demotivating, as is being raised in one way or another time and time again - see the original topic of this thread for example. So having a good score overall, and many good scores at the top of the monitor list, is important motivation for server operators.

Then, in general, every NTP client makes heavy use of the RTT, in the form of the root delay. Sure, a high RTT does not automatically and always mean bad connectivity (aka packet losss), or asymmetry and jitter. But as has been pointed out time and time again, most recently by @avij in his last post above, the chances of something going wrong just increase with the distance/RTT. Not sure why you don’t get that, and that that is one reason why the pool prefers to serve clients from local servers, i.e. currently from within the same country zone. Where the current country zones are a simplification for the system, similar to what @avij highlights, because in many cases, peers in the same country are close to each other - though not always, and being in a different country also doesn’t mean there necessarily is a large distance. But as a quick and especially simple approximation. And the plans to loosen the current strict zone concept will still try to honor that. Just as NTP clients typically prefer upstreams with lower root delay, i.e., lower RTTs between the upstream servers, because it simply reduces the likelihood of issues, and makes for a lower bound on the maximal path asymmetry that can happen. Not sure why you still don’t get that.

And lastly, NTP clients also typically prefer samples with lower RTT over samples from about the same time but with higher RTT, because the assumption is those will typically have lower path asymmetries. Again, not always, but as a rule of thumb, and for simplification, because there is no good way to measure actual path asymmetry in a strict and anonymous client-server relationship. That is what @ask has been explaining above with respect to the system picking the sample with the lowest RTT - just as a typical NTP client would.

I don’t agree with you, sorry.
As you do not know how the peers are from and to your ISP.
I know in Belgium we have often better peers with Germany/France/Holland then we have inside Belgium.

Linux Mint/Ubuntu has a nice tool where you can measure the fastest Apt-source to get updates from.

But to show you how bad it is, and why I run my servers mostly outside Belgium:

So by looking local only, you may not get the best server.
Mostly my ping-times are better outside Belgium then inside.

But those servers do not serve Belgium unless asked to add them.

The only purpose of a monitor is to check if a ntp-server is on time and online, it can not know where the clients are.

I do agree clients do prefer faster responces and steady delay. But that can’t be the job of a monitor, as it simply does not know where the clients are. The monitors, like mine are at the same location as my servers (same machine in fact), so they are excluded on the same network, as they should.

I expect the monitors to score my systems to be online and tick properly. The rest is speculation what the best is for the clients. Testing various servers as backup in my servers, often German servers score better.

To nitpick belatedly, it’s actually:

(server receive time - client transmit time)
+
(client receive time - server transmit time)

In other words, the server’s processing time between receipt of request and transmission of response is excluded. Also, the differing clocks of the server and client cancel out – either difference can be negative, and neither difference represents a one-way time.

You can’t rule that out, as only with HW-stamping you know when the package was received and send by the NIC. As most don’t have HW-stamping it’s the kernel doing the stamping.
That time is higher then the NIC doing it, so processing can’t be excluded.

More then an average guess at best. It really doesn’t have much meaning other then showing a route is poor/bad/good at best.

A stable path at 100ms (via e.g. sat) is far better then a wobbly path of 10ms one second and 1000ms the next. As I presume it will also show in the time-accuracy.

Maybe a min/max RTT should be given, so you can see how stable it is, with an average value to see the middle-value. Because if min/max are wide apart, it makes it an unstable path (unless you did it yourself, reboots, testing etc).