Suggestions for the offset threshold for pool server

Sebhoster · December 31, 2024, 2:32pm

Hi everyone,

currently, the pool monitoring and scoring system in regards to the time offset for a otherwise functional server boils down to the magic number of 125 milliseconds. A server persistently below this threshold will be included in the pool, a server persistently above it will drop out.

In a recent pull request for the ntp pool code, @ask mentioned that it might be an opportunity to tighten this requirement.

So: What are your opinions on what the maximum allowed constant offset should be?

Some food for thought:

As mentioned above, the current value is 125ms. I was not able to trace back where this number came from - if anyone knows, that would be interesting.
I did some basic datascience on an hour of monitoring data:
- more than 90% of the monitored offsets were below 10ms
- more than 95% of the monitored offsets were below 20ms
- more than 98% of the monitored offsets were below 50ms

I also read through this post that had pretty much the same question in it:

What level of accuracy should users expect from sources like time.windows.com, time.apple.com, or even our beloved NTP pool?

but I would summarize the answers there with “it depends”.

My personal take: If my servers would show offsets beyond 20ms, I’d investigate it. Allowing some leeway for long routes and networking strangeness, I’d say a server in the pool should be able to keep the offset below 50ms.

marco.davids · December 31, 2024, 5:40pm

Interesting research and good to know that 90% is below 10 ms!

If that is indeed a concern, the NTP Pool might not be the best option. I for one don’t mind if my smart doorbell is off by 125 ms.

Nevertheless, you’re putting forward a valid question.

Kets_One · December 31, 2024, 6:39pm

hi seb

Is there any difference between ipv4 and ipv6 when looking at the dataset?

How big is this dataset anyway and is is a representative sample?

Personally i have been struggling to get solid low-ping servers, mainly because of unwilling router.
50 ms might just be doable for me.

ask · December 31, 2024, 8:03pm

The current generous allowance was from when there was just one monitor. I think we can have more confidence in the offsets now with more monitors.

@Sebhoster The system doesn’t directly track “active” monitors in the monitoring logs, but if you are willing to do more analytics I’m happy to help do what I can to make sure you have enough data.

@Kets_One longer round trip times doesn’t necessarily mean the time offsets are higher (as you probably know!)

Sebhoster · December 31, 2024, 9:36pm

To clarify: If the ntp servers that I operate in the pool would show offsets beyond 20ms in the pool monitoring, I would investigate since I know that this is outside the range that they usually achieve.

Sebhoster · December 31, 2024, 10:00pm

Good questions.

The dataset was about 180,000 measurements. This fits the expectations - the statuspage logs somewhere around 3000 checks per minute, times 60 minutes we get to about 180,000.

I dumped one hour of monitoring offset measurements from the pool monitoring database, filtered out “null”, filtered out any measurements with an error message, and filtered out the monitor_id that is reserved for the overall score since that monitor does not do measurements itself. I’d consider it kind of representative, since it contains a lot of data from all monitors and all servers and the point in time was chosen at random. However, it is biased towards lower offsets since the monitors do more measurements for servers that they are “active” for, which is determined by better scores, which is influenced by lower offsets.

I have not looked at ipv4 vs ipv6, or any other possible correlation. That is an interesting idea, but in my opinion not relevant for the question at hand since we will have both address types in the pool for the foreseeable future.

Kets_One · January 1, 2025, 3:26pm

Thanks @Sebhoster.
@ask Ofcourse you are totally right. My mistake.
In that case I expect that offset is much lower than 125ms for servers under my control (extremes are within a bandwidth of +4ms and -4ms of GPS time) . What is a good way of finding this out an monitoring this for myself?

Sebhoster · January 2, 2025, 3:57pm

The pool monitoring already does this for you. Just look at the page for your server under https://manage.ntppool.org . The graph there shows you the offset measurements from the different monitors. Additionally, at the bottom of the page you get a link to the “csv protocol”, which is basically an API where you can easily access the recent monitoring data from all monitors for this server

Kets_One · January 2, 2025, 8:44pm

Ah thanks what i already figured.
To come back to my first post in the conversation above, thats what i mistakenly referred to as “ping”, i meant offset ofcourse.

marco.davids · January 2, 2025, 10:57pm

A question comes to mind: how do the monitor servers know what time it is? And how reliable is their time?

avij · January 2, 2025, 11:21pm

This has been thought out. First of all, the monitor servers should sync their clock from reliable sources, like from GPS or other good NTP servers. There’s also a safeguard mechanism in the monitoring software that periodically checks the time from a dozen (hardcoded?) well-known NTP servers. If there is a large enough time difference (>10ms) between the monitor server and several reference NTP servers, the monitor won’t check any pool NTP servers until the clock is in sync again.

davehart · January 3, 2025, 5:21am

To add to what @avij said, the quality of time the monitors are basing their comparisons on is vetted by screening potential monitor operators by having them provide pool server(s) for some time first. This is speculation.

Sebhoster · January 3, 2025, 12:19pm

Thank you to everyone who has replied so far.

The 10ms offset tolerance of the monitoring servers gives us a lower bound for the offset tolerance for the ntp servers.

Now back to the initial question - what offset would you consider the threshold where a ntp server is not viable for the pool?

bjh21 · January 3, 2025, 12:46pm

I think somewhere around 100 ms is probably appropriate. I see the role of the Pool as being “better than nothing”, or rather better than not using NTP (or similar) at all. Without NTP, a human administrator might set a system clock by hand from a time signal, and they can probably get within about 100 ms like that. So if an NTP server is better than a human sitting at a keyboard, I think it’s good enough for the Pool.

Kets_One · January 3, 2025, 5:42pm

With increased computing demands (often) come increased timing demands.
In my opinion we should aim a little lower than 100ms. This would be hardly an improvement beyond the current 125ms.

More like 50-75ms would be acceptable.

sebi · January 5, 2025, 8:16pm

50 ms sounds really good to me. Having a very small number of servers sometimes drop out for a while because their offset is too bad sounds like a sensible price to pay for protecting pool users from unexpectedly high server offsets. If a server has offsets that are worse than more than 98 % of all measured offsets for a long enough time to actually drop out of the pool, the operator should really take a look at that.

mnordhoff · January 8, 2025, 9:43pm

What’s a reasonable worst-case scenario? What if, for example, a small country with some NTP servers and no monitors has a bad fiber cut and their international connectivity gets congested or is routed strangely?

Many of the monitors might show significant latency and offsets.

(Stratum 2 servers in the country using foreign upstream servers could also have significantly bad time.)

ask · January 9, 2025, 2:50am

We could mitigate that at the expense of having it be more complicated to figure out what’s going on. For example make the maximum allowed offset be related to the median (or maximum) latency of the active monitors, or something like that.

(Which monitors are “active” is in part based on their latency to the server.)

matuskral · January 9, 2025, 2:28pm

if we have (or could have) the confidence areas calculated realtime, then the offset would define itself as the current 95/98.

(without putting any special thoughts about corner cases, … but should play well - at least for 95% of cases :D)

mnordhoff · January 10, 2025, 12:20am

That’s an interesting idea! Spitballing, the rules could be something roughly like:

Maximum offset is 25 ms if latency is <= 25 ms
Maximum offset == latency if latency is 25-100 ms
Maximum offset is 100 ms if latency is > 100 ms

But IDK and I don’t have evidence suggesting that’s a good idea.

And complicated rules are not necessarily a good idea.

Topic		Replies	Views
Ntppool offset monitoring graph and offset parameter in chrony Server operators	8	1759	December 2, 2020
No PR action for project and dramatic reduction in the number of active servers	61	2562	January 22, 2020
Adding New Monitor Pool Development monitoring	18	1502	April 17, 2024
Server access dead time characterisation Pool Development	17	1984	March 6, 2021
Add server produces: ""Could not check NTP status Server operators	36	1556	November 5, 2020

Suggestions for the offset threshold for pool server

Related topics