What makes an NTP server a good server, when is it 'good enough'?

So… in theory, NTP can achieve quite high accuracy—often far beyond what’s needed for most applications (like keeping my calendar in sync). But sometimes, applications demand much higher accuracy and precision. That got me thinking the other day… Is there anything definitive we can say about this? Can we really make any meaningful claims about the accuracy of public NTP servers? What level of accuracy should users expect from sources like time.windows.com, time.apple.com, or even our beloved NTP pool?

Any thoughts?

I think the same way, but I do not know such application.
Would you enumerate some application, just for example?

Assuming the source servers and their upstream NTP servers to the reference clocks are implementing the protocol correctly, and that the reference clocks are correctly configured, the root dispersion of your local NTP daemon should give you an error bound. However in practice with anonymous servers and mysterious upstream sources they’re using, and the ease of misconfiguring a stratum 1 server, there’s not much you can claim beyond what the pool monitoring system shows you.

I’d be pretty confident pointing a ntpd at time.cloudflare.com alone that my ntpd’s root dispersion was a good error bound at any given instant. Sadly ntpd doesn’t offer a way to log the root dispersion over time, though you could do it programmatically using

ntpq -c "rv rootdisp"

in a script loop pretty easily.

I lost my faith in those a bit a while ago, occasionally seeing them be off by 30ms, and more, and also double-digit jitter, lasting for several hours. And in different regions of the world. I guess maybe owing to them being anycast, and thus maybe the actual instances reachable changing all the time. Or just the routing being off/asymmetric for extended periods of time, regardless of the anycast aspect.

Anyway, such networking axpects would be my biggest concern, besides the server implementation and configuration ones. Not sure how much of that would be reflected in metrics such as the dispersion (math isn’t a strong suit of mine, so I think I have some intuition, but could be way off).

Or server overload, which in my area is not a concern, but as various threads in this forum attest to, very common in other areas.

Looking at expectations towards the pool from another angle, I always think about what triggered its creation, and what is still highlighted on the pool’s documentation pages: that the original (and maybe still today main) purpose was to share load.

So I’d indeed think the pool is good enough for getting time for my calendar, or my web cam or other IoT/smart home device, or even my PC. But if I have a specific use case with certain requirements, not sure I’d turn to the pool. And it is not so much about accuracy as such, but more about reliability and availability. Especially in zones that are not as comfortably equipped with servers as my own “home zone”.


There are some obvious examples on the far end of the spectrum, like ‘high frequency trading’, TDoA (Time Difference of Arrival) in LoRaWAN and others, that require very accurate time stamps. Also some academic applications exist that rely on extremely accurate clocks. But naturally they will not use service like the NTP pool. Often they don’t even use NTP at all (instead they use PTP or White Rabbit for instance).

On the other end of the spectrum are applications that need ‘some understanding’ of the right time, like DNSSEC-validation, TOTP and many others. No one cares if the time is seconds, or even minutes wrong.

So the interesting area, in relation to this topic, is in between these two extremes. Do examples exist there? I think; yes.

For instance; precise correlation of system logs (for forensic research or debugging) benefits from pretty good accuracy of time. Another example is a distributed, fair ‘first come first served’-application, where many people try to claim something at about the same time.

So yes, I believe these examples exist.

But I haven’t seen many documents that provide guidance here. For example some good recommendations to network engineers or SOC’s (MiFID II has some hard numbers I know of, but that’s on the high end of the spectrum, not in the ‘grey area’).

Long story short: where in the spectrum are public NTP servers, in particular the NTP pool ? What is the recommended or intended scope of application of the pool? Is it safe to use in enterprise networks? Are any claims being made by the pool? What are the thresholds of the monitoring system of the pool for instance? Is it fair to complain if the pool is >30 ms off?

This bit at least I can answer: the scoring code has a sliding scale of goodness based on the server’s offset relative to the monitor. Offsets below 75 ms get no penalty. Offsets between 75 ms and 250 ms cause the server’s score to rise more slowly than usual. Offsets between 250 ms and 3 s cause the server’s score to drop so it will eventually be removed from the pool, and offsets over 3 s cause the server’s score to be capped at -20, which will cause it to be removed from the pool as soon as enough monitors agree that there’s a problem.


Intentions are mostly a curiosity of history. In practice, most of the load I see seems to be coming from Linux VMs, many of them on AWS. I’m glad AWS offers internal NTP servers and is starting to offer PTP on some VM types, but I wish their recommended images had been using those internal NTP servers for the last few years.


Yes, banking, stock-market, etc.

But they use their own clocks, typical far more precise then we need.

There are ham-radio-digital-transmissions that need ‘high’ accuracy, but you talk about 0.1 sec accuracy.
They are called JT-65, FT4, FT8, WSPR etc.

They need this ‘accuracy’ because those signals stop and start all over the world at the same time. They do not contain start-stop-bits, nor ecc-correction.
As such the computer needs to know when it starts and stops.

With these signals you can cover large distances without the need of much transmitting power, where WSPR is the max.
To give an idea, on the 7MHz band, you can transmit (my record) 6800km with just 1 milli-Watt of power.

As you know, typical 2.4GHz Wifi is already 100 to 1000mW!!

So yes, Ham radio needs ‘high’ accuracy, and what we profide is more then accurate enough.

When they need higher accuracy, typical they use Bodnar to get an accurate frequency…like they do here:

To make the receiver frequency accurate and stable. But that is extreme :rofl:

Google Spanner for example

1 Like

In another thread @stevesommars linked to an interesting paper of his that explored some aspects of long-term monitoring:

One thing I recall noticing is that occasionally some servers gave very incorrect time, off by years.

But I feel like you could, with enough data, probably make a statement like, “The 90th-percentile accuracy of pool servers was within 250ms” or something. (To be clear, that statistic is a completely made-up example.)

I wonder if the pool monitoring data used for graphs could give some answers here?

Dave’s comment about root dispersion as a reasonable benchmark, and my side project to get all the stuff I manage into Ansible, led me to grab this data:

% ansible all -m ansible.builtin.shell -a "sudo chronyc tracking | grep dispersion"
nyc-2g | CHANGED | rc=0 >>
Root dispersion : 0.004316320 seconds
mia-2g | CHANGED | rc=0 >>
Root dispersion : 0.001929239 seconds
las-2g | CHANGED | rc=0 >>
Root dispersion : 0.001757243 seconds
lux-2g | CHANGED | rc=0 >>
Root dispersion : 0.000888808 seconds
mumbai-1g | CHANGED | rc=0 >>
Root dispersion : 0.001670093 seconds
korea-1g | CHANGED | rc=0 >>
Root dispersion : 0.000842893 seconds
singapore-ls | CHANGED | rc=0 >>
Root dispersion : 0.000610336 seconds
capetown | CHANGED | rc=0 >>
Root dispersion : 0.001687264 seconds
malaysia-ptp | CHANGED | rc=0 >>
Root dispersion : 0.000001237 seconds
sao-paulo | CHANGED | rc=0 >>
Root dispersion : 0.001140296 seconds
bangalore-do | CHANGED | rc=0 >>
Root dispersion : 0.002006466 seconds
singapore-do | CHANGED | rc=0 >>
Root dispersion : 0.001366011 seconds

With the exception of malaysia-ptp, these are all stratum 2+ servers getting time over the Internet. At the same time, they’re all VMs inside data centers where I’ve taken some care to configure them with several good, nearby NTP sources. Someone using wifi on a laptop through their cable modem would probably have worse numbers.

There are many aspects to describing a “good NTP server.” Briefly there are NTP servers that many people will agree are “good” and others that many people would agree are “bad.” Within the gray area an NTP server may suffice for some uses, but not others. A client with stringent needs may find the NTP Pool completely inadequate.

One criteria should be: the NTP response time stamp’s consistency with the root dispersion.

Also, what if the server does not support NTPv1 ? I believe ntpd-rs is an example of this.

Are NTP-versions a requirement for participating in the pool?

Given the NTP timestamp format will overflow in 2036, should the Pool maintain backward compatibility to old clients not aware of NTP Era and date format introduced in NTPv4?

NTPv4 will be 26 years old by 2036. Should there be any client relying on any other version then, it’ll deserve everything that comes its way.

1 Like

The pool monitoring expects mode 3 messages from the servers. The mode field was introduced in NTPv2, so that is the minimum version for a server to be able to join the pool.

Since the information exchanged between server and client does not include the era number, there is nothing that can be done from the server side. Either the client is able to handle the overflow, or it isn’t.

Version compatibility will probably become an interesting topic when NTPv5 gets finished, the current draft contains changes to the message format.