Status page not working?

Looking at the status page: NTP Pool System Status Status

It looks like the monitoring probes value dropped off a cliff a few hours ago, then all values stopped being updated. Is anyone else seeing this?

Is there something else broken here too? We use ntpdate against ntp.pool.org to sync our server times and we had about 10 servers jump to random times in the future this morning.

The first group jumped at around 5:30am US Eastern
The second group jumped at around 8:00am US Eastern
The third group jumped at around 8:30am US Eastern

The first group jumped at around the same time this website started having issues.

1 Like

Do note that the pool consists of thousands of NTP servers run by volunteers. Maybe the clock on one of them is not synced properly. Normally this would be detected by pool monitoring and the offending server would get removed from the pool automatically. The current outage may affect monitoring as well, leaving such falsetickers serving time in the pool.

It would help if you could identify the IP address of the NTP server that served incorrect time.

We faced this issue as well. Here is at least one of those way out-of-sync server info:
2.xxxxxxxxxx.pool.ntp.org polled 6m9s ago
selected 1h16m29s ago
delta +0.001±0.004
address 45.55.126.202
stratum 2
action jump to 2037-03-02 06:14:08.717 +0000 UTC

1 Like

As of now, that NTP server seems to serve correct time. It may have served incorrect time in the past, though.

The general recommendation is to use multiple NTP sources to avoid situations like this. If you have several servers that need correct time, it would be a good idea to set up an internal NTP server within your LAN (with chrony or ntpd or what have you) and configure your other servers to sync the time from your internal NTP server. The internal NTP server can and should use multiple NTP sources, including servers from the NTP pool. This would also reduce the load placed on NTP Pool resources.

I use an independent system to monitor NTP Pool servers. 45.55.126.202 has had good accuracy during July 2025, no issues seen.

We don’t have all the logs, but we do use ntp pool and typical ntp infra (which would have picked up the best time from across all reporting ntp instances and hosts, [0-3].xxxxxxxxxx.pool.ntp.org). Some of the ntp instances reported time as futuristic as 2044-xx-xx and corrupted our devices time causing wide spread outage. Just to be clear, the corrupted time did get recover few hrs back, but I am not sure what steps ntpool.org is taking to avoid such repeat.

1 Like

What is the type of the NTP client you are using? In what country the NTP clients are residing?

We use ntpdate, and the impacted servers were spread across the central United States

Which version of ntpdate? Do you have logs for which NTP servers were used? What you saw is surprising and unusual.

@jsh1234 We have a lot of monitoring, including monitoring setup independently from the system that was down.

At least ntpdate classic should be somewhat immune to issues with a single, or a small subset of servers. While deprecated, and not being a full NTP implementation, it queries all IP addresses behind a name, and seems to apply some sanity checking before selecting a source to get time from.

debian@debian:~$ ntpdate -q 2.pool.ntp.org
server 2001:ba0:21f:4900::1, stratum 2, offset -0.000296, delay 0.02628
server 2001:ba0:21f:4900::4, stratum 2, offset -0.000329, delay 0.02625
server 2606:4700:f1::1, stratum 3, offset +0.001125, delay 0.06578
server 2a0d:5440::24, stratum 2, offset +0.000414, delay 0.05992
server 5.250.184.159, stratum 2, offset -0.000007, delay 0.02568
server 94.143.139.219, stratum 2, offset +0.000291, delay 0.02687
server 195.95.153.43, stratum 1, offset +0.000096, delay 0.03296
server 178.215.228.24, stratum 2, offset -0.000033, delay 0.05917
 4 Jul 08:50:45 ntpdate[2374419]: adjust time server 195.95.153.43 offset +0.000096 sec
debian@debian:~$ ntpdate -v
 4 Jul 08:57:21 ntpdate[2374495]: ntpdate 4.2.8p18@1.4062 Mon Jul  8 17:59:00 UTC 2024 (2)
 4 Jul 08:57:21 ntpdate[2374495]: no servers can be used, exiting

Thanks. And what parameters were used for the ntpdate command?

We’re currently using ntpdate 4.2.8p15@1.3728-o. The command we were using was:

ntpdate -u pool.ntp.org

Unfortunately I don’t think there are any records of which servers were used when the time jumped. We’ve since amended the command to include the -s flag to log this data to syslog.

Not sure what your reason is for using ntpdate vs having ntpd or chronyd running in the background continuously. So not sure whether using ntpd with the -q option (“Set the time and quit.”) instead might be an option. The validation done by ntpdate seems somewhat simplistic. I would expect ntpd to be more thorough, even when run with the -q option, but admittedly haven’t verified that myself.

chronyd similarly supports a -q option to set time and quit, but I haven’t used that myself yet, either, at least not for setting time.

I think this is the first and most important issue to understand for @bws. Typical practice would be to run ntpd or chrony on each machine, so its clock can be gently steered towards accurate time rather than violently “stepped” every once in a while by ntpdate, which might lead to time warps where your servers logs show the same time stamps occurring for two different real-world times, thanks you ntpdate stepping the clock back a bit.

Using a pool of thousands of volunteer servers with ntpdate or a SNTP client to jam in a new time on a regular basis and then complaining to the pool volunteers when one of them apparently gave some of your servers a bad estimate of time is just asking for trouble. It’s not as simple as your server asking the pool server “what time is it” and setting the clock to that. There’s a NTP dance involved, and as mentioned, who knows which of thousands of IPv4 and thousands more IPV6 servers your servers were querying.

Having a whole bunch of your machines hitting the pool is also pretty unfriendly to the pool volunteers. Run ntpd or chrony on one or a few servers querying multiple pool servers, and have the rest of your machines sync to them. Then if your machines have a problem, you can investigate why your NTP server(s) and your NTP clients are misbehaving.

2 Likes

FWIW, in my monitoring data I see that 23.155.40.38/2602:2eb:2:95:1234:5678:9abc:def0 had a strange fault around 2025-07-03 9:00-12:30 UTC, where either its RX or TX timestamp jumped a few decades into future. This would cause a very large NTP offset and delay. I’m not sure if ntpdate rejects samples with large delay.

1 Like

The graph for that server looks pretty amazing: pool.ntp.org: Statistics for 2602:2eb:2:95:1234:5678:9abc:def0

The IPv4 server was behaving appropriately a little longer, but was turned off as soon as the database came back up the other day.

I agree with you that ntpd is the right answer to this problem, but hasn’t worked consistently for us across our systems, whereas ntpdate has worked pretty flawlessly for us up until last Thursday.

We operate several hundred servers, most of which are virtualized in datacenters around the world. We do not control the hypervisor, and not all of our providers are excellent at their jobs. So we have to deal with some challenging environments, including:

  • Time skewing faster than NTPd can correct
  • Outgoing traffic from port 123 being blocked (ntpdate -u works around this)

@davehart my post was not a complaint, I was seeking information about the outage and how it may have caused time on my servers to jump so drastically, or if there was some other issue I needed to quickly investigate.

I realize you have far more experience with NTP than I do (I read your profile), but I think you’re over stating the negative impacts of ntpdate. For our business, we’ve decided to accept server clocks with an accuracy of +/- 5 seconds, and we don’t care if log lines have duplicate timestamps. The world isn’t perfect, and we’re willing to accept that.

As far as the impact on the NTP pool, my understanding is that ntpd will vary its check interval anywhere from 64 to 1024 seconds. Using ntpdate every 30 minutes (1800 seconds) is less frequent, or am I misunderstanding?

@mlichvar thanks for sharing the server you found. We don’t have logs so we won’t know if that’s the one our servers used, but interesting to see what can happen

It’s complicated! :slight_smile:

ntpd (or chronyd and other similar implementations) will do much less regular DNS queries, but yes, likely more NTP queries.

The DNS infrastructure is less distributed and has less capacity; but is also benefiting from massive caching by the DNS resolvers that people use, so it’s unclear if an extra query one of your systems make actually makes it to the DNS infrastructure or gets answered by a cache.