Uk.pool.ntp.org delivering wrong time

Hi there. Since Friday morning we’ve been getting incorrect times from uk.pool.ntp.org across multiple sites. We’ve had time that has been out by 62 years in some cases, sometimes 61 minutes, other times just a few seconds. It seems to fix itself the next time we ask for the time (every minute) but by that point the damage has been done in our software (which relies on second accurate clocks).

Has anyone else seen this?

Thanks.

I monitor the NTP pool closely and haven’t seen any noticeable recent changes in UK servers. Please provide some NTP server IP addresses and I’ll run detailed checks on my log files.

@j3132uk it sounds like a client that’s reacting poorly to, for example, rate limit responses.

What is the client software? As @stevesommars said, if you can provide IPs of the servers we can check the monitoring data.

Hi, we are also experiencing issues with incorrect time since 09/08.

12/08/2024 23:55 Info : The time has been corrected by 450.649 seconds (Clock stepped)
12/08/2024 23:48 Info : SNTP Client connecting to 3.uk.pool.ntp.org
12/08/2024 23:33 Info : The time has been corrected by -450.081 seconds (Clock stepped)
12/08/2024 23:40 Info : SNTP Client connecting to 3.uk.pool.ntp.org

As was mentioned to the original poster, you need to tell us the IP addresses of the servers you are querying. We don’t know which server you queried at that exact time as this is a pool of volunteers and there’s multiple of them offered to you.

Also, you are using SNTP not NTP. SNTP is inherently vulnerable to being served an incorrect time from one source and having no choice but to believe it. If you were using an NTP client then it would consult several sources and detect bad clocks.

I recommend that you configure an NTP client for your platform if possible. What is the platform you are using here?

In my logs I see some servers with an offset of 450 and 400 seconds (all in 155.138.0.0/16), but they are in the Canadian zone and they were removed from the pool on the 6th.

Well, 57.128.182.127 has very strange replies. I did not do packet capture yet, my post is just a hint for a possible broken server (or connection to it).

tumbleweed:~ # ntpdate -q -d 57.128.182.127
13 Aug 13:48:53 ntpdate[5175]: ntpdate 4.2.8p17@1.4004-o Thu Feb 22 14:01:27 UTC 2024 (1)
transmit(57.128.182.127)
receive(57.128.182.127)
receive: packet length 0
57.128.182.127: Server dropped: no data

13 Aug 13:48:55 ntpdate[5175]: no server suitable for synchronization found
tumbleweed:~ #

Two problems were reported in this thread. We don’t know whether they have related causes.

Dave2724’s NTP client may be running on a VM. I’ve seen time jumps on a couple of my Digital Ocean VMs.
Oct 26 12:23:07 chronyd[1852701]: Forward time jump detected!
Oct 26 12:25:13 chronyd[1852701]: System clock wrong by 42.949713 seconds
Upon investigation the VM stopped for 43 seconds for an unknown cause. The time daemon (chrony here) was able to correct the time. Bottom line, this was a VM problem, not a chrony/ntpd or NTP pool problem.

I looked at over 100 uk.pool.ntp.org NTP servers from Aug 1 to Aug 12 and saw no systematic failures.

Sorry for the lack of reply to the questions. My account on the forum was locked for some reason. We’re hitting 195.171.43.10. I can see the time has drifted by 713 seconds today

A zero-length response from this server could explain it. A client which doesn’t check the length correctly, could be getting a random offset calculated from random values in memory (e.g. stack) where it expects timestamps from the response.

1 Like

May the monitoring system check for invalid reply packets, and remove broken servers from the pool, just for the sanity of the service provided by the pool?

Hi grifferz, we are using Windows platform with third-party NTP client app. I cannot see the IP address of the server the app is querying.

I agree concerning 57.128.182.127 : Responses with no payload. I first saw this at 2024-08-08 22:48. This appears to be a problem with the NTP server.

Since the response has no payload NTP clients should treat this as “no response”

My monitoring indicated that 195.171.43.10 was within 1-2 msec of UTC since August 2.

Please describe your host and NTP software.

Your log indicates it’s doing an SNTP (Simple NTP) query, which isn’t NTP. Even if we do find a server in the pool that has wrong time that the pool’s monitor somehow didn’t detect, you really should be using NTP not SNTP as otherwise you are vulnerable to this sort of thing happening again.

I feel sure there are NTP clients for Windows, but I don’t use Windows so I couldn’t recommend one. Maybe even what you are using can be told to do NTP as opposed to SNTP.

Thanks grifferz, we will surely look into moving away from SNTP.

ntpd classic also runs on Windows, @davehart is looking after that. Meinberg used to build it for Windows and provide a packaged version of that build. But at this time, that build is lagging behind ntpd classic’s published releases. At some point, I even built it myself, so can’t be that hard (used Meinberg’s installer to get things set up, e.g., starting it as a service, then replaced the binaries as new ntpd releases became available)…

Recent versions of Windows natively support higher acuracy time synchronization than older ones. But I don’t know whether that is a full NTP client (in the sense of tracking multiple upstream time sources simultaneously and deriving “best” time from them, weeding out potential outliers/falsetickers), or again just an SNTP client (in the sense of relying on a single time source only at one point in time, and blindly trusting that one). Or whether in the latter case, they have some sanity checks that would reject the odd implausible time sample from an upstream server, avoiding local time jumping forth and back by several seconds (“spike protection”).

Looks like the pool monitoring system is interpreting this as “perfect” offset of exactly zero seconds

Given the relatively high “bandwidth”/DNS share (4%/2.7%), I see how SNTP clients can get confused rather frequently…

1 Like

Ouch! Miroslav and PoolMUC have identified a stinker of a bug in the monitoring. As you can see from the .csv PoolMUC points to:

ts_epoch,ts,offset,step,score,monitor_id,monitor_name,leap,error
1723614741,2024-08-14 05:52:21,,1,19.00510025,24,recentmedian,,
1723614741,2024-08-14 05:52:21,0,1,18.477138519,56,plszy1-361g4fk,,

The observed offset should be appearing between 05:52:21 and 1.

There is no offset provided by the server with a zero-payload response, but the monitors are treating that as an impossibly perfect sync resulting in a zero measured offset. Better would be to treat it as if the response showed leap 0b11 (3). It may be desirable to log it with a unique error message to make it distinguishable from an unsync’ed server, but the key point is it should be treated as unusable.

@Ask or other monitor code maintainers please take a look.

Short of going for a full-blown NTP implementation, there are two things where the currently used SNTP client implementation can be improved, or needs to be fixed:

  • It could do some plausibilty checks of received time samples instead of blindly trusting them, and blindly setting the clock to whatever is received. Apart from the sophisticated source grooming mechanisms of full-blown NTP implementations, they also have a (typically configurable) mechanism to prevent large clock jumps, except at startup (where initially stepping the clock may be necessary depending on RTC availability/accuracy). Something along those lines could also be done in an SNTP implementation. Maybe a bit simpler than what full-blown NTP clients do in that respect.
  • While it is not clear yet whether the “empty” NTP packets from that one server really caused the jumps, it might the be cautious thing to do to verify that the current SNTP implementation handles such cases properly, and fix it if not.