Strange Behavior of 0.debian.pool.ntp.org (84.255.251.205:123)

Hello everyone,

I want to report an issue we’ve encountered with one of the Debian NTP pool servers: 0.debian.pool.ntp.org (84.255.251.205:123).

Our company operates multiple IoT devices, all of which are configured to sync their system time via NTP. However, we’ve observed that some of these devices received completely incorrect timestamps upon their first-time synchronization.

Each row in the table below represents a different device. The first column indicates the actual timestamp when synchronization occurred, while the second column shows the system time after synchronization with the NTP server.

+------------------------+------------------------+
|      synced at        |       synced to        |
+------------------------+------------------------+
| 20.1.2024 @ 02:01:54  | 01.12.2024 @ 14:28:57  |
| 27.1.2025 @ 02:03:47  | 01.12.2024 @ 14:28:56  |
| 27.1.2025 @ 02:03:50  | 01.12.2024 @ 14:29:00  |
| 27.1.2025 @ 02:04:08  | 01.12.2024 @ 14:28:41  |
| 27.1.2025 @ 02:04:10  | 01.12.2024 @ 14:28:42  |
| 27.1.2025 @ 02:04:12  | 01.12.2024 @ 14:28:45  |
| 27.1.2025 @ 02:04:13  | 01.12.2024 @ 14:28:47  |
| 27.1.2025 @ 02:04:14  | 01.12.2024 @ 14:28:47  |
| 28.1.2025 @ 17:50:14  | 01.12.2024 @ 14:29:27  |
+------------------------+------------------------+

For example, the second to last device in the table logged the following message:

Dec 01 14:29:27 xxx systemd-timesyncd[23348]: Synchronized to time server for the first time 84.255.251.205:123 (0.debian.pool.ntp.org).

However, this log was actually recorded two days ago, meaning the system time was set incorrectly by nearly two months.

I understand that Debian NTP pool servers are not intended for production environments, but I find it surprising that this kind of issue can even occur. I’d like to understand more about how this is possible and whether similar behavior could happen with other pools, such as europe.pool.ntp.org.

Thanks in advance for any insights!

(this was flagged by Discourse and I just unflagged it; apologies @johnny141).

The monitoring system has seen twice where this server was wildly inaccurate in January (and some dozen times over the years; out of ~1.3 million checks). Overwhelmingly the server is working well, but it’s also consistently (several times a year) briefly wildly inaccurate.

@apuls, will you email the operator and see if they’d be up for sharing what software/hardware they are running and maybe join us on the forum here to see if we can figure out what’s going on?

1 Like

To answer the other question: No, changing the time source to europe.pool.ntp.org would not have helped. NTP server operators do not get to choose if their servers are included in some specific vendor zone, nor does the NTP Pool management make such decisions. In other words, this particular server is not limited to serving time only in the “Debian NTP Pool”. This particular server has been listed in the Europe pool as well.

It would be good practice to have a local NTP server, pulling time from multiple sources, and using that to serve time to your local clients. That way, if one upstream server does this, it will be ignored.
It also wastes less bandwidth and server load for the volunteers in the pool. Some of the servers may also rate limit you if there are multiple NTP clients coming from a single IP address, some more aggressively than others, meaning some of the NTP clients might not manage to get the time at all for a while.

As noted, there is no practical difference between the NTP pool and the Debian NTP pool at this time.

More importantly, the fundamental problem you have is not with the pool, but with the timesyncd software that is being used on your IoT devices to set the clock. It is trusting that none of the thousands of volunteer-run and unauthenticated NTP servers in the pool is ever reporting time that is wildly wrong for even a brief time. This generally works, but is very fragile.

If your IoT devices have battery-backed RTC chips and boot with time correct within a few seconds, I suggest disabling timesyncd in favor of a full-fledged NTP client such as ntpd or chrony which will ensure agreement of multiple servers before adjusting the local clock.

If your devices boot with a time of 1970 or 1980, you can still get their clock quickly set early in boot without this single-shot fragility, but it might take a bit of work to understand of your OS’ boot scripts to find a good place to use ntpd -gq 0.debian.pool.ntp.org. 1.debian.pool.ntp.org. 2.debian.pool.ntp.org. 3.debian.pool.ntp.org. or the similar but unmaintained ntpdate 0.debian.pool.ntp.org. 1.debian.pool.ntp.org. 2.debian.pool.ntp.org. 3.debian.pool.ntp.org.

There is undoubtedly a similar chrony invocation available.

2 Likes

Probably irrelevant, but I was intrigued by the reverse DNS of this IP address:

4.pool.chrony.eu.

@ask Sure :slight_smile:
@marco.davids yeah had done this too. Looks like the “chrony pool project”

1 Like

The general recommendation is to synchronize using disparate time sources.

I monitor the NTP Pool, including 84.255.251.205, independent of the pool’s distributed monitors. This server has a behavior which might be related to the report.

Usually this server runs at stratum 1, occasionally dropping to stratum 2/3. The reported time accuracy is good. Infrequently though, the server responds indicating stratum 6, reference ID=127.127.1.1 The reported time was in error by days (as much as 92 days) when stratum 6 NTP responses were seen.

The common convention for the Reference ID is that 127.127.1.1 means that the NTP server is synchronizing to host’s clock and is not able to get time from from a local clock (e.g., GNSS) or other NTP servers. I don’t know why the NTP server’s administrator chose stratum 6 for this situation. It’s more common to see stratum 10 returned.

1 Like

10 is what I recommend for orphan mode, which is much preferable to the local clock driver. 127.127.1.1 reads as the local clock driver to me, which ntpd treats as a zero-delay, minimum-dispersion source of time when no other source is available. It has been deprecated for over a decade, but the message doesn’t seem to reach those promulgating example ntp.conf files. Orphan mode won’t show synchronized if it’s never found a real source, the local driver will. Orphan mode continues to increment the dispersion, the local clock driver does not.

If the server didn’t indicate it’s synchronized, how would the clients synchronize to it? IIRC, with a single ntpd server the main difference between the local driver and orphan configuration is the length of the delay before it activates (4 poll intervals vs 300 seconds?).

In chrony there is the local reference mode enabled by the local directive. It’s not a driver. It has an orphan option for compatibility with the ntpd orphan-specific source selection. The documentation clearly says it makes the server appear synchronized when it’s not. Enabling it on a public server is a bad idea.

You’re correct, mlichvar, I was mistaken in saying orphan mode won’t show synchronized if it has never been.

Orphan mode should not be used on public-facing NTP servers. Private servers can do whatever they want.

1 Like

Hello All,

Boris here. I maintain this set of public servers at https://chrony.eu. From time to time server went out of sync. This cluster of servers has each own GPS time module and following sources:

# chronyc -n sources

MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS0                          0   5    37    48   -965ns[-4072ns] +/-  104ns
#x GPS0                          6   1   377     2   +167ms[ +167ms] +/-   25ms
#+ GPS1                          1   4   377    22  -2984ns[-2984ns] +/-  116ns
^- 84.255.251.188                2   5   377    61    -55us[  -55us] +/- 2154us
^- 2a01:260:4068:1::188          2   5   377    32    -21us[  -21us] +/- 2092us
^- 84.255.251.189                2   5   377    62   -189us[ -189us] +/- 1958us
^- 2a01:260:4068:1::189          2   5   377    65   -468us[ -471us] +/- 1683us
^- 84.255.251.204                1   5   377    60   -444ns[ -444ns] +/-   63us
^- 2a01:260:4057:4::204          1   5   377    48    +17us[  +17us] +/-  103us
^- 193.2.1.117                   1   5   377    28   -242us[ -242us] +/-  976us
^- 2001:1470:8000::117           1   6    77    31   -206us[ -206us] +/- 1503us
^- 193.2.1.92                    1   5   377    27   -286us[ -286us] +/- 1625us
^- 2001:1470:8000::92            1   6    77    33   -445us[ -448us] +/- 1222us
^- 193.2.4.2                     2   5   377    28   +161us[ +161us] +/- 3738us
^- 2001:1470:ff80::11            2   6    77    33   +172us[ +169us] +/- 3626us
^? 46.54.224.12                  0   6     0     -     +0ns[   +0ns] +/-    0ns
^? 2a02:2590:0:224::aaaa:12      0   6     0     -     +0ns[   +0ns] +/-    0ns

# chronyc tracking
Reference ID    : 50505330 (PPS0)
Stratum         : 1
Ref time (UTC)  : Fri Mar 07 13:43:50 2025
System time     : 0.000000247 seconds fast of NTP time
Last offset     : +0.000000448 seconds
RMS offset      : 0.000001780 seconds
Frequency       : 7.619 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.016 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000022400 seconds
Update interval : 32.0 seconds
Leap status     : Normal

# chronyd -version
chronyd (chrony) version 4.6.1 (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER -SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG)

Sources:

  • PPS0: pulse per second from GNSS time module,
  • GPS0: is RS232 serial line from time module as backup,
  • GPS1: is USB connection to other time module (3->4; 4->3),

The only possible explanation to me is that there is some startup issue while it synchronizes with other servers. In setup it is set that drops to stratum to 11 while not in sync with any external server, while reading clock from RTC module.

All four servers:

  • have exact same HW and SW config, except network related parameters,
  • do not use any system software for time sync except and only chronyd,
  • are clustered in pairs with keepalived: 3 + 4 and 1 + 2,
  • cluster pairs are at different locations,
  • If one fails other will overtake in 50ms.

As non-elegant solution I will implement periodically check of time and react if it out of sync (but is should do automatically). This still does not explain why server was so long in “limbo state”.

Anyhow  thanks for reminder...

	Regards

		Boris

Does any of those servers have the local directive in chrony.conf? That needs to be removed to avoid serving clients time from an unsynchronized clock.

I looked at a subset of my 2025 monitoring data for chrony.eu and selected the NTP samples that were in error by over 0.5 seconds. The leap indicator (LI, or alarm) was set to 0 for all samples (with or without errors)

Interestingly when reference ID was 127.127.1.1, the stratum was set to 6, or 8 or 10 or 11.

All error samples had offsets between 42 and 96 days and had reference ID set to 127.127.1.1. Here is a three samples from 2a01:260:4057:4::cc
client time server time difference (days)
1737333053.433599 1733059767.478127636 -49.4593
1738082992.961167 1733059746.562753106 -58.1394
1741356656.843867 1733059753.442727741 -96.029
All dates are given in seconds in the Unix Epoch and correspond to reference ID=127.127.1.1 Look at the server time; it hardly changes even though the dates range from January 2025 to March. Something is wrong with the server’s local clock.

Local clocks, orphan mode, etc should not be used by an NTP pool server. If the server loses touch with time references it should set alarm to 3 and stratum to 0.

Hello,

Those are lines the most essential part of the configuration for sources:

  refclock PPS /dev/pps0 refid PPS0 precision 1e-9 stratum 0 poll 5 lock GPS0
  refclock SHM 0         refid GPS0 precision 1e-6 stratum 6 poll 1
  refclock SHM 1         refid GPS1 precision 1e-8 stratum 1 poll 4
  rtcdevice /dev/rtc0
  hwclockfile /etc/adjtime
  rtcsync
  makestep 1 3
  leapsecmode step

Maybe the local RTC chip has a problem with battery. I will replace the battery.

Behavior is like this:

  • Restart may take between 15-20 seconds.
  • Crony starts.
  • It syncs to the local RTC clock.
  • It takes around 8-10 seconds to find network sources and synchronizes to stratum 1 or 2 server.
  • In mean time GNSS initialization starts and crony syncs to it.

No big science.

Boris

Assuming there are 4 servers, then could each battery failed similarly?

Rather than go into deeper analysis it would be simpler to remove the local/orphan mode settings.

Thanks for the explanation. We considered migrating to ntpd, but given our limited experience, it felt like too much work. Updating existing timesyncd commands also seemed risky since we have custom logic for handling long offline periods. In the end, we opted for a simpler solution—setting time.aws.com as the primary time server and time.google.com as a backup.

Hello,

@stevesommars unfortunately this in not possible, because “local” directive does not exists.

:slight_smile:

Boris

You said earlier:

I think that setup is what you need to remove. A system that is not synchronised should not be reporting a stratum of 11.