Server consistently off

z.a and z.b share the same configuration file, except for the peers and servers. Other parameters disable logging, specify the drift file, restrictions and the one tinkering that I added a couple of days ago to see if it could bring z.a and z.b closer: huff n puff.

After connecting the second interface of z.a’s to a different switch port using a different cable, it still displays the same persistent offset:

 remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-192.168.z.c     111.111.111.222  3 u  600 1024  377    0.519    7.749   2.682
+192.168.z.b     111.7.1.66       2 s   51   64  377    0.394    8.372   0.816
*2222:ffff::123: .GPS.            1 u  760 1024  377   34.453    8.054   1.973
-2222:111:bbbb:e 222.111.2.5      2 u  574 1024  377   31.686   10.712   5.204
+2222:0:eee:4444 222.7.222.33     2 u  400 1024  377   35.810    8.849   3.365

Did you check there is no chronyd, timesyncd, ntpdate running from cron, or anything else controlling the clock?

There can be only one listener at an UDP port, but, no, there’s not other of these daemons in the background and no scheduled time reset job.

An NTP server needs to bind to the UDP port, but not a client.

It’s not possible for a working control loop to have a constantly positive input and not hit the frequency limit. An explanation might be that ntpd is running in a container with no permission to set the clock, or that the discipline is disabled.

If you want to debug this further, I’d suggest to post your ntp.conf and the loopstats log.

The ntpd distributions from ntp.org and ntpsec.org include a program called “ntptime”.
$ ntptime
ntp_gettime() returns code 0 (OK)
time e2bace6b.a63dd510 Thu, Jul 16 2020 8:02:35.649, (.649381825),
maximum error 112136 us, estimated error 55 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 41.186 us, frequency -6.451 ppm, interval 1 s,
maximum error 112136 us, estimated error 55 us,
status 0x2001 (PLL,NANO),
time constant 6, precision 0.001 us, tolerance 500 ppm,

Try running ntptime once/minute. Does the offset and frequency change?

By the way, it is suggested to include the leap seconds file in ntp.conf:

leapfile        /etc/ntp/leap-seconds.list

And download the leap seconds file from Paris Observatory twice every year:

#!/bin/sh
# to update leap seconds list from IERS & Paris Observatory
# https://datacenter.iers.org/productMetadata.php?id=16
# http://hpiers.obspm.fr/eop-pc/index.php?index=bulletins&lang=en
# ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat
cd /etc/ntp
/usr/bin/wget https://hpiers.obspm.fr/iers/bul/bulc/ntp/leap-seconds.list

Here are three runs over three minutes of ntptime:

ntp_gettime() returns code 0 (OK)
  time e2bb4f53.c35e418c  Thu, Jul 16 2020 17:12:35.763, (.763157937),
  maximum error 58154 us, estimated error 4145 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 6310.265 us, frequency 5.446 ppm, interval 1 s,
  maximum error 58154 us, estimated error 4145 us,
  status 0x2001 (PLL,NANO),
  time constant 10, precision 0.001 us, tolerance 500 ppm,

ntp_gettime() returns code 0 (OK)
  time e2bb507f.c456c4c0  Thu, Jul 16 2020 17:17:35.766, (.766949040),
  maximum error 208154 us, estimated error 4145 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 5864.555 us, frequency 5.446 ppm, interval 1 s,
  maximum error 208154 us, estimated error 4145 us,
  status 0x2001 (PLL,NANO),
  time constant 10, precision 0.001 us, tolerance 500 ppm,

ntp_gettime() returns code 0 (OK)
  time e2bb51ab.c536d60c  Thu, Jul 16 2020 17:22:35.770, (.770368526),
  maximum error 19500 us, estimated error 0 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset -2323.382 us, frequency 4.125 ppm, interval 1 s,
  maximum error 19500 us, estimated error 0 us,
  status 0x2001 (PLL,NANO),
  time constant 6, precision 0.001 us, tolerance 500 ppm,

ntpd is running directly on the metal. No VM, no container. How can I check that the kernel discipline is enabled?

On Debian you can use /usr/share/zoneinfo/leap-seconds.list from the tzdata package.

What is the output of the ntpdc -c sysinfo command?

Different possibilities to disable the loop completely, e.g. disable ntp, or noselect option in ntp.conf. The kernel loop can be disabled with the -x option, tinker step, or disable kernel. That’s why we ask for ntp.conf. It would help if you could enable the loopstats log and post a day worth of data to confirm the loop is getting only positive offsets and see how the frequency is changing.

It could be also a kernel bug. What OS, kernel version and ntpd version are you running? If you don’t provide any information, we can only guess what’s wrong.

$ ntpdc -c sysinfo
system peer:          ...
system peer mode:     client
leap indicator:       00
stratum:              2
precision:            -23
root distance:        0.03351 s
root dispersion:      0.03400 s
reference ID:         [61.150.110.96]
reference time:       e2bc5040.9c7267d5  Fri, Jul 17 2020 11:28:48.611
system flags:         auth monitor ntp kernel stats 
jitter:               0.000214 s
stability:            0.000 ppm
broadcastdelay:       0.000000 s
authdelay:            0.000026 s

See above at Server consistently off.

Here’s the ntp.conf file:

# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help

driftfile /var/lib/ntp/ntp.drift

# Enable this if you want statistics to be logged.
#statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

# Router
server 192.168.z.c iburst
# Peers
peer 192.168.z.b maxpoll 6
# Source #1
server ... preempt 
# Source #2
server ... iburst preempt 
server ... iburst preempt 

# Default
discard average 5
restrict default limited kod nopeer noquery notrap

# Local net
restrict 192.168.z.0         mask 255.255.255.0         limited kod
restrict 2222:1111:cc:5555:: mask ffff:ffff:ffff:fff0:: limited kod

restrict 169.254.0.0         mask 255.255.0.0           limited kod nopeer 
restrict fe80::              mask ffe0::                limited kod nopeer 

# Local users
restrict 127.0.0.1
restrict ::1

I’ll enable the collection of loop stats and share it later.

Difficult to know which addresses are real and which are obfuscated :frowning_face:, but if reference ID: [61.150.110.96] is real, whois comes back as a Chinese IP that’s not in the NTP Pool. Is that as expected?

I do not understand one thing. For the host 192.168.z.a:

In your very first post:

I see no host which could match the IP 61.150.110.96. The first two hosts are on the local network with private IP addresses, the last three hosts are IPv6 hosts. There is no server/peer line which could be the public IP 61.150.110.96

Folks, when the peer is IPv6, methinks that the reference ID is a 32 bit hash.

Update:

Since NTP was designed when IPv4 was the only game in town, using the IP address to detect time loops was a really great idea. For S2+ packets, we displayed the refid as an IPv4 address, which meant it was obvious where each machine was getting its time from. But then came IPv6, and a way needed to be found to translate the 128 bits that make up an IPv6 address into a 32-bit field. The solution that was decided on was to use the first 4 bytes (32 bits) of the MD5 hash of the IPv6 address.

(Source: https://www.nwtime.org/ntps-refid/)

FWIW, here’s how ntpd is launched:
/usr/sbin/ntpd -p /var/run/ntpd.pid -gLN -u xx:yy

Is this the same stratum 1 server on z.a and z.b?

server         remote           refid      st t when poll reach   delay   offset  jitter
_xxxx::yy   *xxxx::123:     .GPS.           1 u  459 1024  377   36.360    6.783   2.814
192.168.z.b +2607:ffff::123: .GPS.          1 u  287 1024  377   35.063    0.367   2.363

If the loopstats output doesn’t help, my suggestion would be to configure z.a exactly the same way as z.b or z.c to eliminate as many differences as possible and use a known working configuration to see what happens.

Folks,

I’m embarrassed to say that there indeed was another interloper resetting the system time. Though none of the known daemons were competing with ntp for the system time, the NAS UI software itself had an option to keep the time in sync with “Internet time” buried somewhere. After disabling it, ntp properly keeps the system time synchronized with other servers:

 remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+192.168.z.c     222.55.111.111   2 u  339 1024  377    0.312   -0.236   0.463
-192.168.z.b     66.111.111.99    2 s   12   64  377    0.150   -0.686   0.095
*2222:ffff::123: .GPS.            1 u  912 1024  377   34.440   -0.621   0.776
+2222:111:bbbb:e 222.111.2.77     2 u  346 1024  377   28.605   -0.518   0.826
-2222:0:eee:4011 222.7.222.33     2 u  979 1024  377   35.972    0.983   0.832

As for why it worked once, since I don’t remember setting this option in the NAS UI, I honestly cannot account for.

I appreciate your patience and insight.

1 Like