What on earth is wrong with my stratum 1 servers

Hi Everyone,

Following on from my previous thread which so many people helped with (thanks), I am zooming in on problems with my S1 servers, which are currently set to only get time from Satellite. I want to see this working before adding other servers for resiliency, so for current purposes each is a standalone device.

3 x Raspberry Pi 4 with adafruit ultimate gps hat, sharing 1 Marine GPS antenna, running latest raspian
1 x Raspberry Pi 3b+ with adafruit ultimate gps hat, 1 magnetic puck antenna, running latest raspian

They all behave exactly the same, which makes me suspect my config or a software bug… Most likely my mistakes.

My Config

# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help
driftfile /var/lib/ntp/ntp.drift
leapfile /usr/share/zoneinfo/leap-seconds.list
statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
server 127.127.20.0 mode 84 minpoll 3 iburst
fudge 127.127.20.0 stratum 0 flag1 1 flag2 0 flag3 0 flag4 0 time1 0.00 time2 +0.537 refid GPS0
restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited
restrict 127.0.0.1
restrict ::1
restrict source notrap nomodify noquery

Before the problem occurs:
Below is recorded after about 10 minutes after NTP has been restarted and offsets and rootdisp has settled. Everything looks good.

root@raspberrypi:/etc# ntpq -pn -c rv
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.20.0    .GPS0.           0 l    5    8  377    0.000    0.001   0.001

associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version="ntpd 4.2.8p12@1.3728-o (1)", processor="armv7l",
system="Linux/4.19.97-v7+", leap=00, stratum=1, precision=-20,
rootdelay=0.000, rootdisp=1.075, refid=GPS0,
reftime=e221cfef.55b3ebca  Sun, Mar 22 2020 14:52:15.334,
clock=e221cff4.d42ce8bb  Sun, Mar 22 2020 14:52:20.828, peer=20443, tc=3,
mintc=3, offset=0.001064, frequency=-2.489, sys_jitter=0.000954,
clk_jitter=0.006, clk_wander=0.004, tai=37, leapsec=201701010000,
expire=202006280000

Associations

root@raspberrypi:/etc# ntpq -pn -c ass
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.20.0    .GPS0.           0 l    5    8  377    0.000   -0.001   0.003

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 20443  971a   yes   yes  none  pps.peer    sys_peer  1
root@raspberrypi:/etc# 

PPSTest

root@raspberrypi:/etc# ppstest /dev/pps0 
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1584878216.999999638, sequence: 7451 - clear  0.000000000, sequence: 0
source 0 - assert 1584878217.999998981, sequence: 7452 - clear  0.000000000, sequence: 0
source 0 - assert 1584878218.999999312, sequence: 7453 - clear  0.000000000, sequence: 0
source 0 - assert 1584878219.999999119, sequence: 7454 - clear  0.000000000, sequence: 0
source 0 - assert 1584878220.999999913, sequence: 7455 - clear  0.000000000, sequence: 0

Then after about 60 minutes everything goes wrong:
Notice the “no_sys_peer” entry and x on the server entry.

root@raspberrypi:/etc# ntpq -pn -c rv
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
x127.127.20.0    .GPS0.           0 l    1    8  377    0.000  -20.345   1.154

associd=0 status=0018 leap_none, sync_unspec, 1 event, no_sys_peer,
version="ntpd 4.2.8p12@1.3728-o (1)", processor="armv7l",
system="Linux/4.19.97-v7+", leap=00, stratum=1, precision=-20,
rootdelay=0.000, rootdisp=902.523, refid=GPS0,
reftime=e221e077.55ac2657  Sun, Mar 22 2020 16:02:47.334,
clock=e221e1e0.523212b0  Sun, Mar 22 2020 16:08:48.321, peer=0, tc=3,
mintc=3, offset=-8.531103, frequency=30.136, sys_jitter=431.788750,
clk_jitter=38.617, clk_wander=12.217, tai=37, leapsec=201701010000,
expire=202006280000

Associations

root@raspberrypi:/etc# ntpq -pn -c ass
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
x127.127.20.0    .GPS0.           0 l    6    8  377    0.000  -23.204   1.160

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 20443  911a   yes   yes  none falsetick    sys_peer  1

PPSTest
PPS is now out of whack.

root@raspberrypi:/etc# ppstest /dev/pps0
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1584882546.021158350, sequence: 11780 - clear  0.000000000, sequence: 0
source 0 - assert 1584882547.021191766, sequence: 11781 - clear  0.000000000, sequence: 0
source 0 - assert 1584882548.021221952, sequence: 11782 - clear  0.000000000, sequence: 0
source 0 - assert 1584882549.021255523, sequence: 11783 - clear  0.000000000, sequence: 0

Can anybody suggest how I can solve this?

I have tried flag3 to both options of Clock discipline and Kernel discipline. Doesnt make any difference.

Thanks in advance,

Alex

Are you logging your GPS data? Is it loosing satellites? Is the GPS configured to continue giving a PPS pulse even when it looses lock?

Does iburst even work on local refclocks?

How did you determine an offset for time2 +0.537 ??? Half a second offset kind of makes me think you are triggering on the wrong edge of the PPS?

Hi,

Thanks for the interest.

I determined the time2 offset by doing a 60 minute collection as below:

watch -n0.5 "ntpq -pn | grep '.GPS0. 1 1 1' | tee --append GPSD_Offset.txt"

awk '{total=total+$9; count=count+1} END {print "Total:"total; print "Count:"count; print " Avg:"total/count}' GPSD_Offset.txt

I can try the other edge of PPS easily enough, that is flag2.

I have no idea about iburst on refclocks. Let me remove it and see what happens. My GPS does NOT continue PPS when satellite lose lock. Whenever I have had this situation I have never seen loss of satellites. Also its quite predictable at about 60minutes after starting ntpd. If I restart ntpd then it will work again for a while and then stop after 60 mins.

I dont think I am losing satellites, I am not logging anything about them (not sure how to), but see below signals (I am in the tropics almost on the eequator with antenna placed outside in clear view). See below:

Thanks,

Alex

Have you considered running ntpviz? It might help narrow down whether there’s an intermittent GPS reception issue. I’ve recently put it on my Debian box that has a Garmin GPS-18 LVC puck. My graphs are here: https://ljay.org.uk/ntpviz/day/

I’ve definitely got an occasional issue where reception drops off causing TDOP to spike!

The ntpviz details are here: https://docs.ntpsec.org/latest/ntpviz.html

I know you’re not running gpsd, but this page gives a few good all round hints and tips: https://gpsd.gitlab.io/gpsd/gpsd-time-service-howto.html including calculating the GPS offset from the ntpd peerstats file.

Maybe it’s worth running ppswatch instead of ppstest. You can use “-a” or “-c” to change whether it watches the leading or trailing edge of the PPS signal. When you ^C it it gives some stats too.

Hi,

Regarding ntpviz it seems to come as part of the ntpsec package and not the regular ntp package.

I have installed the file from github, but running throws errors:

root@raspberrypi:~# ntpviz 
ntpviz: ERROR: can't find Python NTP library.
No module named ntp.statfiles
Check your PYTHONPATH
root@raspberrypi:~#

Before I go running all over to fix that, can you confirm that ntpviz works with regular ntp package not just ntpsec package? Doing apt install ntpsec works to install ntpviz but also wants to remove ntp.

Any other tips to get it going? (I have installed liberation font, gnuplot, python3-ntp and so on.)

Thanks,

Alex

Hi, yes, I’m running plain ntpd, not ntpsec. On my Debian box it says it depends on:

adduser
fonts-liberation
gnuplot
pyton3 (>= 3.3)
python3-ntp
systemd | cron | cron-daemon

OK, and how did you install ntpviz? Via git clone?

Using aptitude so it sorted out the dependencies for me :grinning:

So here is an update, what does the no_sys_peer mean?

root@ntp-s1-1:/home/pi# ntpq -pn -c rv
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.20.0    .GPS0.           0 l    4    8  377    0.000   -0.019   0.002

associd=0 status=0418 leap_none, sync_uhf_radio, 1 event, no_sys_peer,
version="ntpd 4.2.8p12@1.3728-o (1)", processor="armv7l",
system="Linux/4.19.75-v7l+", leap=00, stratum=1, precision=-20,
rootdelay=0.000, rootdisp=1.045, refid=GPS0,
reftime=e2222e67.b59b4aaf  Sun, Mar 22 2020 18:35:19.709,
clock=e2222e6b.5a1725a3  Sun, Mar 22 2020 18:35:23.351, peer=31651, tc=3,
mintc=3, offset=-0.019002, frequency=-13.755, sys_jitter=0.002317,
clk_jitter=0.002, clk_wander=0.011, tai=37, leapsec=201701010000,
expire=202006280000

Yet the NMEA driver is still selected, and the numbers look OK. I mean offset is not the best, but its OK isnt it?

Alex

Ummm… Exactly how is that supposed to tell you anything relevant? You would need a second (accurate) source to compare time with.

From previous posting:

The page linked says the “o” status means “PPS peer (when the prefer peer is valid)”

The suggestion from the GPSD page is:

Start ntpd and let it run for at least four hours. Periodically check progress with “ntpq -p” and wait until change has settled out.

Calculate the average GPS offset using this script (a copy is included as contrib/ntpoffset in the GPSD distribution):

awk ’
/127.127.28.0/ { sum += $5 * 1000; cnt++; }
END { print sum / cnt; }
’ </var/log/ntpstats/peerstats

This prints the average offset.

  1. Adjust the “time1” value for unit 0 of your ntp.conf (the non-PPS channel) by subtracting the average offset from step 4.
  2. Restart ntpd.

But their minimal configuration has a pool command so your machine is able to compare time against other servers…

That was kind of my point. You need other time sources in the list so NTP will steer the clock towards one of those. Otherwise if you have only one source, and are recording the offset value, it’s merely telling you how much NTP is lagging behind keeping the clock in sync with the one source.

Hi Alex,

Serial Delay and stability are to things the GPS USB attached devices can suffer from.

I am one of the lurkers that doesn’t ever post, but I looked at your output and noted the jitter is way up and had to speak on my experience. I always like to dig out the table that explains the flags. Your clock was getting the x and the o and the * when it was stable.

Code Message T Description
0 sel_reject discarded as not valid (TEST10-TEST13)
1 sel_falsetick x discarded by intersection algorithm
2 sel_excess . discarded by table overflow (not used)
3 sel_outlier - discarded by the cluster algorithm
4 sel_candidate + included by the combine algorithm
5 sel_backup # backup (more than tos maxclock sources)
6 sel_sys.peer * system peer
7 sel_pps.peer o PPS peer (when the prefer peer is valid)

Anything other than a * (splat) will generate a no_sys_peer if you don’t have any other reference clocks, servers or peers it can access.

I have enough clocks locally I don’t slave off the NTP Stratum 1 open servers as much. The trouble I always ran into was in the hardware inside the USB to serial conversion. The brand (Chip Sets and Micro code) makes a huge difference in the stability in the circuit

I keep the configs simple, and find adding too many options can complicate the issues. Also the documents indicates a server, of which a reference clock is one type, you can use the iburst with it (Someone asked on the thread). I have noted on my servers, Iburst really didn’t seem to do much for a reference clock as on a restart, the reference clock is polled first and acquired first, so I don’t use the option.

Then I can see after it started and running for a few minutes it starts to poll the others, the algorithms will switch away from the reference clock and select another server until the local reference clock signal steadies out. Then I usually see it stay put after a while. The cool thing about the NTP Daemon is it will switch among the peers and servers and select the best one, and will continue to serve time even if a reference source of any kind is lost as long as there is a pool to select from.

I have 4 Symmetricom dedicated time servers and 5 Unix servers. I tie the hosts to the output of the serial ports on the clocks using a USB to serial converter.

One manufacturer was so unstable that the time fluctuated as much as 5-10ms, and the Jitter would never fall to less than 1 and was always between 1-10. I can say watching the output for years, NTP does not like it when reference clocks move around too sharply, hence the jitter you see is causing the clock to become unstable, and no other servers to fall back on, cause the no peer condition.

I now am running Sabrent USB to Serial USB 2.0 converters and the circuit is not only way more stable, but the delay induced is far less by a factor of 100.

Here is one of my clocks output with the same command line you used.

 remote           refid      st t when poll reach   delay   offset  jitter

==============================================================================
*127.127.5.0 .GPS. 0 l 1 16 377 0.000 -0.202 0.189
+192.168.1.10 .GPS. 1 u 2 32 377 0.230 0.052 0.101
-192.168.1.32 .GPS. 1 u 2 32 377 0.222 1.840 0.155
+192.168.1.31 .GPS. 1 u 20 32 377 0.219 0.123 0.129
-192.168.1.9 .IRIG. 1 u 16 32 377 0.345 0.105 0.128
-140.142.234.133 172.22.16.38 2 u 6 64 377 40.418 3.175 0.330

associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version=“ntpd 4.2.8p12@1.3728-o (1)”, processor=“x86_64”,
system=“Linux/5.3.0-42-generic”, leap=00, stratum=1, precision=-24,
rootdelay=0.000, rootdisp=1.015, refid=GPS,
reftime=e2280625.006e2e48 Thu, Mar 26 2020 21:57:09.001,
clock=e2280626.c262d1da Thu, Mar 26 2020 21:57:10.759, peer=62365, tc=4,
mintc=3, offset=-0.035920, frequency=-3.990, sys_jitter=0.365607,
clk_jitter=0.106, clk_wander=0.020, tai=37, leapsec=201701010000,
expire=202006280000

Here is my config for that clock older converter (that works):

server 127.127.5.0 minpoll 4 # Locally attached TrueTime GPSDO-Rubidium PPS
fudge 127.127.5.0 refid GPS # Reference to S200 GPSDO-Rubidium GPS
fudge 127.127.5.0 time1 0.001355 # Fudge factore for Serial port delay
server timex.anomalink.com minpoll 4 iburst # Local Stratum 1 Server Symmetricom S200 GPSDO-Rubidium GPS
server pulsar.anomalink.com minpoll 4 iburst # Local Stratum 1 Server Symmetricom S350 GPSDO-Rubidium GPS
server rolex.anomalink.com minpoll 4 iburst # Local Stratum 1 Server Symmetricom S350 GPSDO-Rubidium GPS
server omega.anomalink.com minpoll 4 # Local Stratum 1 Server Symmetricom S250i IRIG-B, 10Mhz, + References
server bigben.cac.washington.edu # Internet based stratum 1 server GPS
#server 192.168.1.17 minpoll 4 # Local Stratum 1 Server Jacksom Labs GPSDO-TCXO PPS NMEA
#server india.colorado.edu # Internet based Stratum 1 Server NIST

Here is the other one that has the Sabrent converter

 remote           refid      st t when poll reach   delay   offset  jitter

==============================================================================
*127.127.5.0 .GPS. 0 l 9 16 377 0.000 0.069 0.128
-192.168.1.10 .GPS. 1 u - 16 377 0.523 -0.181 0.075
-192.168.1.32 .GPS. 1 u 14 16 377 0.338 1.672 0.140
+192.168.1.31 .GPS. 1 u 11 16 377 0.411 -0.035 0.114
+192.168.1.9 .IRIG. 1 u 1 16 377 0.351 -0.098 0.101
-140.142.2.8 172.22.16.38 2 u 25 64 377 32.864 2.331 1.132

associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version=“ntpd 4.2.8p12@1.3728-o (1)”, processor=“x86_64”,
system=“Linux/5.3.0-42-generic”, leap=00, stratum=1, precision=-24,
rootdelay=0.000, rootdisp=1.150, refid=GPS,
reftime=e2281831.000703a2 Thu, Mar 26 2020 23:14:09.000,
clock=e228183a.f5de065b Thu, Mar 26 2020 23:14:18.960, peer=2284, tc=4,
mintc=3, offset=-0.013202, frequency=-7.265, sys_jitter=0.178096,
clk_jitter=0.108, clk_wander=0.020, tai=37, leapsec=201701010000,
expire=202006280000

and here is the config, but note time1 value on this server:

server 127.127.5.0 minpoll 4 # Locally attached TrueTime GPSDO-Rubidium PPS
fudge 127.127.5.0 refid GPS # Reference to S200 GPSDO-Rubidium GPS
fudge 127.127.5.0 time1 0.000055 # Fudge factore for Serial port delay
server timex.anomalink.com minpoll 4 iburst # Local Stratum 1 Server Symmetricom S200 GPSDO-Rubidium GPS
server pulsar.anomalink.com minpoll 4 iburst # Local Stratum 1 Server Symmetricom S350 GPSDO-Rubidium GPS
server rolex.anomalink.com minpoll 4 iburst # Local Stratum 1 Server Symmetricom S350 GPSDO-Rubidium GPS
server omega.anomalink.com minpoll 4 # Local Stratum 1 Server Symmetricom S250i IRIG-B, 10Mhz, + References
server bigben.cac.washington.edu # Internet based stratum 1 server GPS
#server 192.168.1.17 minpoll 4 # Local Stratum 1 Server Jacksom Labs GPSDO-TCXO PPS NMEA
#server india.colorado.edu # Internet based Stratum 1 Server NIST

Note: I just slew the clock with “time1” and leave the rest of the options default.

The way I tuned my servers is you can use the ntpq and the md5 keys to make the adjustments on the fly instead of restarting the daemon.

So, we would need to understand what your circuit looks like, and how it is connected. I used the same procedures and config with GPSD NMEA clocks attached serially as well.

So perhaps it the hardware?? I think you will learn a lot more once you add in some other peers or servers in your mix.

Anyway, this may not be your issue, but at least you have my experience.

Allyn

2 Likes