What happened here? Bad Stratum 7 13 and 15

Prior to 2020-03-09 22:00 UTC the 3 stratum 1’s were root dispersion of 150msec. After that time the stratum 1’s improved to usually under 5 msec root dispersion, but sometimes as bad as 25 msec. This suggests the stratum 1’s are struggling. If the GPS signal is good, this could be a result of variation in CPU load / temperature on the RPi’s

Post 2020-03-09 22:00 the stratum 2 servers are generally good, but sometimes drop to stratum 3.

I would focus on getting the stratum 1 servers working well.

Hello Everyone,

Well, firstly thanks to everyone here who contributed. It was a massive outpouring of knowledge, experience and support.

Net effect was that you all sent me back down the rabbit hole, and I now resurface with some results to show.

@stevesommars - yes, I saw that using time1 to fudge the gpstime close to internet time was bringing S2 to show an est. error of 162ms or so. using a time2 fudge instead has brought this to a very few microsec. I still have some other rather larger than expected offsets on the S1 that I am working on.

@NTPman I have gone for option 2 and have selected some “known good” servers nearish me.

@mlichvar - I have done as you suggest, to remove the loop and have A<=> B <=> C and this seems to help, though sometimes if an S2 chooses another S2, then my stratum with rise to 3 instead of 2. Is there any value in fudging a higher stratum level for the S2 to S2 peers?

@lammert - Thanks for spotting the issues with PEER in Chrony. Very helpful, and I certainly wasnt reading that correctly, though I feel like I read that section many times.

Salaams,

Alex

1 Like

So, for the GPS signal, I suspect (how can I check?) that I am getting nastly reflections from the metal roof that the antenna is only slightly above.

Temp and CPU load are for sure not the issue. Raspberry pis are also forced to cpufreq profile of performance.

No, there is no value in fudging a stratum number (especially S3 to S2)… The internals of NTP work based on a number of factors, mostly the calculated ‘root dispersion’.

You can always add a ground plane or choke ring, or raise the antenna higher if possible. I’ve seen plenty of people mount a pizza pan under small antennas (like the magnetic or flush mount).

Thanks for the info. I will read on ground planes more. Prolly just easier for me to raise it.

Interestingly it looks like mode 84 (NMEA Sentence GPGLL) works better for PPS than mode 88 (NMEA GPZDA)

GPS offsets are down to the very low expected levels now.

I have left one machine to run for a while and lets see what happens.

SOURCE: https://forums.adafruit.com/viewtopic.php?f=50&t=70133&hilit=stratum+1+ntp&start=60

Before

    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.20.0    .GPS0.           0 l    7    8  377    0.000    0.706   0.841

after:

    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.20.0    .GPS0.           0 l    8    8  377    0.000   -0.002   0.001

something is still wrong with the stratum 1’s.
160.119.217.233 was in error by several hundred msec.

160.119.217.234 had a brief gap. But the root dispersion is still way too high. I expect root dispersion to be well under 1 msec.

What does ppstest report?

160.119.217.235 is similar to 160.119.217.234.

I run my GPSs using gpsd. That makes it easy to capture the NMEA sentences and to watch the satellites visibly (xgps).

I’m unsure what monitoring is available when ntpd directly captures the PPS.

You can watch the root dispersion by periodically running ntpq -np.

ntpd also supports statistics which might help.

On my stratum 1’s, I like to include a couple of nearby well-managed NTP stratum 1’s using the “noselect” directive.

I’d suggest including some servers outside your RPi systems. Here are a couple:

146.64.58.41 tick.meraka.csir.co.za Pretoria_SouthAfrica
146.64.8.7 tock.meraka.csir.co.za Pretoria_SouthAfrica

Hi,

Thanks for this. Interesting graphs. How do you generate those?

Yes, I am struggling. I cant see why though. I am an expert in running watch -n 5 ntpq -pn -c rv but not an expert obviously in interpreting it…

I have moved, raised and improved the GPS antenna. Didnt see any difference. See current results below. Your thoughts much appreciated. I will have added the extra entries.

See attached the GPS Status. Is this not OK?

And a monster offset. Ouch!

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.20.0    .GPS0.           0 l    7    8  377    0.000   10.392   9.271

associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version="ntpd 4.2.8p12@1.3728-o (1)", processor="armv7l",
system="Linux/4.19.75-v7l+", leap=00, stratum=1, precision=-20,
rootdelay=0.000, rootdisp=22.721, refid=GPS0,
reftime=e21608e2.2e58ec74  Fri, Mar 13 2020 13:28:02.181,
clock=e21608e9.c761489b  Fri, Mar 13 2020 13:28:09.778, peer=49898, tc=3,
mintc=3, offset=10.391946, frequency=-14.205, sys_jitter=9.270779,
clk_jitter=4.514, clk_wander=2.825, tai=37, leapsec=201701010000,
expire=202006280000

Config:

driftfile /var/lib/ntp/ntp.drift
leapfile /usr/share/zoneinfo/leap-seconds.list
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
server 127.127.20.0 mode 88 minpoll 3 iburst prefer
fudge 127.127.20.0 stratum 0 flag1 1 flag2 0 flag3 1 flag4 0 time1 0.00 time2 0.105 refid GPS0
restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited
restrict 127.0.0.1
restrict ::1
restrict source notrap nomodify noquery

PPSTEST Results

oot@ntp-s1-0:/etc# ppstest /dev/pps0 
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1584107147.499433791, sequence: 367465 - clear  0.000000000, sequence: 0
source 0 - assert 1584107148.499435367, sequence: 367466 - clear  0.000000000, sequence: 0
source 0 - assert 1584107149.499436516, sequence: 367467 - clear  0.000000000, sequence: 0
source 0 - assert 1584107150.499435221, sequence: 367468 - clear  0.000000000, sequence: 0
source 0 - assert 1584107151.499432056, sequence: 367469 - clear  0.000000000, sequence: 0
source 0 - assert 1584107152.499432928, sequence: 367470 - clear  0.000000000, sequence: 0
source 0 - assert 1584107153.499430226, sequence: 367471 - clear  0.000000000, sequence: 0
source 0 - assert 1584107154.499433209, sequence: 367472 - clear  0.000000000, sequence: 0
source 0 - assert 1584107155.499433525, sequence: 367473 - clear  0.000000000, sequence: 0
source 0 - assert 1584107156.499432600, sequence: 367474 - clear  0.000000000, sequence: 0

Your GPS looks good at that snapshot, plenty of satellites over 25 db SNR, and it has included 9 out of 11 in the lock solution.

offset=10.391946, … sys_jitter=9.270779

Both of those are high, but is it going down? How long had ntpd been running?

NTP packet captures collected at my client were parsed and plotted via gnuplot. If you send email to me I can describe what I do in more detail.

Please clarify the GPS configuration. Is the GPS signal amplified and split to three GPS hat’s, one on each RPi?

I’ve converted all my RPi’s to use GPSD, it is a solid framework for receiving and distributing NMEA information and the PPS timestamps. I don’t use the type 20 driver.

The three stratum one’s still show root dispersion above 1 msec. I have an RPi2 + Adafruit hat with r.d. of a few microseconds. This is why I suspect PPS isn’t working. Also look at your ppstest output. Unless you’ve done some special GPS configuration, the pps signal should arrive close to the second, yours is arriving at 0.4994 msec after the second. My Adafruit gives:

ppstest /dev/pps0

trying PPS source “/dev/pps0”
found PPS source “/dev/pps0”
ok, found 1 source(s), now start fetching data…
source 0 - assert 1584225427.999997187, sequence: 1704022 - clear 0.000000000, sequence: 0
source 0 - assert 1584225428.999997197, sequence: 1704023 - clear 0.000000000, sequence: 0
source 0 - assert 1584225429.999999810, sequence: 1704024 - clear 0.000000000, sequence: 0
source 0 - assert 1584225430.999998256, sequence: 1704025 - clear 0.000000000, sequence: 0
source 0 - assert 1584225431.999999827, sequence: 1704026 - clear 0.000000000, sequence: 0
source 0 - assert 1584225432.999999058, sequence: 1704027 - clear 0.000000000, sequence: 0
Note that the time stamps are within a couple of microseconds of the second.

Did you follow one of the RPi+GPS cookbooks? David Taylor assembled a lot of useful information, see http://www.satsignal.eu/ntp/Raspberry-Pi-quickstart.html

Hi Everyone,

Thanks again for continued support. I think I am making some headway now.

PPStest (now that I know how to read it) shows:

ok, found 1 source(s), now start fetching data...
source 0 - assert 1584283267.000003067, sequence: 90038 - clear  0.000000000, sequence: 0
source 0 - assert 1584283268.000000979, sequence: 90039 - clear  0.000000000, sequence: 0
source 0 - assert 1584283269.000002355, sequence: 90040 - clear  0.000000000, sequence: 0
source 0 - assert 1584283270.000002119, sequence: 90041 - clear  0.000000000, sequence: 0
source 0 - assert 1584283271.000003290, sequence: 90042 - clear  0.000000000, sequence: 0
source 0 - assert 1584283272.000002571, sequence: 90043 - clear  0.000000000, sequence: 0
source 0 - assert 1584283273.000000165, sequence: 90044 - clear  0.000000000, sequence: 0

Which I think is now very close to the second. Pls confirm.

Build details:
1 Single Marine Hi DB GPS Antenna connected to a 1:4 GPS Splitter, connected to three separate adafruit ultimate GPS Hats with battery, connected to 3 x raspberry pi 4.

I started with the Satsignal how to, and in fact I find the NMEA Driver 20 much easier to manager than via GPSD, but thats just me. I also like the concept of minimal extra software running.

I made progress by doing the following:

Reset gps to factory defaults with the gpsinit tool, and then load the latest EPO database.

Then to redo via David Taylors technique to measure (5000 samples) of offset to determine the best time2 offset (its now +537)

Then use the gpsinit script to set system time before starting NTP, and this is where I think I saw the most gain: Like this:

gpsinit -s 115200 -i set_system_clock

You can find the gpsinit tools here: https://github.com/f5eng/mt3339-utils

My ntp.conf now looks like:

server 127.127.20.0 mode 84 minpoll 3 iburst
fudge 127.127.20.0 stratum 0 flag1 1 flag2 0 flag3 0 flag4 0 time1 0.00 time2 +0.537 refid GPS0
server tick.meraka.csir.co.za iburst noselect
server tock.meraka.csir.co.za iburst noselect

Additional nearby servers
Unfortunately the csir.co.za servers dont like my defaults and I am getting rate exceeded, so I guess I need to increase min/max poll but cant think of a sane setting to try yet.

Current readings:

 ntpq -pn -c rv
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.20.0    .GPS0.           0 l    6    8  377    0.000    0.000   0.003
 146.64.58.41    .GPS.            1 u   30   64  377   58.046    0.106   1.424
 146.64.8.7      .GPS.            1 u   58   64  377   64.923    3.865   1.108

associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version="ntpd 4.2.8p12@1.3728-o (1)", processor="armv7l",
system="Linux/4.19.97-v7l+", leap=00, stratum=1, precision=-20,
rootdelay=0.000, rootdisp=1.090, refid=GPS0,
reftime=e218bf42.5338759f  Sun, Mar 15 2020 14:50:42.325,
clock=e218bf48.a8e6fa33  Sun, Mar 15 2020 14:50:48.659, peer=43510, tc=3,
mintc=3, offset=0.000456, frequency=-14.437, sys_jitter=0.002518,
clk_jitter=0.002, clk_wander=0.001, tai=37, leapsec=201701010000,
expire=202006280000
root@ntp-s1-0:/home/pi# 

However I note root disp is still more than one. All three (plus a 4th development system on a raspi 3b+) are all showing near identical rootdisp which swings between 1.000 and 1.090ish.

Is that rootdisp acceptable?

Best,

Alex

My terminology was sloppy/imprecise, sorry.

The stratum 0 clock (GPS module) should ideally have a root dispersion of a few microseconds. If the RPi has temperature swings, it can be significantly worse.

Root dispersion of 1 msec for the stratum 1 system (which includes all the NTP software) is in the right range.

rootdisp increases every second by the configured “tinker dispersion” Miscellaneous Commands and Options (CLOCK_PHI, defaults to 15 ppm or 0.000015 s/s).

rootdisp is set to the minimum dispersion value (MINDISPERSE, default 1ms) every time the clock is set (the “reftime” timestamp).

Since it is 6 seconds after reftime in your snapshot, your root dispersion is 1ms + (0.000015 s/s * 6s) = 1ms + 0.090ms = 1.090ms

This is an acceptable rootdisp for the vast majority of users

Going back to the start, just wondering why mode 88?

The docs says that’s 115200 baud and process $GPZDA or $GPZDG.

Found a couple of other adafruit hat pages and they suggest other mode settings. The pages are from 2012 so maybe things have moved on, but curious how you settled on 88?

Hi,

I initially followed David Taylor (www.satsignal.eu) guide. But struggled with the gpsd.

I then followed this guide: https://forums.adafruit.com/viewtopic.php?f=50&t=70133&start=60

This made much more sense to me. Cut out the offsets created by gpsd, and have NMEA driver read the serial line directly, and then add the PPS flag1.

In that thread there is a link to the GPSINIT tools for the adafruit ultimate gps (here: GitHub - gtjoseph/mt3339-utils ) which allow you to tune your GPS Receiver directly.

These tools work, and allow to select different NMEA sentences to be sent by the GPSR. So again, I liked the fact that you could reduce the number of sentences and increase the baud rate so that your chances of NTPd getting the data it needs is maximum.

So mode 84 (I am not using mode 88 anymore, as I wanted more data than GPZDA offered) is serial line rate of 115200 and NMEA sentence GPGLL, so NTP should be receiving only what it needs from the GPSr at max speed.

I dont need to view sat status as many people prefer. I am confident in my sat signals, and this is made more easy considering that I am right on the equator.

So thats been my thought process, but if you have been reading these threads you will see that I am still struggling, so ymmv.

Salaams,

Alex

Some info on the Generic NMEA Driver is found here (note I am using ntpsec package) Generic NMEA GPS Receiver

Take particular note of:

This driver supports GPS receivers with the $GPRMC , $GPGLL , $GPGGA , $GPZDA and $GPZDG NMEA sentences by default. Note that Accord’s custom NMEA sentence $GPZDG reports using the GPS timescale, while the rest of the sentences report UTC. The difference between the two is a whole number of seconds which increases with each leap second insertion in UTC. To avoid problems mixing UTC and GPS timescales, the driver disables processing of UTC sentences once $GPZDG is received.

The driver expects the receiver to be set up to transmit at least one supported sentence every second.

Just a quick note here:

My ntp.conf refclock setting in /etc/ntpsec/ntp.conf is:

refclock nmea baud 115200 mode 4 flag1 1 minpoll 3 maxpoll 3

This is rather different that the old driver 20 method in ntp classic, which would look like this:

server 127.127.20.0 mode 84 minpoll 3 maxpoll 3
fudge 127.127.20.0 flag1 1 time2 +0.460 refid GPS0

Alex

Hi @ddrown

rootdisp increases every second by the configured “tinker dispersion” Miscellaneous Commands and Options (CLOCK_PHI, defaults to 15 ppm or 0.000015 s/s).

Thanks very much for this tidbit… Very useful.

To expand further, this is controlled by tos mindist in the ntp.conf file, and defaults to 1ms or 0.001s. So the 1.xxx in my result is coming from the default mindist hard coded in to ntpd.

Setting tos mindist 0.010 would give a rootdisp of 10.xxx plus whatever else is calculated.

Alex

Hi @alex

Is everything stable now? Not quite sure what the latest is… :grinning:

No… Not yet…

Its really a journey for me…

Do you know what spike_detect means?

ssocid=0 status=c413 leap_alarm, sync_uhf_radio, 1 event, spike_detect,
version="ntpd 4.2.8p12@1.3728-o (1)", processor="armv7l",
system="Linux/4.19.97-v7l+", leap=11, stratum=1, precision=-20,
rootdelay=0.000, rootdisp=355.974, refid=GPS0,
reftime=e2273df8.58e382db  Thu, Mar 26 2020 14:43:04.347,
clock=e2273dfc.e76a9f7f  Thu, Mar 26 2020 14:43:08.903, peer=38904, tc=3,
mintc=3, offset=0.000000, frequency=-16.036, sys_jitter=0.000000,
clk_jitter=0.001, clk_wander=0.000, leapsec=201701010000,
expire=202006280000

The documentation is spectacularly vague, it says spike_detect means spike detected… Yay. See here: https://docs.ntpsec.org/latest/decode.html

So, a spike of what?

Hi @alex, oh dear… pesky computers! :man_facepalming: :grinning:

That status report doesn’t look happy at all - leap_alarm, leap=11 and no tai= looks like the daemon only just started and hasn’t settled down yet. How long has the daemon been running on that Pi?

What issues are you currently you seeing? Please could you post a ntpq -pn -c rv

Did you have any joy getting ntpviz running? Without a baseline or some log output it’s difficult to see whether issues are intermittent or diagnose…

mindist also has other side effects aside from the root dispersion, which could lead to a spike detect I think if you have it set to 10 microseconds.