Get more stability at no costs

@Mpegger , for you I have been playing with my J1900 CPU, that my timeserver (str1) use.
It’s about the same as an RPi4 or better.

I noticed a lot of swings. However the crystal is hopless, just like the Pi.

So I wondered, how can we improve?

See what happened after 13:30…I changed a lot:

What I did is exchange the normal kernel for the RT-kernel.

Step 1: apt-get install linux-image-rt-amd64 install the Realtime kernel and reboot.

Step 2: change Chrony params, works for 4.6.x or newer…

# This directive specify the file into which chronyd will store the rate information.
maxdrift 50
driftfile /var/lib/chrony/chrony.drift

# Hardware timestamping on NIC, either hwtimestamp eth0 or * for all
hwtimestamp *

# Make scheduler change
sched_priority 40

# Run only in RAM
lock_all

That will help a lot.

Step 3: Alter systemd for GPSD nano /etc/systemd/system/gpsd.service

Change this:

[Service]
CPUSchedulingPolicy=rr
CPUSchedulingPriority=40
IOSchedulingClass=realtime
Nice=-10
EnvironmentFile=-/etc/default/gpsd
#EnvironmentFile=-/etc/sysconfig/gpsd
ExecStart=/usr/sbin/gpsd $GPSD_OPTIONS $DEVICES

When you check ‘top’ you will see it’s using PR -41, meaning it’s near realtime.

That is what I did to make the CPU more stable. Beware, I also removed CPU-speed-changing…it’s running at max-performance 24/7.

I have not tested this for long, but the graph is very good after several hours.
Beware, a RT-Kernel gives real-time software more time to process.

Chrony gets enough time to work it out, and GPSD enough time to pass data from source to processing via Chrony.

I do not know if this works good, but the graph look good after 3 hours.

The system in question is an Intel J1900…not the fastest CPU you can get.

Enjoy. Let me know if it works for you…or if it doesn’t. :zany_face:

Was pondering running this test on FreeBSD also with rtprio - but I have a few more things I want to test first

This also may be intersting to run on my Pi5 CMIO with nic hardware timestamping, PHC, and satpulse

Looks like they built in a rt kernel into their OS

Wonder how much more accurate that clock can get

I also did a lot of reading on the different connectors to GPS and PPS etc.

It turns out that for Chrony (NTPd can go wrong) that SHared Memory is the fastest way to get data from GPSD to Chrony.

Also, I know now why PPS and GPS signals are seperated, as it makes it even faster to transport between programs.

The reason is that it’s lacking overhead in total, it’s just a message-board in memory and the poster posts it, the other software reads it.
That is also why you can use e.g. Chrony and GPSMON at the same time, they read the same messages.
And those messages are stored in RAM (maybe even cache), as such it’s the fastest transfer you can use if your OS has it, Linux does.

The kernel PPS reading directly is slower then SHM. As I was puzzled why the other ways like Sockets or PPS was giving my PPS signal such jumps in reporting, SHM gives less then 5ns almost all the time, clearsky even less then 1ns. (according to Chrony).

As for hardware timestamping, yeah stamping via a nic is good, but when you tune the fifo with setserial to low_latency.

in /var/lib/setserial

/dev/ttyS0 uart 16550A port 0x03f8 irq 4 baud_base 115200 spd_normal skip_test low_latency
/dev/ttyS1 uart 16550A port 0x02f8 irq 3 baud_base 115200 spd_normal skip_test low_latency

Run service setserial status, it should show this:

dec 03 18:15:03 server systemd[1]: Starting setserial.service - controls configuration of serial ports...
dec 03 18:15:03 server setserial[1870665]: Loading the saved-state of the serial devices...
dec 03 18:15:03 server setserial[1870674]: /dev/ttyS0 at 0x03f8 (irq = 4) is a 16550A (low_latency)
dec 03 18:15:03 server setserial[1870677]: /dev/ttyS1 at 0x02f8 (irq = 3) is a 16550A (low_latency)
dec 03 18:15:03 server systemd[1]: Finished setserial.service - controls configuration of serial ports.

That will help too. By default low_lantecy is NOT enabled, and if you don’t run:

dpkg-reconfigure setserial

And set it to manual, then after change the file, it will override it at every reboot.

And last, the computer clock isn’t much of a problem…the crystal in the GPS is, as cheap GPS’ses like u-blox typical use poor quality crystals and that affects your entire system.

The Garmin is spot on the dime, I tested u-blox versus Garmin, the Garmin is soo much better.
Sure it sees less GPS’ses, who cares, it’s far better at receiving indoors and you only need 1 GPS to provide you time, 3D lock 3 or more, but receiving 80 sats is nuts, as it will try to send all their info also. I receive about 12 sats all the time, more then enough to get good time.

The Chrony documentation below:

I don’t understand this one. In GPSD SHM (like SOCK) gets its pulse data either from kernel or user-space PPS - you can review the code here:
https://gitlab.com/gpsd/gpsd/-/blob/master/gpsd/ppsthread.c

I’m curious what metric you’re looking at in Chrony - a screenshot would be helpful

Neither do I. SHM or SOCK is primarily about how the data is transferred between gpsd and the time daemon. It has nothing to do with how gpsd obtains its data in the first place, i.e., how accurate the data is (apart from potentially different defaults for precision), how much delay there is in the data, …

Not sure what the impact on timekeeping accuracy is supposed to be. The data transferred consists of a timestamp of when an event was assumed to have happened (e.g., top of the second for a PPS pulse), and of what the actual system time was when the event occured. As long as the system clock isn’t so far off frequency-wise so that that relationship would be significantly different by the time the time daemon processes that data, it doesn’t really matter how fast the transfer is (as long as we are talking about just a few milliseconds only), and the time daemon will be able to figure out the difference between the two and discipline the local clock accordingly.

No, they don’t. The protocol, and transfer mechanism between gpsd and its clients such as gpsmon are quite different from the SHM- and SOCK-based protocols between gpsd and a time daemon. If anything, ntpd with its GPSD-NG client driver would be talking the same protocol towards gpsd as gpsmon does. Which could be leveraged by chronyd via Miroslav’s ntp-refclock, a wrapper for ntpd reference clock drivers. But then again, one of the other mechanisms to get data from gpsd supported by chronyd might be more efficient and less cumbersome to set up.

1 Like

It’s less intensive for the CPU, it means less work.
Google for it.

As I noticed when using the PPS instead of SHM the sourcestats for PPS signaling was jumping a lot.
With SHM it’s far more stable.

No idea why, or sourcestats isn’t correct.

I stand corrected, GPS mon uses TCP.

Af for NTPd, look here: https://www.ntp.org/documentation/drivers/driver28/

There are some tricky points when using the SHM interface to interface with GPSD, because GPSD will use two SHM clocks, one for the serial data stream and one for the PPS information when available. Receivers with a loose/sloppy timing between PPS and serial data can easily cause trouble here because NTPD has no way to join the two data streams and correlate the serial data with the PPS events.

See here about it:

Advantages of Shared Memory IPC

  • Speed: The Shared memory IPC is much faster than other IPC methods like message passing because processes directly read and write to the shared memory location.

  • Low Overhead: It eliminates the overhead associated with the message passing where data has to be copied from the one process to another.

Less overhead means less latency or jitter…as sourcestats shown to me testing the best ‘port’ to use.

I don’t really think there is a meaningful difference that matters. Compared to handling actual NTP packets, the mechanism by which gpsd sends data, I don’t think there is any noticeable performance difference betwen the two, or even any impact worthwhile mentioning. SOCK is also just a construct in memory, it is just handled differently. And the amount of data shared is too small to have an impact.

1 Like

You could be right.
But it’s also sending GPS data all the time.
Is there a way to see how much data is really send?

Because when I run GPSMON it’s going quite fast, meaning it’s also sending a lot of interrupts to the CPU.

To my knowledge the CPU (Intel and AMD, not ARM) can not ignore these IRQ’s.
As such I think every little bit helps to unload the CPU and keep it at a steady pace.

Therefor I also installed the RT-kernel, as my server does more then just running Chrony :zany_face:

I know many of you are running dedicated NTP-servers, I can’t.

BTW, the DrayTek is out and MikroTek is now doing the routing, so far no slowdowns except IPv6 isn’t working, yet.

I faintly remember you insisting on accessing the PPS twice, and this statement suggests that you might also be using such a configuration in the case of accessing PPS directly. When two clients access the same PPS simultaneously, I think that is likely to cause issues, vs. just gpsd accessing it and transferring the data to chronyd, like in the SHM case you refer to.

I don’t think that is really relevant with gpsd. In my understanding, the SHM segment used for the PPS signal will not be PPS only, but it will have the serial data mixed in already as well. So using both, or even combining them, is not necessary. The only reason why one might want to use both memory segments is that when the receiver stops providing PPS, e.g., because its quality criteria aren’t met, gpsd will completely stop the stream that contains it. But the receiver might still output serial data, and depending on circumstances, one might want to use that, rather than having no reference clock at all.

It is accessing it twice.
The kernel does it, creating an PPS device.
Then GPSD take that and also reads the serial port.

DEVICES="/dev/ttyS0"
BAUDRATE=9600

Edit: -Somebody told me the kernel-module is faster-
Runs fine without, so I removed it, as well as pps-ldisc.

See, PPS +1ns via SHM.

Lines that I use in Chrony for my Garmin:

refclock SHM 0 refid GPS offset 0.072 delay 0.2
refclock SHM 1 refid PPS poll 1 precision 1e-9 maxlockage 64 lock GPS prefer

We’ve been over this before: When the serial port carries the PPS signal already, there is no need to add a PPS device to gpsd’s configuration. gpsd will check whether there is a PPS signal on the port, attach the PPS line discipline to it, and use it. The PPS device appearing is a side effect of gpsd attaching the PPS line discipline to the port. I think it might even be inviting potentially more trouble, because for the PPS device to appear in the first place so that gpsd can access it when it starts, the line discipline needs to be attached prior to starting gpsd. So there is a risk of ldattach and gpsd stepping on each other’s toes when effectively trying to do the same thing.

And no, when the system is configured properly, the PPS device is NOT being accessed twice, but either only by chronyd directly (in which case gpsd cannot have the serial port open, and the line discipline needs to be attached manually as gpsd not using the port does not take care of that), or by gpsd only. With the latter option being better in the sense that it grasps both the PPS as well as the serial data and combines them into one for chronyd to consume. (ntpd can also get both from the same port directly, but with chronyd not supporting decoding of the serial stream, gpsd is needed for preprocessing/aggregating PPS and serial stream into one data stream for chronyd to consume.)

And the lock in your second config line is not needed because SHM1 provided by gpsd already combines the PPS signal and the serial data stream into one.

1 Like

Do you know how many guides to PPS are online?

GPSD does create 2 streams SHM, no matter how you do it, as far as I understood Chrony combines them with the lock.
Where NTPd doesn’t.

That is the thing with SHM, GPSD does create 2 streams. It least the GPSD websites states this, so does the NTPd driver sites.

Where do you get that it’s combined into 1 SHM? Or I’m very very wrong…could be.

1 Like

Yes, sorry I didn’t emphasize that enough: That is indeed the case, I never disputed that. That does not change, however, that the SHM carrying the PPS already includes the input from the serial stream, which is available on its own on the other SHM.

That is what would need to be done if there is a separate PPS device that chronyd accesses directly, and which is thus missing the time information. And the serial port is serial data only, no PPS (or PPS as well, but then you don’t really use the PPS on the serial port, chronyd will use the PPS it accesses directly, and will combine with the already PPS-augmented serial stream).

Yes, no dispute whatsoever that that is what gpsd does. But where I think neither daemon’s description are sufficiently clear, or even misleading in my understanding (and those millions of recipes out there, where one copies the unnecessary stuff from the other, ever perpetuating it), is that since gpsd already combines PPS and serial data (aka, time) into one, the time daemon does not need to do that as well, on top of what gpsd already did.

I didn’t got my Garmin to work with PPS without letting the kernel-module decode the DCD pulses and feed them into GPSD.

And I’m not the only one.

Look here…

Many pages say you can let the kernel do it, or some other way.
By just giving the serial port to GPSD I never got PPS.

I had to add the /dev/pps2

Maybe it works that way for U-blox or so. I have been fighting PPS for so long, that I leave it the way it is.
And noticed SHM for both is, for me, the most stable performance on PPS.

Indeed, because that is what kernel PPS is all about: The kernel module “decoding” the DCD pulses from the serial port, and making them available via a PPS device, via a specific API.

gpsd also has user-mode PPS, e.g., when it is not started as root or otherwise cannot access the PPS device created as consequence of attaching the line discipline to the serial port. Or when the PPS development files were not available when gpsd was built (which is a way to force user-mode PPS, e.g., when the pulse is not on DCD, which is the only one that typical serial device kernel modules support, while gpsd also can use others). But that mode has noticeably more lag, unsurprisingly. Thus kernel PPS should be preferred whenever possible.

Yes, again, no dispute whatsoever about that. The question is how one accesses what the kernel does for one.

Obviously don’t know your setup and what you did, but it should pick it up. That said, gpsd is not perfect, and I’ve had issues as well where it didn’t work. But if one really wanted a clean solution, the only way would be to dig into why it did not work for you. Attaching the line discipline separately, and then have gpsd access the resulting PPS device, is just a kludgy work-around. But no dispute that it also works.

Note, e.g., the gpsd configuration used in Lammer Bies’ tutorial, does it specify a PPS device separately?

There is nothing u-Blox- or Garmin-specific about the PPS signal. And I’ve got it to work also with non-u-Blox types of devices.

Sure, never change a running system. But that doesn’t change the fact that that is not how it should actually work, and that people starting with the topic should not yet again copy all that superfluous and complicating and obfuscating stuff that keeps getting perpetuated by getting copied one tutorial/recipe to the next without ever reflecting as to what exactly is being done, and why. And since the kludge actually works, too, why should one ever question whether it, and all the complexity it brings, is actually needed.

Sure, I don’t dispute that. I just don’t think that it is because SHM is a more efficient transfer mechanism than a pipe, and differences in CPU load either of them causes. Though in some way, it could have to do with that difference, but more from a functional point of view, not so much performance.

1 Like

I agree with most of what you write.
Yeah they all say it ‘should’ pickup the PPS.

Maybe it’s better now, but I never got that ‘way’ to work.
By making it use the /dev/pps2 it does.

All I try to do is getting the most stable/precise time-source and yes, I test a lot.
Various GPS, different ways to feed Chrony. Sorry gave up on NTPd after it kept saying I was a false ticker no matter what I did to correct it.

My only interest is, what is the best way to do it? I just present mine…is it the best?
I hope so…but then, I could be sooooooo wronggggggg :grin:

I’m trying to find out if I do it right, or where do I go wrong…as Chrony does tell me it’s ticking fairly well. And all my changes did gave me better results in Chrony.

Am I doing okay? I hope so, and want to spread the word if my way is a good way. If not, show me how and why I’m wrong.

Did you ever recently try it “the other way”? From past discussions, I gathered that you perhaps never did. And when one starts by attaching the line discipline to the serial port manually before gpsd can do it, then maybe that is where it already goes wrong with gpsd picking up the PPS signal from the serial port. Because unfortunately, gpsd is quite picky and sensitive regarding the PPS signal. I.e., the slightest issue, and it bails for good.

Just run gpsd with higher log levels, and see what it says. Or just really try without ldattach and handing gpsd the PPS device the kernel generates in response to the line discipline being attached. Again with higher log levels, to see what gpsd is trying to do, and where maybe things go wrong.

But again, entirely up to you. Never change a running system.

There’s two related, but separate topics now, that I have the feeling start getting mixed: How to access the PPS signal, and how to transfer whatever data gpsd generates to the time daemon.

The above in this current post was only about the former. And I personally don’t think it is “best” because it is overly complicated, and thus is prone to cause more issues than it solves. E.g., multiple programs trying to access the same devices rarely is a good idea.

Regarding the latter, I don’t doubt that you might have gotten different results between using SHM and SOCK, just saying it is likely for reasons other than SHM being more efficient and using less CPU cycles than SOCK.

I’ve explained that in this thread, and elsewhere before. And I am not saying it is “wrong” how you do it, as it obviously works for you, and probably many others. Just saying it is not how it should work, and seems to me like a kludgy work-around for issues that at least I would want to understand a bit better before going that way.

Regarding whether SHM0 and SHM1 need to be combined, or whether just SHM1 is enough, just give it a try by removing the SHM0-related line from your config, as well as anything related to locking the two inputs from the SHM1-related line, and see what happens.

chronyd seems sufficiently smart to even combine a PPS-only input with time from non-reference clock sources (aka, NTP sources). So I think the lock command is mostly when wanting to combine with a specific time source, rather than leaving it to chronyd which other time source for numbering the seconds it combines with the PPS signal.

Are you joking? As SHM1 needs a reference clock, you can not remove it.
As then you rely in the RTC/System-clock to be correct.

You need a reference within 200ms or so, so Chrony knows it’s ticking correct and the Pulse is OK.

GPSD does not combine both signals into 1. Where do you get it does for SHM?

Chrony combines both SHM streams into correct time. GPSD makes 2 streams in SHM, always as far as the docs say. Tell me where GPSD combines them in 1 SHM.

Or sooooo many docs are wrong, including Chrony-docs.