Chrony comparison - Ubuntu vs FreeBSD

Here are some snips of the last couple days of testing at polling interval 10 (1024 s)

Before this I did some preliminary testing, moved GPS receivers between COM ports to see if the behavior was due to the GPS receivers or the COM ports, and I’ve convinced myself the discrepancies are 100% with the COM ports, not the receivers - essentially the COM ports with software timestamping have a precision on microsecond scale and each COM port is different, where the LEA-M8T’s are nanosecond scale precision (as tested on Raspberry Pi 5 with satpulse, chrony, and hardware timestamping to PHC)

GPS1-3 set to noselect, using only GPS4 (COM 4, even with IRQ sharing with COM 3, has much better precision than the others)

As shown in the Error Margin graph, COM1 (red) hangs out around 2us precision, COM2 (purple) sticks around 200 ns and occasionally spikes to over 1 us, COM3 (green) is all over the place between 2 us and 150 ns, and COM4 (yellow) holds steady around 500 ns.

But I need to fix this problem with the offsets caused by the COM ports:


These lines need to be on top of each other, not spaced 10’s of us apart…
Otherwise when chrony decides to change refclocks, the RMS offset will spike

So, I looked at a few areas where the error margins for COMs 2-4 were all low at the same time, and recorded their offsets - and then tweaked the offset directives in the chrony config file.

Since GPS3 was always the lowest offset, I’m adjusting the others down the GPS3 - but its anyone’s guess where the exact second is, best I can do is bring the sources I have into alignment - COM1 is adjusted by 13 us, COMs 2 and 4 between 30-40 us.

Also added precision directives to each refclock to account for the error margin I’m seeing

Resetting the refclocks back to their default polling interval, and removing the noselects, will see how this goes…

refclock SOCK /var/run/chrony.cuau0.sock refid GPS1 offset +0.0000137 precision 2e-6
refclock SOCK /var/run/chrony.cuau1.sock refid GPS2 offset -0.0999666 precision 1e-6
refclock SOCK /var/run/chrony.cuau2.sock refid GPS3 offset -0.2000000 precision 2e-6
refclock SOCK /var/run/chrony.cuau3.sock refid GPS4 offset -0.2999625 precision 0.5e-6

The problem is that it doesn’t matter, the RS-232 chip will trigger an IRQ to the CPU, regardless of your software setting.
So if 2 GPS’ses trigger it, the CPU will stop what it’s doing and run the software that uses this IRQ.

These ports don’t timestamp, they force an IRQ that can’t be ignored. Heck you do not even need to run software at all. Every trigger on the RS-232-pins trigger an IRQ.
That is the problem.

The only way to stop IRQ’s is disconnect hardware to the ports. Whatever you do in software, it does not matter. Every FIFO on the IRQ will be read, no matter if software wants it. The CPU will turn it’s attention to the IRQ and ports that are attached to this IRQ.
This is a hardware matter, not software.

Modern IRQ-routing is a bit better, but it still makes the CPU look at it. You can only solve it by disconnecting the device.

I do not have overlapping interrupts - I don’t think the problem is because of IRQ conflicts

Each receiver is set to 115,200 baud:

And transmits these binary messages totaling 730 bytes once per second:

At 115,200 baud and 10 bits per byte (1 start bit + 8 data bits + 1 stop bit), thats 63 ms per transmission.

The pulse with is configured to 1,000 us (1 ms)
And the receivers are staggered 100,000,000 ns (100 ms) apart

Which should conceptually look like this:

That is only the PPS-pulse, what about the rest of the data?
As it doesn’t matter what pin you trigger on the RS-232, it will send an IRQ as soon as the buffer fills in the UART.
So the pulses may be separated, but how about the other data?

To be sure, you really need to disconnect them or stop them from receiving.

IRQ’s are triggered on the ‘attention-pins’ like CTS/RTS/DCD but also when the RX/TX buffer is filling up and needs attention. As far as my memory goes on UART’s.

Other way is disabling the LPT-ports, just turn them off in the BIOS, then you have IRQ 5 and 7 also free.

So you could set:

Com1 IRQ4
Com2 IRQ3
Com3 IRQ5
Com4 IRQ7

Also met setbaud make sure all of them use the right IRQ, ergo systemd should show you at status:

dec 05 12:41:15 server setserial[627]: /dev/ttyS0 at 0x03f8 (irq = 4) is a 16550A (low_latency)
dec 05 12:41:15 server setserial[633]: /dev/ttyS1 at 0x02f8 (irq = 3) is a 16550A (low_latency)
dec 05 12:41:15 server setserial[627]: /dev/ttyS2 at 0x03f8 (irq = 5) is a 16550A (low_latency)
dec 05 12:41:15 server setserial[633]: /dev/ttyS3 at 0x02f8 (irq = 7) is a 16550A (low_latency)

Above is a sample as I do not have 4 ports in use. The BIOS should allow you to use them once the printer-ports are disabled.

The sentences are transmitted after the pulse

Yes - that’s what I’m trying to do - for some reason I’m not being successful with assigning independent IRQs

Can you access the BIOS of the server?
As you need to set it in the BIOS.

That doesn’t matter.

The IRQ is given at any time the UART needs attention, can be PPS, databuffer-filling-up, can be anything else.
As soon as the UART has something to say, it generates an IRQ.

DCD is original Data-Carrier-Detect, meaning the modem is being called to start transferring data, bit the same as Ring-detect. This means that the CPU MUST pay attention to the UART and stop all other things it’s doing.

This happens on everything the UART(s) is(are) receiving.

When it’s sending sentances, the FIFO-buffer will fill up, that is not much, it will trigger a new IRQ.

As such UART’s are good for realtime-response, but they are bad for multitasking.

But you put 4 GPS on them with 2 IRQ’s…sorry mate…real bad idea, you need to mess with the BIOS or remove (physical) com3/4 from generating IRQ’s.

Or you could try to lower the Com-speed to e.g. 9600, to generate less IRQ’s and hopefully buffers aren’t filled to quickly. Faster isn’t always better in this case.

In my opinion if you can’t reassign the IRQ’s in the BIOS, stick to 2 devices.

Yes - I can access the BIOS, I can change the COM ports from automatic IRQ assignments to manual - but I can only select a range, the BIOS does not let me select specific IRQ’s

The method I’m aware of to do this in FreeBSD (using the following in /boot/device.hints) does not seem to work and I’ve been troubleshooting it for a while…

hint.uart.0.at="isa"
hint.uart.0.port="0x3f8"
hint.uart.0.irq="4"
hint.uart.1.at="isa"
hint.uart.1.port="0x2f8"
hint.uart.1.irq="3"
hint.uart.2.at="isa"
hint.uart.2.port="0x3e8"
hint.uart.2.irq="5"
hint.uart.3.at="isa"
hint.uart.3.port="0x2e8"
hint.uart.3.irq="9"

I’ve got 4 GPS on 3 IRQ’s

I’m not sure I understand how a lower baud rate would generate fewer IRQs

Also - with the spacing I’ve employed, I don’t see how there would be IRQ conflicts

I have updated the BIOS for my board (Supermicro X11SSN-H-WOHS) from rev 2.5 to rev 2.6 - and I think it has helped settle the Last Sample Error Margin in chrony

And I’ve managed to get the offsets (I’m assuming due to the serialized IRQ scheme the SuperIO chip uses to send data to the CPU) from the 4 receivers to line up on top of each other the best I can, so there in minimal difference between the receivers

refclock SOCK /var/run/chrony.cuau0.sock refid GPS1 offset -0.0000040
refclock SOCK /var/run/chrony.cuau1.sock refid GPS2 offset -0.1000030
refclock SOCK /var/run/chrony.cuau2.sock refid GPS3 offset -0.2000075
refclock SOCK /var/run/chrony.cuau3.sock refid GPS4 offset -0.3000000

This is helping to keep RMS offset below 1 us

Here’s some snips of the last 3 hrs



Here’s the individual Last Sample Error Margins…

Interesting that they all seem to settle around 500 ns, with occasional spikes to 3us.
Also interesting that some ports are more well behaved than others.

I think if I can flatten these 4 lines that would help increase precision of the system.

As you can see GPS Com1/2 seem to work fine, same for Com3, but because of Com4 using same IRQ as Com3 it’s spiking a lot.

The BIOS you use can be tricky, I believe you didn’t disable LPT/Parallel-port, then save and reboot.
Then try to go into the BIOS and see if IRQ7 can be used, it should have freed IRQ7 as the printerport is disabled.

As for messages from the GPS, I would reduce them as much as you can and lower the speed, so the messages don’t fill buffers too much and then burst-into your ports/cpu’s.

Not saying that will solve it, but UART’s are not really good at high speeds, they where never designed to do much above 57K6, while they can do it, there is no need to go above 38K4.
Beware, if the GPS or UART errors on transmission, the stream has to be restarted.
May give a reason for the spikes.

Not saying it is. But worth to try.

LPT was mentioned in the SuperIO chip’s datasheet, however the PC does not have a parallel port, so there is no LPT to disable in BIOS or elsewhere

Makes sense, and I’ve seen this recommended elsewhere.
I had been running 11 UBX sentences totaling 730 bytes - this was averaging 82 interrupts per second per COM port.

Interesting on the shared IRQ was only calculating 155 interrupts per second - I would have expected 164 - can’t explain the discrepancy…

So I’ve cut down on the number of sentences from 11 to 2, and total size is down from 730 bytes to 132 bytes, interrupt rate has dropped down significantly - from 82 interrupts per second to 17 - and the math now makes more sense on the shared IRQ.

Running GPSD with the -b flag (so it doesn’t change the settings):

So with this single change - I did have to change the offsets in the chrony config:

refclock SOCK /var/run/chrony.cuau0.sock refid GPS1 offset -0.000014118
refclock SOCK /var/run/chrony.cuau1.sock refid GPS2 offset -0.100004390
refclock SOCK /var/run/chrony.cuau2.sock refid GPS3 offset -0.200025739
refclock SOCK /var/run/chrony.cuau3.sock refid GPS4 offset -0.300002397

but it looks like I now have worse precision - error margins have both increased and are more spikey and RMS offset never gets near, much less below 1us



Here’s the individual Last Sample Error Margins…

Crazy that one of the COM ports on a shared IRQ is actually performing better than the others, and at a lower IRQ rate with fewer/smaller message payload everything seems to have gotten worse - I have no idea how to explain that…

Maybe the NAV messages are the problem.
I would try GPS and Gallieleo only.

On the u-blox forum they complain about NAV-messages a lot.

As for IRQ5, it’s intresting, maybe the CPU scans both ports faster without being extra-interupted.
If that is the case, you should set them all on the same IRQ, but I would try different messages first.

However, I would try without offsets and delays, just the default u-blox but with just GPS-Galileo messages as NAV is about navigation precission, not timing.

See what happens.

UBX-NAV-PVT reports UTC time, its the equivalent to NMEA GGA and RMC sentences

Here is the payload for UBX-NAV-TIMELS

I’m not using IRQ 5, the IRQs in use are below:

Will do

Typo, sorry :smile:

Time reporting doesn’t say anything about accuracy.

We do know that (as far as I know) that GPS and Gallileo are the most exact on time-keeping.

Therefor my suggestion, see what they do compared to NAV-timing.

If the rubbish is the same…we know it’s not the different sats/messages.
If it changes…well it’s the sats/messages.

Simple as that. I typical prefer GPS+Gallileo, I disable all others when possible.

What I’ve seen is different - here is a 24 hour snip of a different machine - Raspberry Pi CM5

Instead of feeding the GPS receiver into a COM port with the pulses on the DCD pin - the sentences are fed into the uart, and the pulses are sent to the SDP driving the PHC on the NIC. Instead of gpsd it is using satpulse to create the SOCK for chrony to read

The same receiver (LEA-M8T) with the same config, same antenna, and same constellations (GPS, Galileo, and GLONASS) achieves RMS offsets less than 200 ns.

That is fast, my RS-232 (DCD) is at 800.
I can do PHC, but to lazy to do it…my NIC’s have it. :rofl:

But it seems you don’t use NAV anymore.

The snip where RMS offset was between 50-200 ns is from a Pi (tock) with the UBX NAV sentences - it was just for comparison - I think a lot of these issues are from the COM ports and not necessarily the sentences used. The 4 port COM machine is tick

I reset the ublox receivers all to defaults, enabled only GPS and Galileo, and GPSD operating with -b flag. No staggered offset user delays, no fixed survey location, default baud rate (9600)

My interrupts:

I did need to re-center the offsets in chrony (COMs 2 and 3 are close, but 1 and 4 are outliers).

refclock SOCK /var/run/chrony.cuau0.sock refid GPS1 offset +0.000036674
refclock SOCK /var/run/chrony.cuau1.sock refid GPS2 offset -0.000000359
refclock SOCK /var/run/chrony.cuau2.sock refid GPS3 offset +0.000000919
refclock SOCK /var/run/chrony.cuau3.sock refid GPS4 offset +0.000016856

Here are the rest of the stats:



RMS offset has lowered into the 1us region

If I can control the spikes in the error margin metrics I think that will help improve the RMS offset, but I don’t know what’s causing them.

For example:

I will run without the -b flag so GPSD will switch from NMEA mode to UBX mode and autoconfigured sentences - see if that changes anything we’re seeing here…

Running gpsd without the -b flag, so instead of reporting default NMEA sentences the receivers feed UBX sentences to GPSD to parse before sending data to the chrony SOCK. Some interesting observations:

Using GPSD’s default UBX instead of Ublox’s default NMEA - interrupt rate has gone down slightly from 70 to 65 interrupts per COM port per second



Doesn’t look like much change in the error margin graphs.

FreeBSD allows changing the FIFO trigger level:

In /boot/device.hints:
       hint.uart.0.disabled="1"
       hint.uart.0.baud="38400"
       hint.uart.0.port="0x3f8"
       hint.uart.0.flags="0x10"

       With flags encoded as:
       0x00010	 device	is potential system console
       0x00080	 use this port for remote kernel debugging
       0x00100	 set RX	FIFO trigger level to ``low'' (NS8250 only)
       0x00200	 set RX	FIFO trigger level to ``medium low'' (NS8250 only)
       0x00400	 set RX	FIFO trigger level to ``medium high'' (default,	NS8250 only)
       0x00800	 set RX	FIFO trigger level to ``high'' (NS8250 only)

I’ll try changing the from the default “medium high” to “high” on the 4 COM ports - whatever that means…

Won’t work as it’s the old 8250 chip, yours is the newer 1655x, has a far bigger buffer.
I would be very surprised if your modern board will have this stone-age chip :smile:

Maybe silly, but you run Chrony over Socks, what happens if you run them over SHM?
Is that possible with FreeBSD?

As then you only need 1 NMEA clock/GPS and you can lock 3/4 PPS’s to it.
As the SHM stream does not carry NMEA and PPS together, they are seperated.

Or use the /dev/ppsx devices that GPSD creates, they only carry PPS not NMEA.
However you need to lock it, else it doesn’t know the average time.

Maybe something worth to test? Maybe the errors are created because of the combined data over socks? Just another thought.

I thought the same thing when I first read the manpage, however I think the note is just to distinguish the 8250 family from the SCC family

Running with the FIFO trigger level changed from “medium high” (default) to “high” did change the interrupt rate from 65 interrupts per second per com port to 38 interrupts per second per com port

Yes - I think I have been able to run with SHM instead of SOCK. I prefer SOCK because I only need to use 1 reflock instead of 2 per receiver, the SOCK combines both the sentences and the pulses, I think with SHM I would need a line for each. I think I have some things I can continue to tweak in SOCK before changing to SHM.

I did some preliminary testing with this idea, to try and run some of the receivers in pulse-only mode. I did this by disabling all the sentences together and only running pulses in the receiver. The problem is I think GPSD interprets the receivers as not having a valid lock, and therefore not sending time over the SOCK, even through the pulse is present. These are the devices I have available, note there is no PPS device:


In FreeBSD, GPSD does not create a PPS device, it is part of the cuau

Agreed, it’s worth at least a test. I anticipate there isn’t much difference between SOCK and SHM, but would be good to verify.

Here are the last 3 hours of using the highest FIFO buffer - Frequency and RMS offset graphs do look much improved (between 100-800 ns):



1 Like

You do not need to pair SHM or /dev/ppsX with the (NMEA) normal time-source.
You can even pair it with a remote server that ticks correctly.

The PPS pulse via pps/shm only needs to know the right time up-to 40ms correct to make it more precise.

As SHM seems to be the fastest method (according to many) it’s the shortest path for PPS to be send to Chrony.

Every CPU-cycle takes time, so to get the most efficient get A->B without wasting CPU-cycles the better.

Ergo, instead of making processing 4x NMEA+PPS, this way you give 1x NMEA+4x PPS to Chrony and lock all PPS to 1 NMEA , where Chrony doesn’t waste time processing all 4x NMEA.

That is the point that I try to make, why process something you do not need? Combining makes you force to process them all. In short, CPU-cycles are lost.