Is your NTP IPv4 yo-yo blue-line? How to fix it. (worked for me)

Hi all,

It took me months and months and incredible amount of reading, testing and replacing hardware.

Finally I fixed my Time-out in the CVS-log…see here:

https://web.beta.grundclock.com/scores/77.109.90.72/log?limit=400&monitor=*

3 monitors, not a single Time-Out, all packages are accounted for.

You can see my monitor here:

https://web.beta.grundclock.com/scores/77.109.90.72

Do not look at the orange dots, they are my fault and don’t matter.
Those are restart and offset mistakes by me, but all UDP-packets are listed in the log and cause the Blue-line to stay on top.

How did I do this? Here are my settings:

1: Change your IPv4 network-settings to use MTU 1488 and not MTU 1500, 1500 will make the package with header about 1512bytes, IPv6 can not handle this size and drops UDP.
TCP doesn’t have this problem and sends an package-size-too-big, but UDP doesn’t have this.
Check MTU with netstat -i

Looks like this:

root@server:~# netstat -i
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
enp2s0 1488 17212099 0 94835 0 18826293 0 0 0 BMRU
lo 65536 22003579 0 0 0 22003579 0 0 0 LRU
root@server:~#

Do NOT change it for IPv6, just IPv4. Under Linux this is enough :

The primary network interface

allow-hotplug enp2s0
iface enp2s0 inet static
address 192.168.1.50
netmask 255.255.255.0
gateway 192.168.1.1
dns-nameserver 192.168.1.1
dns-nameserver 1.1.1.1
dns-nameserver 212.71.0.33
mtu 1488

2: If you have a wobble TSC like my Intel, you need to use Chrony.

Use about 6 to 8 sources or more, as you really need them.

Cheap PPS-GPS is configured something like this:

########### GPS settings to work with GPSD
refclock SHM 0 offset 0.135 delay 0.2 refid GPS
refclock SHM 1 offset 0.0005 refid PPS lock GPS prefer
###########

makestep 1 10 # Don’t step too quickly!
minsources 4 # Make sure at least 4 sources agree on the time before it can change, that will stop making it jump…the green/orange dots.

That is IT!!!

It will make this:

jumps

Turn into this:

Schermafdruk op 2020-02-29 19-03-52

This works for any stratum…just make sure you have enough sources to let it compare.
But the MTU of 1488 will remove all Time-Out.

I have been testing for 8 hours…0% timeout. My machine is very busy and timekeeping is just a side-task.

Without these changes my Blue-line will drop in a few hours.
The monitor is fine, there is no filtering underway…it’s just IPv6 paths that kill our packets because they are too large.
This is one of the big problems with IPv6 when using IPv4 packets.

Please let me know if this works for you…I have been testing for more then 6 months.

Bas.

PS. if the MTU doesn’t solve it, try 1400 as value, you may be too many hops away.

NTP packets are just a few dozen bytes. How would any MTU value cause problems for that?

For TCP it’s typical to need a small MTU value for IPv6, but that’s not even what you did. A quick Google search found someone who needed to do a similar change on your ISP because of their particular network setup (but again, not related to UDP or NTP): https://yeri.be/linux-gatewayrouter-unable-to-access-certain-https-sites

I’m glad your server is performing better now, but the MTU value itself wouldn’t have changed anything.

4 Likes

And we jumped down again at 2 out of 3 monitors.
LA kept humming along.

So I added a new parameter to Chrony:

ratelimit interval 2 burst 16

As somebody told me it comes within 4~5 seconds, the default of 8 seconds in Chrony is too low.
Now it allows 2^2 sec with max 16burst, before (default) is 2^3 and 8 burst.

As usual the documentation is poor and complicated :slight_smile:

Maybe it’s the limiter giving us these problems?

I know UDP is a “send but never receive - joke protocol”.

However the time-outs are never because of bad-timekeeping, that part is solved by these:

maxupdateskew 1000
makestep 1 10
minsources 4

Fingers crossed again, as it’s happening less then before…and I’m using the troubled system.