Sawtooth graph - every time score is over 10 the next check times out

janderson · October 17, 2020, 3:31pm

New admin here - I’ve tried two different servers, an RPI Stratum 2 and a Galleon GPS stratum 1. Every time my score goes over 10 the next check will timeout - been happening since I started trying to participate- several days now. Anyone seen this before ? I’m on an ATT gigafiber link with an IP that hasn’t changed in 2 years and a Cisco 5515-X firewall.

apuls · October 17, 2020, 5:29pm

Hi and welcome

what “Ned Speed” setting did you setup on the pool management site for your server ?Try to set it to the lowest value (384kb) wait at observer if it will be better.

You can also test the route to the monitoring server:

IPv6
mtr -u -P 123 2604:1380:2:6000::15
IPv4
mtr -u -P 123 139.178.64.42

And there is the Beta Site with additonal montiros.

You can register / add your server too - it’s only for monitoring.

janderson · October 17, 2020, 11:34pm

I had it set to 100mb, I’ve reduced to 384 - will report back- thanks !

stevesommars · October 18, 2020, 2:34pm

In addition to reachability problems this server currently (2020-10-18 14:20) has a time offset of ~100 msec, which is not normal for an NTP server synchronized to GPS+PPS. If this server lacks a PPS it will be difficult to provide accurate time.

janderson · October 19, 2020, 12:19am

The only reason its even that close is because I have “fudge 127.127.28.1 time1 -0.300” set on it, I just set it to -0.400

I’m trying to determine how/if its using the PPS reference - when I open the case of it up there’s a custom board with the GPS antenna plugged into it and it’s got a PPS LED blinking at the rate one would expect. That board is connected to the MB via 2 serial ports.

elljay · October 19, 2020, 11:35am

ntpq -p will show you which sources are used etc “*” is the active one, “+” are possibles and “-” are discounted.

Something like gpsmon will give you a display of the status of the GPS. There should be a “PPS” entry showing the current offset from the PPS signal if all is ok.

janderson · October 20, 2020, 1:56pm

I’ve replaced the Galleon GPS device with an EndRun TempusLX (CDMA) - it’s having trouble picking up the CDMA-PCS signal in my area but I have it configured with some Stratum1 servers and it’s holding time at Stratum 2 pretty stable for the last 18 hours. I believe it has an internal oscillator that’s contributing to this. I’m going to adjust the antenna placement today to see if I can pick up the signal and move to Stratum 1. I also adjusted the Net Speed up to 1mb and am not seeing the original (sawtooth) issue yet. I’ll keep adjusting up until I find the breakpoint.

stevesommars · October 20, 2020, 4:37pm

The server time stability and accuracy are better now that its at stratum 2.

I’m very interested in the Net Speed experiments.

In the US CDMA service is being phased out by the major cellular carriers. Several NTP Pool volunteers have seen coverage deteriorate in the last year. See https://usatcorp.com/verizon-extends-cdma-network-date/

thom · October 21, 2020, 2:51pm

I am kind of experiencing the same. My NTP server is behind a Cisco ASA 5506 on a 50 mbit glassfiber connection. The graphs became much worse after upgrading Ubuntu 18.04 LTS to Ubuntu 20.04 LTS

I ran the ntpperf test on the local network and the server itself can handle 55k request+ per second without dropping a request.

mlichvar · October 21, 2020, 4:28pm

Is connection tracking disabled for NTP on that firewall? If not, that could cause the server to be kicked out as soon as it gets a larger number of clients.

thom · October 22, 2020, 9:15am

Connection tracking is disabled for ntp traffic

NTPman · October 22, 2020, 9:58am

Is the connection tracking disabled not only in the ASA router, but in the server as well?

thom · October 22, 2020, 10:38am

The conntrack package has not been installed. So that should not be the problem.

mlichvar · October 22, 2020, 11:04am

A conntrack package might be just some extra tools, not required for the kernel support. Try

wc -l /proc/net/nf_conntrack

to make sure the number of connections is small.

thom · October 22, 2020, 7:05pm

$ sudo wc -l /proc/net/nf_conntrack
wc: /proc/net/nf_conntrack: No such file or directory

stevesommars · October 22, 2020, 9:53pm

FYI The poor reachability began 2020-09-29.

Try running
mtr --udp -P 123 139.178.64.42
from the NTP server when the Newark score is low. If the problem is NTP loss in the network, this command may show where it is happening.

thom · October 23, 2020, 3:59am

Never used the mtr command. What is - for me- weird that the route changes with 3 minutes apart. Is this normal?

$ sudo mtr -w -c100 --udp -P 123 139.178.64.42
Start: 2020-10-23T03:03:03+0000
HOST: ntp Loss% Snt Last Avg Best Wrst StDev
1.|-- 139.156.151.67 0.0% 100 3.7 4.8 2.4 121.7 12.4
2.|-- ??? 100.0 100 0.0 0.0 0.0 0.0 0.0
3.|-- asd2-rou-1022.NL.eurorings.net 77.0% 100 4.8 5.1 4.6 9.6 1.0
4.|-- asd-s7-rou-1042.NL.eurorings.net 28.0% 100 5.5 6.0 4.3 8.8 0.8
5.|-- asd-s17-rou-1041.NL.eurorings.net 50.0% 100 5.0 9.1 5.0 85.3 15.7
6.|-- 213.46.191.62 79.0% 100 4.6 8.9 4.2 86.8 18.0
7.|-- ae11.cs3.ams10.nl.zip.zayo.com 98.0% 100 83.4 83.7 83.4 84.0 0.4
8.|-- ae5.cs3.lga5.us.eth.zayo.com 96.0% 100 85.2 84.2 83.3 85.2 0.8
9.|-- ae15.er1.lga5.us.zip.zayo.com 71.0% 100 89.0 83.9 83.2 89.8 1.6
10.|-- ae15.er1.lga5.us.zip.zayo.com 55.0% 100 84.0 89.5 83.8 147.7 11.5
11.|-- 0.et-0-0-1.bsr2.ewr1.packet.net 12.0% 100 91.0 91.4 82.7 170.4 13.3
12.|-- 64.125.54.26.available.above.net 31.0% 100 94.3 94.6 89.7 204.8 15.8
13.|-- 901.et-0-0-7.bsr1.ewr1.packet.net 33.0% 100 93.4 101.6 92.9 187.4 16.5
14.|-- 147.75.98.105 73.0% 100 110.2 112.3 99.0 150.3 8.7
15.|-- 147.75.98.105 58.0% 100 107.2 112.3 101.1 202.8 17.9
16.|-- ??? 100.0 100 0.0 0.0 0.0 0.0 0.0

$ sudo mtr -w -c100 --udp -P 123 139.178.64.42
Start: 2020-10-23T03:05:53+0000
HOST: ntp Loss% Snt Last Avg Best Wrst StDev
1.|-- 139.156.151.67 0.0% 100 2.6 2.9 2.4 8.5 0.9
2.|-- ??? 100.0 100 0.0 0.0 0.0 0.0 0.0
3.|-- asd2-rou-1022.NL.eurorings.net 79.0% 100 4.7 5.0 4.7 6.9 0.6
4.|-- asd-s7-rou-1042.NL.eurorings.net 31.0% 100 6.4 6.3 4.8 17.8 1.8
5.|-- nl-ams04a-ri3-ae-8-0.aorta.net 54.0% 100 5.4 6.1 5.0 15.7 2.3
6.|-- 213.46.191.62 83.0% 100 4.3 14.8 4.2 87.8 26.8
7.|-- ae11.cs3.ams10.nl.zip.zayo.com 93.0% 100 84.4 84.7 83.1 88.4 1.8
8.|-- ae10.cs1.lhr15.uk.eth.zayo.com 99.0% 100 83.0 83.0 83.0 83.0 0.0
9.|-- ae15.er1.lga5.us.zip.zayo.com 80.0% 100 83.4 83.9 83.2 90.6 1.6
10.|-- ae15.er1.lga5.us.zip.zayo.com 60.0% 100 83.9 86.3 83.8 92.1 3.2
11.|-- 0.et-0-0-1.bsr2.ewr1.packet.net 14.0% 100 93.6 89.4 82.6 143.6 9.5
12.|-- 64.125.54.26.available.above.net 38.0% 100 89.8 95.3 89.7 183.0 16.4
13.|-- 901.et-0-0-7.bsr1.ewr1.packet.net 29.0% 100 116.3 101.9 92.9 170.8 14.9
14.|-- 147.75.98.105 67.0% 100 113.2 110.2 104.7 115.6 3.4
15.|-- 147.75.98.105 61.0% 100 111.4 109.2 98.7 122.2 4.1
16.|-- ??? 100.0 100 0.0 0.0 0.0 0.0 0.0

stevesommars · October 23, 2020, 3:32pm

The varying path is normal. The traceroute and mtr commands use varying UDP source and destination ports (these can be configured).

A particular router may have multiple egress ports available for a particular IP destination. [The multiple paths often exist for capacity reasons.] The router often uses
IP src, IPdst, UDP/TCP srcport, UDP/TCP dstport
to determine which egress port to use. Using multiple egress ports for one application flow may create missequencing that causes problems.

MartianRabbit · October 23, 2020, 8:49pm

Hi Janderson,

The time1 offset you setup on the original Osc, needs to be tuned better. take the output of the Osc stats and average 24 hours worth of data together. Take this value and likely subtract it from what you have currently set. Your current offset on your graphs are way too much in the positive direction.

If you can get the variance to swing around zero “0” the Daemon will work better. This of course can depend on the Osc, but in most cases, you can get to average around swinging negative and positive

Serial adaptors can be very unstable depending on the manufacturer and I use Sabrent Serial adaptors. The time1 offset is meant to remove the delay of the physical interface but needs to be done in small increments of microseconds not 10th’s of seconds.

Not sure if this scenario will work with your setup, but is something to consider.

Topic		Replies	Views
Score won't stay above 10 Server operators monitoring	8	573	August 19, 2023
Monitoring station seems to hate my server all of a sudden Server operators	35	4085	April 8, 2019
Monitoring station routing problems Server operators	10	1007	July 13, 2019
Server score still flapping severely due to timeouts from Newark Server operators	6	677	March 3, 2021
Time server pool problems since mid February Server operators monitoring	18	4446	January 29, 2018

Sawtooth graph - every time score is over 10 the next check times out

Related topics