Only one monitoring server can connect

Since about 24h ago only monitoring server ussjc1-1a6a7hp can connect to my ntp servers:
https://www.pool.ntp.org/scores/46.22.24.205
https://www.pool.ntp.org/scores/2a02:1368:6400:cd10::10
All other monitors get i/o timeout.
A wireshark investigation showed, my server still response to incoming queries at about a rate of 600 packets/s. Has anyone a idea, what happens?

@fratzr , welcome to the community!

When I am monitoring your server with my tool, GitHub - bruncsak/ntpmon: NTP server reachability monitoring , it seems that the drops are periodic:

[root@localhost ~]# tail 1687332715_*.log ; echo
==> 1687332715_2a02:1368:6400:cd10::10.log <==
06-21 07:30    20 @@@.................@@@................@.@@................@@.@................@@@................@.@@..........
06-21 07:32    20 ......@@.@................@@.@................@@.@................@@.@................@@@................@.@@...
06-21 08:00    20 .............@@.@................@@.@........
==> 1687332715_46.22.24.205.log <==
06-21 07:27 19.99 @@@.................@@@.................@@@................@.@@................@@.@................@@@..........
06-21 07:46 -79.3 ......@.@@................@.@@...............@..@@...............@..@@...............@.@.@..............@..@.@..
06-21 07:58 19.99 ............@.@..@..............@.@..@.......
[root@localhost ~]#

It looks like you are having some local problem. It would be great to know what kind of device you are using, I guess Unix with ntpd or chrony. Do you have any ratelimit configured? It is unlikely the problem anyhow, but in a batch we always get exactly three replies. Otherwise most likely it is resource exhaustion on the router or on the time server itself. Have you disabled connection tracking for the NTP traffic?

By the way, I do not think that the monitor ussjc2-1a6a7hp has good reachability to your server. It is rather an artifact that the last active monitor cannot be removed, or rather its score is not updated properly. @ask, what is your take on that?

Hi NTPman. Thanks for investigating. Yes, I’m using linux with chrony. I
ratelimit is set to interval 4 burst 4

I’m using the settings (including firewall, router, server) for years, without any change.

For testing I commented out “ratelimit” and restarted crony. So far it seems to change nothing.

Thanks for testing that option. Have you disabled the connection tracking? It is with iptables firewall rules (in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables):

-A PREROUTING -p udp --dport 123 -j CT --notrack
-A OUTPUT -p udp --sport 123 -j CT --notrack

The router may have trouble, too.

I think I was wrong with the assumption that the ussjc2-1a6a7hp monitor has bad results too. It shows nicely variable offsets for each measurement.

1 Like

Yes, just double checked. ipconntrack is switched of (4 times PREROUTING and OUTPUT, for both IPv4 and IPv6).

Maybe it helps to see 3 packet exchanges. From the time between each conversation you can see, we still get and answer about 600 packets/second):

tcpdump -n -i eno1 port 123

16:02:19.723641 IP 62.2.105.242.61296 > 192.168.10.10.ntp: NTPv4, Client, length 48
16:02:19.723718 IP 192.168.10.10.ntp > 62.2.105.242.61296: NTPv4, Server, length 48
16:02:19.723922 IP 14.119.194.69.radclientport > 192.168.10.10.ntp: NTPv4, Client, length 48
16:02:19.723960 IP 192.168.10.10.ntp > 14.119.194.69.radclientport: NTPv4, Server, length 48
16:02:19.723991 IP6 2a01:2a8:a13c:1:6801:e6ff:feec:2722.59125 > 2a02:1368:6400:cd10::10.ntp: NTPv4, Client, length 48
16:02:19.724029 IP6 2a02:1368:6400:cd10::10.ntp > 2a01:2a8:a13c:1:6801:e6ff:feec:2722.59125: NTPv4, Server, length 48

Is the packet loss on the inbound or outbound path? Could you do a packet capture for the IP 156.106.202.71, please? There should be one packet every 8 seconds. What kind of router are you using?

NTPman you are great. With pointing to this source address I was able to figure out the problem.
I use pfsense as firewall/router an set the “Maximum state entries per host” for UDP port 123 to 4.
This was intended to reduce traffic from bogus hosts. and blocked many of the requests.
I removed this restriction on my router, an now it works. So the problem for me is solved.
Maybe it shroud be mentioned, that the new monitoring system sends queries more often.
Thank you very much for your help.

3 Likes

I have NTP monitors that are not part of the NTP pool monitoring system located in the US (Illinois, Colorado, New York) and London. All of them showed 99%+ response rate from the given IPv4 and IPv6 addresses for several days. A second monitor also located in Illinois showed only ~75% goodput.

Could be a path-dependent loss.

Nah, seems like it was the pfSense.

Path is ok from FRA (Pool Monitor)(Germany) towards CH(Austria)
Sadly only over transit.

  1. redacted (redacted)  0%   0.3 ms   [REDACTED] REDACTED, DE
  2. ???   100% *   (No reply)
  3. 100ge0-77.core2.zrh2.he.net (184.105.65.29)   20% 8.8 ms   [AS6939] HURRICANE, US
  4. arcade-solutions-ag.10gigabitethernet10-1.core1.zrh2.he.net (216.66.93.130)   0%   7.1 ms   [AS6939] HURRICANE, US
  5. 10.119.154.5    0%  7.8 ms    BOGON  rfc1918 (Private Space)
  6. smtp.irtech.ch (46.22.24.205)  0%  8.2 ms  [AS51873] AS-ARCADE, CH

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.