Only one monitoring server can connect

fratzr · June 21, 2023, 6:54am

Since about 24h ago only monitoring server ussjc1-1a6a7hp can connect to my ntp servers:
https://www.pool.ntp.org/scores/46.22.24.205
https://www.pool.ntp.org/scores/2a02:1368:6400:cd10::10
All other monitors get i/o timeout.
A wireshark investigation showed, my server still response to incoming queries at about a rate of 600 packets/s. Has anyone a idea, what happens?

NTPman · June 21, 2023, 8:11am

@fratzr , welcome to the community!

When I am monitoring your server with my tool, GitHub - bruncsak/ntpmon: NTP server reachability monitoring , it seems that the drops are periodic:

[root@localhost ~]# tail 1687332715_*.log ; echo
==> 1687332715_2a02:1368:6400:cd10::10.log <==
06-21 07:30    20 @@@.................@@@................@.@@................@@.@................@@@................@.@@..........
06-21 07:32    20 ......@@.@................@@.@................@@.@................@@.@................@@@................@.@@...
06-21 08:00    20 .............@@.@................@@.@........
==> 1687332715_46.22.24.205.log <==
06-21 07:27 19.99 @@@.................@@@.................@@@................@.@@................@@.@................@@@..........
06-21 07:46 -79.3 ......@.@@................@.@@...............@..@@...............@..@@...............@.@.@..............@..@.@..
06-21 07:58 19.99 ............@.@..@..............@.@..@.......
[root@localhost ~]#

It looks like you are having some local problem. It would be great to know what kind of device you are using, I guess Unix with ntpd or chrony. Do you have any ratelimit configured? It is unlikely the problem anyhow, but in a batch we always get exactly three replies. Otherwise most likely it is resource exhaustion on the router or on the time server itself. Have you disabled connection tracking for the NTP traffic?

By the way, I do not think that the monitor ussjc2-1a6a7hp has good reachability to your server. It is rather an artifact that the last active monitor cannot be removed, or rather its score is not updated properly. @ask, what is your take on that?

fratzr · June 21, 2023, 9:26am

Hi NTPman. Thanks for investigating. Yes, I’m using linux with chrony. I
ratelimit is set to interval 4 burst 4

I’m using the settings (including firewall, router, server) for years, without any change.

For testing I commented out “ratelimit” and restarted crony. So far it seems to change nothing.

NTPman · June 21, 2023, 11:29am

Thanks for testing that option. Have you disabled the connection tracking? It is with iptables firewall rules (in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables):

-A PREROUTING -p udp --dport 123 -j CT --notrack
-A OUTPUT -p udp --sport 123 -j CT --notrack

The router may have trouble, too.

I think I was wrong with the assumption that the ussjc2-1a6a7hp monitor has bad results too. It shows nicely variable offsets for each measurement.

fratzr · June 21, 2023, 2:06pm

Yes, just double checked. ipconntrack is switched of (4 times PREROUTING and OUTPUT, for both IPv4 and IPv6).

Maybe it helps to see 3 packet exchanges. From the time between each conversation you can see, we still get and answer about 600 packets/second):

tcpdump -n -i eno1 port 123

16:02:19.723641 IP 62.2.105.242.61296 > 192.168.10.10.ntp: NTPv4, Client, length 48
16:02:19.723718 IP 192.168.10.10.ntp > 62.2.105.242.61296: NTPv4, Server, length 48
16:02:19.723922 IP 14.119.194.69.radclientport > 192.168.10.10.ntp: NTPv4, Client, length 48
16:02:19.723960 IP 192.168.10.10.ntp > 14.119.194.69.radclientport: NTPv4, Server, length 48
16:02:19.723991 IP6 2a01:2a8:a13c:1:6801:e6ff:feec:2722.59125 > 2a02:1368:6400:cd10::10.ntp: NTPv4, Client, length 48
16:02:19.724029 IP6 2a02:1368:6400:cd10::10.ntp > 2a01:2a8:a13c:1:6801:e6ff:feec:2722.59125: NTPv4, Server, length 48

NTPman · June 21, 2023, 4:56pm

Is the packet loss on the inbound or outbound path? Could you do a packet capture for the IP 156.106.202.71, please? There should be one packet every 8 seconds. What kind of router are you using?

fratzr · June 21, 2023, 5:45pm

NTPman you are great. With pointing to this source address I was able to figure out the problem.
I use pfsense as firewall/router an set the “Maximum state entries per host” for UDP port 123 to 4.
This was intended to reduce traffic from bogus hosts. and blocked many of the requests.
I removed this restriction on my router, an now it works. So the problem for me is solved.
Maybe it shroud be mentioned, that the new monitoring system sends queries more often.
Thank you very much for your help.

stevesommars · June 21, 2023, 5:55pm

I have NTP monitors that are not part of the NTP pool monitoring system located in the US (Illinois, Colorado, New York) and London. All of them showed 99%+ response rate from the given IPv4 and IPv6 addresses for several days. A second monitor also located in Illinois showed only ~75% goodput.

Could be a path-dependent loss.

Knot3n · June 22, 2023, 7:54am

Nah, seems like it was the pfSense.

Path is ok from FRA (Pool Monitor)(Germany) towards CH(Austria)
Sadly only over transit.

  1. redacted (redacted)  0%   0.3 ms   [REDACTED] REDACTED, DE
  2. ???   100% *   (No reply)
  3. 100ge0-77.core2.zrh2.he.net (184.105.65.29)   20% 8.8 ms   [AS6939] HURRICANE, US
  4. arcade-solutions-ag.10gigabitethernet10-1.core1.zrh2.he.net (216.66.93.130)   0%   7.1 ms   [AS6939] HURRICANE, US
  5. 10.119.154.5    0%  7.8 ms    BOGON  rfc1918 (Private Space)
  6. smtp.irtech.ch (46.22.24.205)  0%  8.2 ms  [AS51873] AS-ARCADE, CH

system · July 22, 2023, 7:54am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"i/o timeout" from different monitoring stations Server operators monitoring	50	4018	January 27, 2020
Could not check NTP status Server operators	2	431	July 8, 2021
So we DO have a problem with monitoring, right? Server operators	12	1167	May 22, 2019
Monitoring stations timeout to our NTP servers Server operators	103	8295	May 22, 2021
Remove my server from pool Server operators monitoring	2	1110	November 2, 2019

Only one monitoring server can connect

Related topics