Monitors and packet loss

Maurice · February 10, 2025, 9:23pm

At this moment, I have three systems in the pool. One system with IPv4 and IPv6 and two systems with IPv6 only. All systems are connected to the same router.

The scores for the IPv6 servers are good: the overall score is a solid 20 and also nearly all individual scores are 20. Maybe once or twice a day a single lost packet, mostly for monitors far away.

The IPv4 score is, let’s say, more dynamic. There seems to be quite some packet loss. Most of the time the overall score is still at 20, but sometimes it drops to lower levels. And sometimes (shortly) even below 10.
And the strangest thing is: the monitors that are closest (both geographically and in terms of network latency), have the worst score (4.2 and 5.9 at the time of writing). Most other servers are at 20, but some distant monitors are just below 20.

I don’t experience any packet loss for other protocols. I do see a spike in NTP traffic around the top of the hour, but in the CVS-log I see that the time outs are scattered around the clock. I don’t see a correlation with NTP traffic or other traffic through my router.
I get approximately 2k NTP requests per second (in total). I tried to lower the bandwith setting to reduce the NTP requests, but this doesn’t seem to help with the scores of the nearby monitors.

Should I consider this as normal and just ignore it?

Kets_One · February 11, 2025, 6:24am

Hi, what is your current bandwidth setting and how much did you decrease it?
Also, after lowering the setting it takes time for the traffic flow to subside.

Maurice · February 11, 2025, 9:02am

At this moment, the IPv4 server is at 2 Gbps and the IPv6 servers are at 3 Gbps.
I tried as low as 100 Mbps. But even after several days, there is still some packet loss.
I’ll try again and report back.

MagicNTP · February 11, 2025, 9:26am

What country zone(s) are your servers in?

Acknowledging that you did not yet indicate which server(s) we are talking about (i.e., likely you didn’t for a reason), in case that wasn’t on purpose, it would help if you could indicate which they are.

Maurice · February 11, 2025, 9:29am

Sorry, these are my servers:
https://www.ntppool.org/a/Maurice

They are in NL, Europe and global zone.

gunnar · February 11, 2025, 9:45am

My guess is on your router.
For IPv4 I guess you are using port forwarding/DNAT to an private IP on your network?
Then most likely the NAT table in your router overflows because your router tries to keep the connection state in memory. For NTP over UDP you can try to disable stateful forwarding because it is not needed.
IPv6 does not have this problem because your machines have their own public IP and you most likely have setup just some firewall rules which don’t need to keep state and don’t tax the CPU and memory of the router much.

Maurice · February 11, 2025, 10:37am

This was also my first guess (and I did hit the limits at first), but after increasing the maximum number of states and decreasing the time that a state is kept in memory I can see that the number of states is far below the limit.
At this moment (with the lowered bandwidth setting) there are around 12k states, while the limit is at 500k. With the previous higher bandwidth setting, the number of states was around 50k IIRC. Still quite low compared to the maximum.

And yes, the IPv4 server is on a private IPv4 address on my network. Because of NAT I cannot disable the creation of states (state is needed for the translation of packets, at least on OpenBSD).

But even in case I am overlooking something at the router (which is very possible), why does the packet loss occur mainly at the two nlams monitors closest to me?

mlichvar · February 11, 2025, 11:12am

I guess it might be bad ISP peering on the local path to the monitor. My server in the nl zone is keeping perfect score to all european monitors.

In a quick test to your server I don’t see any packet loss.

avij · February 11, 2025, 2:20pm

Seeing that a few monitors have problems reaching your server and the other monitors give you a perfect 20 score, I find it unlikely that the problem would be in your server or your router. As mlichvar wrote above, it’s probably some sort of a routing/peering issue somewhere along the path. I would not be too concerned about this, as long as the overall score stays generally above 10.

MagicNTP · February 11, 2025, 5:31pm

Strictly speaking, NAT would not need to be stateful to effect port forwarding. But indeed, it depends on the implementation as to whether stateless port forwarding is available.

MagicNTP · February 11, 2025, 8:24pm

The two monitors in the Netherlands have consecutive IP addresses, which might explain why both give bad scores for you, vs. a hypothetical, more diverse deployment within the Netherlands.

Maurice · February 11, 2025, 9:33pm

Thanks for all the answers and your insights.

After about half a day on the lower bandwidth, I see that the monitors’ packet loss shows no improvement, while I do get significantly fewer NTP requests.
So I also think it has to do with routing/peering between the NL monitors and my ISP.
I will leave it as it is for now and will turn up the bandwidth again soon.

Topic		Replies	Views
List of Monitoring IPs? Server operators	18	194	January 2, 2025
Timeouts from San Jose Server operators	7	612	November 10, 2021
Monitoring stations timeout to our NTP servers Server operators	103	8301	May 22, 2021
Server score keeps dropping Server operators monitoring	14	2117	April 19, 2019
Suggestions for monitors, as Newark fails a lot and the scores are dropped too quickly Server operators monitoring	91	4061	August 2, 2021

Monitors and packet loss

Related topics