Some client really can't behave

I understand your concern. I’d love to see the use the pool page stop suggesting “server”.

If you want a server IP resolved once to never be re-resolved, why not use an IP address?

2 Likes

That’s what I’d indeed use in a “long-term relationship”, i.e., once I picked a specific server. Or if ntpd were to change its behavior without option to opt out or need to explicitly enable such changed behavior.

But sometimes, the selection at the level of a name is good enough, and I want to be able to capture/become aware of potential infrequent updates to the addresses that a name points to that I would not necessarily notice when specifying server IP addresses.

However, main point was that in this forum, the perspective obviously is from a pool server operator point of view. But I am not sure that ntpd (or chrony or NTPSec) are causing the issue that started this thread.

So changing those implementations in an attempt to mitigate such issues is probably not solving the issues.

On the other hand, those implementations are used in contexts other than as pool clients, especially as servers themselves (which, e.g., systemd’s timeyncd or Windows’ SNTP implementation and others are not). So they should cater to those needs as well, including a wider range of options how to tweak their behavior for specific uses cases (other than only being a pool client).

ntpd not hanging onto resolved IPs forever has been a requested change heard many times over at least since 2009, most likely years earlier. I just mentioned it as a counterpoint to the idea that some DNS software is misbehaving.

1 Like

The reason I push back is adding a mechanism to opt to retain existing behavior of never re-resolving “server” associations adds to the complexity of the code, the testing of the changes, and adds documentation requirements and future questions about the new knob and I don’t want to do that if there’s not a compelling reason.

Hi Bas,

@Bas
I must say i’m sorry you experience the problems you describe.
For what its worth, i also have a Fritzbox and two IPv6 servers running (one at 3 Gbit and another at 500 Mbit netspeed). Total number of requests is around 350/sec. This is the typical resulting bandwidth graph:

As you can see (and what was suggested in this thread already), the up and downstream are pretty comparable in bandwidth consumption.

To be clear: these are both IPv6 servers only. I’ve stopped providing IPv4 servers since severe peak load was occasionally clogging up my internet connection (200 Mbit connection) and preventing smooth internet access from clients in my home network. Maybe thats an idea for you to do aswell.

1 Like

My problem is that I have other services running too, that creates a big upload.
However, NPT was set at realtime priority and it was hurting the other services.
It doesn’t look much, but back-ground applications started to fail, like email.

So I set a ratelimit and reduced the priority, that seems to solve the problem.

As for the NTP server config-line, I find it strange that it never does DNS-lookup unless not responding.
In my humble opinion it should do a lookup by default every 2 weeks, regardless if it works or not.
Only IP’s could be without lookups.
And please make this hardcoded and not optional, just to avoid it hitting the same server(s).

If people set the pool as server-line, it thus never lookup an keep polling the same server for a long time.
This may be part of the problem.

Am i correct that you recceive a lot of undesired requests from one IPv4 address?
If no other solution is available and you still want to keep the NTP server you could consider “ippeerlimit” directive, which limits the number of requests from the same IP:

The ippeerlimit directive limits the number of peer requests for each IP to int , where a value of -1 means “unlimited”, the current default. A value of 0 means “none”. There would usually be at most 1 peering request per IP, but if the remote peering requests are behind a proxy there could well be more than 1 per IP.

Yes, it’s a few IPv4 adresses, typical they poll hundreds of times a second from 1 IP.
When I check it’s always the same, a NAT-type network that is routed via 1 IP.

Where most IP’s just poll a few times a day, those poll like 185 times a second, from 1 IP only.

And it’s a couple of those, that put serious load on the server, not the ammount that I would expect.

Dont want to hijack your topic, but my problem is that i get frequent drops in monitoring score because my server cant be reached (i/o timeout). Especially German monitoring servers are giving me problems.

Im trying to figure out what causes this. Things i tried:

  • Replacing Fritzbox
  • Changing MTU
  • Changing DSCP value (tried vairous ones, even EF)
  • Adjusting NTP traffic prority in router

anybody an idea?
Capture

I’m dropping too…this is not our fault.

Something is wrong and it’s not on your or my side.

I don’t think ippeerlimit is the droid you’re looking for. It specifically has to do with automatically-spun peer/client associations, the kind that show up in the ntpq -p peers billboard, such as from peer and pool. The rate-limiting via restrictkod is probably the tool you are looking for to rate-limit client time requests from a given remote IP address.

Make sure your router is not trying to keep state for NTP traffic, as is commonly done to allow UDP “connections” to work. I believe Linux refers to this as conntrack.

1 Like

I haven’t done any systematic testing, but my impression is that at least for IPv4, the Fritz Box simply isn’t designed to handle a high rate of small packets efficiently (with typically the need for DNAT with IPv4, which is not needed for the IPv6 case). Far from saturating the uplink, I started seeing round-trip times and jitter increase noticeably with higher bandwidth settings. And sometimes the Fritz Box wouldn’t even respond on the LAN side anymore at all. While I have no issue on the other hand to saturate the uplink (in both directions) with typical TCP-based traffic (packets much larger on average), with round-trip times and jitter obviously increasing a bit, but the Fritz Box as such was always reachable and functional in those cases.

I think this ties in with the discussion about whether to keep connection tracking enabled for NTP traffic or not that we had in another thread just a few days ago. With the Fritz Box being a consumer device unfortunately not giving one control over that aspect. But at the same time not sure whether disabling connection tracking completely would even work when DNAT functionality is required, as it often is with consumer Internet connectivity.

Sorry, trying to stay on-topic, my phrasing got a more absolute tint than what my view actually is. My point is (sorry for likely repeating once more) that I don’t think that typically well-behaving clients such as ntpd and their behavior of sticking to an IP address once resolved initially (with the server directive) are causing the issue that triggered this thread. Thus I was not sure whether that issue was a good enough reason for making the change you consider in ntpd (as it would likely not solve the problem).

Though now hearing/reading that others are seeing such high residual loads after dropping from the pool has the potential to change my view on that (as my Fritz Box router simply can’t handle higher rates of small NTP packets anyway, I never got in a position to experience such high residual loads myself).

But I fully understand that there might be other drivers to have such functionality. In fact, I sometimes wished myself to have this type of more frequent re-resolving of server names available (at least in cases where a server stops responding).

I hear you and concur. Just highlighting that such a change in default functionality needs to be weighed carefully. E.g., it is not without reason that so-called enterprise Linux distributions typically run such outdated software versions, simply because in some contexts, even such small changes may have large ramifications, so they typically only backport security and essential bugfixes but otherwise keep functionality unchanged. E.g., Debian 10 just offered me yet another iteration of ntpd 4.2.8p12.

But obviously, that point could also be taken to argue the other way: As enterprise-like Linux distributions and others anyhow stick to some eventually outdated version of ntpd for the lifetime of a certain OS major version, it could be acceptable to have default functionality change in ntpd, and let distributions pick up the changed functionality with a future major OS release.

This will also not lowering the traffic instantly, it will take some time.