I went through this path quite a few years ago until I figured out that not tracking the connections at all is the ultimate solution.
However, yes, decreasing that value will probably ease the pain somewhat. As for NTP I’d say something like 3 seconds would suffice, but bear in mind that this timeout is not only for NTP, it’s for all UDP protocols. In particular, DNS is primarily UDP so you may want to use a timeout value that is long enough for all DNS queries.
YMMV, but:
$ time dig @192.0.2.33example.com
;; communications error to 192.0.2.33#53: timed out
;; communications error to 192.0.2.33#53: timed out
;; communications error to 192.0.2.33#53: timed out
; <<>> DiG 9.18.33 <<>> @192.0.2.33example.com
; (1 server found)
;; global options: +cmd
;; no servers could be reached
real 0m15.047s
so in this case I might pick something like 16 seconds. If you have or use other services that are primarily UDP you may want to take those into account as well.
Well I had this error also a lot of times because of tables filling up, unable to reach DNS-servers on the net. Somewhere above in the topic. DNSmasq solved this partly by caching.
And yes, I’m aware it impacts other stuff like Voip as well.
On the MikroTik forum they speak about 15 seconds, some even 8.
I’m testing at 30s now, and I have set the pool to 1.5Mbit, let’s see how it goes.
The example was for a non-existent DNS server. 192.0.2.x is reserved for examples and I wanted to test how long it took for the DNS client to recognize that the server isn’t going to reply.
As a side remark – if you want to reduce the number of DNS queries going through your router, use only one DNSmasq instance to make the DNS cache hit ratio better (unlikely to really matter, but hey, anything that helps). I think you mentioned earlier that you had two DNSmasq instances running.
Some statements for the nitpickers:
I said earlier that the “best way to fix NAT problems is to have no NAT at all”. There may be other ways to “fix” NAT including shortening the timeouts, but I would not call that the best solution. NAT is fine if you can make it work. If not, get rid of NAT.
Stateless NAT (if the router supports it) would work fine for NTP server purposes, but ntppool-agent (monitor software) does require keeping some state because it uses random high source ports. The router might have room for these sessions, though.
I also had contact with DrayTek support and they confirm that their safe UDP setting of 180sec is giving problems with users like me that have many UDP connections.
But for most ‘normal’ users it’s a good setting.
Hopefully they document it better in the manual.
I hope this is the solution, we will soon know, typical in a few days problems start, often within hours. So far, no issues.
While talking to DrayTek, you should ask them if they can’t add an option to disable keeping state of port forwarding (reverse NAT) sessions. There is no reason to keep state. All the information to rewrite the packets in both directions are already in the information you fill in to set up port forwarding.
Don’t think they are going to change the firmware. As I asked an option to give timeout in the GUI, they replied it’s not going to change. But I can alter it in the CLI. No biggy.
In the Fritzbox you can even alter it at all.
As for the DNS queries, according to your server status page the script does not use the -n option. Maybe set CHRONY_ALLOW_DNS_LOOKUP=“no” in the script.
Generally speaking, I’d argue that it’d be better if the default was “no” and the script user would need to explicitly set it to “yes” if needed.
Why not? The time limit is not the maximum length of the session, but the time the session is idle before it gets cleared. The VoIP connection state will stay active in the router as long as there’s a call going on. I suggested 16 seconds earlier (due to DNS timeouts). Maybe try that value and see if everything still works.
Sorry for the late answer, seems like I have botched the notification settings on my end.
Is Germany underserved? Certainly not. But my two instances of ntpd-rs and chrony combined get about 300 queries per second with a peak up to 500 qps every 15 minutes (SNTP clients I guess). My router doesn’t break a major sweat though, even with flow logging and Surricata IDS enabled. Sure I have also lowered the UDP timeout to 10 seconds since most clients exchange just 2 packets (one query, one reply).
A pool monitor is also running, every server has one IPv6 in the pool and the incoming IPv4 address is NATed to the ntpd-rs instance currently.
I guess you just had bad luck with the choice of routers you tried. I bought the Ubiquity one for the wife acceptance factor, but it’s charming that it is nice looking hardware running Debian inside with a ssh shell if you want. But I did not make any changes in the shell level to host ntpd at home, the UI had all the knobs needed.
Yeah, but just because the UI doesn’t expose the notrack option for the DNAT rule. So sure I lowered the timeout for UDP globally. But there are enough entries still in the connection tracking table of the kernel to kill your devices I guess.
root@Cloud-Gateway-Max:~# conntrack -L | grep "dport=123" | wc -l conntrack v1.4.6 (conntrack-tools): 3754 flow entries have been shown. 3315
So of all tracked connections currently (I also host my webserver, ssh and VPN) there are over 3k incoming UDP connections open with destination port 123 and a UDP timeout of 10 seconds. And there’s still memory free in the router for plenty more. nf_conntrack_max is 131072 by default on this device
I played around with setting the iptable rule myself with the notrack option on the shell, but for my current volume it wasn’t worth it. The UI deletes all rules and populates them from the database on change, so handwritten rules get wiped frequently and I saw no real benefit on lowering CPU or memory usage of the router.
Yeah, I didn’t look into it further after a major firmware/software bump. Earlier versions had a well documented way of incorporating user changes, but Ubiquity started to dumb the device down a bit
Maybe there is another way now to save user changes, but I am afraid that it breaks a year from now. It’s not only a router, it also manages the three WiFi access points, switches and cameras at home and also has an SSD and plays the role of network video recorder. I got old enough to stop playing around with such gear, I appreciate that everything “just works”
I don’t want to sweat during updates right now and check if everything from my internet access, WiFi, internal VLANs, VPN connections, home security and my hosted services are still working after a firmware update So like you lower UDP timeouts are the easiest option right now, but with 3 GB of RAM and I think a 4 core 1.5 GHz CPU that slim white box does everything I need it to, including hosting two pool servers and a monitor at home without slowing down, killing DNS and still serving a few hundred NTP clients per second (500 Mbps setting in the pool, Germany)