Modem/Routers slowdown on heavy traffic

Is it? As I see the next one all the time too.

jan 08 18:23:59 server ntppool-agent[3508]: level=WARN msg="local-check failure" env=prod ip_version=v4 server=ntp.nict.jp ip=133.243.238.244 err="offset too large: 13.160382ms"

It’s there all the time. I understand that correct-local-time must be checked.
But these keep popping up to no use, as they ‘error’ one way or the other.

I would suspect the monitor-server to remove those, and not sending them any longer.

I don’t expect this to be my problem, but it looks strange it’s sending dns-names and not IP’s, when it’s said the pool works with IP’s only.

I would expect the monitor to return the error and if e.g. 10 monitors report this the server in question is removed from local-check.
Or at least send an email to the owner letting them know it’s in error.

I know this. But you can monitor them via the normal pool way, and email them if they are gone, we all get those emails.
But as these are special, I would expect Ask to be mailed as well and if they don’t fix it, take them out.
All monitors catch useless traffic over this, and on top, if gone they serve no purpose.

No, as it’s not really solve my problem.

Still looking…but for the moment it’s ok.

NTP-server is set to monitor-only, and monitor is running…while checking DNS-servers.

I was referring only to the process of how the sanity check NTP servers are used. The other 1000+ pool servers are still checked without using DNS, as explained earlier in this topic and many times elsewhere.

As for ntp.nict.jp, welcome to the world of asymmetric routing. Here are measurements to that server (edit: specifically that IP address 133.243.238.244 as this server has multiple IPs) from a few places around the world:

Poland: -0.004486 seconds
Finland1: 0.008363 seconds
Finland2: 0.010310 seconds
Australia: 0.012873 seconds
Netherlands: -0.004205 seconds
US (Chicago): -0.001034 seconds
Philippines: -0.010562 seconds
Singapore: -0.002144 seconds

So even if that server does not seem to be within ± 10 ms from you, it is a perfectly fine sanity check server for some other monitors. I’d imagine that some sanity check servers in Europe are similarly “off” when queried from Asia.

1 Like

(post deleted by author)

For all know it’s CGNAT causing the problems.

I mean, I have enough STR2 servers, it won’t matter if I let the pool just monitor my STR1.
As it serves time to the others.

All I want to know is, what causes this? If that means I need to keep my home-server out of the pool, so be it. Indirect it serves via all others.

I do believe NAT-tables are the problem, too much requests, either DNS or NTP.
Routers with NAT simply can’t handle it. Best I can describe it for the moment.

I have only 1 IPv4 address, so NAT is needed. I have multiple servers here at home.

I stopped using the MikroTik as it has too much other problems, and I’m not able to solve them, and MikroTik forum doesn’t have a solution.

The DrayTek can do it all too, there I use port-forwarding, but that is still NAT, and it only has 1024 NAT-table-entries.

It should be able to 30K NAT translations, looks to me it can’t when it comes from all directions.

This is not a single-IP rackserver, it has to route an entire network behind it.

As said before, my ‘rack-servers’ have no problems, but they are not NAT’ed.

It’s running for more then 24 hours now, and the problem is the number of NTP-connections.

The monitor works fine and doesn’t slowdown my network.

I have not seen any slowdowns anymore. Oh well, it’s just 1 server less, but it does monitorring, so it does help the pool.

I wil keep my eye on it for another week. But I’m pretty sure it’s caused by massive requests caused by CGNAT where ISP’s probably cache the pool and send all clients to 1 server.

As it’s constantly comming from the major ISP’s here in Belgium, and they hit hard.

Let them hit my other servers :grin:

For the benefit of other readers of this topic, can you clarify what kind of query rates per second you classify as being “hit hard”? Is it still around 20 requests/second?

In any case, as you’ve noticed the problem is the number of sessions and NAT. The best way to fix the session problems is to have no sessions at all. The best way to fix NAT problems is to have no NAT at all (except maybe for your “regular” home computers). This can be achieved by the following setup:

MODEM

  • Set to bridging mode, ie. it just passes packets back and forth from VDSL to Ethernet.
  • No firewall, no NAT, no rate limiting, no DHCP, no internal DNS, no WiFi, no anything
  • The modem does not have a public IP address of its own
  • The modem’s LAN side has a private IP address for management, like 192.168.99.1 (but you’ll rarely if ever need to access the modem after initial setup, maybe for firmware updates)

HOME SERVER / GATEWAY

  • Probably some sort of Linux
  • Two network interfaces: One (WAN) that has your public IP address, the other (LAN) has a private IP address like 192.168.100.1
  • Runs chrony, ntppool-agent, caching DNS server, possibly DHCP server for LAN, possibly a web server if you want to share NTP server stats
  • GPS attached to this server
  • Firewall config that does not track incoming NTP requests (“iptables -t raw -A PREROUTING -p udp --dport 123 -j CT --notrack”, “iptables -t raw -A OUTPUT -p udp --sport 123 -j CT --notrack”)
  • Firewall config that does NAT for traffic coming to/from your LAN, ip_forward setting enabled

SWITCH / WIFI ROUTER

  • Attached to the LAN port of your home server
  • If you need WiFi this device can be a WiFi router with some Ethernet ports for some wired connections. If not, a plain regular switch suffices.

ADVANCED TOPICS

  • This setup allows giving your LAN devices their own IPv6 addresses from your designated IPv6 network.

(edit: Details about WiFi omitted for now, I need to test this kind of WiFi setup myself first)

I have more running then just NTP.

But running the pool-agent only isn’t giving problems.

I have been looking into pfsense, endian firewall, and many others.

Not my cup of tea, far to complex to configure, like MikroTik, what a mess.

The Draytek tells me it’s constant between 200-300 sessions at the moment, doing port 123.

When I enable the ntp-server for the pool again my session-table is constantly filled with 1024 sessions, ergo it’s full.
The thing is, I need NAT, therefor I looked into other router-software to hook Chrony on the WAN-IP, but none of them have easy ways to do it. Most are BSD, not my thing. The Linux ones don’t have a proper package manager to install chrony and gpsd.

Building my own router, no thanks. Not going to happen. Looks to me I need to forget running as Pool-server from home.

I keep running the Agent behind NAT as it has no issues.

I already done a lot of reading on NAT-routers and not many handle loads of NEW-sessions all the time, they do run a lot of sessions, but not massive amounts of NEW sessions.
I would need a CGNAT router for that, but they are expensive, 1000-2000 euro.

Sure DrayTek says 60K sessions, but not NEW ones. Certainly not UDP.
After reading more, if you turn on monitors like flow-monitor, DDOS defences etc, it’s running into troubles fast.

Like I also saw with the Fritzboxes (several), MikroTik and DrayTek.

So in short, when running a ntp-pool-server, don’t do it from home behind NAT.

Don’t worry, I still have 7 servers running.

I simply run too much stuff :rofl:

See what happens when I enter the pool again with my home-ntp-server:

As you can see it spikes bad from time to time.

Ehm. I run my NTP server on a home fiber connection, using a MikroTik router, and the server is behind NAT. No issues at all. I think it’s your crappy DSL line that causes all of this.

Same I run a few servers and a monitor from home behind a Vodafone Cable modem in bridge mode and a Unifi Cloud Gateway Max without any problem apart from the occasional drop of the DOCSIS infrastructure leaving me without connection for a few hours. But that’s not a problem of NTP pool or the monitor.

It is possible to have stateless NAT (DNAT) without session tracking. However, it still adds some minimal packet processing time.

1 Like

I said several times, I run more then NTP-alone,

If it was the DSL-line, it would not give the router problems.

As the load isn’t high enough, it’s the sessions combined with other stuff I run.

Shit happens. The MicroTik I used also ran into troubles.

Is your country underserved like BE? Your server/router will be hit hard,
I doubt it, as you are in Germany..

I think that, as is often the case, whether an NTP server can work behind NAT depends on various factors.

For underserved countries, it will certainly be a challenge, but depending on the hardware and software used, the bandwidth setting, and the zone you are in, there are certainly cases where it works fine.

I myself run an NTP server behind an OpenBSD (NAT) router. With about 2000 req/sec and a few tweaks to the firewall, it works flawlessly.

So it’s a bit simplistic to say: don’t do NTP behind NAT.

@Bas

You think I don’t? I mirror 20 Linux/UNIX distros and my server transfers 2-3 Terabytes per day! Yes, Terabytes! No issues here with NTP, router, modem or anything else.

Here are my mirrors: https://mirror-services.net/

Your mirrors are TCP-traffic, not UDP like NTP is.
UDP is far more difficult for NAT then TCP.

Not the same thing. I have not seen NAT failing on TCP.

As folks study these issues, it may be worth monitoring how many unique IPs are using the NTP server at a given time. Here is recent data from an NTP server located near London.

If the client UDP port matters, then the number of {IP+Port} combinations would be larger.