Joining the pool kills my Internet

It seems it is indeed possible to use nftables for a stateless NAT limited to NTP packets.

I did some tests with iptables and nftables to get a better idea of the NAT and connection tracking impact. I used an older consumer-grade router TP-LINK WDR4900 v1 running OpenWrt 19.07, which was connected between an NTP server and a host running ntpperf to simulate a large number of clients sending requests at a specified rate.

First, I optimized the router configuration a bit:

  • Increased nf_conntrack_buckets to be closer to nf_conntrack_max, which made the hashtable chains shorter and improved the performance with connection tracking by about 10%
  • Disabled the ethernet+WLAN bridge, which made a 12% improvement
  • Switched to a manual firewall configuration and moved the NTP-specific firewall rules to the chain beginnings, which made a 20% improvement

Then I compared the performance in several different iptables and nftables configurations:

  • iptables with stateful forwarding (no NAT) - 16 kpps
  • iptables with stateless forwarding (no NAT) - 36 kpps
  • iptables with stateful NAT - 15.5 kpps
  • nftables with stateful NAT - 16 kpps
  • nftables with stateless NAT - 30 kpps
  • no firewall - 47 kpps

So, it seems switching to nftables can improve the performance by a factor of two and avoid wasting conntrack memory at the same time. It might be worth the hassle if you want to increase your pool speed, but don’t want to switch to a more powerful router. At least on OpenWrt it wasn’t too bad. It provides nft kernel modules in the repository.

Here is the raw table I used if anyone is looking for an example:

table ip raw {
        chain prerouting {
                type filter hook prerouting priority -300;

                iif eth0.2 udp dport 123 ip daddr 192.168.123.1 ip daddr set 192.168.123.99 notrack
                iif eth0.1 udp sport 123 ip saddr 192.168.123.99 notrack
        }

        chain postrouting {
                type filter hook postrouting priority -300;

                oif eth0.2 udp sport 123 ip saddr 192.168.123.99 ip saddr set 192.168.123.1
        }
}
2 Likes

@mlichvar Thanks for doing that testing - that is very useful information, especially given how often people post here with conntrack performance issues.

Did you happen to do any testing over a longer period (with conntrack on) for comparison? I’d be interested to know what the figure looks like for that router once the state table has reached a steady state (either where entries are timing out at the same rate as being added, or where the table is full).

That’s more or less what I was trying to test. The number of simulated clients was 65536, but their source port was random, so there was practically an infinite number of connections to be made. The test ran several times for 5 seconds each. I tried it with the default (for 128MB of RAM) nf_contrack_max of 16384 and also 131072, which is the practical maximum, consuming more than half of the RAM.

For the NTP performance, it doesn’t seem to matter much whether the entries are timing out or they are replaced. What matters is the length of the hash chain which has to be traversed, which in the worst case should be on average the ratio between nf_conntrack_max and nf_conntrack_buckets.

With 16384/2048 (the default) it handled about 13.5 kpps. Changing it to 16384/8192 improved the performance to about 15.5 kpps. With 131072/65536 it was the same.

However, with shorter chains there is a higher chance that the chain doesn’t have another UDP connection that could be replaced by the new one, which causes a drop of the packet with the “nf_conntrack: table full, dropping packet” message.

Ideally the maximum number of connections should be so high and/or the UDP timeouts so short that it doesn’t get full even under the maximum load. For this router that would mean the UDP timeouts need to be shorter than 8 seconds at the maximum number of connections of 131072.

3 Likes

Thanks @mlichvar - very helpful!

Hello, I don’t have any more specific questions on the commercial grade routers. I’ve started reading and it seems they’re just better CPUs with more cores, more speed, and more memory.

the transit routers you mention with no state tracking sounds more like a switch than a router, but I’m sure they’re performing routing functions more than a switch could. I’m learning the ceiling of complexity is high when investigating hardware beyond consumer grade :sob: :sob: :sob:

outstanding @mlichvar , this is quite helpful.

I’m just barely following your test procedures and am pretty sure I could not replicate them. But, you’ve clearly shown stateless forwarding on that router supports about twice as much throughput than stateful forwarding (36kpps vs. 16kpps).

Also, I’ve been wondering about dropping the timeout on udp message tracking below the commonly-seem 30 second timeout. You’ve shown that a much lower timeout (like 8 seconds or less) could be helpful. I can setup 8 or 5 seconds on the udp connection timeout. My consumer-grade Linksys WRT1900AC with dd-wrt only allow 65535 max connections trackable, so the max number of connections would surely be less than your 131072.


Having said that, I’ve migrated to an Ubiquiti Security Gateway (USG-3P) as the primary router connected to the residential cable modem. The WRT1900AC is now a wireless access point and is no longer the bottleneck on wired NTP traffic.

The Ubiquiti USG has Debian 7 Wheezy as the OS and seems to work well but still experiences the same overloading condition when turning up the NTP Pool’s Net Speed setting. The “internet just stops working”, which really means udp DNS traffic stops something simple like nslookup www.google.com. I must log in to the NTP Pool page via cellular network on phone to turn down the Net Speed setting. After a few minutes the connection tracking table clears and “internet works again” because udp connection table is no longer overloaded.


So @mlichvar:
Debian 7 on the Ubiquiti Security Gateway has some nftables support already so I might be able to achieve stateless NAT. the nft command is not available, so it’s not using nftables entirely but has some nf parameters being reported with sysctl -a (similar to here). Debian 7 apparently uses iptables. (how to check)

What steps did you use to achieve stateless NAT for port 123 only? Can you provide pointers or the basic sequence for “nftables with stateless NAT” in your test?

Switching is shunting frames around between interfaces at OSI layer 2. Most commonly this will be Ethernet frames, with the egress interface being whichever interface the destination mac address of each frame was last seen on.

Routing is shunting packets around between interfaces at layer 3. Most commonly this means IP packets, with the egress interface determined by a predefined table of address prefixes - the packet gets sent out whichever interface has the longest matching prefix.

Neither switching or routing operations implicitly require any state information regarding the packets they are making egress decisions about, but some other services which are commonly run on the same devices might (e.g. NAT, to allow interfacing public and private address spaces; QOS; traffic inspection; firewalling; accounting; etc.).

In the case of a home or small business router, you’re generally talking about just one public interface, and 1-2 internal ones. The routing table is very simple, and usually static - anything destined for a local subnet goes there, and anything else left over matches 0.0.0.0/0 and goes out the internet interface. Because you don’t normally have public IP addresses assigned to your internal clients, that device will usually need to do NAT as well - but note that NAT is not routing; it’s an entirely separate function. NAT is, however, required at any point where a packet with a private source or destination IP crosses the boundary to the public internet - otherwise the traffic will simply be dropped.

In the case of a transit router, or a customer router with multiple ISPs, the routing table will often be built via BGP, with the router configured to exchange information with other routers connected to its interfaces about where various prefixes can best be found. Larger businesses will also often use multiple internal routers as well, which will communicate subnet information using any of a variety of dynamic routing protocols.

Depending on which device and protocol you’re talking about, the boundaries can also become a bit blurred - for example MPLS, which doesn’t really fit neatly into either, but provides aspects of both. Some transit providers may use MPLS to carry traffic within their own networks, and only make actual IP routing decisions at the points where they interconnect to other parties.

[Ed: This is obviously grossly simplified, and leaves a lot out - but hopefully helps to clear up the major points of difference that might be confusing you.]

1 Like

@erayd , well I either need NAT to the stratum 1 NTP server on the LAN or need to run chrony as the stratum 1 on the router itself. Chrony is fed a PPS signal from a USB-based GPS so if chrony is served from the router, that device needs a USB port. The Ubiquiti Security Gateway (USG-3P) does not have an external USB port, but an ER-4 does.

The point of moving to a more capable router hardware was to eliminate the slow access to a large connection tracking table, but it seems improved hardware has not solved that problem.

@mlichvar, I’ve migrated to OpenWRT 19.07.7 and gotten as far as testing a prototype nftables config file from the instructions.

But there seems to be considerable uncertainty in successfully migrating to nftables. Experienced users have difficulty and cannot quite get it all working. Some can, though.

My errors look mostly like this but occur even without the flush command. It seems the migration to nftables in OpenWRT is not mature or reliable.

Did you compile from SNAPSHOT or use a precompiled installation firmware?

I used one of the 19.07 images they provide, not sure which version exactly.

I did hit the issue with flushing, but flushing it in a separate command before loading the rules worked for me. I have not tried anything more advanced as QoS, just a minimal configuration to show that the stateless NAT works.

I recently upgraded some routers to 21.02-rc2 and it seems to work nicely for me. I’d suggest to give it a shot if nftables on 19.07 doesn’t work as expected for you. Some major Linux distributions already switched to nftables in the mean time, so I’d expect it to be well tested at this point.

2 Likes