Ratio of inbound traffic vs. outbound traffic in NTP operations

Hi all,

When I set up the NTP server I did the amplification attack mitigation stuff:

restrict default limited kod nomodify notrap nopeer noquery
restrict -6 default limited kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict ::

ntpq -c rv returns nothing. Things seem to work.

But I am seeing 6-7x more inbound traffic than outbound. Expected it to be more symmetrical than 7:1. Could this be a side effect of kod being enabled and not playing nice with many clients ?

What ratios of inbound vs. outbound traffic do y’all see ?

Same behaviour here on different server / zones.

ex. My Server in DE Zone are running realy fine. The server in LT acting like yours.

In the forum is also a thread about Fortigate FW which are hammering like hell because the firmware contains a bug. NTP bursts from FortiGate firewalls

This thread makes me want to try not using kod and see if it changes anything. Interesting that you have different results too on different servers.

Unless someone will spoon feed me here… Next stop: wireshark.

Kick your kids off Netflix :slight_smile:

Nah that is not it.
The device is in a rack in a datacenter and the traffic shown in that graph is filtered for port 123.

Did some more digging around and it’s clearly not normal.

Here are the stats from other nodes in other countries:

Tokyo

Bytes In 434.60 GiB
Bytes Out 402.42 GiB

Singapore

Bytes In 61.52 GiB
Bytes Out 62.97 GiB

Seoul

Bytes In 476.97 GiB
Bytes Out 444.88 GiB

Now we look at Manila:

Bytes In 1.04 TiB
Bytes Out 115.43 GiB

Server same version same config in all cases so the clients or the network seems to be doing something there. It’s a developing country with many shitty consumer devices that have 5 year old firmware. Maybe this is some issue like what the Dutch operators have observed where some ISP had crap firmware in some devices and they were hammering the pool ? BTW I am alone in that region. There are just my team and Cloudflare. If cloudflare sneeze, I will die. set to minimum speed in NTPpool.org, I get 1-2MB/s sustained.

Edit:

Aha! If I remove kiss of death, there is a semblance of symmetry:

What is going on in ph.pool.ntp.org that maks this happen ? Clearly some craziness with client implementations ?! I have kod enabled in all regions and its only a problem in this country.

Recently I experienced the same situation on my NTP server at Digital Ocean. The indicators showed a high amount of incoming network traffic but a considerably smaller amount of outbound traffic. In my case, the problem was being caused due to the response time of NTP requests, because of a virtual firewall that filtered traffic from my virtual machine. For every 10 requests, about 2 were not answered by the server.

After removing the virtual firewall, the inbound and outbound rates of network traffic became more consistent.

Another likely cause for high inbound traffic and low outbound traffic on an NTP server is the high CPU response time due to host processor exhaustion/congestion (phenomenon known as CPU Steal). If you are running your server on a machine that uses virtual processor cores, this is likely to be the cause.

Even if you are running the server directly on a dedicated machine, it is still possible that the NTP service is not being able to fulfill all requests due to multiple factors, including the UDP buffer of the Linux Kernel.

I recommend doing a stress test on your NTP server using the “NTPTool” software, it simulates multiple requests in a course of time to identify whether your server is being able to respond to all requests.

Thanks @Clock - some good pointers here, too. In this situation it is happening on virtual and bare metal. Only in the PH zone and goes away as soon as I disable KOD. Running wireshark right now with KOD and without KOD to compare what is being talked about in all them packets :slight_smile: We shall know more soon.

I suspected the FortiGate bug, but that was incorrect. I received a couple of packet captures from this server. Two main points.

There were five clients with systemd-timesyncd patterns. This caused a lot of pointless NTP requests.

The server runs the NTF (NTP reference implementation) code. When KOD is enabled, responses to the high rate requests are suppressed or stopped. [This is the best option, IMHO] This explains the inbound vs outbound difference. I suspect that rate tracking requires extra CPU load.

1 Like

Thank you @stevesommars for helping decipher and analyze our traffic.

Disabled KOD for now because that site has the bandwidth to handle the traffic but tracking rate limits for a bunch of buggy clients takes more resources than we can spare. Situation should normalize when more sites come back online. The typhoon there wiped out some sites so right now the effected site is alone with Cloudflare for all of .ph - the added load reveald this issue.