Ratio of inbound traffic vs. outbound traffic in NTP operations

zeroav · December 28, 2021, 1:27pm

Hi all,

When I set up the NTP server I did the amplification attack mitigation stuff:

restrict default limited kod nomodify notrap nopeer noquery
restrict -6 default limited kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict ::

ntpq -c rv returns nothing. Things seem to work.

But I am seeing 6-7x more inbound traffic than outbound. Expected it to be more symmetrical than 7:1. Could this be a side effect of kod being enabled and not playing nice with many clients ?

What ratios of inbound vs. outbound traffic do y’all see ?

apuls · December 28, 2021, 2:14pm

Same behaviour here on different server / zones.

ex. My Server in DE Zone are running realy fine. The server in LT acting like yours.

In the forum is also a thread about Fortigate FW which are hammering like hell because the firmware contains a bug. NTP bursts from FortiGate firewalls

zeroav · December 28, 2021, 2:27pm

This thread makes me want to try not using kod and see if it changes anything. Interesting that you have different results too on different servers.

Unless someone will spoon feed me here… Next stop: wireshark.

Bas · December 28, 2021, 3:36pm

Kick your kids off Netflix

zeroav · December 28, 2021, 4:22pm

Nah that is not it.
The device is in a rack in a datacenter and the traffic shown in that graph is filtered for port 123.

zeroav · December 28, 2021, 5:09pm

Did some more digging around and it’s clearly not normal.

Here are the stats from other nodes in other countries:

Tokyo

Bytes In	434.60 GiB
Bytes Out	402.42 GiB

Singapore

Bytes In	61.52 GiB
Bytes Out	62.97 GiB

Seoul

Bytes In	476.97 GiB
Bytes Out	444.88 GiB

Now we look at Manila:

Bytes In	1.04 TiB
Bytes Out	115.43 GiB

Server same version same config in all cases so the clients or the network seems to be doing something there. It’s a developing country with many shitty consumer devices that have 5 year old firmware. Maybe this is some issue like what the Dutch operators have observed where some ISP had crap firmware in some devices and they were hammering the pool ? BTW I am alone in that region. There are just my team and Cloudflare. If cloudflare sneeze, I will die. set to minimum speed in NTPpool.org, I get 1-2MB/s sustained.

Edit:

Aha! If I remove kiss of death, there is a semblance of symmetry:

What is going on in ph.pool.ntp.org that maks this happen ? Clearly some craziness with client implementations ?! I have kod enabled in all regions and its only a problem in this country.

Clock · December 29, 2021, 2:19am

Recently I experienced the same situation on my NTP server at Digital Ocean. The indicators showed a high amount of incoming network traffic but a considerably smaller amount of outbound traffic. In my case, the problem was being caused due to the response time of NTP requests, because of a virtual firewall that filtered traffic from my virtual machine. For every 10 requests, about 2 were not answered by the server.

After removing the virtual firewall, the inbound and outbound rates of network traffic became more consistent.

Another likely cause for high inbound traffic and low outbound traffic on an NTP server is the high CPU response time due to host processor exhaustion/congestion (phenomenon known as CPU Steal). If you are running your server on a machine that uses virtual processor cores, this is likely to be the cause.

Even if you are running the server directly on a dedicated machine, it is still possible that the NTP service is not being able to fulfill all requests due to multiple factors, including the UDP buffer of the Linux Kernel.

I recommend doing a stress test on your NTP server using the “NTPTool” software, it simulates multiple requests in a course of time to identify whether your server is being able to respond to all requests.

zeroav · December 29, 2021, 2:26am

Thanks @Clock - some good pointers here, too. In this situation it is happening on virtual and bare metal. Only in the PH zone and goes away as soon as I disable KOD. Running wireshark right now with KOD and without KOD to compare what is being talked about in all them packets We shall know more soon.

stevesommars · December 29, 2021, 5:06pm

I suspected the FortiGate bug, but that was incorrect. I received a couple of packet captures from this server. Two main points.

There were five clients with systemd-timesyncd patterns. This caused a lot of pointless NTP requests.

The server runs the NTF (NTP reference implementation) code. When KOD is enabled, responses to the high rate requests are suppressed or stopped. [This is the best option, IMHO] This explains the inbound vs outbound difference. I suspect that rate tracking requires extra CPU load.

zeroav · December 31, 2021, 3:59am

Thank you @stevesommars for helping decipher and analyze our traffic.

Disabled KOD for now because that site has the bandwidth to handle the traffic but tracking rate limits for a bunch of buggy clients takes more resources than we can spare. Situation should normalize when more sites come back online. The typhoon there wiped out some sites so right now the effected site is alone with Cloudflare for all of .ph - the added load reveald this issue.

system · January 30, 2022, 3:59am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Network: i/o timeout Server operators monitoring	33	767	December 20, 2024
Usual traffic intensity	11	953	August 9, 2022
The issue of NTP requests exceeding bandwidth load Server operators	54	1236	November 24, 2024
My NTP server stopped responding to requests Server operators	7	638	July 26, 2021
Secure ntp.conf against DDoS abuse Server operators	10	6510	September 13, 2017

Ratio of inbound traffic vs. outbound traffic in NTP operations

Related topics