But I am seeing 6-7x more inbound traffic than outbound. Expected it to be more symmetrical than 7:1. Could this be a side effect of kod being enabled and not playing nice with many clients ?
ex. My Server in DE Zone are running realy fine. The server in LT acting like yours.
In the forum is also a thread about Fortigate FW which are hammering like hell because the firmware contains a bug. NTP bursts from FortiGate firewalls
Did some more digging around and it’s clearly not normal.
Here are the stats from other nodes in other countries:
Tokyo
Bytes In
434.60 GiB
Bytes Out
402.42 GiB
Singapore
Bytes In
61.52 GiB
Bytes Out
62.97 GiB
Seoul
Bytes In
476.97 GiB
Bytes Out
444.88 GiB
Now we look at Manila:
Bytes In
1.04 TiB
Bytes Out
115.43 GiB
Server same version same config in all cases so the clients or the network seems to be doing something there. It’s a developing country with many shitty consumer devices that have 5 year old firmware. Maybe this is some issue like what the Dutch operators have observed where some ISP had crap firmware in some devices and they were hammering the pool ? BTW I am alone in that region. There are just my team and Cloudflare. If cloudflare sneeze, I will die. set to minimum speed in NTPpool.org, I get 1-2MB/s sustained.
Edit:
Aha! If I remove kiss of death, there is a semblance of symmetry:
What is going on in ph.pool.ntp.org that maks this happen ? Clearly some craziness with client implementations ?! I have kod enabled in all regions and its only a problem in this country.
Recently I experienced the same situation on my NTP server at Digital Ocean. The indicators showed a high amount of incoming network traffic but a considerably smaller amount of outbound traffic. In my case, the problem was being caused due to the response time of NTP requests, because of a virtual firewall that filtered traffic from my virtual machine. For every 10 requests, about 2 were not answered by the server.
After removing the virtual firewall, the inbound and outbound rates of network traffic became more consistent.
Another likely cause for high inbound traffic and low outbound traffic on an NTP server is the high CPU response time due to host processor exhaustion/congestion (phenomenon known as CPU Steal). If you are running your server on a machine that uses virtual processor cores, this is likely to be the cause.
Even if you are running the server directly on a dedicated machine, it is still possible that the NTP service is not being able to fulfill all requests due to multiple factors, including the UDP buffer of the Linux Kernel.
I recommend doing a stress test on your NTP server using the “NTPTool” software, it simulates multiple requests in a course of time to identify whether your server is being able to respond to all requests.
Thanks @Clock - some good pointers here, too. In this situation it is happening on virtual and bare metal. Only in the PH zone and goes away as soon as I disable KOD. Running wireshark right now with KOD and without KOD to compare what is being talked about in all them packets We shall know more soon.
I suspected the FortiGate bug, but that was incorrect. I received a couple of packet captures from this server. Two main points.
There were five clients with systemd-timesyncd patterns. This caused a lot of pointless NTP requests.
The server runs the NTF (NTP reference implementation) code. When KOD is enabled, responses to the high rate requests are suppressed or stopped. [This is the best option, IMHO] This explains the inbound vs outbound difference. I suspect that rate tracking requires extra CPU load.
Thank you @stevesommars for helping decipher and analyze our traffic.
Disabled KOD for now because that site has the bandwidth to handle the traffic but tracking rate limits for a bunch of buggy clients takes more resources than we can spare. Situation should normalize when more sites come back online. The typhoon there wiped out some sites so right now the effected site is alone with Cloudflare for all of .ph - the added load reveald this issue.