I wrote a small script that fetches the clients from Chrony that send more than X packets in the last hour, and automaticly adds them to the DENY list.
My only doubt is now what would be reasonable value for X? Because some legitimate clients could behind a NAT and I don’t want them to be blocked.
Okay, thanks! The reason why I was in doubt is that I have quite a lot of clients that send me over 10.000 requests. Each hour there are multiple of them who do this, so that banlist will grow quickly. Some go over 100.000, and I even encountered over 1.000.000 twice this week.
Are there so many badly configured clients in the world? I also read about some Fortigate bug which causes this. But still I expected the amount of misbehaving clients a lot lower.
My only doubt is now what would be reasonable value for X? Because some legitimate clients could behind a NAT and I don’t want them to be blocked.
Is 10.000 per hour a good threshold for example?
A better question to ask would be: Is this actually causing your server a problem? If it’s not, why bother blocking them? Like other replies have said - it could be a number of legitimate hosts behind a NAT gateway.
Personally if you are in the pool and it’s a problem you shouldn’t be in the pool. If you join the pool you open your server to ALL to use. Blocking people is really not something you should be doing as a pool operator unless it’s a DDOS affect on your systems.
It is still possible to misuse an NTP server for a reflection attack. (Amplification attack with an amplification factor of equal to 1.) I recommend to rate-limit the outgoing packets.
What is the number of users/households behind a typical CGNAT IP? May it rise the legitiamate QPH above 10.000?
I personally add anything that hits me with more than 1000 reqs/sec to a 24hr blocklist at my network edge (prevents things like the fortinet bursters sending problematic levels of traffic as far as my server, although I do still save the blocked requests for later analysis). They get a 3000 request burst before the policy is applied. Anything slower than that, I don’t worry about.
If reflection abuse becomes a significant problem, then I will revise that limit downwards.
My main rationale behind allowing that much traffic is to address the scenario where e.g. a few thousand shitty IoT clients try to synchronise from behind NAT all at the same time, or if there are few buggy devices that try to sync time in a fast loop. I doubt it happens often, but I’d still like to have my server available to service those requests, and to service the requests of other devices sharing a public IP with something buggy.
ratelimit [option]…
This directive enables response rate limiting for NTP packets. Its purpose is to reduce network traffic with misconfigured or broken NTP clients that are polling the server too frequently. The limits are applied to individual IP addresses. If multiple clients share one IP address (e.g. multiple hosts behind NAT), the sum of their traffic will be limited. If a client that increases its polling rate when it does not receive a reply is detected, its rate limiting will be temporarily suspended to avoid increasing the overall amount of traffic. The maximum number of IP addresses which can be monitored at the same time depends on the memory limit set by the clientloglimit directive.
The ratelimit directive supports a number of options (which can be defined in any order):
interval interval
This option sets the minimum interval between responses. It is defined as a power of 2 in seconds. The default value is 3 (8 seconds). The minimum value is -19 (524288 packets per second) and the maximum value is 12 (one packet per 4096 seconds). Note that with values below -4 the rate limiting is coarse (responses are allowed in bursts, even if the interval between them is shorter than the specified interval).
burst responses
This option sets the maximum number of responses that can be sent in a burst, temporarily exceeding the limit specified by the interval option. This is useful for clients that make rapid measurements on start (e.g. chronyd with the iburst option). The default value is 8. The minimum value is 1 and the maximum value is 255.
leak rate
This option sets the rate at which responses are randomly allowed even if the limits specified by the interval and burst options are exceeded. This is necessary to prevent an attacker who is sending requests with a spoofed source address from completely blocking responses to that address. The leak rate is defined as a power of 1/2 and it is 2 by default, i.e. on average at least every fourth request has a response. The minimum value is 1 and the maximum value is 4.
An example use of the directive is:
ratelimit interval 1 burst 16
This would reduce the response rate for IP addresses sending packets on average more than once per 2 seconds, or sending packets in bursts of more than 16 packets, by up to 75% (with default leak of 2).
It will stop responding to bad clients. And more so if they are massively behind a NAT.
I’m familar with the ratelimit option, but it will limit every client from the start. I only want to limit clients after it became clear that they are abusive. With my script I postpone that decision at least one hour, and no limiting will take place in the mean time.
Another problem with ratelimit is that it will remove the limit if it detects that the other side increases their rate even higher in response to the limiting. It does this because its only goal is to minimize traffic, not to prevent DDoS attacks. While my goal is more of the opposite, I am not trying to minimize on traffic, I just want to block abuse and attacks.
I saw some abusive clients with minus interval(more than 1 packets/sec), so I rewrote your script to ban offender with iptables+ipset instead of chronyc deny.
After running it for a day with 3600req/hour limit, result is rather strange.
comment is timestamp | packets/hour | packet interval
I added timestamp to comment because I was going to remove ban automatically after some time, but it seems that’s not necessary.
I prefer to block abusers at my network edge, rather than allow their traffic to hit my server at all. Chrony cannot achieve that, but firewall rules can. So the firewall approach is what I’ll be keeping.
my first post is visible now, finally.
I guess first post with lots of addresses looked suspicious to bot.
5 more hosts added in two days after previous post then nothing for three days.
out of 57 banned hosts, only 1 is not from amazon ec2.
anyone seeing something like this from aws?
btw, I’m in Asia/KR zone with only 384Kbit net speed.
so maybe what I see is only small part of abusive clients in this zone.
I do something like that with iptables since some time. It doesn’t need a script and already blocks the traffic at the edge of the network layer I can control.
-A INPUT -m state --state NEW -m udp -p udp --dport 123 -m hashlimit --hashlimit-upto 200/second --hashlimit-burst 225 --hashlimit-mode srcip --hashlimit-srcmask 32 --hashlimit-name ntp -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 123 -m hashlimit --hashlimit-above 200/second --hashlimit-burst 225 --hashlimit-mode srcip --hashlimit-srcmask 32 --hashlimit-name ntp -j DROP
Some marginal amount of traffic gets blocked. But I guess those are all misconifgured clients or even attacks, since even a shared network behind a single NAT address shouldn’t have problems because of the DNS Round Robin of the pool.