Collapse of Russia country zone

kkursor · November 21, 2024, 11:52am

@umike blocking ICMP traffic at all would not help?

umike · November 21, 2024, 12:33pm

yes

the server is getting better, but 1.8 millon is too much for me.

Yes, i try limit them by iptables like

drop all from bad_icmp set source
if icmp type 3 arrives add them to bad_icmp set for some time

after 1-2 minutes set contain 700 000 ip’s and grows.

There is one more nuance here: part of the icmp comes from transit routers/wirewalls and looks like

TransitHost->me ICMP host NTPquerier port zzzz unreacheble.

where ICMP source ip TransitHost does not match the NTPquerier ip. I don’t know how this can be processed in the firewall. Any analysis in the user space is too expensive.

I will try increase clientloglimit, but… really I dont see many queries from single ip’s. Therefore, I don’t think that daemon ratelimit or iptables limits will help me much.

timz · November 23, 2024, 6:32pm

Has the article been approved yet?

Samsonov · November 23, 2024, 8:45pm

Where is it? Can you provide a link so we can vote for it? I see nothing relevant in the Sandbox, and zero articles or comments in you profile.

kkursor · November 23, 2024, 8:46pm

No.

Запланировано к публикации 24 ноября 2024 в 11:15

Samsonov · November 23, 2024, 8:48pm

I’ve sent an invite to you so that you can post articles without pre-moderation.

kkursor · November 23, 2024, 8:50pm

Oh, thanks! ^^, The world is so close

n1zyy · November 24, 2024, 2:07am

Years ago it was possible for the admin of a server to request that it be manually added to an underserved zone, but it seems that fell out of favor. Would the pool admins be amenable to allowing admins of servers in Europe to temporarily have their servers put in the Russian zone to try to help stabilize things?

davehart · November 24, 2024, 3:03am

umike:

I researched traffic dumps, soooooooo
Pool settings: 512k
Real incoming traffic to server: 10Mbps - 855 Mbps
Real packets per second (pps) server get: variable 30,000 - 1,800,000, Most of them have different source ip.

I think any home connections and hardware will be overloaded with this traffic/pps. Any hosting and VPS also will kick you with such traffic/pps too.

25 percent of packets is ICMP port xxx unreacheble.packets
internet ->server: ntp request from port xxxxx to 123
server -> internet: ntp reply to port xxxxx
internet - > ICMP port xxxx unreachable
sometimes the icmp comes from the host that requested the time, sometimes from the transit hosts. Looks like ntp traffic are dropped (with icmp) on firewalls or int’s a spoofed source of bad NAT. In one munute I get ~900 000 (nine hundred thousand) uniq ip that send icmp to me. Wow.

I think this is clear evidence, along with the sheer volume of requests, that ru.pool.ntp.org is under sustained DDoS and that is the issue that needs to be resolved.

The fact that you’re seeing so many port unreachable suggests strongly to me the DDoS is happening via source IP spoofing from an ISP that doesn’t implement BCP 38 info (Note that’s a HTTP-only link, no HTTPS available. You can also see the actual IETF document at IETF BCP 38 RFC.

Quite a few ISPs do ensure spoofed source IPs don’t leave their eyeball (consumer) networks, but there are exceptions. It’s possible the spoofing is coming entirely from one or a few machines with a fast connection sending each packet with a different random spoofed IP. It’s also possible the actual sources are compromised devices, but only some fraction of a botnet is going to be connected to spoofing-friendly ISPs.

With a spoofed source IP address, the IP receiving the NTP server’s response didn’t send the NTP request, so it doesn’t have the request’s claimed port open waiting for a response and so responds with port unreachable.

I am curious if you’re seeing only ICMP port unreachables, or both that and ICMP host unreachable. With spoofed source addresses, if they’re generated randomly I’d expect some to not actually have a live machine at that address so a host unreachable from a router in the target ISP (AS, autonmous system) would make sense. They could be careful to pick among a list of known-connected IPs, but I’m guessing they wouldn’t bother.

The bad news is tracking down the actual source of spoofed-source traffic is difficult and sometimes practically impossible. You need the cooperation of each ISP along the path back from a target IP to the source, with each one needing to look for huge flows of NTP mode 3 requests and figure out where it’s entering their system to point back to the next AS/ISP on that path.

Correct.

You can disable the processing overhead of maintaining the MRU list in ntpd by adding disable monitor to ntp.conf and ensuring none of the restrict lines have limited. I don’t think kod alone will enable the MRU list maintenance, but then kod without limited in a restrict entry does nothing, and will produce a warning in the log to that effect.

The default maximum memory for the mrulist is 1 MB. Using the authentication-required ntpq command monstats you can see MRU list stats. On a ntpd with no mru configuration in ntp.conf on x64, I get:

C:\Users\daveh>nq -c "monstats"

enabled:              0x3
addresses:            19
peak addresses:       20
maximum addresses:    11915
reclaim above count:  600
reclaim older than:   64
kilobytes:            2
maximum kilobytes:    1024

C:\Users\daveh>

So you can see each entry on x64 is consuming 1 MB / 11915 or 88 bytes. A gigabyte of memory would allow up to 12.2 million different addresses, however, it would require tuning the “reclaim above” and “reclaim older than” which cause ntpd to not grow the number of entries if the total count is more than 600 or the oldest is more than 64 seconds. That’s done with mru options in ntp.conf, maxdepth and maxage respectively.

The MRU list is maintained as a doubly-linked list indexed by an IP address hash table to minimize the per-packet work. This means the work is localized to the two hash table entries (lists) for the outgoing and incoming IP address plus the back and forward list pointers of the entry being recycled to move it to the most-recent poisition in the MRU list. I’ve successfully configured it to keep at least 200,000 entries without noticable impact to the processing speed on a system handling 1-2 Mbps of NTP traffic. It will be a bit slower triggering more CPU cache misses manipulating various pages of the 17.6 MB of memory a 200,000 entry MRU list occupies.

Incidentally the default mru maxdepth 600 of ntpd is a holdover from the long-ago-removed ntpdc command monlist, as ntpd’s response could only send 600 responses in a blast of packets likely to make it through to a remote without any being dropped. That monlist response functionality was the infamous ntpd traffic amplification that was widely exploited in 2014/2015 before people either updated to a newer ntpd without that functionality, or configured an older ntpd to drop ntpq and ntpdc requests via restrict ... noquery. It’s probably time to increase that default maxdepth to something closer to the number that fits in the default maximum memory of 1 KB, or at least a more generous number like 2000.

[1] I tried to figure out the link syntax to use here that would let me change the link text, no luck yet, my apology. EDIT: Thanks to @n1zyy for pointing out the correct syntax to me in a PM. He also pointed out it’s MarkDown, but I knew that and had searched for how to link in MarkDown. Maybe I got my () and ][ confused, but it might also be Discourse is picky about the order of the text vs. the link, which apparently isn’t always the case in ever-slippery MarkDown universe.

davehart · November 24, 2024, 9:30am

Following up on messages in another thread really about the problems for Russian pool server operators, thanks to @timz and @kkursor for verifying Cloudflare’s anycast servers are working from within Russia.
For reference see my post followed by two responses.

The upshot is while the flood of abusive queries to *.ru.pool.ntp.org is causing pain for most pool server operators in Russia, it’s only degrading service and making the zone utility essentially entirely reliant on Cloudflare. For those relying on that zone to maintain their clocks, it appears Cloudflare’s infrastructure can handle the flood one way or another. They may have tracked it back to a particular AS they peer with and filtered NTP queries from that AS, or they may have some peer-facing firewalling that’s dropping the abusive traffic before it hits their NTP servers. Given providing DDoS-proof web CDN is one of their core businesses, I’m sure they have all sorts of expertise and tools at their disposal to manage the problem.

Operators of pool servers may want to switch to monitoring-only mode as long as this mostly-futile attack continues. Or they may want to reach out to their ISPs to explain the situation and ask for their help back-tracing the flood to its sources.

avij · November 24, 2024, 6:37pm

… or it could be just a bug.

As mentioned earlier, kkursor posted about this on Habr and something interesting showed up in the comments.

With the help of some machine translation:

“On the night of October 24, the number of UDP broadcasts on all nodes at once increased sharply. That is, this is not just one node. […] A lot of sessions on port UDP 123. I took a specific subscriber and found out that Yandex station is requesting NTP servers 6 times every 5 seconds. […] p.s. I checked it on my home “Alice”. Exactly every 5 seconds, 4 NTP requests.”

A followup response:

"The number of Yandex stations sold is 8 million by 2023 and +3.3 million in 2024 = 11.3 million.

Let’s assume that the phenomenon is widespread, each one makes 4 NTP requests every 5 seconds.

This is 720 (3600 / 5) requests per hour, or (11,300,000 * 4 * 720) - 32.544 billion requests per hour or 9,040,000 requests to NTP servers per second."

I would suggest investigating if those Yandex stations are to blame. You may need to contact the abuse address of some friendly ISP to troubleshoot this further, possibly with some tcpdumps of the offending traffic.

Samsonov · November 24, 2024, 7:18pm

At a glance, this didn’t look like a DDoS to me, because:

Heavy traffic load appears only when the monitoring system includes a server in the pool.
Traffic drops to negligible values once the pool no longer includes the server.

In my understanding, this looks like legitimate clients making their first-time requests. If it was a DRDoS, then the traffic would remain indefinitely, once the “attackers” become aware of server existence.

However, inspecting the MRU list gave me some thoughts:

$ ntpq -c 'mrulist sort=-count'
lstint avgint rstr r m v  count rport remote address
==============================================================================
     0      0  3d0 L 3 3  82834   437 171.22.215.174 (RLINE1 = AS35608)
     0      0  bd0 K 3 4  43905 46178 80.76.106.190 (dynip6-190.tdsplus.ru)
     1      0  3d0 L 3 4  41390 39532 80.76.96.53 (TDS+ = AS51547)
     3      0  3d0 L 3 4  40596 52981 80.76.96.43 (TDS+ = AS51547)
     0      0  3d0 L 3 3  38341   294 45.141.93.253 (RLINE1 = AS35608)
     1      0  3d0 L 3 4  30959 40020 80.76.110.197 (dynip10-197.tdsplus.ru)
     3      0  3d0 L 3 4  21990 22305 80.76.96.37 (etra-plus.ru)
     1      0  3d0 L 3 4  21932 50388 80.76.96.35 (dkkonversiya.ru)
     2      0  3d0 L 3 4  17815 42734 80.76.110.195 (dynip10-195.tdsplus.ru)
     2      0  3d0 L 3 4  17723 33731 80.76.96.33 (TDS+ = AS51547)
     5      0  3d0 L 3 4  17013 50980 80.76.96.39 (TDS+ = AS51547)
     1      0  3d0 L 3 3  15484 19523 171.22.213.22 (RLINE1 = AS35608)

First of all, the most frequent addresses are from a small bunch of domestic ISPs. This fact alone does not indicate anything, as many users in Russia are behind NAT and thus sharing same IP addresses. However, the ISPs figured here are not anywhere popular, AFAIK, to generate such an amount of traffic, while none of the really popular ISPs showed up in the logs. This makes me think that some ISPs may be the target of an attack, or may be the source of some IoT devices which went out of control, etc.
Second, many requests “from” those clients have strange source port numbers — neither 123 nor 32768–65535, and sometimes even below 1024. I decided to block such requests on the firewall to decrease the probability of reflection attacks on third-party infrastructure (or at least to halve its intensity if the “source” port is chosen randomly by a spoofer). If those are legitimate legacy systems using ports starting from 1024, then I think it is acceptable “collateral damage” in current desperate circumstances.
PS. Well, after inspecting the firewall logs during “peace time”, I reconsidered and enabled ports 1024–32767 as well.

JFYI. For me, the bottleneck is not the NTPd server itself (although its Atom D2500 is nearly fully loaded when incoming traffic reaches 20 to 50 Mbit/s), but pfSense router based on Celeron G3900 and an Intel NIC which seems to generate a lot of interrupts, so that a single core is almost eaten by handling them.

kkursor · November 24, 2024, 9:23pm

One of big russian hosting-provider has read Habr article, contacted me for assistance and offered 30 free VPS to serve RU-zone.
Maybe we will resurrect soon.

davehart · November 25, 2024, 12:17am

This would be consistent with the attack targeting a hostname in *.ru.pool.ntp.org rather than IP addresses.

It’s not unusual for clients to use any UDP source port. Typically Linux systems would query from 123 or a port above 1024, but any source port is possible.

As far as the flood coming from less-popular ISPs, those might be ISPs which don’t protect against their customers spoofing others’ IP addresses. The unusually high level of ICMP unreachables suggests forged source IP addresses.

kkursor · November 25, 2024, 9:27am

It is possible to set netspeed lower than 512k. Look at GET requests that ‘Manage servers’ page send.
pool.ntp.org: the internet cluster of ntp servers will set 1 kbps. I set 1 kbps and it is easy to handle.

upd: set 30 kbps, it gets higher - 50% cpu, about 70k ppm, ~70 mbps. Set 15 kbps, ~45 mbit/s, comfortable load.

apuls · November 25, 2024, 12:54pm

Maybe due the “huge” amount of new server in the RU Zone you will get less often in the DNS rotation and as result you will get less traffic.

But funny that you can set the speed via GET
grafik

gunnar · November 25, 2024, 2:06pm

And even clients using source UDP/123 will be translated to some other port number if behind a NAT gateway anyway.

ebahapo · November 25, 2024, 3:18pm

In my experience, I suspect that the pool serves as a way to mine server addresses. My test involved adding a server to the pool and observed waves of pokes at several popular ports, such as SSH, SMB, Telnet, etc, besides NTP. If my anecdotal experience applies, the Russian pool would be an obvious resource to mine addresses of servers in Russia.

grifferz · November 25, 2024, 4:37pm

All of my public IPs receive this 24/7 whether they are NTP servers or not, and there are paid for services like Shodan which scan the whole Internet to compile a database of open ports.

Are you sure that you see an increase in this immediately after you add IPs to the NTP pool?

The idea that people are doing DNS queries to gather lists of NTP servers (in a given region?) and then subject them to further scanning seems strange to me when one can for example just download a list of all IP addresses allocated to entities in RU and scan those (or pay someone who has already scanned those).

ebahapo · November 25, 2024, 5:13pm

Yes, I am.

Why not? I would, if I were thusly inclined. It’s a very low hanging fruit, especially to script kids.

Topic		Replies	Views
CN pool collapse a few hours every day Server operators	48	1447	February 17, 2024
31-01-2019: CN pool is about to fail Server operators	8	1343	March 5, 2019
Russia going to "turning the internet off and on again" Server operators	3	809	February 18, 2019
Featurerequest: change zone by server manager Forum Site Feedback	1	923	March 14, 2019
Adding servers to the China zone Server operators	386	25098	June 9, 2022

Collapse of Russia country zone

Related topics