Collapse of Russia country zone

@umike blocking ICMP traffic at all would not help?

yes

the server is getting better, but 1.8 millon is too much for me.

Yes, i try limit them by iptables like

  1. drop all from bad_icmp set source
  2. if icmp type 3 arrives add them to bad_icmp set for some time

after 1-2 minutes set contain 700 000 ipā€™s and grows.

There is one more nuance here: part of the icmp comes from transit routers/wirewalls and looks like

TransitHost->me ICMP host NTPquerier port zzzz unreacheble.

where ICMP source ip TransitHost does not match the NTPquerier ip. I donā€™t know how this can be processed in the firewall. Any analysis in the user space is too expensive.

I will try increase clientloglimit, butā€¦ really I dont see many queries from single ipā€™s. Therefore, I donā€™t think that daemon ratelimit or iptables limits will help me much.

1 Like

Has the article been approved yet?

Where is it? Can you provide a link so we can vote for it? I see nothing relevant in the Sandbox, and zero articles or comments in you profile.

No.

Š—Š°ŠæŠ»Š°Š½ŠøрŠ¾Š²Š°Š½Š¾ Šŗ ŠæуŠ±Š»ŠøŠŗŠ°Ń†ŠøŠø 24 Š½Š¾ŃŠ±Ń€Ń 2024 Š² 11:15

Iā€™ve sent an invite to you so that you can post articles without pre-moderation.

2 Likes

Oh, thanks! ^^, The world is so close

Years ago it was possible for the admin of a server to request that it be manually added to an underserved zone, but it seems that fell out of favor. Would the pool admins be amenable to allowing admins of servers in Europe to temporarily have their servers put in the Russian zone to try to help stabilize things?

I think this is clear evidence, along with the sheer volume of requests, that ru.pool.ntp.org is under sustained DDoS and that is the issue that needs to be resolved.

The fact that youā€™re seeing so many port unreachable suggests strongly to me the DDoS is happening via source IP spoofing from an ISP that doesnā€™t implement BCP 38 info (Note thatā€™s a HTTP-only link, no HTTPS available. You can also see the actual IETF document at IETF BCP 38 RFC.

Quite a few ISPs do ensure spoofed source IPs donā€™t leave their eyeball (consumer) networks, but there are exceptions. Itā€™s possible the spoofing is coming entirely from one or a few machines with a fast connection sending each packet with a different random spoofed IP. Itā€™s also possible the actual sources are compromised devices, but only some fraction of a botnet is going to be connected to spoofing-friendly ISPs.

With a spoofed source IP address, the IP receiving the NTP serverā€™s response didnā€™t send the NTP request, so it doesnā€™t have the requestā€™s claimed port open waiting for a response and so responds with port unreachable.

I am curious if youā€™re seeing only ICMP port unreachables, or both that and ICMP host unreachable. With spoofed source addresses, if theyā€™re generated randomly Iā€™d expect some to not actually have a live machine at that address so a host unreachable from a router in the target ISP (AS, autonmous system) would make sense. They could be careful to pick among a list of known-connected IPs, but Iā€™m guessing they wouldnā€™t bother.

The bad news is tracking down the actual source of spoofed-source traffic is difficult and sometimes practically impossible. You need the cooperation of each ISP along the path back from a target IP to the source, with each one needing to look for huge flows of NTP mode 3 requests and figure out where itā€™s entering their system to point back to the next AS/ISP on that path.

Correct.

You can disable the processing overhead of maintaining the MRU list in ntpd by adding disable monitor to ntp.conf and ensuring none of the restrict lines have limited. I donā€™t think kod alone will enable the MRU list maintenance, but then kod without limited in a restrict entry does nothing, and will produce a warning in the log to that effect.

The default maximum memory for the mrulist is 1 MB. Using the authentication-required ntpq command monstats you can see MRU list stats. On a ntpd with no mru configuration in ntp.conf on x64, I get:

C:\Users\daveh>nq -c "monstats"

enabled:              0x3
addresses:            19
peak addresses:       20
maximum addresses:    11915
reclaim above count:  600
reclaim older than:   64
kilobytes:            2
maximum kilobytes:    1024

C:\Users\daveh>

So you can see each entry on x64 is consuming 1 MB / 11915 or 88 bytes. A gigabyte of memory would allow up to 12.2 million different addresses, however, it would require tuning the ā€œreclaim aboveā€ and ā€œreclaim older thanā€ which cause ntpd to not grow the number of entries if the total count is more than 600 or the oldest is more than 64 seconds. Thatā€™s done with mru options in ntp.conf, maxdepth and maxage respectively.

The MRU list is maintained as a doubly-linked list indexed by an IP address hash table to minimize the per-packet work. This means the work is localized to the two hash table entries (lists) for the outgoing and incoming IP address plus the back and forward list pointers of the entry being recycled to move it to the most-recent poisition in the MRU list. Iā€™ve successfully configured it to keep at least 200,000 entries without noticable impact to the processing speed on a system handling 1-2 Mbps of NTP traffic. It will be a bit slower triggering more CPU cache misses manipulating various pages of the 17.6 MB of memory a 200,000 entry MRU list occupies.

Incidentally the default mru maxdepth 600 of ntpd is a holdover from the long-ago-removed ntpdc command monlist, as ntpdā€™s response could only send 600 responses in a blast of packets likely to make it through to a remote without any being dropped. That monlist response functionality was the infamous ntpd traffic amplification that was widely exploited in 2014/2015 before people either updated to a newer ntpd without that functionality, or configured an older ntpd to drop ntpq and ntpdc requests via restrict ... noquery. Itā€™s probably time to increase that default maxdepth to something closer to the number that fits in the default maximum memory of 1 KB, or at least a more generous number like 2000.

[1] I tried to figure out the link syntax to use here that would let me change the link text, no luck yet, my apology. EDIT: Thanks to @n1zyy for pointing out the correct syntax to me in a PM. He also pointed out itā€™s MarkDown, but I knew that and had searched for how to link in MarkDown. Maybe I got my () and ][ confused, but it might also be Discourse is picky about the order of the text vs. the link, which apparently isnā€™t always the case in ever-slippery MarkDown universe.

1 Like

Following up on messages in another thread really about the problems for Russian pool server operators, thanks to @timz and @kkursor for verifying Cloudflareā€™s anycast servers are working from within Russia.
For reference see my post followed by two responses.

The upshot is while the flood of abusive queries to *.ru.pool.ntp.org is causing pain for most pool server operators in Russia, itā€™s only degrading service and making the zone utility essentially entirely reliant on Cloudflare. For those relying on that zone to maintain their clocks, it appears Cloudflareā€™s infrastructure can handle the flood one way or another. They may have tracked it back to a particular AS they peer with and filtered NTP queries from that AS, or they may have some peer-facing firewalling thatā€™s dropping the abusive traffic before it hits their NTP servers. Given providing DDoS-proof web CDN is one of their core businesses, Iā€™m sure they have all sorts of expertise and tools at their disposal to manage the problem.

Operators of pool servers may want to switch to monitoring-only mode as long as this mostly-futile attack continues. Or they may want to reach out to their ISPs to explain the situation and ask for their help back-tracing the flood to its sources.

1 Like

ā€¦ or it could be just a bug.

As mentioned earlier, kkursor posted about this on Habr and something interesting showed up in the comments.

With the help of some machine translation:

ā€œOn the night of October 24, the number of UDP broadcasts on all nodes at once increased sharply. That is, this is not just one node. [ā€¦] A lot of sessions on port UDP 123. I took a specific subscriber and found out that Yandex station is requesting NTP servers 6 times every 5 seconds. [ā€¦] p.s. I checked it on my home ā€œAliceā€. Exactly every 5 seconds, 4 NTP requests.ā€

A followup response:

"The number of Yandex stations sold is 8 million by 2023 and +3.3 million in 2024 = 11.3 million.

Letā€™s assume that the phenomenon is widespread, each one makes 4 NTP requests every 5 seconds.

This is 720 (3600 / 5) requests per hour, or (11,300,000 * 4 * 720) - 32.544 billion requests per hour or 9,040,000 requests to NTP servers per second."

I would suggest investigating if those Yandex stations are to blame. You may need to contact the abuse address of some friendly ISP to troubleshoot this further, possibly with some tcpdumps of the offending traffic.

3 Likes

At a glance, this didnā€™t look like a DDoS to me, because:

  • Heavy traffic load appears only when the monitoring system includes a server in the pool.
  • Traffic drops to negligible values once the pool no longer includes the server.

In my understanding, this looks like legitimate clients making their first-time requests. If it was a DRDoS, then the traffic would remain indefinitely, once the ā€œattackersā€ become aware of server existence.

However, inspecting the MRU list gave me some thoughts:

$ ntpq -c 'mrulist sort=-count'
lstint avgint rstr r m v  count rport remote address
==============================================================================
     0      0  3d0 L 3 3  82834   437 171.22.215.174 (RLINE1 = AS35608)
     0      0  bd0 K 3 4  43905 46178 80.76.106.190 (dynip6-190.tdsplus.ru)
     1      0  3d0 L 3 4  41390 39532 80.76.96.53 (TDS+ = AS51547)
     3      0  3d0 L 3 4  40596 52981 80.76.96.43 (TDS+ = AS51547)
     0      0  3d0 L 3 3  38341   294 45.141.93.253 (RLINE1 = AS35608)
     1      0  3d0 L 3 4  30959 40020 80.76.110.197 (dynip10-197.tdsplus.ru)
     3      0  3d0 L 3 4  21990 22305 80.76.96.37 (etra-plus.ru)
     1      0  3d0 L 3 4  21932 50388 80.76.96.35 (dkkonversiya.ru)
     2      0  3d0 L 3 4  17815 42734 80.76.110.195 (dynip10-195.tdsplus.ru)
     2      0  3d0 L 3 4  17723 33731 80.76.96.33 (TDS+ = AS51547)
     5      0  3d0 L 3 4  17013 50980 80.76.96.39 (TDS+ = AS51547)
     1      0  3d0 L 3 3  15484 19523 171.22.213.22 (RLINE1 = AS35608)

First of all, the most frequent addresses are from a small bunch of domestic ISPs. This fact alone does not indicate anything, as many users in Russia are behind NAT and thus sharing same IP addresses. However, the ISPs figured here are not anywhere popular, AFAIK, to generate such an amount of traffic, while none of the really popular ISPs showed up in the logs. This makes me think that some ISPs may be the target of an attack, or may be the source of some IoT devices which went out of control, etc.
Second, many requests ā€œfromā€ those clients have strange source port numbers ā€” neither 123 nor 32768ā€“65535, and sometimes even below 1024. I decided to block such requests on the firewall to decrease the probability of reflection attacks on third-party infrastructure (or at least to halve its intensity if the ā€œsourceā€ port is chosen randomly by a spoofer). If those are legitimate legacy systems using ports starting from 1024, then I think it is acceptable ā€œcollateral damageā€ in current desperate circumstances.
PS. Well, after inspecting the firewall logs during ā€œpeace timeā€, I reconsidered and enabled ports 1024ā€“32767 as well.

JFYI. For me, the bottleneck is not the NTPd server itself (although its Atom D2500 is nearly fully loaded when incoming traffic reaches 20 to 50 Mbit/s), but pfSense router based on Celeron G3900 and an Intel NIC which seems to generate a lot of interrupts, so that a single core is almost eaten by handling them.

One of big russian hosting-provider has read Habr article, contacted me for assistance and offered 30 free VPS to serve RU-zone.
Maybe we will resurrect soon.

5 Likes

This would be consistent with the attack targeting a hostname in *.ru.pool.ntp.org rather than IP addresses.

Itā€™s not unusual for clients to use any UDP source port. Typically Linux systems would query from 123 or a port above 1024, but any source port is possible.

As far as the flood coming from less-popular ISPs, those might be ISPs which donā€™t protect against their customers spoofing othersā€™ IP addresses. The unusually high level of ICMP unreachables suggests forged source IP addresses.

1 Like

It is possible to set netspeed lower than 512k. Look at GET requests that ā€˜Manage serversā€™ page send.
pool.ntp.org: the internet cluster of ntp servers will set 1 kbps. I set 1 kbps and it is easy to handle.

upd: set 30 kbps, it gets higher - 50% cpu, about 70k ppm, ~70 mbps. Set 15 kbps, ~45 mbit/s, comfortable load.

Maybe due the ā€œhugeā€ amount of new server in the RU Zone you will get less often in the DNS rotation and as result you will get less traffic.

But funny that you can set the speed via GET :smiley:
grafik

And even clients using source UDP/123 will be translated to some other port number if behind a NAT gateway anyway.

In my experience, I suspect that the pool serves as a way to mine server addresses. My test involved adding a server to the pool and observed waves of pokes at several popular ports, such as SSH, SMB, Telnet, etc, besides NTP. If my anecdotal experience applies, the Russian pool would be an obvious resource to mine addresses of servers in Russia.

All of my public IPs receive this 24/7 whether they are NTP servers or not, and there are paid for services like Shodan which scan the whole Internet to compile a database of open ports.

Are you sure that you see an increase in this immediately after you add IPs to the NTP pool?

The idea that people are doing DNS queries to gather lists of NTP servers (in a given region?) and then subject them to further scanning seems strange to me when one can for example just download a list of all IP addresses allocated to entities in RU and scan those (or pay someone who has already scanned those).

Yes, I am.

Why not? I would, if I were thusly inclined. Itā€™s a very low hanging fruit, especially to script kids.