NTP server blacklisted?

As soon as my servers are reaching the score 10 and become active member of the pool, they are loosing the connectivity for extended duration. It looks like that they got blacklisted (by Atlas, or some other global sensor).

Is there a way to check the status of my servers in the different global DoS protection database(s), and if yes, how?

My servers are running ntpd, here is the relevant part of the outgoing packet flow rate-limit:

restrict default nomodify notrap nopeer noquery limited
discard monitor 100

Looking at the domain of your servers, I think it’s possible that the “DoS countermeasures” are somewhere within your own organization. I’d start by asking from your friendly network admin people if they have an idea of what’s going on.

@avij thanks, that I have to check.
The servers do not loose all the traffic (at least for inbound). Even being out of the pool, some residual traffic still comes into the servers:
image
image

Maybe this helps…

bas@workstation:~$ ping 156.106.214.52
PING 156.106.214.52 (156.106.214.52) 56(84) bytes of data.
^C
--- 156.106.214.52 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8171ms

bas@workstation:~$ ping 156.106.214.48
PING 156.106.214.48 (156.106.214.48) 56(84) bytes of data.
^C
--- 156.106.214.48 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8190ms

Some firewall dude at your site doesn’t want anything to happen and even block simple pings.

That lets me believe they use something that counts requests and closes the inbound on x-requests and even on pings.

Looks to me a little over 350/sec triggers their firewall to cut off 50% of the requests.

However, I see this happening at my servers also then they run as VPS and IPv4 is heavily used, at the same time IPv6 humms along like nothing is wrong.

IPv4 seems to have more problems then IPv6 when traffic goes up. Routers not fast enough?

The policy of our security team is to deny all ICMP. It is not surprising that the ping does not work. We have enough routing capacity.

One may see that the offset is around 16 msec from San Jose. When the packets went through different ISP, (offset was around 7 msec) we had score near 20 continuously, only some time-out sample rarely. My theory that this specific new ISP in the middle is broadcasting incorrect alert all over the Internet to other ISP’s.

curl -s 'https://www.ntppool.org/scores/156.106.214.52/log?limit=2000&monitor=9' | cut -d, -f2,3,5 | grep 2022-05-18
"2022-05-18 23:52:42",0,-16.2
"2022-05-18 23:39:16",0,-11.8
"2022-05-18 23:26:18",0,-7.2
"2022-05-18 23:12:01",0,-2.3
"2022-05-18 22:57:03",0,2.9
"2022-05-18 22:43:59",0,8.3
"2022-05-18 22:31:24",0,14
"2022-05-18 22:18:30",0.007862721,20
"2022-05-18 22:04:39",0.007801146,20
"2022-05-18 21:50:41",0.006818914,20
"2022-05-18 21:37:42",0.004033139,20
"2022-05-18 21:24:37",0.008080867,20
"2022-05-18 21:11:31",0.006564996,20
"2022-05-18 20:57:04",0.006537837,20
"2022-05-18 20:42:14",0.007931453,20
"2022-05-18 20:28:50",0.008203841,20
"2022-05-18 20:16:00",0.006689921,20
"2022-05-18 20:02:03",0.004346407,20
"2022-05-18 19:48:41",0.004442483,20
"2022-05-18 19:34:56",0.005602809,20
"2022-05-18 19:22:05",0.008135739,20
"2022-05-18 19:08:53",0.006890307,20
"2022-05-18 18:54:46",0.006868908,20
"2022-05-18 18:41:30",0.008103705,19.9
"2022-05-18 18:28:02",0.008580185,19.9
"2022-05-18 18:13:16",0.005965046,19.9
"2022-05-18 17:59:54",0.004815361,19.9
"2022-05-18 17:46:23",0.009523904,19.9
"2022-05-18 17:32:06",0.008323988,19.9
"2022-05-18 17:19:07",0.007010948,19.9
"2022-05-18 17:04:58",0.006920376,19.9
"2022-05-18 16:50:50",0.00674642,19.9
"2022-05-18 16:36:29",0.006680381,19.9
"2022-05-18 16:22:19",0.007977072,19.9
"2022-05-18 16:08:13",0.007887176,19.9
"2022-05-18 15:54:26",0.006636645,19.9
"2022-05-18 15:40:28",0.003784333,19.9
"2022-05-18 15:26:15",0.006056124,19.9
"2022-05-18 15:12:17",0.003845293,19.9
"2022-05-18 14:58:18",0.006783818,19.9
"2022-05-18 14:44:25",0.00616746,19.9
"2022-05-18 14:30:33",0.003886172,19.9
"2022-05-18 14:15:40",0.007746599,19.9
"2022-05-18 14:02:00",0.005383428,19.9
"2022-05-18 13:48:17",0.005471019,19.8
"2022-05-18 13:34:35",0.005588513,19.8
"2022-05-18 13:20:06",0.006636532,19.8
"2022-05-18 13:05:12",0.006676442,19.8
"2022-05-18 12:51:07",0.007606232,19.8
"2022-05-18 12:36:40",0.006729631,19.8
"2022-05-18 12:22:48",0.00790966,19.8
"2022-05-18 12:08:56",0.007027382,19.8
"2022-05-18 11:54:29",0.007963715,19.8
"2022-05-18 11:40:50",0.006798359,19.8
"2022-05-18 11:27:43",0.007960031,19.7
"2022-05-18 11:14:43",0.006983988,19.7
"2022-05-18 10:59:51",0.006796026,19.7
"2022-05-18 10:46:19",0.008099731,19.7
"2022-05-18 10:32:42",0.004228404,19.7
"2022-05-18 10:18:57",0.006560396,19.7
"2022-05-18 10:04:50",0.008083211,19.7
"2022-05-18 09:50:32",0.008049165,19.6
"2022-05-18 09:36:41",0.00654796,19.6
"2022-05-18 09:23:45",0.004178042,19.6
"2022-05-18 09:09:54",0.006385475,19.6
"2022-05-18 08:56:55",0.006586986,19.6
"2022-05-18 08:43:14",0.006658423,19.5
"2022-05-18 08:30:55",0.006677522,19.5
"2022-05-18 08:18:15",0.004403919,19.5
"2022-05-18 08:05:23",0.006847005,19.5
"2022-05-18 07:51:46",0.008074777,19.4
"2022-05-18 07:37:13",0.007145587,19.4
"2022-05-18 07:23:52",0.006761529,19.4
"2022-05-18 07:11:13",0.007025819,19.3
"2022-05-18 06:57:50",0.0078394,19.3
"2022-05-18 06:43:16",0.007741587,19.3
"2022-05-18 06:28:38",0.006906522,19.2
"2022-05-18 06:15:00",0.007818337,19.2
"2022-05-18 06:01:26",0.007641627,19.1
"2022-05-18 05:47:34",0.004002031,19.1
"2022-05-18 05:33:10",0.006717447,19
"2022-05-18 05:19:38",0.006828889,19
"2022-05-18 05:05:11",0.007565161,18.9
"2022-05-18 04:51:21",0.006531512,18.9
"2022-05-18 04:37:31",0.007939724,18.8
"2022-05-18 04:22:08",0.008008638,18.8
"2022-05-18 04:09:06",0.007036715,18.7
"2022-05-18 03:55:34",0.008039401,18.6
"2022-05-18 03:41:50",0.007914564,18.5
"2022-05-18 03:28:03",0.008316757,18.5
"2022-05-18 03:14:04",0.008310484,18.4
"2022-05-18 03:01:09",0.006660226,18.3
"2022-05-18 02:48:26",0.008134513,18.2
"2022-05-18 02:35:01",0.004574117,18.1
"2022-05-18 02:21:26",0.00700869,18
"2022-05-18 02:07:52",0.00725353,17.9
"2022-05-18 01:54:50",0.008562962,17.8
"2022-05-18 01:42:04",0.00698099,17.7
"2022-05-18 01:28:47",0.007323966,17.6
"2022-05-18 01:14:55",0.005939204,17.4
"2022-05-18 01:00:34",0.007183402,17.3
"2022-05-18 00:47:41",0.007142566,17.2
"2022-05-18 00:34:42",0.007027033,17
"2022-05-18 00:21:04",0.007294048,16.9
"2022-05-18 00:07:51",0.007652317,16.7

Maybe they drop NTP-UDP over more important packages like VOIP?

I have seen this behavior on my IPv4-ports also but hardly never on IPv6-ports.

It’s typical for IPv4 to have a lot of drops when IPv6 acts like nothing is wrong.

I do not know why this happens but it does.

I have ordered me an Intel Hardware-timestamping-NIC to see if it differes anything.
As I do notice a lot of jiffer on my IPv4 but weirdly others do have a lot less problems.

I suspect HW-timestamping can make a difference as it’s the NIC itself that stamps the ‘time’ in the package and not the kernel or something else, so the receiver knows exactly when it left the system.

Also on my VPS servers I see the same behaviour from time to time, and they don’t offer HW-stamping either.

In my opinion (could be 100% wrong) it may has to do with HW-stamping and packages being too long underway and thus dropped for whatever reason. I do not know how SW-stamping works, but I suspect it can play a factor in it.

Intel has cheap cards for about 35 euro and 2 ports, so it’s worth a shot to test it.

I ordered me this one: https://www.amazon.de/gp/product/B09D3JL14S/ref=ppx_yo_dt_b_asin_title_o00_s00

Bas.

Denying all ICMP is a very dumb. You can block any unknown ICMP types/codes, block strange size ICMP, limit rate of ICMP via pps per type or/and at general, block just type 8 (echo request) from WAN. At least blocking whole ICMP cause PMTU blackholes happen.

Anyway it seems that those fails triggered by your security system. Do you try to set some outside monitoring for your NTP server? Maybe also software “mtr” in UDP traceroute mode and ports (33434) can help you to determinate problem location.

1 Like

I think it is an issue only for IPv6, IPv4 has no problem with it.

There are multiple monitoring points on site24x7, and many are reporting the problem regularly when blacklisting occurs.

Already the neighboring ISP in the same country is blacklisting the connections, but not all ISPs are doing it. About one-third - half of the traffic dropped that way. You may see that on the MRTG graph posted earlier. The residual clients (mostly kind of NTP servers with server statement without reboot) generates the traffic. The QPS drops from ~300 to ~200 at blacklisting, but as the blacklisting removed, the QPS value jumps back to the earlier ~300 immediately.

You are not blacklisted anywhere as far as I can see:

In my opinion your own system/firewall causes it, as the beta system shows the same problems.

https://web.beta.grundclock.com/scores/156.106.214.52

I doubt your ISP has just 1 peer or would block some NTP-traffic.

I would like to let you know that Palo Alto pushed a signature update that fixed an issue of NTP traffic being falsely classified as Bittorent. Since the signature update, the servers are on score 20.

1 Like