IPv4 -v- IPv6 monitoring

I run a single NTP server (physical box, sole task, native IPv4 and native IPv6 connections through same line) and especially over the last few days I’ve noticed that the IPv4 monitoring has been a bit crazy at the same time that the IPv6 has been rock solid. Given that there’s only one server - physical connection etc. is there something specifically difference about the way the monitoring works between the two addressing schemes?

ntp_ipv4 ntp_ipv6

1 Like

Hi, most likely a dodgy IPv4 router somewhere between the monitor and yourself. You could try an "mtr --udp --port 123 " to the monitor, but you’d have to be looking at just the right moment to see the problem given the issue looks intermittent. The monitor IPs are here: https://dev.ntppool.org/monitoring/network-debugging/

Ah, as we were – that was the answer and it turns out to be ZAYO again… Ta!

[14:51] > traceroute --resolve-hostnames 139.178.64.42
traceroute to 139.178.64.42 (139.178.64.42), 64 hops max
1 84.45.170.209 (rtext) 1.223ms 1.074ms 1.056ms
2 78.33.253.11 (lns4.inx.dsl.enta.net) 26.039ms 26.701ms 26.197ms
3 78.33.253.1 (100.bundle-ether2.inx.dsl.enta.net) 26.790ms 26.743ms 27.170ms
4 188.39.127.242 (bundle-ether1.interxion3.core.enta.net) 26.978ms 26.512ms 26.666ms
5 188.39.127.102 (bundle-ether100.telehouse-east4.core.enta.net) 26.904ms 26.942ms 27.966ms
6 195.66.224.76 (ge-2-1-0.mpr1.lhr2.uk.above.net) 28.442ms 26.634ms 28.457ms
7 64.125.30.52 (ae11.mpr2.lhr2.uk.zip.zayo.com) 29.040ms 26.878ms 26.524ms
8 * * *
9 64.125.29.126 (ae5.cs3.lga5.us.eth.zayo.com) 93.323ms * *
10 64.125.29.221 (ae15.er1.lga5.us.zip.zayo.com) 93.012ms 92.891ms 92.476ms
11 64.125.54.26 (64.125.54.26) 138.747ms 94.022ms 94.696ms
12 198.16.6.237 (0.et-0-0-1.bsr2.ewr1.packet.net) 95.037ms 96.124ms 94.893ms
13 198.16.4.213 (0.ae2.dsr1.ewr1.packet.net) 114.624ms 109.102ms 110.425ms
14 147.75.98.105 (147.75.98.105) 107.499ms 113.299ms 186.843ms
15 139.178.64.42 (monewr1.ntppool.net) 96.013ms 95.163ms 95.076ms
[14:52] > traceroute --resolve-hostnames 139.178.64.42
traceroute to 139.178.64.42 (139.178.64.42), 64 hops max
1 84.45.170.209 (rtext) 1.130ms 1.073ms 1.081ms
2 78.33.253.11 (lns4.inx.dsl.enta.net) 26.164ms 26.288ms 25.706ms
3 78.33.253.1 (100.bundle-ether2.inx.dsl.enta.net) 27.231ms 27.473ms 26.727ms
4 188.39.127.242 (bundle-ether1.interxion3.core.enta.net) 27.624ms 27.458ms 26.631ms
5 188.39.127.102 (bundle-ether100.telehouse-east4.core.enta.net) 27.171ms 27.945ms 27.934ms
6 195.66.224.76 (ge-2-1-0.mpr1.lhr2.uk.above.net) 26.778ms 27.264ms 26.523ms
7 64.125.30.52 (ae11.mpr2.lhr2.uk.zip.zayo.com) 27.973ms 27.516ms 27.476ms
8 * * *
9 * * *
10 64.125.29.221 (ae15.er1.lga5.us.zip.zayo.com) 92.449ms 92.855ms 92.375ms
11 64.125.54.26 (64.125.54.26) 94.519ms 155.184ms 94.148ms
12 198.16.6.237 (0.et-0-0-1.bsr2.ewr1.packet.net) 94.701ms 94.995ms 95.942ms
13 198.16.4.213 (0.ae2.dsr1.ewr1.packet.net) 114.130ms 111.512ms 148.829ms
14 147.75.98.105 (147.75.98.105) 109.034ms 106.901ms 110.455ms
15 139.178.64.42 (monewr1.ntppool.net) 94.831ms 94.781ms 94.975ms

Will follow any further discussion on the earlier thread.
Alison

(mtr, btw, not showing anything special. As you say it’s a question of catching it…)

What are the IPv4 and IPv6 addresses of the monitoring stations in NL and US, please?

Thank you.

https://dev.ntppool.org/monitoring/network-debugging/

I don’t know the IPs for the beta monitors… Sorry.

I’d like to know those too, for they sometimes also error out with “RATE” as the reason in the CSV log. I assume that it’s because I have the restriction “kod” by default, so they may need a special rule without this restriction. At least the production monitors seem happy with the new rule specific for them.

Thank you.

You can try messaging @Ask directly, he would know the IPs.

I sent you a message with the Amsterdam IPs.

The Amsterdam monitor used to do more samples (3) than the Newark one (1), but they both do 3 samples now so I am surprised the rate limiting works out differently.

In any case 3 (or even 10 …?) NTP queries should be expected and supported, I believe. Think people behind NAT, etc.

At the moment, I have only the IPv6 server running, so NAT is not an issue.

Also, do the monitor servers abide by this best practice in RFC4330 §10?

  1. A client MUST NOT under any conditions use a poll interval less
    than 15 seconds.

Why would they be getting a RATE KOD?

However, how much does KOD make sense? Only canonical clients would honor it, but canonical clients usually don’t violate the RATE. Rather, I suppose that only non canonical clients would abuse the RATE, which probably just ignore KOD.

Should or should not KOD be used in restriction rules?

Thank you.

I use it in my configurations. Yes, there are a small amount of devices out there that don’t honor it, but better to have it enabled for the majority. At the very least they are not receiving time when they get rate limited.

The discard setting gives control over NTP’s rate limiting: Access Control Commands and Options

In reality an 8s average would be allowed, with the minimum 2s. The reason for this is clients that use the burst setting.

By spotting clients like this, I don’t think that I can afford to keep the server without limited and kod restrictions:

remote address            port  local address                count m ver rstr avgint  lstint
============================================================================================
xxxx:xxxx:xxxx:xxxx::xxxx 34028 xxxx:xxxx:xxxx:xxxx::xxxx     1562 3 4    178      4       1

@ebahapo what was the time period for the 1562 requests?

Does adding limited / kod actually help or does it just make the server send the same number of responses, but now some of them rate limited?

The monitoring system is setup to send up to 4 or 5 queries 2 seconds apart. (Steve has spotted in the tcpdump diagnostics he’s doing that it sometimes seem to double up the queries; I haven’t had time to debug this and fix it though)

Sorry, may be off topic here.

Isn’t that 3 queries per server?

Is it doubling up the number of queries or doubling up the delay between the queries? Why am I asking this? Reading the code of the monitoring, in case one query times out (2 sec elapsed time) the calling procedure still sleeps for an additional 2 seconds before the next query to the same server.

I just grabbed that information when I posted. Here’s another snapshot as of now:

remote address            port  local address                count m ver rstr avgint  lstint
============================================================================================
xxxx:xxxx:xxxx:xxxx::xxxx 51252 xxxx:xxxx:xxxx:xxxx::xxxx      909 3 4    158      6       1
xxxx:xxxx:xxxx:xxxx::xxxx 35674 xxxx:xxxx:xxxx:xxxx::xxxx      733 3 4    158      6       3

These are rogue clients. Obviously, they ignore the KODs and just bang the system for over 1h. I suspect that they are malicious bots or drones. I should rate limit them in the firewall though, as ntpd assumes compliant clients and cannot properly deal with them.

However, since I added rate exceptions for the IP addresses of the monitoring servers, they have not been sent KODs anymore.

Right, not all clients will respect a KOD, and in fact poorly coded clients when they don’t receive a proper time packet reply they will simply start querying more! Thankfully NTP packets are tiny so it’s not really a burden, just an annoyance at someone else’s ignorance.

However, there are a few legitimate reasons some IPs might end up querying more than expected, one that comes to mind is it could be a proxy. In fact, I had issues with some IPs that after some communication back & forth we found out they were being used by Tesla. Not only were they proxying through a small subnet, but their client had a bug causing a greater rate than it should have been. But then I’ve also had some clients doing sub-second querying that came from AWS… Which I eventually blocked.

Another thing is if you are using NTPD or Chrony. With NTPD even when a client exceeds the rate limit, there is still a percentage of packets that NTPD will reply correctly instead of with a KOD. I don’t know how Chrony behaves.

I use ‘hashlimit’ with iptables in order to rate-limit by IP source. Be aware depending on how many QPS you get you might have to bump up your conntrack_max in sysctl.conf & hashsize in modprobe.conf… I have my hashlimit set to a burst of 8, and avg 4/min with a 2/min expire (that’s all that is really necessary). I find most clients are either very well behaved, or wildly abusive… There’s no in-between… lol. About 1/6th of the traffic gets dropped with the above settings.

Chrony has the same behavior. At the maximum rate limit setting, still 1 in every 16 NTP packets is responded to. The manual defends this behavior as a way to prevent completely cutting off a DDOS-ed address.

This is necessary to prevent an attacker who is sending requests with a spoofed source address from completely blocking responses to that address.

The patterns are interesting. Polls for some servers come in groups of 3 separated by 2 seconds (if response arrives) or 5 seconds (no response).

The other major pattern has polls in groups of 3 alternating with polls in groups of 5. [This is a simplification.] About 10% of the hosts have this second pattern. I suggested that a second monitor of older vintage might inadvertently be running.

Hi all,

Experiencing the same issues here. IPv6 is working perfectly fine, however IPv4 is extremely inconsistent and has been for some time.

Would it be possible to get some further assistance for troubleshooting?

Thanks

Hi and welcome! My suggestions are:

  • check that your server is working as you expect and that it can be accessed from the internet (Google a “check my ntp server” site)
  • run "mtr --udp --port 123 " to the monitor and if there’s something obvious report it to your ISP to sort out. The monitor IPs are here: https://dev.ntppool.org/monitoring/network-debugging/
  • if your score is >10 ignore it :slight_smile:
  • if it’s intermittent and it’s bugging you, I would work out when the checks from the monitor are due (you can watch them come in with the appropriate tcpdump recipe), then fire up mtr around the same time and see if the monitor packets arrive / gets an answer / there’s an obvious drop along the route at that time.