IPv4 -v- IPv6 monitoring

You can try messaging @Ask directly, he would know the IPs.

I sent you a message with the Amsterdam IPs.

The Amsterdam monitor used to do more samples (3) than the Newark one (1), but they both do 3 samples now so I am surprised the rate limiting works out differently.

In any case 3 (or even 10 …?) NTP queries should be expected and supported, I believe. Think people behind NAT, etc.

At the moment, I have only the IPv6 server running, so NAT is not an issue.

Also, do the monitor servers abide by this best practice in RFC4330 §10?

  1. A client MUST NOT under any conditions use a poll interval less
    than 15 seconds.

Why would they be getting a RATE KOD?

However, how much does KOD make sense? Only canonical clients would honor it, but canonical clients usually don’t violate the RATE. Rather, I suppose that only non canonical clients would abuse the RATE, which probably just ignore KOD.

Should or should not KOD be used in restriction rules?

Thank you.

I use it in my configurations. Yes, there are a small amount of devices out there that don’t honor it, but better to have it enabled for the majority. At the very least they are not receiving time when they get rate limited.

The discard setting gives control over NTP’s rate limiting: https://www.eecis.udel.edu/~mills/ntp/html/accopt.html

In reality an 8s average would be allowed, with the minimum 2s. The reason for this is clients that use the burst setting.

By spotting clients like this, I don’t think that I can afford to keep the server without limited and kod restrictions:

remote address            port  local address                count m ver rstr avgint  lstint
============================================================================================
xxxx:xxxx:xxxx:xxxx::xxxx 34028 xxxx:xxxx:xxxx:xxxx::xxxx     1562 3 4    178      4       1

@ebahapo what was the time period for the 1562 requests?

Does adding limited / kod actually help or does it just make the server send the same number of responses, but now some of them rate limited?

The monitoring system is setup to send up to 4 or 5 queries 2 seconds apart. (Steve has spotted in the tcpdump diagnostics he’s doing that it sometimes seem to double up the queries; I haven’t had time to debug this and fix it though)

Sorry, may be off topic here.

Isn’t that 3 queries per server?

Is it doubling up the number of queries or doubling up the delay between the queries? Why am I asking this? Reading the code of the monitoring, in case one query times out (2 sec elapsed time) the calling procedure still sleeps for an additional 2 seconds before the next query to the same server.

I just grabbed that information when I posted. Here’s another snapshot as of now:

remote address            port  local address                count m ver rstr avgint  lstint
============================================================================================
xxxx:xxxx:xxxx:xxxx::xxxx 51252 xxxx:xxxx:xxxx:xxxx::xxxx      909 3 4    158      6       1
xxxx:xxxx:xxxx:xxxx::xxxx 35674 xxxx:xxxx:xxxx:xxxx::xxxx      733 3 4    158      6       3

These are rogue clients. Obviously, they ignore the KODs and just bang the system for over 1h. I suspect that they are malicious bots or drones. I should rate limit them in the firewall though, as ntpd assumes compliant clients and cannot properly deal with them.

However, since I added rate exceptions for the IP addresses of the monitoring servers, they have not been sent KODs anymore.

Right, not all clients will respect a KOD, and in fact poorly coded clients when they don’t receive a proper time packet reply they will simply start querying more! Thankfully NTP packets are tiny so it’s not really a burden, just an annoyance at someone else’s ignorance.

However, there are a few legitimate reasons some IPs might end up querying more than expected, one that comes to mind is it could be a proxy. In fact, I had issues with some IPs that after some communication back & forth we found out they were being used by Tesla. Not only were they proxying through a small subnet, but their client had a bug causing a greater rate than it should have been. But then I’ve also had some clients doing sub-second querying that came from AWS… Which I eventually blocked.

Another thing is if you are using NTPD or Chrony. With NTPD even when a client exceeds the rate limit, there is still a percentage of packets that NTPD will reply correctly instead of with a KOD. I don’t know how Chrony behaves.

I use ‘hashlimit’ with iptables in order to rate-limit by IP source. Be aware depending on how many QPS you get you might have to bump up your conntrack_max in sysctl.conf & hashsize in modprobe.conf… I have my hashlimit set to a burst of 8, and avg 4/min with a 2/min expire (that’s all that is really necessary). I find most clients are either very well behaved, or wildly abusive… There’s no in-between… lol. About 1/6th of the traffic gets dropped with the above settings.

Chrony has the same behavior. At the maximum rate limit setting, still 1 in every 16 NTP packets is responded to. The manual defends this behavior as a way to prevent completely cutting off a DDOS-ed address.

This is necessary to prevent an attacker who is sending requests with a spoofed source address from completely blocking responses to that address.

The patterns are interesting. Polls for some servers come in groups of 3 separated by 2 seconds (if response arrives) or 5 seconds (no response).

The other major pattern has polls in groups of 3 alternating with polls in groups of 5. [This is a simplification.] About 10% of the hosts have this second pattern. I suggested that a second monitor of older vintage might inadvertently be running.

Hi all,

Experiencing the same issues here. IPv6 is working perfectly fine, however IPv4 is extremely inconsistent and has been for some time.

Would it be possible to get some further assistance for troubleshooting?

Thanks

Hi and welcome! My suggestions are:

  • check that your server is working as you expect and that it can be accessed from the internet (Google a “check my ntp server” site)
  • run "mtr --udp --port 123 " to the monitor and if there’s something obvious report it to your ISP to sort out. The monitor IPs are here: https://dev.ntppool.org/monitoring/network-debugging/
  • if your score is >10 ignore it :slight_smile:
  • if it’s intermittent and it’s bugging you, I would work out when the checks from the monitor are due (you can watch them come in with the appropriate tcpdump recipe), then fire up mtr around the same time and see if the monitor packets arrive / gets an answer / there’s an obvious drop along the route at that time.

I’ve worked on a number of NTP loss problems. Tools like mtr and traceroute can sometimes identify where packets are being dropped. Tracing the NTP request or response may be needed, depending on which is lost. For NTP request losses those tools should be run from the NTP client. For NTP response losses those tools should be run from the NTP server.

I typically run traceroute towards two different target UDP ports, e.g., 123 and 53. Comparing the two may show evidence of blockage / rate limits.

The bigger question is “what next?” I’ve seen NTP filtering problems with CenturyLink, Telia, Zayo and others. They don’t respond to my emails. The people doing the deliberate NTP filtering (rate limiting) feel that it is a necessary DDoS mitigation. There seems to be no dialogue with the NTP community to lessen the time transfer impact.

Note. Using the “–udp --port 123” options mtr version 0.92 worked as expected. An older version, 0.85, did not use UDP port 123 probes. I don’t know when the behavior changed.

It’s not that hard to answer. There need to be more monitors.
Newark is a good monitor for some but terribly bad for others like me.

1 Like

Sounds very sensible. Maybe it’s time to update the automated email that goes out when the monitor can’t reach a server. The current one is a bit finger pointy that it’s the server that’s got a problem rather than either the link or the server and it’s a bit light on what steps to take to diagnose the problem and try to fix it.

Aye. We seem to know what a (the?) cause of the issue is, but not having much joy fixing the underlying cause. Is there a trade body that represents ISPs that we could talk to to get the message out across all ISPs? Rate limit port 123 seems to be the current default ISP position. Not sure what’s needed to change their mind?

I guess adding more monitors would work around the issue as long as the end user NTP<-> pool traffic stays local to the monitor doesn’t cross any of the rate limiting ISPs…

Thanks for the advice everyone.

Like others have said checking mtr isn’t great since it’s a case of catching it. Traceroutes appear to point to Zayo dropping connections too. Funnily enough, on the beta site both Amsterdam and L.A both report accurately yet Newark is inconsistent. Any idea what command the monitor is running to perform the checks?

1 Like

Hi. Not sure what you mean by what command? You can watch the packets come in with tcpdump…

Sorry, early morning brain. I was just wondering if the server in Newark is running some ntp specific command for the polling (how many checks to perform, how often etc)

Does anyone have any idea what the IP addresses are for the servers in L.A and Amsterdam in the beta pool?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.