What is the cause of the demand spikes?

Not an issue, but I’m just curious…

I am recent to hosting a couple of servers for the pool, and on both of them I see some quite large spikes in requests every now and then with no obvious pattern. I’m curious what the cause is?

I wonder whether PAT gateways with a lot of users behind them sharing an IP address could be part of the explanation, or just very big nameservers (in terms of load) being served my address? On one of the servers, the NTP server is behind a NAT gateway, and when I look at the number of unique IP addresses on that NAT gateway NAT connection tracker (the orange plot) it doesn’t rise with the peak in requests which makes me think that “big nameservers” can’t be the cause.

Another possibility could be users making multiple timesync requests over a short time, but why? Peaks are usually 3-5 times normal load levels.

Not an issue, just curious… Thanks - Olly.

I think it’s CGNAT. Especially since you already observed the number of unique IP addresses not rising together with the spike.

Can you see or monitor if it’s IPv4 or IPv6 sources? If it’s CGNAT it will be IPv4.

My graphs look similar:

chrony_serverstats-day

And most of it is IPv4. But that’s a different topic :zipper_mouth_face:

nft_counters_packets-day

1 Like

Yeah, it may well be part of it. I have a v4 only and a dual stack server. I see the spikes on both, but TBH I don’t see much v6 traffic on the dual stack server (at least compared to v4).

I have some experience with CG NAT (I used to work for ISPs) and the examples I saw had a very small number of users behind each address - 8 or so so that fixed port ranges could be used for each customer to assist with legal intercept etc. Cellular providers do NAT at much higher scale, and of course some of it might be v6 only carriers doing MAP-T or similar translation, but I don’t think there are many of those - not in Europe at least (where my servers are).

Perhaps I will try to do some captures and analysis, but the spikes are hard to predict so it will be a lot of packets I need to capture :slight_smile:

I suppose I could wind down the bandwidth setting first…

Why should CGNAT cause a spike? Sure, there are many people behind one IP address, but each of those will still be requesting time at random intervals.

1 Like

Well, that depends on the type of device. In the post I linked, @marco.davids mentions IoT devices syncing at the top of each hour. It all pretty much depends on how well NTP is implemented. There’s a known case of a large Dutch provider, having configured hundreds of devices using the NTP pool, but sending requests to port 37 (time) instead of 123 (ntp). Fortunately, that was fixed. Albeit, you can still see leftovers of that cock up in my second graph above.

I’d say the same holds for all kinds of network enabled devices, which need time, but the developers of the software know or care little about proper NTP implementation. Heck, I know of an internet radio device, that, though it pretends to be using NTP, is simply not able to observe daylight saving time. It still needs to be adjusted manually twice a year… :roll_eyes:

1 Like

@decibel - Yeah, I think it might be part of the explanation, but perhaps could just be one of several reasons. It’s certainly possible to put thousands of devices behind a singe address (enterprises often do - and use a single rDNS server for all of those devices too) and so you’d certainly expect multiple requests per second, but sometimes the peaks are really big. I only sample the data every 30 seconds, but it seems the spikes only last around a minute, which ties in with the DNS TTL of 150s.

@pengiunpee - I’ve seen IOT devices request time every 60s consistently, so I’m sure a lot of ntp load is from devices that don’t play nicely…

This is the biggest spike I’ve seen in the last 30 days or so… It’s big! 5.5Mbps traffic is quite a lot of ntp! :slight_smile:

Yes, these spikes are surely the result of badly configured devices of some sort and having many “hiding” behind CGNAT makes it very hard to identify the culprits.

There will be more tears before the guilty party/parties are found, I am sure.

systemd-timesyncd can create bursts of NTP requests.
FortiGate firewalls has created NTP bursts for several years.

Packet captures may help track down the sources.

3 Likes