Some client really can't behave

Hi all,

I have changed my servers to drop clients that do not behave.

2 parameters I set in Chrony and I see lots of abusive clients drop like flies.

ratelimit interval 6 burst 6
clientloglimit 4194304

There is no reason why people should poll loads of times, some even ten times a second from the same address.
So I stopped allowing this as my upload goes nuts, even when 512kbit speed is set.

Looks like many routers do not hold their own ntp-server and do not broadcast it in their own network.

E.g. a Fritzbox does this by default, but it seems many others don’t and let the clients keep polling the pool for no reason other then being lazy to proxy-ntp to reduce load.

Providers and router manufacturers should implement a NTP-proxy and broadcast it in the local network as this will save us a lot of unneeded and unwanted traffic.

For now, I knock those lazy networks out and drop their requests.

3 Likes

Is this causing real issues, or is this more a matter of principle?

2 Likes

Yes it’s causing problems as my entire upload is being used several times a day.
This shouldn’t happen.

Withing seconds of restarting thousants of clients request time, just from a few providers.

I find it silly that providers do DNS-caching but do not supply NTP-services by default nor configure their routers to broadcast NTP in local networks.

Because DNS is cache, all clients that request time get the same time-server, as such it’s not rotated.
This causes something near DDos on my upload.

Beware these ISP’s have milions of clients, and almost all of them use their DNS-servers and when asking the pool, they get the last query that worked.

Meaning if it cashes my server, I get hit by loads of them. I can see it spike on my load.

Look at the green-spikes, that is NTP allone, and it should not be accessed that heavily.

Often disrupting other services. I should not be hit this big.

I ran nethogs in -C mode and it shows constant UDP traffic, the only UDP I run here is NTP.

See how many it drops. That is a lot.

chronyc> serverstats 
NTP packets received       : 545202
NTP packets dropped        : 37835
Command packets received   : 71
Command packets dropped    : 0
Client log records dropped : 139498
NTS-KE connections accepted: 0
NTS-KE connections dropped : 0
Authenticated NTP packets  : 0
Interleaved NTP packets    : 16
NTP timestamps held        : 67
NTP timestamp span         : 9212

Just running less then 1 hour.

2 Likes

Do you remember I was proposing the decrease of the TTL of the DNS answers? That would lead smoother load distribution to the NTP server, for the cost of higher DNS traffic.

3 Likes

But that won’t help much for ISP’s that have millions of clients.
It will help for local networks.

I have checked the number of requests in a few seconds and it’s thousands.
It may lower the numbers, sure, but will it help against lazy ISP’s? I doubt that.

As such I simply reduced the number of replies from the same IP.

They supply everything via DHCP except NTP, they proxy all traffic except NTP.
They even keeps mail-logs for years.

The routers they supply (forced else no support) do not contain a proper NTP-local-network-server.

The solution is simply to prevent them from polling too fast and too often.
We can not fix their lazyness by lowering TTL as that will stress Ask’s his servers, not a good option either.

In my opinion a client should not poll more then once every 15 minutes, once a day at best.
Polling more should only be allowed for ntp-servers that serve others like we do.

1 Like

To give you a sample of my problem:

Belgian company that has loads of computers:

37.230.125.225 370 131 9 6 489 0 0 - -

They request 370 times and 131 are dropped because they all use the pool.
Yet they have a router to their own local network, but don’t use local-router-NTP.

So they ‘attack’ my server for time…unneeded as they have plenty resources and material to make their router/firewall etc broadcast/run a NTP-proxy.
They are just too lazy to do it, it’s just 1 line in their system where DHCP is served.

Any decent router can do this, so they don’t request 370 times but just once an hour or so.

We really need to educate IT people to do this…else the load will just go up and people will stop giving time because of the high traffic levels.

IT departments need take NTP serious and setup a local proxy, it’s not rocket-science these days :rofl:

1 Like

That does not make sense. Sure, the scales in the figure you included are different for uplink and downlink. Still, with NTP being pretty much symmetric, and unless you have doctored the graph, there should be at least the same throughput on the downlink as on the uplink. And that should show up in the graph despite different scales, if, as you claim, the uplink is pretty much NTP only.

I believe the pool is about serving and supporting clients, making it easy for them. While ISPs impact how the pool is used by clients, e.g., by CGNAT being used, that’s a fact of life, not sure what good it is to rail against ISPs. Many of them already run NTP servers, and there are a few big NTP providers out there as well. The challenge is getting clients to use those. And if they consistently would use them, the pool would be obsolete.

So the challenge would be to adapt the pool as best as can be done to the realities. E.g., up to Ask to decide how low the TTL on DNS records could go to balance the load on the infra against effects as you describe (peaks in NTP load on individual servers).

Not sure how much that would help. Only a subset of clients pick up NTP servers from DHCP. Many do not. And even those that do, e.g., Linux and similar systems, pick the DHCP-supplied server(s) in addition to default ones, not instead of/replacing them. So you’d need to educate the user to fiddle with these settings.

And again, if that were done consistently, what would be the purpose of having the pool?

I tend to disagree. As long as a single actual client (vs. the aggregated ones behind a single IP address) isn’t “abusive” as per common best practices, I don’t see why this should be policed. We are serving time to clients who request it, I don’t think we have business judging the needs of clients.

Apart from not being realistic to expect ISPs to heed your demands, or the majority of users behaving accordingly, I think anyone having issues contributing to the pool despite the pool making best possible provisions to address issues it can address (and that is subject to continuous discussion as issues arise) maybe shouldn’t be in the pool. It’s a voluntary contribution, and subject to the terms laid out on the project’s page, including how clients are expected to behave, which governs what contributing servers would need to be willing to accept.

1 Like

Sorry but in my opinion ISP’s should run NTP-proxies and should configure their router-hardware to supply NTP-by-proxy-via-DHCP.

Yet they do not.

But know this, in Belgium they force their own modems/decoders on you, typical they do not implement an NTP-proxy but just set the Pool as DHCP-DNS and all devices target the pool.

Routers/Modems should not do this, unless YOU select tell it to do different.
However, you seldom have the option to change it.

Any router hardware can run Chrony/NTPD for 1000+ clients local, it doesn’t take a lot of CPU-power.

Yet they do not do this, but give the same NTP-server to all requests at the same time.

ISP’s are lazy, they could also run their own NTP-server and push that…yet they don’t.

Sure, anybody should rely on the pool, but they mostly push everybody to the pool, this is being lazy.

As such I restricted my responses, lazy ISP’s/IT-teams will not get responses anymore.

Therefor I vouch for changing the rules and restrict responses to abusive IP’s…regardless the source.
Nobody needs more then 1 or 2 updates a day, unless you serve 1000+ clients, but even then once an hour should do it.

If you run a proxy, you could poll every 10 minutes, that is just 1 request…no 1000 from 1 ip.

DHCP has a function to solve this…but they need to back it with a NTP-proxy…often they do not.

2 Likes

Is that the output from chronyc clients? Because if I’m matching the columns up correctly, it says that client’s average request interval is 512 (2^9) seconds, and the last query from it was 489 seconds ago. Not really a very effective “attack”.

2 Likes

I think there is some confusion here. The ISP/IT team will not even notice. It is the users you are hurting.

Have you read the part I wrote about most clients not even picking up NTP servers via DHCP?

Also still wondering how it is that there should be almost 12 Mbit/s uplink NTP traffic without any simultaneous dowlink traffic whatsoever.

2 Likes

The users won’t hurt, as their NTP request don’t get an answer and they try the next DNS entry.
Don’t fortget that the pool gives 4 answers.

So the client will select the next to get an answer.
In a way, we steer them to use the next DNS entry and not only the first.

I know most clients don’t, Windows doesn’t…Linux does.
And most Linux based clients that use DHCP also use it, and those devices are a lot more then Windows-clients.

We do not hurt anybody by setting the rules more strict, they just use the next DNS-entry.

That is what Round-Robin is for.

If we all respond less they take the next, and keep the next as long as it’s working.

We do not NEED to reply all the time, as they get 4 servers at every request.

It’s a sample of 1 IP requesting 370 and dropped 131 at the same time, this is in a few minutes.
The IP is from a BIG company in Belgium, they should know better!!!

That is my point, they should NOT refer all systems to the pool but to their own router that has a NTP-server.

It’s this company: https://www.fluvius.be

And they install Digital meters for Electricity, by millions of people…serious? They keep polling me this much? Beware it is just a sample…there are many more.

Lazy IT-teams :sleepy:

1 Like

Just for fun, I set up my internal firewall to redirect outgoing NTP packets to the internal NTP server. I was amused at how many devices, especially IoT, have seemingly hard coded the standard pool.ntp.org, rather than using their own vendor zone or servers. As a matter of fact, commercial routers, especially those based on OpenWRT, should do the same by default, while providing an option to let them through. Especially because many clients reject NTP servers from the DHCP server or the users specify their own NTP servers.

2 Likes

The problem seems to be self-solving: if a NTP pool server can’t keep up with requests above a certain rate, it will stop replying to requests above that rate (since there is no alternative, by defintiion). Daemon clients will move on and eventually find a server that can keep up with its request load–eventually they’ll hit one of Cloudflare’s servers, and stop looking. Zero humans will be affected by this (until the zone doesn’t have enough working servers left and collapses under the load). One-off clients will experience a delay or service failure, but that’s the risk a client takes when it doesn’t have a battery-backed RTC in its design.

It seems unfair to limit the request rate by client IP address. It would be better to apply the rate limit to all requests equally, since it is no longer possible to distinguish clients by IP address. But it doesn’t matter how or why you drop requests. If you need to reduce your monthly load by 50%, you could simply turn off your NTP server for 12 hours a day, or run it from the 10th to the 25th of each month.

I’ve done that, but I wouldn’t want to do that to anyone else, and I certainly do not want it done to me.

Manufacturers do not choose router SoCs for their ability to maintain accurate time. You can get better accuracy from a server on the other side of the planet. The client behind the NTP firewall loses diversity, so it can’t even measure how inaccurate its clock is, since every clock it can reach agrees exactly on the time because it’s the same clock.

At scale there may be no choice, as happened with SMTP port filtering, but I don’t think we’re there yet.

That is my point.

They are lazy…and simply point to the pool.

Often not even use the DHCP given resolving.

Worse, ISP’s that don’t proxy-NTP.

All want time, but they don’t give a shit about traffic or server-load.

3 Likes

Not just ISP but also Amazon Cloud, I have been seeing a lot of storms from AWS lately. And it seems once someone spools up a virtual machine there it will never query for new servers. I took my IPv4 server down for 2 weeks to do OS updates and kept getting hits from one of their Midwest USA zones.

1 Like

I think we all agree that a stronger reliance on local ntp servers would benefit almost everyone.

But I don’t think that this is a problem that can or should be solved by ISPs. If I send a NTP request out to a server, I expect that server to answer me and not some proxy.

In my experience the main problem is that a single broken client is not distinguishable from a bunch of correct clients that happen to share a public IP. And since CGNAT exists and is extensively used, the second case if definitely something to consider when offering a public service like NTP.

I would say this is the larger problem. I did some digging through the client logs on one of my servers and the vast majority of dropped requests come from just a few IPs in the EC2 ranges. No idea if those are bad clients or just the public IP of a lot of NATed instances?

1 Like

This is stricter than the defaults and can already trigger if there are just two rfc-conforming ntp clients behind one IP address.

They are hurt as they waste bandwidth and processing time on a server that won’t respond, and clients that use multiple peers loose accuracy. One could even argue that the total number of NTP traffic will increase since legitimate clients now have to do a second request.

Totally irrelevant since those are already randomized.

No, that is not what Round-Robin is for. Round-Robin is there to spread load across multiple working servers. It is not intended to use as an excuse to not answer requests.

You don’t need to reply only because there are other servers in the pool who pick up the slack. It is literally the one thing expected of a NTP server in the pool - that it answers NTP requests.

I appreciate anyone that adds servers to the pool, and I know that there is no rule regarding rate limiting settings or service quality, but please consider using at least the default rate limits.
And to repeat a point made above - if the network throughput screenshot you posted was accurate, then NTP is not what is clogging your bandwidth. Your NTP server should never have a higher outgoing than incoming traffic.

3 Likes

I’m going out on a limb speculating here, but AWS finally added an internal NTP service (with leap smearing) in 2017. I don’t know how quickly they rolled it out, but it’s likely that many AWS clients using the NTP Pool in 2023 are, by definition, old and clunky, or otherwise unusual, some of them using ntpd with “server 0.amazon.pool.ntp.org” and such.

2 Likes

According to the EC2 documentation, the internal NTP is preconfigured on Amazon Linux 2, Amazon Linux AMI and Amazon Windows AMI. For the other distros (and all old instances) it is up to the server operator to set up time sync or it will use distro defaults. So the rollout is still far from finished. And even if you follow the official manual, that just ensures that you add the local ntp service to the config, next to the other servers…

1 Like