Increase in traffic, not just Australian servers

monitoring

#1

I’ve just started seeing a significant increase in traffic to selected pool servers: https://libertysys.com.au/imagebin/NTP/pool-increase-20180630.html (graphs show one week of UDP packet counts, low traffic on the first one was covered in Server score took a dive [oceania]).

It seems a little odd that the drop-off on one AU/Oceania zone server almost exactly corresponded to an increase on the other, despite no corresponding score change: http://www.pool.ntp.org/user/b3sczh4en5vgob2d4uxp However, there was no corresponding dip in the first UK/Europe server despite an increase on the other.

Each pair of servers uses matching pool bandwidth (although the UK pair is much higher than the AU pair.)


#2

The DNS rotation system is down - see https://ntppool.statuspage.io/# - the management server also appears to be down (or to be more exact the server that hosts it’s JS and CSS files)


#3

Thanks - that explains it!


#4

It’s causing us enough pain that we’re considering pulling out of one of the badly under-served pools — traffic to our server in the China pool has quadrupled. Seems like many clients we just gained are not behaving properly when given a rate-limiting KOD :frowning:


#5

…and it was a bit worse than just clients that don’t respect KOD. ntpd then gradually climbed to 100% CPU, which meant that we weren’t always responding, which meant we got more queries, which exacerbated the problem. Throwing more cores at that VPS (which already had four cores) doesn’t help, because ntpd is single-threaded. Instead I’ve reduced the number of cores on that VPS to just one… and added three more single-core VPSs. I’ve then set up equal-cost multipath routing for the IP address of our China NTP Pool node, to spread out the query traffic across four single-core servers.

Currently hitting about 55% CPU usage on each VPS. Traffic levels are gradually reducing again, though still seeing an imbalance of in:out (suggesting some devices are being rate-limited for other reasons?)


#6

On my servers I see that the pattern of NTP traffic has changed yesterday (and for couple hours also on Wednesday), but the average rate hasn’t changed much. The packet rate is more stable, there are no longer cycles switching between a high rate and low rate. To me it looks like the addresses returned by DNS are now randomized for each query. I’m not sure if it’s intentional or not, but I like it.


#7

Some people could serve more clients by running a multi threaded rust implementation: https://github.com/mlichvar/rsntp

ref: Getting beyond 10k qps?


#8

Same DNS issue again, starting about midnight BST ?


#9

Yes, I see it on my servers too. Here are graphs from two servers:


The average rate of incoming requests didn’t change much. The rate of dropped packets (due to rate limiting) increased on one, but decreased on the other. I’m wondering what’s going on.

Edit: It turned out the second server didn’t have rate limiting enabled.


#10

Seems like this has started again, even though https://ntppool.statuspage.io/ says everything is fine. My UK pool servers:


#11

What change in Jul 10th?
jp_pool