Disable monitoring emails for a single server

Just as reference: With a 3MBit/s setting for IPv4, I get peaks of up to 6 MBit/s (sometimes a bit more) on one of my two instances in Singapore (including less than 250 kbit/s IPv6 traffic at a 3Gbps setting).

On another instance, with the minimum setting of 512 kbit/s, I get peak above 1Mbit/s (which is why IPv4 is disabled on that instance as it exceeds the contracted bandwidth, only IPv6 is enabled at the 3 Gbps setting).

Looking at https://www.ntppool.org/scores/5.223.49.159/log?limit=20000&monitor=recentmedian, one can see that your server has passed a score of 10 several times. I haven’t thoroughly checked every instance, but it looks as if the score always dropped shortly, almost immediately, after the threshold of 10 was crossed. That strongly suggests that the issue is traffic load related.

Unlike other cases, however, the drop in score is way more pronounced, going way into the negative area (as nicely visible in the graph as well). That strongly suggests that this is not just some overload that would go away once the load eases a bit, causing kind of a saw-tooth pattern with a relatively small amplitude. But that something is seriously blocking traffic in response to this overload, and for longer than the actual overload condition itself probably persists*.

I.e., my suspicion would also be that there is some “protection” mechanism in place somewhere that kicks in upon the sudden onslaught of traffic when the score threshold of 10 is crossed*.

So as @avij suggests, I concur that the way to systematically troubleshoot this is by starting with the lowest “netspeed” setting available, and take it from there. I.e., see what happens, and slowly increase until the characteristic pattern starts again, and then investigate what is going on around that threshold.

And just to re-emphasize what @avij already stated as well: The mere bandwidth of the link is no indicator of how well the system can handle loads of NTP traffic. NTP traffic is not volume distributed across fewer (relatively speaking) but larger packets, but many very small packets from a few tons of different sources.

Handling high rates of small packets is something that network equipment typically is a bit more challenged in dealing with than with larger-packet traffic because it needs to do more lookups and processing for the same amount of data transferred. Where only one forwarding lookup and passing of all the handling up and down the protocol stack is needed for a single 1500 octet packet like, e.g., common for video streaming, the system needs to do almost 20 lookups when carrying the same amount of data in small NTP packets.

And because the connection tracking of stateful firewalls/NAT/port forwarding mechansims can very easily get overloaded by the sheer amount of sources for the traffic a server gets when included in the pool.

Note that ISPs may have different traffic limits in different regions, even if everything else regarding the setup appears to be the same. E.g., I have the impression that bandwidth is more expensive in Asia than, e.g., in Europe. E.g., I only found out by re-reading the fine-print on the first instance mentioned above that for the Singapore site and another site in the wider region of that hoster, there is a traffic volume limit in place that does not exist in the European and North American data centers (which is another reason why it’s only at the 3Mbit/s setting even though the 1 core/1 GB memory instance could handle loads of 20Mbit/s actual traffic and more).

* Note in this context that due to past exploits of NTP for traffic reflection and amplification attacks, many “protection” systems (and admins) even nowadays are very sensitive to NTP traffic, even when current NTP implementations are no longer susceptible to that kind of attack. E.g., that is one reason why my IPv4 instance is limited to the 3Mbit/s setting only, as the next higher value would drive the traffic volume into regions where my ISP’s protection mechanisms occasionally trigger traffic blocking for 15 minutes because of such a suspected reflection/amplification attack (even though the traffic itself clearly does not fit the pattern of such an attack, e.g., no amplification whatsoever).

1 Like