Just some quick feedback: Looking at the traffic monitor you kindly shared, the currently limiting factor seems not that the system wouldn’t be able to handle the load. Rather, it seems someone (the IDC operator?) is explicitly dropping packets. I suspect again some kind of “protection” mechanism kicking in.
Note how in the 5 minute graph, after the traffic peaks and then suddenly stops, the traffic goes to almost zero. And after a while, I goes back up again to near 1 Mbit/s. This is also nicely visible in the 5 minute table, except that one has too short a history, and at this very moment, the near-zero (actually ~36 kbit/s upstream and downstream combined) is about to move out of the table scope.
If it were just overload in some component, e.g., the network interface card or a connection tracking entity not able to deal with the bitrate or packet rate anymore, the drop would not be so sudden, but it would remain somewhat steady at a high level for a bit before starting to drop drop (when the load decreases because the server has been taken out of the pool). And it wouldn’t drop to near zero, and stay there for some time. Note the jump back up to almost 1 Mbit/s. At that time, the server is not in the pool, so it wouldn’t get any new traffic. So the jump is when the residual NTP traffic comes back to the VM when the explicit block of the traffic is lifted. It is as if a switch were flipped off, stopping NTP traffic, and then some time later back on, allowing traffic again.
So, if you’re interested to pursue this, you could try to find out what is causing this block, e.g., whether the IDC has some protection mechanisms in place, maybe that can be disabled somehow, or the trigger threshold increased.
But there is obviously always the risk that even if this current issue can be overcome, the next one may come up (e.g., actual capacity limits). So it may take some more steps to investigate until perhaps making your NTP server work as part of the pool, and there is always the risk that an issue, even when identified, cannot be overcome.
EDIT: For documentation purposes, I took the liberty of taking a screenshot of the graph referred to above to conserve the pattern described (the above links point to graphs being updated continuously). Note that the very last peak is tapering off more slowly (continuing outside the time window shown in the screenshot) because the server was manually taken out of the pool at about that time.