we’ve been running a pair of NTP servers for the pool for several years now. One of them high load (2 Gb setting, ~6 Mbps average) the second lightly loaded (and basically supposed to be switched in should #1 fail)
These boxes have been up 24/7 for several years now, in a fully redundant data center with pretty good connectivity, and with standard operational monitoring.
So imagine my surprise when Ask’s robot emailed me this morning telling me that Server #1 has been removed from pool because of low score. First I assumed the box had crashed - but it hadn’t. It was up (> 1000 days uptime right now), ntpd was up and serving. Synchronization was ok too - it synchronized to a DCF77 Stratum1 box ~200 km north, and had a GPS based Stratum1 ~200km southeast as candidate. Both those Stratum1s are reliable Meinberg boxes.
And it was claimed to have negative -13.3 score. Checking on the other, it also was scored pretty badly, plus 11.something (and has since fallen to 4.8)
Being completely out of ideas I added another external stratum1 source and restarted ntpd on both boxes. And while box 1 is now very slowly creeping back top the 0 line, box 2 has since fallen way below the acceptability threshhold…
Another thing I see is that they are monitored from the US West Coast. Both of these boxes are located in Central Europe (Frankfurt, Germany) - could it be that we are seeing here US west coast connectivity problems, not those of my boxes?
Any idea? What can I do?